Content uploaded by Nia Marcia Maria Dowell
Author content
All content in this area was uploaded by Nia Marcia Maria Dowell on Sep 16, 2018
Content may be subject to copyright.
BUILDING INTELLIGENT
CONVERSATIONAL TUTORS
AND MENTORS FOR TEAM
COLLABORATIVE PROBLEM
SOLVING: GUIDANCE FROM
THE 2015 PROGRAM FOR
INTERNATIONAL STUDENT
ASSESSMENT
Arthur C. Graesser, Nia Dowell, Andrew J. Hampton,
Anne M. Lippert, Haiying Li and
David Williamson Shaffer
ABSTRACT
This chapter describes how conversational computer agents have been used in
collaborative problem-solving environments. These agent-based systems are
designed to (a) assess the students’ knowledge, skills, actions, and various
other psychological states on the basis of the students’ actions and the con-
versational interactions, (b) generate discourse moves that are sensitive to
the psychological states and the problem states, and (c) advance a solution
Building Intelligent Tutoring Systems for Teams: What Matters
Research on Managing Groups and Teams, Volume 19, 173211
Copyright r2018 by Emerald Publishing Limited
All rights of reproduction in any form reserved
ISSN: 1534-0856/doi:10.1108/S1534-085620180000019012
173
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
to the problem. We describe how this was accomplished in the Programme
for International Student Assessment (PISA) for Collaborative Problem
Solving (CPS) in 2015. In the PISA CPS 2015 assessment, a single human
test taker (15-year-old student) interacts with one, two, or three agents that
stage a series of assessment episodes. This chapter proposes that this PISA
framework could be extended to accommodate more open-ended natural lan-
guage interaction for those languages that have developed technologies for
automated computational linguistics and discourse. Two examples support
this suggestion, with associated relevant empirical support. First, there is
AutoTutor, an agent that collaboratively helps the student answer difficult
questions and solve problems. Second, there is CPS in the context of a multi-
party simulation called Land Science in which the system tracks progress
and knowledge states of small groups of 34 students. Human mentors or
computer agents prompt them to perform actions and exchange open-ended
chat in a collaborative learning and problem-solving environment.
Keywords: AutoTutor; collaboration; collaborative problem solving;
conversational agents; PISA; problem solving
This chapter describes how conversational computer agents can be used in
collaborative problem-solving environments. These agent-based systems are
designed to (1) assess the team members’ knowledge, skills, actions, and various
other psychological states on the basis of their actions and conversation,
(2) generate discourse moves that are sensitive to their psychological states and
the problem states, and (3) advance a solution to the problem. The develop-
ment of agent-based systems has traditionally focused on tutorial dialogue
between a single human and computer tutor. There are also a number of sys-
tems that have a single human interact with two or more agents to help them
learn subject matter in various domains (such as literacy, numeracy, science,
engineering, and technology). At the horizon is the use of agents to facilitate
team learning, problem solving, and work. This chapter describes how conver-
sational agents have been used in tutorial dialogue and how they are starting to
be used in team collaborative problem solving.
A broad theoretical framework is of course desired for guiding the design of
agents in collaborative problem solving. This chapter adopts the framework
that was articulated in the Programme for International Student Assessment
(PISA) for Collaborative Problem Solving (CPS) in 2015 (OECD, 2013). In the
PISA CPS 2015 assessment, a single human test taker (15-year-old student)
interacts with one, two, or three agents that stage a series of assessment
174 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
episodes. The human’s responses in these assessment episodes were used to
scale their CPS proficiency. However, their responses consisted of selecting
alternatives on action pallets or chat menus rather than natural language
because it was not feasible for the computer to interpret natural language in
several dozen languages and cultures. This chapter proposes that this PISA
framework could be extended to accommodate more open-ended natural lan-
guage interaction for those languages that have developed technologies for
automated computational linguistics and discourse. This can be accomplished
by combining the advances of automated analyses of tutorial dialogue in
natural language with the theoretical framework provided by PISA CPS 2015.
The chapter begins by justifying the need to better understand collaborative
problem solving and describing the theoretical framework of PISA CPS 2015.
We subsequently describe AutoTutor, an agent that collaboratively helps the
student answer difficult questions and solve problems by holding a conversation
in natural language. AutoTutor illustrates how computers can semantically
interpret dialogue in natural language, assess the quality of human contribu-
tions, and guide the agent in adaptively responding to help students learn. Our
contention is that these automated discourse mechanisms can be transferred to
team learning and problem solving environments. We show how this has been
attempted in the context of a multi-party simulation called Land Science,in
which the system tracks progress and knowledge states of small groups of 34
students in computer-mediated chat. Human mentors or computer agents
prompt them to perform actions and exchange open-ended chat in a collabora-
tive learning and problem solving environment.
This chapter has the primary lens on communication in natural language,
computer agents, and collaborative problem solving. Nevertheless, we assume
this work can generalize to broader contexts and applications. There are
channels of communication other than natural language, such as facial expres-
sions, gesture, and physical action. Investigations of computer agents can pre-
sumably be extended to communication among humans and all sorts of hybrid
humanagent combinations among team members. Collaborative problem
solving has many similarities to other team efforts, such as collaborative learn-
ing and coordinated work. Communication is essential in all of these team
efforts, so the present focus on natural language is far reaching.
WHY FOCUS ON COLLABORATIVE PROBLEM
SOLVING?
Much of the planning, problem solving, and decision making in the modern
world is performed by teams. Many problems are so complex that it takes a
group of experts with diverse perspectives and talents to collaborate in finding
optimal solutions. The success of a team can be threatened by a social loafer,
175Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
a saboteur, an uncooperative unskilled member, or a counterproductive
alliance, whereas solutions can be facilitated by a strong leader that draws out
different perspectives, helps negotiate conflicts, assigns roles, and promotes
team communication (Cesareni, Cacciamani, & Fujita, 2016; Fiore, Wiltshire,
Oglesby, O’Keefe, & Salas, 2014; Salas, Cooke, & Rosen, 2008). To understand
these dynamics, many advocate discussions in national assessments and the
development of educational curricula that designate collaborative problem
solving (CPS) as an important twenty-first-century skill (Care, Scoular, &
Griffin, 2016; Griffin & Care, 2015; Hesse, Care, Buder, Sassenberg, & Griffin,
2015; National Research Council, 2011; Von Davier, Zhu, & Kyllonen, 2017).
At the international level, CPS was selected by the Organisation for
Economic Co-operation and Development (OECD) as a new development for
the Programme for International Student Assessment (PISA) in the 2015 inter-
national survey of student skills and knowledge (Graesser, Forsyth, & Foltz,
2017; Graesser, Foltz, et al., 2017; OECD, 2013). Fifteen-year-old students
from over 50 countries completed this PISA CPS 2015 assessment in addition
to assessments of mathematics, science, literacy, and other proficiencies.
It is important to acknowledge that CPS is a category of team interaction
that is different from other categories. There normally are objective criteria on
whether the problem is solved so we can assess whether, or the extent to which,
the team solves the problem successfully. There can also be analyses on the
extent to which different team members contribute to the solution. Team mem-
bers play different roles in guiding the team (teamwork) or solving the problem
(taskwork) in route to a group solution, which can be tracked with automated
measures. This is different than collaborative learning, which involves an
assessment of whether each team member and the group as a whole have
learned a subject matter according to measured criteria (such as an achievement
test, as opposed to the quality of a solution to a problem). CPS is also different
than coordinated work, as in the case of a team that produces artifacts accord-
ing to a well-established plan. CPS is believed to be a more difficult and practi-
cally useful proficiency than collaborative learning and coordinated work in the
twenty-first century.
Conversational agents were used in the PISA CPS 2015 assessment that
followed the definition in the assessment framework:
Collaborative problem solving competency is the capacity of an individual to effectively
engage in a process whereby two or more agents attempt to solve a problem by sharing the
understanding and effort required to come to a solution and pooling their knowledge, skills
and efforts to reach that solution. (OECD, 2013; p. 6)
An agent could be either a human or a computer agent that interacts with the
student according to this definition. The final decision was to have computer
agents in the final assessment. That is, a single student interacted with one
to three computer agents in each problem rather than interacting with other
humans.
176 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Although conversational agents have been used to assess and to facilitate
collaborative interactions (Graesser, 2016; Graesser, Forsyth, et al., 2017;
Tegos, Demetriadis, Papadopoulos, & Weinberger, 2016), the decision to have
the students interact with computer agents during the PISA CPS 2015 assess-
ment was motivated entirely by logistical and assessment constraints. PISA
required a computer-based assessment that would measure CPS skills of indi-
vidual students in a short time window (two 30-minute sessions) and 23
problem solving scenarios per session. It was believed that the computer agents
could systematically control the interaction and thereby provide reliable and
valid assessments within the time constraints. A student could be assessed in
multiple teams, multiple tasks, different characteristics of team members, and
multiple phases of a problem in a controlled interaction. This would be logisti-
cally impossible with humanhuman interaction. It often takes a few minutes
for a new group of humans to get acquainted in computer-mediated conversa-
tion before important problem solving processes begin. There is no guarantee
that a student would be paired with several groups of students with the ideal
combination of characteristics and assessment episodes. In contrast, assess-
ments with computer agents handle the challenges of (a) assembling groups of
humans (via computer mediated communication) in an expedient manner
within rigid time constraints, (b) the necessity of having multiple teams per stu-
dent to obtain reliable and valid assessments, and (c) measurement uncertainty
and error when a student is paired with humans who are unresponsive or
defiant. A systematic design of computer agents was ultimately created that
provided control, many activities and interactions per unit time, and multiple
groups of agents.
Another serious logistical constraint was that each assessment had to be
translated into several dozen languages and cultures. OECD (the Organisation
for Economic Co-operation and Development) has always had an English and
a French version of each assessment scenario, which the home language is com-
pared against. Country representatives examine and ultimately sign that the
translation is adequate. OECD manages to achieve successful approvals from
the participating countries but there is always the persistent possibility that dif-
ferences in language and cultures impact scores for particular items (El Masri,
Baird, & Graesser, 2016). The main repercussion of this constraint is that it is
practically and financially impossible to score open-ended responses in natural
language. The expense and time to have human experts’ grade responses was
not feasible. Automated natural language is sufficiently reliable for only a
handful of languages so the different countries would not be treated equally.
Therefore, assessments could only be inferred from actions performed on a
computer interface or options selected from alternatives in a chat menu. Quite
clearly, this is very different from open-ended chat interactions and incited
discussion throughout OECD, countries, and researchers on how to evaluate
the use of computer agents in PISA CPS 2015.
177Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
The computer agents solved many of the logistical and assessment problems,
but questions emerged on this approach to assessing CPS. How similar are
these CPS environments with agents to bona fide environments among
humans? How well can these agents simulate actual humans? How similar are
the assessments between humans and agents (HA) to the assessments among
humans (HH)? Some researchers are in the process of answering these ques-
tions, particularly the last question (Rosen, 2014).
The traditional conception of “computer-support” in collaborative learning
consists of stationary technology, such as structured interfaces, prompts, and
assignment of students to scripted roles (Fischer, Kollar, Stegmann, & Wecker,
2013). However, more recent research efforts highlight the benefit of interactive
and context-sensitive assessment and support in group learning interactions
(Erkens, Bodemer, & Hoppe, 2016; Gilbert et al., 2017; Liu, Von Davier, Hao,
Kyllonen, & Zapata-Rivera, 2015; Rose
´& Ferschke, 2016; Tegos et al., 2016).
These same trends would apply to CPS per se. The computer agents help solve
assessment concerns in PISA CPS 2015, but they can also provide adaptive and
interactive computer support technologies, particularly when coupled with
open-ended natural language processing.
Once again, this chapter shows how interactive computer supports, namely
intelligent conversational agents and open-ended natural language processing,
can enhance collaborative interactions and assessment of CPS proficiencies.
Toward this goal, we next clarify how PISA CPS 2015 is assessing CPS profi-
ciency with the agent-based approach. We later examine how the components
of the PISA CPS 2015 framework can be assessed with open-ended natural lan-
guage for languages that have sufficient advances in computational linguistics
and discourse (such as English). We also discuss the possibilities and benefits of
integrating advances within the context of AutoTutor tutorial dialogues
(Graesser, 2016) with automated analyses and facilitation of chat interactions
among groups of 34 students who collaboratively learn and solve problems.
This is one path on a roadmap for incorporating these technologies in team
learning and problem solving.
PISA 2015 COLLABORATIVE PROBLEM SOLVING
The problem solving dimension of CPS directly incorporated the PISA 2012
problem solving framework for individuals (Funke, 2010; Greiff et al., 2014;
OECD, 2010). This draws on influential theoretical frameworks for analyzing
CPS, such as the teamwork processing model of O’Neil, Chuang, and Baker
(2010), teamwork models of Fiore and colleagues (Fiore et al., 2010; Salas
et al., 2008), and the Assessment and Teaching of 21st Century Skills findings
(Griffin & Care, 2015; Hesse et al., 2015). All of these include both a cognitive
and a collaborative dimension, as well as a differentiation between teamwork
178 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
and taskwork. The four cognitive processes (focusing on taskwork) of the prob-
lem solving assessment in both PISA 2012 and 2015 were: (A) exploring and
understanding, (B) representing and formulating, (C) planning and executing,
and (D) monitoring and reflecting. It should be noted that the A and B
processes were difficult to differentiate in both PISA 2012 and 2015. The collab-
oration processes (focusing on teamwork) of CPS 2015 had three competencies:
(1) establishing and maintaining shared understanding, (2) taking appropriate
action, and (3) establishing and maintaining team organization. When the four
problem solving processes are crossed with the three collaboration competen-
cies, there are 12 skills in the resulting matrix representing the competencies of
CPS. The 4 ×3 matrix appears in Table 1. A satisfactory assessment of CPS
would assess the skill levels of students for each of these 12 cells. These skill
levels contributed to a student’s overall CPS proficiency score.
Problem solving scenarios needed to be carefully composed to allow scores
to be computed on each of the 12 cells. All of the problems had instructions on
what needed to be accomplished, a work area for the problem to be solved, and
a chat facility for the human to interact with the agents. Agents and humans
often differed on what information they could see or have access to (following
the hidden profile problem in collaboration), so they needed to have a
Table 1. Matrix of Collaborative Problem Solving Skills for PISA CPS 2015.
(1) Establishing and
Maintaining Shared
Understanding
(2) Taking Appropriate
Action to Solve the
Problem
(3) Establishing and
Maintaining Team
Organization
(A) Exploring and
understanding
(A1) Discovering
perspectives and
abilities of team
members
(A2) Discovering the
type of collaborative
interaction to solve the
problem, along with
goals
(A3) Understanding
roles to solve problem
(B) Representing and
formulating
(B1) Building a shared
representation and
negotiating the
meaning of the problem
(common ground)
(B2) Identifying and
describing tasks to be
completed
(B3) Describe roles and
team organization
(communication
protocol/rules of
engagement)
(C) Planning and
executing
(C1) Communicating
with team members
about the actions to be/
being performed
(C2) Enacting plans (C3) Following rules of
engagement, (e.g.,
prompting other team
members to perform
their tasks)
(D) Monitoring and
reflecting
(D1) Monitoring and
repairing the shared
understanding
(D2) Monitoring results
of actions and
evaluating success in
solving the problem
(D3) Monitoring,
providing feedback, and
adapting the team
organization and roles
179Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
conversation to achieve common ground (Clark, 1996; Dillenbourg, 1999).
Conversations were also necessary for task assignments among team members,
task progress, and task achievements. The agents sometimes made errors, were
slackers, or disagreed so the system measured whether the human test taker
took steps to handle these problems. These capabilities and manipulations
created a broad set of situations to enable assessment of all 12 cells in the
Table 1 matrix.
One advantage of computer agent assessment is the degree of control over
the conversation. The discourse contributions of the two agents (a1, b2) and
the digital media (m) can be coordinated so that each [a1, b2, m] sequential dis-
play is functionally a single episodic unit (U) to which the human responds
through language, action, or silence in a particular human turn (HT). Thus,
there is an orchestrated finite-state transition network that alternates between
episodic units (U) and human turns (HT), which is formally isomorphic to a
dialogue. This is different from a collaboration in which many people can speak
simultaneously and overlap in time (Dascalu, Trausan-Matu, McNamara, &
Dessus, 2015). There can be conditional branching in the state transition net-
work so that the computer’s generation of U
nþ1
at turn nþ1 is contingent on
the state of the human turn HT
n
at turn n. There is a finite number of states
associated with each human turn (HT
n
) in PISA CPS 2015, with two to five
options at each turn (i.e., either chat options or alternative actions to be per-
formed). The complexity of the branching depends on the number of finite
states at each turn and the length of exchanges. In the PISA assessment, the
number of options is small at each turn and the length of the branching is short
for each episodic unit. To foreground what will come later, it would be feasible
to accommodate open-ended natural language within these constraints. There
could be a semantic match score between the student’s verbal contributions and
the correct answer of the episodic unit, with a small number of branching
options, for example, correct, incomplete, incorrect, no response, bad answer
(Cai, Graesser, & Hu, 2015; Zapata-Rivera, Jackson, & Katz, 2015); these lim-
ited options would allow extended but manageable branching options in the
state transition network.
In the PISA assessment, there is only one score associated with each episodic
unit and each episodic unit is aligned with one and only one cell in the Table 1
matrix. These constraints are compatible with conventional psychometric
modeling, which requires a fixed set of items (i.e., episodic units). Consequently,
PISA CPS 2015 had a fixed sequence of episodic units (U
1
,U
2
,…U
m
) that were
distributed throughout the problem solving scenario. The decisions of the
human for each episodic unit determined the score for that unit. Moreover, the
conversations were finessed so they would naturally close at the end of each
episodic unit by either an agent’s speech acts (e.g., “We should do X, let’s go
on”) or an event in the scenario (such as an announcement that a train station
was closed in a transportation problem). After one episodic unit closed, the next
unit would systematically appear. Assessment scores were collected for each
180 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
student for the M episodic units that were distributed among the cells in the
Table 1 matrix.
Students are assessed on a diverse set of situations in PISA CPS 2015. Those
who respond randomly to the response options would obviously score low on
CPS proficiency as well as on the collaboration and problem solving dimen-
sions. A student may be a good team player but not take the initiative when
problems arise (e.g., an agent fails to respond or gives an incorrect answer). A
student may take on some initiative when breakdowns occur, but fail to handle
complex cognitive problems. A student who scores high in CPS proficiency
leads the team in achieving group goals during difficult times (conflicts,
incorrect actions, unresponsive team members) and can also handle complex
problems with many cognitive components that burden working memory and
require reasoning. An adequate CPS assessment would require episodic units
for all of these situations. Assessment with agents (HA) can guarantee com-
plete coverage assessments with other humans (HH) cannot.
Once again, the construction of problem scenarios and episodic units were
critical in PISA CPS 2015. It was important to select problems that had high
interdependency, such that team members could not solve the problems alone
and needed to communicate, formulate plans together, assign roles, and track
each other’s progress. Many problems were hidden profile problems in which
the team members did not have access to the same information and needed to
establish shared understanding through conversational exchanges. Team mem-
bers needed equal status so that they did not fear penalty if their culture stigma-
tizes questions or requests from a low status person to a high status person. In
some problems, unresponsive, low ability, or uncooperative agents required
a high ability student to monitor team members, troubleshoot problems, and
sometimes be pushy.
There are many impressive characteristics of the PISA CPS 2015 assessment.
The framework is theoretically and empirically grounded (OECD, 2013). The
assessment covers a broad range of important CPS situations and aligns with
traditional psychometric methodology. Students can be scaled on different
levels of CPS proficiency, with the reliability and validity of the proficiency
scale and levels currently undergoing analysis.
Nevertheless, both advocates and critics of the PISA CPS 2015 assessment
have raised important questions about potential liabilities (Graesser, Foltz,
et al., 2017). Do scenarios with agents reflect bona fide CPS mechanisms among
humans? To what extent does the assessment with agents match an assessment
that could be accomplished among humans? Does the limited set of response
options prevent an adequate assessment of CPS compared with open-ended
chat responses? It is beyond the scope of this chapter to resolve these important
questions. Instead, we explore whether it is feasible to assess CPS and the
Table 1 cells with open-ended verbal responses in the chat scenarios. We also
clarify the important role of agents in these assessments.
181Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Our research builds on a community of researchers in the learning sciences,
team science, computer supported collaborative learning, and computational
linguistics who have investigated successful versus unsuccessful conversation
patterns among team members in small groups by analyzing computer-
mediated interactions in chat, discussion forums, and other digital environments AU:1
(Cen, Ruta, Powell, Hirsch, & Ng, 2016; Dascalu et al., 2015; Dowell et al.,
2015; Foltz & Martin, 2008; Liu et al., 2015; Morgan, Keshtkar, Duan, &
Graesser, 2012; Mu, Stegmann, Mayfield, Rose
´, & Fischer, 2012; Nash &
Shaffer, 2013; Rose
´et al., 2008; Shaffer et al., 2009; Tausczik & Pennebaker,
2013; Von Davier & Halpin, 2013). We have investigated the conversations
using a variety of automated text analysis tools, such as Linguistic Inquiry and
Word Count (Pennebaker, Booth, & Francis, 2007), Coh-Metrix (Graesser
et al., 2014; McNamara, Graesser, McCarthy, & Cai, 2014), latent semantic
analysis (Foltz, Kintsch, & Landauer, 1998), epistemic network analysis
(Shaffer, 2017; Shaffer et al., 2009), and state-transition networks that track
speech acts of team players (Morgan et al., 2012). These automated tools have
been applied to conversations in their entirety, to subsets of the conversation at
a particular window size (e.g., five consecutive turns), to single conversational
turns, to adjacent conversational turns, and to turns of specific team members.
The conversation profile includes measures of team cohesion, percentage of
on-topic versus off-topic contributions, amount of new information, character-
istics of team members (e.g., leader, organizer, follower, social loafer), alliances
between team members, and presence of specific conversation patterns.
It is conceivable that open-ended student responses with other humans or
with computer agents could cover all of the cells in the Table 1 matrix. If so,
there is hope of assessing CPS proficiencies with open-ended student responses.
The feasibility of this possibility is explored in the next section. If not, it is
prudent to pursue the PISA CPS 2015 approach with scripted agents, a fixed
sequence of episodic units, decisions on a limited number of alternatives at
each human turn, and a small number of conversation paths within each epi-
sodic unit.
SCORING OPEN-ENDED STUDENT RESPONSES
WITH AUTOTUTOR
The simplest agent collaboration is a dialogue in which the human interacts
with only one agent. In this chapter, we use AutoTutor (Graesser, 2016;
Nye, Graesser, & Hu, 2014) as an example. In this system, a tutor agent collab-
oratively interacts with the human student to solve a problem or answer a
difficult question that requires reasoning. AutoTutor presents problems to
solve (reflected in difficult questions that require reasoning) that cover one to
seven sentence-like ideas (i.e., propositions, claims, clauses) in an ideal answer.
182 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
The human student and tutor agent co-construct a solution by multiple conver-
sational turns. It may take up to a 100 conversational turns back and forth to
solve a problem.
Automatic Evaluation of Student Contributions in AutoTutor
AutoTutor evaluates the meaning of student contributions during the course
of the tutorial interaction. Consider a typical example problem with AutoTutor
in physics:
PHYSICS PROBLEM: If a lightweight car and a massive truck have a head-on collision,
upon which vehicle is the impact force greater? Which vehicle undergoes the greater change
in its motion, and why?
A conversation would have many turns between the student and agent in
answering these questions. As the dialogue evolves, AutoTutor compares the
student’s verbal contributions within a single turn and also the previous student
turns in the conversation against (a) a set of good answers (called expectations)
and (b) a set of bad answers (slips and misconceptions). For example, E1 is an
example expectation and M1 is an example misconception for this problem.
E1: The magnitudes of the forces exerted by the two objects on each other are equal.
M1: A lighter object exerts no force on a heavier object.
AutoTutor has a semantic matcher that matches the verbal contributions of the
student to the expectations and misconceptions in order to assess how well the
student is performing on the physics problem. Performance increases with high
matches to good answers and low matches to bad answers.
Advances in computational linguistics and semantics have made impressive
gains in the accuracy of semantic matches between one short text (i.e., a sen-
tence or two) and another short text (Rus, Lintean, Graesser, & McNamara,
2012; Rus & Stefanescu, 2016). The AutoTutor research team has evaluated
many computational semantic matchers over the years in AutoTutor and other
intelligent tutoring systems with conversational agents (Cai et al., 2011;
Graesser, Penumatsa, Ventura, Cai, & Hu, 2007; Rus et al., 2012). Semantic
matchers automatically compute the semantic similarity between a student’s
verbal contribution and an expectation (or misconception), with a similarity
score that varies from 0 to 1. These semantic match algorithms have included
keyword overlap scores, word overlap scores that place higher weight on lower
frequency words in the English language, scores that consider the order of
words, latent semantic analysis cosine values, comparisons to regular expres-
sions, and procedures that compute semantic logical entailment; some of these
algorithms are defined in this chapter. Excellent results can be achieved by
a combination of latent semantic analysis (Landauer, Foltz, & Laham, 1998;
183Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Landauer, McNamara, Dennis, & Kintsch, 2007), frequency-weighted word
overlap (rarer words and negations have higher weight), regular expressions
(Jurafsky & Martin, 2008), and semantic entailment (Rus et al., 2012). For
example, in one analysis of AutoTutor in the area of research methods, latent
semantic analysis together with regular expressions had high agreement scores
in direct comparisons with human experts (Cai et al., 2011). Two human
experts along with a computational model using latent semantic analyses and
regular expressions both evaluated a sample of 892 student answers to
AutoTutor questions. The correlation of similarity scores between AutoTutor
and human expert judges was r¼0.67, which was about the same as between
two experts (r¼0.69). Interestingly, syntactic parsers did not prove useful in
these analyses because a high percentage of the students’ contributions are tele-
graphic, elliptical, and ungrammatical. At the time of this writing, the best
automated semantic matcher is the SEMILAR system developed by Rus and
Stefanescu (2016). SEMILAR won the semantic textual similarity competition
at SemEval-2015, the premier international forum for semantic evaluation.
It is beyond the scope of this chapter to provide a technical specification of
the components in these automated semantic matchers. However, it is impor-
tant to briefly clarify both latent semantic analyses and regular expressions
because they together have proven adequate in AutoTutor on several topics,
such as computer literacy, physics, and electronics. They can also be used in
tracking performance in CPS assessments.
Latent Semantic Analysis (LSA)
LSA (Foltz et al., 1998; Landauer et al., 2007) computes the conceptual similar-
ity between words, sentences, paragraphs, or texts by considering implicit world
knowledge in addition to the explicit words. It is a mathematical, statistical
technique for representing world knowledge, based on a large corpus of texts.
The central assumption is that the meaning of a word is captured by the com-
pany of other words that surround it in naturalistic documents. Two words are
similar in meaning to the extent that they share similar surrounding words in
documents. For example, the word glass will be highly associated with words of
the same functional context, such as cup,liquid, and pour.
LSA is different than a dictionary or a thesaurus because the highly associ-
ated words may be in different syntactic classes and not follow structured
definition frames. Instead, LSA considers how words are used in naturalistic
documents. LSA starts with a very large word by document matrix that counts
how often each word appears in each document. This forms a largely sparse
matrix (lots of 0s); if there are 100,000 words and 50,000 documents (e.g., para-
graphs), there would be 5 billion cells in the matrix. LSA uses a statistical tech-
nique called singular value decomposition to condense the matrix of the large
corpus of texts to 100500 statistical dimensions. Each word is represented as
a vector of values on the K dimensions. The conceptual similarity between any
184 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
two text excerpts (e.g., word, clause, sentence, text) is computed as the geomet-
ric cosine between the values of the words (on the K dimensions) in one text
excerpt versus the other. The value of the cosine typically varies from 0 (not at
all similar) to 1 (perfectly similar). Many other classes of high dimensional
semantic spaces do as well or slightly better than LSA, but these nuances are
beyond the scope of this chapter.
LSA-based semantic similarity can be used in a number of ways when evalu-
ating student contributions. There can be comparisons of the student contribu-
tions to the expectations and misconceptions associated with the problem. For
example, if there are four expectations, the first student turn in the conversation
would have four semantic match scores, one for each expectation. As the con-
versation progresses, these four scores would be updated with each conversa-
tional turn of the student. An expectation would be considered covered when
the match score for an expectation meets or exceeds some threshold value.
When the conversation ends, the student’s overall performance on a problem
can be computed as the mean match score of the four expectations; alterna-
tively, it could be the mean of the expectations minus the mean of the
misconceptions.
A second use of LSA-based similarity is to evaluate different types of dis-
course coherence between the student and tutor (Graesser, Jeon, Yang, & Cai,
2007). That is, to what extent is the content of a student’s turn T
n
related
conceptually to the content of the tutor agent’s previous turn T
n1
or any of
the turns in the previous conversation (T
1
,T
2,
…T
n1
)? These would be com-
puted as sim(T
n1
,T
n
) and max{sim(T
nm
,T
n
)}, where n>m. Alternatively,
we could also compute whether the tutor’s turns are coherently related to the
student’s contributions. The two of these together would yield coherence scores
for the entire conversation.
A third use of LSA-based similarity is to evaluate the newness, givenness, and
relevance of each conversational turn. The distinction between given (old) infor-
mation versus new information in discourse is a foundational distinction in the-
ories of discourse processing (Haviland & Clark, 1974; Prince, 1981) and
assessment (Von Davier & Halpin, 2013). Given information includes words,
concepts, and ideas that have already been mentioned in the discourse, in this
case a conversation on a particular tutoring problem. New information builds
on the given information or launches a new thread of ideas. Relevance is
another foundational construct in discourse processing theories (Sperber &
Wilson, 1995). Discourse contributions are expected to be relevant to the topic
at hand and discourse goals. There are LSA-based metrics that compute the
newness (N), givenness (G), and relevance (R) of each turn in the conversation,
with values that vary from 0 to 1 (Hempelmann et al., 2005; Hu et al., 2003;
McCarthy et al., 2012). The statistical method is called span in the computation
of G and N. The LSA vector of an incoming turn, V(T), is compared with the
existing vector of the preceding discourse, V(P); the existing vector determines
the span of the preceding discourse. The portion of V(T) that is parallel with
185Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
the V(P) is the computation for G (given) whereas the component of the vector
that is perpendicular is the computation of N (new). McCarthy et al. (2012)
reported that there was a high correlation between the G and N values and the
decisions of experts who annotated discourse samples with Prince’s given-new
theory. Regarding relevance, it is possible to compute semantic similarity scores
between the vector V(T) and the subject matter being tutored, as reflected in
comparisons to excerpts from a textbook for example.
All of these LSA-based metrics can be directly applied to CPS among stu-
dents. This includes semantic matches to expectations and misconceptions, as
well as measures of coherence, newness, givenness, and relevance. However, the
pragmatic ground rules are quite different in tutorial dialogues than in CPS dia-
logues among humans. Good tutors do not merely lecture but instead attempt
to get the student to be active learners and articulate the expectations during
problem solving (Chi, Siler, Jeong, Yamauchi, & Hausmann, 2001; Dzikovska,
Steinhauser, Farrow, Moore, & Campbell, 2014; Graesser, Person, & Magliano,
1995). This is accomplished through a variety of dialogue moves (such as hinting
and evaluative feedback) that are described later. The tutor often withholds
information and waits for the student to contribute. In contrast, students who
interact in CPS are not prone to withhold information, generate hints, and
quiz team members. Instead, the students are all doing what they can to contrib-
ute useful information to solve the problem. In summary, the ground rules of
the discourse constitute a central component in the design of team learning
environments.
Regular Expressions
Each expectation and misconception in AutoTutor is expressed in both natural
language and regular expressions. The natural language format is used in the
LSA analyses whereas regular expressions are needed to accommodate (a) mis-
spellings, (b) words that are functionally equivalent in the context of a particu-
lar problem to be solved, and (c) structure-sensitive semantic matches. Regular
expressions are structured symbolic expressions that can accommodate a large
number of alternative verbalizations (Jurafsky & Martin, 2008). In the context
of the example physics problem involving vehicle collision, the notion of a
“collision” could be captured with a number of words in different word classes:
collision, collisions, collide, collides, colliding, and so on. The regular expres-
sion col. (the period allows any other letters in the word) can accommodate all
of these words plus misspellings (colision, colishun). It is important to have
enough letters in the regular expression for a word to distinguish it from other
words that might be expressed in that particular discourse context. So co.
would probably not work because alternative words in that context might be
come or counterforce.
Regular expressions allow functionally equivalent terms with “or” operators
designated as a |. So a car might be referred to as a car, an automobile, a
186 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
vehicle, or simply a pronoun (it). Regarding structure-sensitive semantic
matches, regular expressions can group constituents, order elements within a
constituent, and embed constituents. This allows these expressions to distin-
guish, for example, “the truck moves a shorter distance than the car” from “the
car moves a shorter distance than the truck.”
Regular expressions are needed to tune the semantic matcher component in
ways that cannot be provided by LSA. LSA cannot reliably handle any of the
above problems (misspellings, context-functional synonyms, and structure-
sensitive semantic analysis). As already discussed, data consistently show that
there is substantial added value when adding regular expressions to LSA in
evaluations that compare the semantic matching models to expert human
judgments (Cai et al., 2011).
Some aspects of the subject matter and skills are not routinely measured by
this semantic matching approach with AutoTutor. AutoTutor does not directly
measure deductive, inductive, and other types of logical reasoning unless
the expectations capture the relevant semantic islands underlying reasoning.
AutoTutor does not directly measure particular steps, procedures, and phases
of problem solving unless those are anticipated ahead of time. Instead,
AutoTutor tracks the content through pattern matching processes that compare
the student’s verbal responses to the expectations and misconceptions.
Automatic Generation of Dialogue Moves in AutoTutor
AutoTutor needs to generate dialogue moves to advance conversation. AutoTutor
incorporates dialogue generation mechanism of human tutors as well as more
ideal tutoring strategies that even expert tutors rarely exhibit (Cade, Copeland,
Person, & D’Mello, 2008; Graesser, 2016; Graesser, D’Mello, & Person, 2009).
These analyses of tutoring have revealed that students essentially never give a
satisfactory answer on the first turn after receiving a problem. It takes a conversa-
tion to assess what they know, to build on their knowledge, and converge on a
satisfactory answer.
AutoTutor has a conversational mechanism that includes follow-up
exchanges that draws out more of what the student might know. A pump is a
generic expression to get the student to provide more information, such as
“What else?” or “Tell me more.” Hints and prompts are selected by the tutor to
get the student to articulate missing content words, phrases, and propositions.
A hint tries to get the student to express a complex idea (e.g., proposition,
clause, sentence) whereas a prompt is a question that tries to get the student to
express a single word or phrase. For example, a hint to get the student to artic-
ulate expectation E1 above might be “What about the forces exerted by the
vehicles on each other?” This hint would ideally elicit the answer “The magni-
tudes of the forces are equal.” A prompt to get the student to say “equal”
187Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
would be “What are the magnitudes of the forces of the two vehicles on each
other?” The tutor generates an assertion if the student fails to express the expec-
tation after multiple hints and prompts. AutoTutor provides cycles of pump →
hint →prompt →assertion for each expectation after the student’s initial
response to the main question; the cycle ends as soon as the student articulates
the expectation or the assertion is expressed. As the student and tutor express
information over many turns, the list of expectations is eventually covered and
the main task is completed. The pump →hint →prompt →assertion cycles
have been validated by correlations between the students’ prior knowledge
about physics and the proportion of AutoTutor dialogue moves that are
pumps, hints, prompts, or assertions (Jackson & Graesser, 2006). Correlations
between pretest scores on the Force Concept Inventory (FCI) and the propor-
tion of AutoTutor dialogue moves in each category show the predicted trend
(pump →hint →prompt →assertion), with correlations varying monotonically
from 0.5 to 0.4.
Discourse Structure of AutoTutor Dialogues
The conversational structure of AutoTutor is based on systematic qualitative
and quantitative analyses of human-to-human expert tutoring sessions
(Graesser & Person, 1994; Graesser et al., 1995). The following frequent con-
versation patterns in human tutoring have been simulated in AutoTutor.
•Tutoring sessions are organized AU:2around problems, challenging questions, and
tasks. The outer loop of tutoring consists of the selection of major tasks for
the tutor and student to work on (VanLehn, 2006) whereas the inner loop
consists of the steps and dialogue interactions to manage the interaction
within these major tasks. After the tutor and student settle on a major task
(outer loop), the tutor guides the specific agenda within the task (inner loop).
•A 5-step tutoring frame guides the major task. Once a problem is selected to
work on, a 5-step tutoring frame is launched (Graesser & Person, 1994).
(1) TUTOR asks a difficult question or presents a problem.
(2) STUDENT gives an initial answer.
(3) TUTOR gives short feedback on the quality of the answer.
(4) TUTOR and STUDENT have a multi-turn collaborative dialogue to
improve the answer.
(5) TUTOR assesses whether the student understands the correct answer.
Step 4 in this 5-step tutoring frame involves collaborative discussion and
joint action. Step 4 is the heart of the collaborative interaction.
•Expectation and misconception tailored (EMT) dialogue guides micro-
adaptation (the inner loop). This structure has already been described with
the pump →hint →prompt →assertion cycles to draw out what the student
knows as a solution to the problem is collaboratively constructed. When
188 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
students express misconceptions (incorrect beliefs, errors, bugs) they are
immediately corrected in AutoTutor.
•Tutor turns are well structured. Most turns of the tutor have three informa-
tional components during the inner loop of step 4 of the 5-step frame:
Tutor Turn →Short Feedback þDialogue Advancer þFloor Shift
The first component is short feedback (positive, neutral, negative) on the
quality of the student’s last turn. The second component is a dialogue
advancer that moves the tutoring agenda forward with either pumps, hints,
prompts, assertions with correct information, corrections of misconceptions,
or answers to student questions. The third component shifts the conversa-
tional floor with cues from the tutor to the student. For example, the human
ends each turn with a question or a gesture to cue the student to do the
talking.
•Tutors are sensitive to the speech act categories of the student’s last turn.
AutoTutor segments the content of each student turn into speech act units
and assigns each unit to a category. The natural language of the students is
often fragmentary, ungrammatical, and not semantically well formed so there
are limits to the accuracy of segmentation and classification. However, the
vast majority of student turns are short, typically one or two speech acts,
allowing reasonable performance on speech act classification (Olney et al.,
2003; Samei, Li, Keshtkar, Rus, & Graesser, 2014). The primary speech act
categories of students that AutoTutor accommodates are: short responses
(e.g., yes, no, okay), statement contributions, questions, and metacognitive
expressions (e.g., “I don’t know,” “I’m lost”). Analyses of multi-party chat
conversations among humans have included these four categories plus three
others: expressive evaluations (e.g., “this is ridiculous”), greetings, and
requests (Samei et al., 2014).
AutoTutor has a dialogue advancer network with production rules that
respond appropriately to the four main speech act categories of students that
frequently occur in tutorial dialogue (Cai, Feng, Baer, & Graesser, 2014;
Graesser, Person, Harter, & Tutoring Research Group, 2001). The generation
of the content in a tutor turn is sensitive to both the speech act categories of
the student’s previous turn and the semantic match scores for the expectations
and misconceptions at that point in the conversation. The expectation and
misconception tailored dialogue attempts to get the student to articulate the
expectations through the pump →hint →prompt →assertion cycles until the
semantic matches reach some threshold for the expectations; when a match
score is sub-threshold, an expectation is not covered, so the tutor needs to pres-
ent scaffolding dialogue moves (pumps, hints, prompts) in an attempt to
achieve successful pattern completion. That is, the hints and prompts are strate-
gically selected to achieve the pattern completion for an expectation. Suppose
189Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
an expectation has constituents A, B, C, and D and that the student has
expressed A and B, but not C and D. The tutor would generate hints and
prompts to attempt to get the student to cover C and D so that the threshold
for the expectation would be reached. The tutor gives up covering the expecta-
tion after a few hints and prompts, ultimately generating an assertion to cover
the expectation in the conversation.
It is important to emphasize that the collaborative dialogue manifests much
more of what a student knows than what is revealed after the main question is
asked, that is, step 2 in the 5-step tutoring frame. The semantic match scores
are small or modest for the set of expectations when step 2 is completed
because the responses are short, typically only one or two sentences. The match
scores for the expectations are much higher after the step 4 collaborative inter-
action is completed. This underscores the importance of the AutoTutor agent
in providing a more detailed assessment of the student’s knowledge. Without
the AutoTutor interaction, an assessment would underestimate what the
student knows.
Tutoring versus Collaborative Problem Solving among Peers
Our coverage of AutoTutor illustrates how CPS is automated in a tutoring
environment and how the system assesses the student’s knowledge and mastery
of the material. These same approaches can be applied to automated assess-
ment of CPS. Chat conversations among humans on a team can be assessed on
coherence, newness, givenness, and relevance using precisely the same algo-
rithms as are articulated for LSA-based measures. There can be semantic
matches to the expectations and misconceptions associated with the episodic
units associated with the cells in Table 1 for PISA CPS 2015. This can be imple-
mented with LSA-based measures, regular expressions, and more advanced
semantic matching evaluators such as SEMILAR (Rus, Lintean, Banjade,
Niraula, & Stefanescu, 2013; Rus & Stefanescu, 2016). Conversational agents
can generate dialogue moves that elicit or verify the human’s mastery of the
skills in these assessments. The agents can take on different roles, such as peers
on the team or mentors who help the team accomplish their task of solving the
problem.
As discussed earlier, the ground rules of the conversation differ between
tutoring and CPS among peers when there is no tutor or mentor present. The
pedagogical goal of AutoTutor is to have dialogue moves that encourage the
student to supply the answer information to solve the problem. CPS instead
involves team members contributing information and actions that lead to a
team solution. Nevertheless, in the context of CPS assessment, AutoTutor is
quite relevant even though the pragmatic context diverges. In assessment, the
agent can express dialogue moves that give the test taker a second chance with
190 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
a follow-up question or request, just to make sure the test taker has every
opportunity to contribute. For example, one important dimension of CPS is
whether the test tasker takes the initiative in correcting problems or advancing
solutions (row C in Table 1), as opposed to merely responding to questions and
requests. A student with initiative would communicate and act in ways that
solve a problem, without being prompted by a peer. A student who is respon-
sive but has little initiative would only contribute when another entity nudges
the student with a question or request. A low ability student would be silent
when nudged or would give random responses. The episodic units and agents in
the PISA CPS 2015 assessment incorporated these different levels of assessment
when many of the cells in Table 1 were assessed, but did not accommodate nat-
ural language input of the test taker. The feasibility of natural language human
input is a central underlying question posed in this chapter.
The ground rules of the conversation are more similar to AutoTutor when a
tutor agent or mentor is added to the team in order to enhance team CPS
(Gilbert et al., 2017). The tutor or mentor agent tracks the conversation among
the team members and generates discourse moves that advance the conversa-
tion to cover expectations, correct misconceptions, encourage silent team mem-
bers to contribute, redirect the conversation when the discussion is off topic,
and so on. More will be said about the discourse moves of tutor/mentor agents
in team conversations later in this chapter.
AUTOMATED ASSESSMENT OF COLLABORATIVE
PROBLEM SOLVING WITHOUT AND WITH AGENTS
Automated assessments of CPS and collaborative learning have been developed
in previous research projects that analyze computer-mediated communication
among team members (Cen et al., 2016; Dowell, Graesser, & Cai, 2016; Foltz &
Martin, 2008; Gress, Fior, Hadwin, & Winne, 2010; Liu et al., 2015; Mu et al.,
2012; Nash & Shaffer, 2013; Rose
´et al., 2008; Shaffer, 2017; Shaffer et al.,
2009; Tausczik & Pennebaker, 2013; Von Davier & Halpin, 2013). These sys-
tems analyze the conversations that transpire for the conversation as a whole as
well as the language of individual team members. These research efforts have
automatically analyzed constructs such as group cohesion, responsivity of
individual team members to a group, emotions of team members, and the
personality of team members.
Our contention is that CPS assessment of a team and team members will be
limited without outside involvement of either a human mentor or an agent in
the role of a mentor, tutor, or fellow student. Students will take the easy road
of being polite and the team will take the easy road of “groupthink” (getting a
quick solution that makes most people happy, Janis, 1982) unless there is a
human, agent, or task that shakes up that sanguine world with some challenges,
191Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
contradictions, skepticism, follow-ups on task completion, and modeling of
good behavior (Dillenbourg, 1999). Moreover, such assessments and interven-
tions need to be accomplished in real time so that the agents or humans can
quickly respond to the team or particular team members.
This section describes how CPS assessment has been applied to a particular
computer-mediated communication environment called the Land Sciences simu-
lation (Bagley & Shaffer, 2015; Shaffer, 2017). We illustrate a number of meth-
ods that automatically analyze the language and discourse of chat interactions
between students and a human mentor. We point out how these approaches
have alignments to the PISA CPS 2015 assessment that is captured by the cells
in the Table 1 matrix. The section ends with the proposal that CPS assessment
would benefit from a team member agent that produces conversation moves
that contribute to CPS skills of team members and the team as a whole.
Land Science: Collaborative Learning and Problem Solving of Small Teams
with a Mentor
In the virtual internship Land Science (Bagley & Shaffer, 2015; Shaffer, 2017),
students play the role of interns at Regional Design Associates, a fictional
urban and regional planning firm. Their problem solving task is to prepare a
rezoning plan for the city of Lowell, Massachusetts, that addresses the requests
of various stakeholder groups (business, environment, industry, or housing)
that have views on socioeconomic and ecological issues, some of which are
incompatible. The students read about the different viewpoints and preferences
of stakeholders and eventually prepare individual reports on how to handle
competing concerns. During the course of making these decisions, students dis-
cuss options with their project teams through online chat. They also use profes-
sional tools, such as a geographic information system model of Lowell and
preference surveys, to model the effects of land-use changes and obtain
stakeholder feedback. At the end of the internship, students write a proposal
in which they present and justify their rezoning plans. During this process,
there is a mentor who keeps the small group of three to four students moving
forward, but does not encourage any particular solution to the problem solving
tasks.
The 10-hour game is divided into 14 rooms with different goals and objec-
tives. These rooms and descriptions are presented in Table 2. There is a
sequence of rooms, each of which involves chat interaction among team mem-
bers (plus the mentor) and a product (e.g., survey, recommendation, justifica-
tion) provided by each student independently. As the mentor watches over the
collaboration among students, the mentor has access to an AutoSuggester tool
that analyses the current conversation and provides pre-scripted suggestions
that the human mentor may choose to execute. The mentor also has the ability
192 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
to type freely to students as a group or individually. The first 10 rooms have
students assigned to a group with the same stakeholder; the final four rooms
have representatives from each stakeholder group in a new group that has to
negotiate solutions that consider multiple perspectives and compromises among
the stakeholders.
Table 2. Description of Rooms in Land Science Simulation.
Room # and Name Description
(1) Entrance interview Players introduce themselves to their designated group and
complete the entrance survey. Groups consist of 34 players and
one mentor.
(2) Request for proposals Players read the request for proposals which gives a broad
overview of the game and then groups hold team meetings to
discuss the request for proposals.
(3) Virtual site visit and site
assessment
Players read and take notes on designated stakeholder
requirements, navigate the necessary note taking tools within the
game environment, and write a personal reflection. Stakeholders
have one key interest (business, environment, industry, or
housing). Groups do not take notes on stakeholders from each
interest.
(4) iPlan practice Players learn to use the iPlan tool through changing land use codes
and discussing how the changes impact the virtual environment.
(5) Target identification
matrix 1
Individual players indicate a value taking into consideration
stakeholder input. Then the team as a whole recommends an
agreed upon value.
(6) Preference survey 1 Players create an iPlan map designed to hit the targets determined
in the target identification matrix 1 and write justifications for the
decisions made.
(7) Stakeholder assessment 1 Players review feedback from stakeholders on the preference
surveys, make additional changes to the iPlan map, and justify the
changes made.
(8) Target identification
matrix 2
Steps from target identification matrix 1 are repeated with more
specific targets.
(9) Preference survey 2 Steps from preference survey 1 are repeated based on the target
identification matrix 2.
(10) Stakeholder assessment 2 Steps from stakeholder assessment 1 are repeated.
(11) Final plan Players are grouped into new teams with players who worked
previously with different stakeholders. They collaborate to create
an iPlan map that meets all the stakeholders’ needs. When players
are re-grouped, they have to deal with and negotiate with
stakeholder groups they had not encountered yet.
(12) Final proposal Players write a formal final proposal and justification.
(13) Reflection Players write personal reflections on the game process.
(14) Exit interview Players complete an exit survey.
193Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
A Qualitative Analysis of Mentor Speech Acts in Land Science
What do mentors do during these Land Science chat interactions? In order to
answer this question, we analyzed 200 mentor turns, randomly sampled from
the mentors’ free-type moves (as opposed to those from the AutoSuggester).
The student participants were 91 students in middle school and high school
in the United States who were assembled in groups of 34 students (noting
that groups shift after room 10). There were 50,100 chat turns of students and
mentors in the corpus, with 4,399 unique mentor turns. The 200 mentor turns
in this analysis were randomly sampled from that set.
We qualitatively annotated the five turns before and the five turns after each
unique mentor turn, based on the timestamp of the conversations. These turns
before and after the mentor turn could be expressed by any team member,
including the mentor. This chat window of five turns has frequently been
adopted to analyze the context of particular turns (Collier, Ruis, & Shaffer,
2016; Samei et al., 2014). The before and after turns were extracted in order to
define the context of the unique mentor turn and to help identify patterns
within these conversations.
All turns were assigned to one of the following speech act categories:
Statement, Question, Request, Reaction, MetaStatement, Expressive Evaluation,
and Greeting. The percentages of observations in these categories were 20%,
20%, 15%, 35%, 4%, 4%, and 1%, respectively. Table 3 contains an example
of each speech act category.
For each of the mentor turns, the speech act content and speech act catego-
ries were examined for the student and mentor turns that occurred five turns
in the past and five turns in the future. For each of these sequences of 11 turns,
the annotators induced what higher level goal the mentor was trying to achieve.
The coding scheme developed for the purposes of this investigation include the
following categories: Seed Planting, Explanation, Elaboration, Game Status,
Table 3. Speech Act Category Examples.
Speech Act Category Example
Expressive evaluation That’s a really great observation
Greeting Goodmorning, Play112
Metastatement ooops overtyped, sorry
Statement There should be a message waiting for you from our supervisor
Question in order to make a plan that everyone can live with, what do
we need to know about what they want?
Reaction ok, thanks Player105
Request Please remember to send Maggie an email letting her know
that you finished it
194 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Spoon Feeding, Technical, and Verification. Seed Planting refers to instances in
which the mentor expresses a hint that prompts players to respond in somewhat
elaborate ways. Explanation refers to instances in which a mentor explains or
talks about something that was previously stated. Often these explanations
come in the form of clarifications of definitions, correcting misconceptions, and
providing verbal feedback to players. Elaborations are instances in which a
mentor’s chat references the behavior of the player or players. Examples of
elaborations include asking players to help other players, giving behavioral
feedback, and giving directions outside of standard game play. Game Status is
a mentor chat that indicates where players are, where they should be, what is
happening, or what will be happening during game play. These utterances
include informing players of emails, deadlines, and instructions. Spoon Feeding
refers to utterances in which a mentor asks players for information that can be
procured with very little effort, or a mentor gives players an answer to a
question without prompting them to answer it themselves. Technical refers to
mentor chats that pertain to some technical aspect of the game, such as game
functionality, instructions, saving work, and connection issues. Verification
includes utterances in which a mentor corrects or acknowledges a minor spelling
error and slip. The following percentages reflect the occurrence of these higher
level goals in this sample: Seed Planting (10%), Explanation (34%), Elaboration
(13%), Game Status (19%), Spoon Feeding (5%), Technical (17%), and
Verification (4%).
These annotations illuminate the nature of the mentors’ moves but are
limited in two fundamental ways. First, the higher level goals were annotated
by human judges (no automation). Machine learning methods might extract
diagnostic features that recognize the higher level goals, just as what has been
accomplished for speech act classification, but that would require a corpus that
is two orders of magnitude beyond what is available in 200 mentor turns.
Second, these higher level goals are not aligned with the cells in the PISA CPS
2015 framework of Table 1. That is, it is not clear how these goal categories
map onto any theoretical framework. The human annotations contribute to
academic science but do not advance automated assessment during CPS unless
there is a large enough corpus.
State Transition Networks of Speech Acts
In the AutoTutor section, we claimed that speech act classifiers offered reason-
able accuracy in assigning speech acts to the categories when compared to
expert classifications (Olney et al., 2003; Rus et al., 2012; Samei et al., 2014).
We conducted automated analyses of the speech act categories in the chat
conversations in Land Sciences (and an earlier similar system called Urban
Science). These analyses adopted the Table 3 speech act categories and the
195Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
speech act classifier of Samei et al. (2014). Some of these speech acts pressure
the audience to respond, such as questions and requests, where it is impolite
not to respond (Sacks, Schegloff, & Jefferson, 1974). That is, questions call for
an answer and requests call for an action to satisfy politeness norms, but the
recipient has the option of whether or not to respond. However, other speech
act categories grant considerable options to the recipients as to whether they
choose to respond.
The PISA CPS 2015 assessment often tracks whether the student (a) initiates
progress in the various Table 1 cells and serves as a leader, as opposed to
(b) responding to questions and requests, or, at the worst, (c) not responding or
randomly responding. The selection of many of the chat options in the assess-
ment reflects these alternatives. A cooperative CPS partner takes charge when
needed or responds to others when that is essential.
State-transition-networks can play a role in automatically assessing the
extent to which a team member takes initiative, responds to questions/requests,
versus responds randomly. The speech act categories of a team member can be
a reasonable assessment of a team member taking initiative. A team member
leader would have a high proportion of questions, requests, and statements
whereas a non-leader who is nevertheless involved in the group would have a
distribution of speech act categories that drift toward reactions. However, the
assessment can go further with state transition networks. We can compute the
probabilities of adjacency pairs in a corpus of chat sequences, which is the tran-
sition probability between two adjacent “participant-speech-act-category”
nodes, [P-SAC
n
→P-SAC
nþ1
]. For example, what is the probability that a
team member responds to a question or request by another team member? If
that probability is low, the team member in question is not responsive to other
team members.
State transition networks have been created in Land Science and the related
Urban Science simulation by calculating the conditional probability of each
adjacency transition between categorized speech acts, as well as the overall rela-
tive frequency of each speech act in the corpus (Morgan et al., 2012). Fig. 1
depicts a network in the Urban Science domain for chat interactions between
34 students and a mentor. The speech act categories are self-explanatory
whereas the symbols consist of M ¼Mentor, S ¼Student versus O ¼other
student responding to a previous student. There are three speaker categories
(M, S, O) and eight speech act categories (including junctures that signify
the beginning or end of an exchange). This results in a 24 ×24 matrix of transi-
tion probabilities. Fig. 1 shows only those links with a transition probability
that is 0.15 or higher. These values average over all of the students and conver-
sations. Analyses of conversations in Land Science and Urban Science support
the claim that the transition probabilities are quantitatively stable across
(a) spoken interactions versus computer-mediated communication with chat
facilities (Morgan, Burkett, Bagley, & Graesser, 2011) and (b) rooms that are
discussion-oriented versus action-oriented (Morgan et al., 2012). However, the
196 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
relative frequency distribution of P-SAC nodes is quite different across rooms
and between spoken and computer-mediated communication.
Some measures of CPS can theoretically be derived from the distribution of
participant-speech-act-category nodes and the transition probabilities between
these node categories. As mentioned already, students who take initiative would
be expected to have a high proportion of questions, requests, and statements,
whereas responsive team members (but not leaders) would have a relatively
high proportion of reactions. A disruptive team member would have a high
proportion of negative expressive evaluations and a social loafer would have a
low number of contributions compared with other team members. Regarding
the state transitions, responsive team members would be expected to have a
relatively high transition probability between questions and requests of others
and their reactions or statements; these transition probabilities would be low
for unresponsive team members. Thus, these probabilistic metrics have rele-
vance to many of the cells in the framework for PISA CPS 2015.
Nevertheless, there are some limitations of these metrics extracted from state
transition networks when assessing the CPS proficiency of team members and
the team as a whole. First, the network analysis specifies the category of the
speech acts but not the content of the speech acts. Therefore, this network
approach does not indicate which of the cells, rows, or columns in Table 1 are
being referenced. Second, the state transition networks only consider adjacent
speech acts, so it misses more global conversation patterns. Adjacencies are of
course important units of conversation (Sacks et al., 1974), as in the case of
questionanswer or greetinggreeting. However, conversation patterns with
Fig. 1. State Transition Network for Discussion-oriented Rooms in Land Sciences.
197Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
three or more speech acts are also important in negotiations and collaborative
planning, but are not captured by mere adjacencies.
Latent Semantic Analyses (LSA)
LSA (Landauer et al., 2007) can be used to analyze the content of the team
member’s contributions, as described in the section on AutoTutor. Foltz and
Martin (2008) have successfully used LSA to analyze the coherence of teams
and characteristics of individual team members. Similarly, unpacking temporal
patterns in group interactions and understanding how these patterns relate to
group and individual student performance is acknowledged as high priority for
research in team science (Kapur, 2011; Reimann, 2009; Sawyer, 2014; Stahl,
2005; Suthers, 2006; Von Davier et al., 2017). However, the use of automated
approaches for identifying the dynamics of interactions between group mem-
bers has rarely been investigated until recently (Reimann, 2009).
LSA-based statistical metrics could potentially provide an assessment of
establishing a shared understanding (Clark, 1996) and building on what each
other knows, which theoretically are important in CPS (see column 1 in
Table 1). The relevance of a contribution to the topic at hand is important for
distinguishing turns that are on-topic versus off-topic. As discussed in the
AutoTutor section, it is possible to statistically measure relevance (R), given-
ness (G), and newness (N) of individual turns that are expressed by individual
team members and the team as a whole. A good collaborative team member
contributes relevant information that is new and also builds on other team
member’s topic-relevant ideas. Scores for R, G, and N can be automatically
computed by LSA, as discussed earlier. A team member who productively leads
the conversation would have a vector of RGN measures such as (0.9, 0.4, 0.6).
Team members who echo ideas of others in a conversation would have a (0.9,
0.5, 0.0) vector if they stay on topic, but a (0.0, 0.5, 0.0) vector on off-topic
talk. A team member with a (0.0, 0.0, 0.9) vector would be in their own irrele-
vant worlds and not helpful to collaboration.
These LSA-based vector predictions were recently tested by Dowell (2017) in
collaborative learning and problem solving environments. One of her corpora
was a Land Science chat corpus that attempted to assess characteristics of team
members. A sample of N¼38 participants interacted in 19 CPS simulations
with Land Sciences. Each game had multiple rooms (Table 2) and each room
had multiple chat sessions. There were a total of 630 distinct chat sessions, with
a reasonable distribution among student players and mentors. As previously
mentioned, players in the simulation game communicated both with other
players and with mentors using a chat feature embedded in the interface.
For the purposes of detecting the social roles of players, we analyzed all of
the players’ chats as well as the mentors who periodically entered into
198 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
the conversations. There were also teachers and non-player characters who did
not participate in the chat interactions but played roles in other dimensions of
the simulation.
A number of measures were collected on each move of a student and mentor
on the Land Sciences corpus. Participation is the relative proportion of a parti-
cipant’s contributions (turns) out of the total number of group contributions.
Responsiveness (analogous to G for givenness) is an LSA-based measure that
assesses how responsive a student’s contributions are to all other group mem-
bers’ previous contributions in the conversation. Conversely, social impact is an
LSA-based measure of how turn contributions of a student have a semantic
similarity to other student contributions in the future conversation. Student
cohesion is an LSA-based measure of how semantically similar a student’s con-
tribution is to that student’s previous conversational turns (i.e., is a student say-
ing the same thing over and over?). Newness is the amount of new information
in a contribution, as defined earlier. Communication density is an LSA-based
measure that assesses how much information in a turn is distinctive to the
topic, compared with everyday topics of conversation.
A communication matrix among these measures had some interesting pat-
terns. Those individuals with high participation also had high newness, commu-
nication density, and internal cohesion (rcorrelations between 0.35 and 0.56),
but not a comparatively high social impact and responsivity (r¼0.03 to 0.06).
So the highest information contributors were not necessarily the most sensitive
and impactful members on the team. Another finding is that those with the
highest social impact were also most responsive (r¼0.40). Finally, those with
new information had the highest communication density (r¼0.79). It appears
that there are differences between the individuals with new information and
those who manage the social interaction. Interestingly, Dowell et al. (2015)
documented that those who learn most are content-centered rather that social-
centered (keeping the group going) when analyzing the discourse of team mem-
bers with Coh-Metrix, an automated tool that analyses natural language at
multiple levels of language and discourse (Graesser et al., 2014; McNamara
et al., 2014). There does seem to be a differentiation between team members
who focus on substantive content of the subject matter and team members who
keep the conversation alive and moving forward. The former may learn more
(i.e., taskwork, the rows in Table 1) but the latter may contribute more to CPS
progress (i.e., teamwork, the columns in Table 1).
The LSA-based assessments have the advantages of assessing content, track-
ing substantive communication between team members, and providing semantic
links to the content of episodic units associated with the Table 1 cells of PISA
CPS 2105. We can assess how well the content of individual team members
semantically resonate with the expectations and misconceptions of Table 1 cells
when aggregated over many turns. We can determine whether team members
respond to other team members, whether the turns spawn conversations among
other team members, and whether the content is relevant to the subject matter.
199Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Nevertheless, these LSA measures also have limitations. First, LSA scores
ignore syntax, semantic precision, and structure-sensitive computation. This
motivates the need for regular expressions in any comparison to a Table 1 con-
tent rubric. Second, these assessments are most reliable when they aggregate
scores over many turns, as opposed to individual turns. Thus, the grain size of
reliable assessment specification is global rather than local, even though LSA-
based scores can be collected on each turn (with limited reliability and validity).
It is an open question what a good window size of chat turns should be in these
analyses, but some of our results suggest that a moving window of five chat
turns is reasonable.
Epistemic Network Analyses
Epistemic network analysis (ENA) attempts to assess the complex thinking,
discourse, reasoning, and topics addressed in professional disciplines and com-
munities (Nash & Shaffer, 2011; Shaffer, 2017; Shaffer, Collier, & Ruis, 2016;
Shaffer et al., 2009). There does not need to be a golden standard on what is
said in ENA, but there does need to be a disciplinary style of thinking and
talking that resonates with the expertise of the community of stakeholders.
Do stakeholders give evidence for claims? Do they track causal chains of
reasoning? Do they talk about important topics of the discipline? Do stake-
holders minimize vernacular small talk? Do team members eventually learn
disciplinary discourse over time as they interact with good professional role
models?
ENA has four features in its assessment of the chat discourse in CPS. First,
ENA models complex thinking by representing it as a network of connections
among critical knowledge, skills, values, and epistemic moves in the profes-
sional domain. ENA measures the strength of association among these cogni-
tive elements that characterize complex thinking and quantifies changes in the
composition and strength of those connections over time. Second, ENA models
collaboration by accounting for the cognitive connections that each individual
student contributes to the group conversation. That is, ENA models connec-
tions among concepts when considering how team members interact rather
than merely quantifying who talks to whom. Third, ENA constructs a metric
space that enables comparison of individual or group networks through (a) dif-
ference graphs, which visualize the differences in weighted connections between
two networks, and (b) summary statistics that reflect the weighted structure
of connections in the networks, allowing comparison of large numbers of
networks. Fourth, model parameters can initially be prepared theoretically and
then modified through statistical machine learning algorithms as more data are
collected. ENA attempts to preserve correspondences between qualitative data
and quantitative models.
200 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
It is beyond the scope of this chapter to precisely specify the algorithms that
underlie ENA and the process of applying ENA to CPS data (see Shaffer et al.,
2016, for the ENA Toolkit). The initial steps consist of (a) annotating chat turn
sequences (i.e., stanzas, sliding turn windows of about length five) on important
cognitive categories (i.e., expressions of skills, knowledge, identity, values, and
epistemic content), based on the words expressed in those turns, (b) computing
a matrix with the co-occurrence of these cognitive categories within these turn
sequences, and (c) reducing the large set of co-occurrence matrices to a small
number of dimensions through singular value decomposition. When there are
only two or three dimensions, it is possible to plot each cognitive category in a
two- or three-dimensional metric space. The size of the cognitive category in
the space reflects its relative frequency, and thickness of the links between the
concept categories reflects the co-occurrence frequency. The node and link
patterns in these networks can be compared for different team members, the
team as a whole, and different chat contexts associated with the profession.
Comparison of different networks is accomplished through summary statistics
and standard inferential statistics after the ENA spaces have been suitably nor-
malized in a way that allows direct comparisons.
ENA has been applied to the Land Science chat corpora (Collier et al.,
2016). The logfile contained team chat conversations (41,332 lines of chat in
total) from 265 students who used Land Science. There were novices on the
team that included high school students (N¼110) and relative experts that
included college students enrolled in an introductory urban science course
(N¼155). The chat utterances were coded for 17 cognitive categories associ-
ated with the epistemic frame of urban planning, including:
Knowledge of stakeholder representation. Knowledge of stakeholders, whose
requests pertain to social, economic, and environmental issues.
Skills and practices urban planning using tools of the domain. Discussion or
actions involving the tools of the urban planning domain, such as a virtual
site visit to key regions in the city, a stakeholder preference survey, and
iPlan, a geographic information system-enabled zoning model.
Data-based justifications. Justifications using data such as graphs, results
tables, numerical values, or research papers.
Fig. 2 depicts how frequently the 17 cognitive categories (larger points) are
represented in the ENA network as well as the connections between the cogni-
tive categories (thicker lines). This graph is associated with the relatively expert
students, as opposed to the novice students. The strongest nodes referred to
knowledge of social issues, environmental issues, data, stakeholders, and urban
planning tools in addition to the skill of using urban planning tools. Regarding
connections between nodes, the relative experts were found to connect “data-
based justifications” with knowledge elements as well as with skills and other
201Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
justifications. This distribution of connections was not prevalent among the
novice students when we inspected their graphs.
The similarity of these graphic diagrams have been analyzed for different
students, teams, contexts of conversation, and time phases of a learning envi-
ronment. Large samples of networks can be examined in order to discover
trends in discourse thinking in addition to CPS mechanisms. Team members
can be compared on the similarity of their ENA profiles. All of this can be
accomplished without a golden standard and the methodology can be applied
to both ill-structured and well-structured domains of knowledge. These are all
encouraging characteristics of this approach.
This approach does have its liabilities from the standpoint of automation
and assessment of CPS. First, the researchers need to declare a priori or dis-
cover through data mining the set of cognitive categories and the words associ-
ated with each category before the system can be automated by computer.
Fortunately, the process of completing this development is being reduced
by authoring tools available to professional experts and designers of virtual
Fig. 2. Mean ENA Network Diagrams that Show the Connections Made by the
Relatively Expert Students.
202 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
internships (Shaffer, Ruis, & Graesser, 2015). Second, it is unclear how the
ENA diagrams and cognitive categories are aligned with the 12 cells in the
Table 1 matrix for PISA CPS 2015. Perhaps the discourse thinking parameters
are particularly relevant to row D (monitoring and reflecting) and the identity
concept categories are relevant to column 3 (establishing and maintaining team
organization). These are all open questions for further research.
Matches to Expectations and Misconceptions in Episodic Units
The automated measures of open-ended conversations and the discourse of
professional talk are important accomplishments, but they do not go the
distance in revealing how well the talk matches good solutions and important
collaboration milestones in CPS. We argue that an adequate CPS assessment
with natural language requires mechanisms that include semantic matching
components to ideal expectations and likely misconceptions in the context of
particular problems to be solved by the team. If our argument is correct, then
the computational architecture of AutoTutor is directly applicable to the use of
agents in collaborative learning and problem solving.
One important first step in open-ended natural language assessments of CPS
would be to piggyback on the episodic units in the existing PISA CPS 2015
materials, as discussed in the first section. For each episodic unit there is an
ideal answer in the set of multiple choice options in a chat move. The ideal
answer (the correct response in the MC test) would be the expectation to assess
semantic matches whereas the incorrect options would constitute a few of the
misconceptions and team members may express. Moreover, each episodic unit
for evaluation could have a set of expectations that are deemed as good
answers instead of only one golden answer. As the conversation for an episodic
unit is discussed in chat, the test taker’s chat content would be compared to the
set of good answers and the set of bad answers through the semantic similarity
models discussed in the AutoTutor section of this chapter. This approach
would require a set of expectations and a set of misconceptions for each
episodic unit. It would also require that the episodic units cover the 12 cells in
Table 1, with content in each cell that is readily distinguishable from other cells.
Otherwise, it would be impossible to know which of the verbal content in the
chat is associated with each of the 12 cells.
PISA CPS 2015 currently has each episodic unit assigned to one and only
one cell in the Table 1 matrix. Thus, at each multiple choice point, there can be
a set of expectations associated with the correct answer and a set of misconcep-
tions associated with each distractor item. The chat contributions of the student
(in the relevant speech act categories) would be semantically matched to each
option (a match score varying from 0 to 1) and that would be the score for
that cell in the Table 1 matrix. Suppose that the match score was 0.6 with the
203Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
expectation and 0.1, 0.1 and 0.0 for three wrong answers in the multiple choice;
the score for that cell could be computed as [0.6/(0.6 þ0.1 þ0.1 þ0.0)] ¼0.75.
In contrast, a 0.2 match to an expectation and a distractor vector of (0.6, 0.1,
0.1) would have a score of 0.2 and a 0.75 misconception to follow up and
verify. Consequently, the existing PISA CPS 2015 items could be evaluated
with open-ended conversation through this approach and compared with the
existing multiple choice interface.
It should be acknowledged that it takes substantial effort to design the epi-
sodic units, family of expectations and misconceptions, and set of discourse
moves in this proposed approach to automated assessment and facilitation of
CPS. The time and expense of this effort can to some extent be mitigated with
authoring tools for developing the scripted content (Sottilare, Graesser, Hu, &
Brawner, 2015) and tool kits for evaluating semantic similarity (Rus et al.,
2013). The feasibility of this approach is one direction for future research.
DESIGNING AGENTS TO FACILITATE CPS
The above analysis of CPS is incomplete because it does not specify the
discourse moves generated by a fellow student agent, tutor, or mentor agent.
The system’s “comprehension” of the students’ verbal contributions is only half
of the mechanism to worry about, the other half being the “production” of the
agent’s conversation moves.
In CPS assessment, a computer agent expresses conversation moves that
give the test taker a second chance with a more direct question or request. For
example, an important construct in CPS is whether the team member takes the
initiative during difficult challenges, merely responds to questions/requests, or
is entirely unresponsive and unhelpful. Many of the cells in Table 1 have chat
options in the episodic units and follow-up conversation paths that attempt to
try to get at this construct. Without intelligent agents, these follow-ups would
not systematically occur and an adequate assessment of CPS proficiency would
be compromised. Our contention is that computer agents will provide a more
accurate assessment of CPS proficiency because they can address many subtle-
ties of collaboration.
CPS assessment is important, but the goal of the tutor or mentor agent is to
facilitate CPS by well-constructed, well-timed, and adaptively intelligent dis-
course moves (Gilbert et al., 2017). There are a number of reasons that few
intelligent tutoring systems have been developed to train teams. One important
reason is that it is difficult to create systems that are able to assess complex
interaction patterns among team members and the quality of the team as a
unit. Adequate assessment of team performance is necessary for effective train-
ing of team learning and problem solving, as elaborated in a recent meta-
analysis of the impact of teams on learning and the resulting implications for
204 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
team training (Sottilare et al., 2017). As discussed in the previous section, it is
important to design CPS environments so that informative assessments can be
collected. That requires the capacity to design tasks that have the affordances
for informative CPS assessments. Gilbert and colleagues (2017) propose a
framework to guide the authoring process for team tutors, and demonstrate the
framework using a case study about a team tutor that was developed using a
military surveillance scenario for teams of two. Their work offers conceptual
scaffolding for authors of intelligent tutoring systems that are designed for
CPS. Consequently, another direction for future research is to design and test
intelligent tutoring systems that facilitate CPS.
Besides helping overcome difficulty in assessing complex team interactions,
computer agents can yield assessments in comparatively short amounts of time.
In PISA, there is only about an hour of assessment per student. The likelihood
is near zero of encountering many subtle but important situations in which
particular components of CPS can be assessed. For example, it may take a 100
hours of chat before the test taker encounters a situation in which (a) the entire
group wants to make a particular decision or take an action and (b) the test
taker has the option to disagree and propose a different solution that is more
on target. Some test takers would never encounter this option because of the
particular ensemble of team members who are resistant to group think; if so,
that aspect of CPS would be indeterminate (missing data). However, all of these
situations can be manufactured with a smartly designed construction of epi-
sodic units and thereby have a complete and theoretically mature assessment of
CPS. This approach to intelligent conversation-based assessments with agents
is currently being pursued with open-ended responses at Educational Testing
Service (Liu et al., 2015; Zapata-Rivera et al., 2015). Without the agents push-
ing the envelope of assessment, the experience with meaningful episodic units
would be extremely rare.
There are more generic ways of having agents drum up content from team
members that may or may not be constrained by the expectations and miscon-
ceptions of episodic units. Some of these generic production rules have been
articulated in Graesser, Cai, et al. (2017), as summarized below.
•If the team is stuck and not producing contributions on the relevant topic,
then the agent says “What’s the goal here?” or “Let’s get back on track.”
•If the team meanders from topic to topic without much coherence, then the
agent says “I’m lost!” or “What are we doing now?”
•If the team is saying pretty much the same thing over and over, then the
agent says “So what’s new?” or “Can we move on?”
•If a particular team member (Harry) is loafing, the agent says “What do you
think, Harry?”
•If a particular team member (Sally) is dominating the conversation exces-
sively, the agent says “I wonder what other people think about this?”
205Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
•If one or more team members express unprofessional language, the agent
says “Let’s get serious now. I don’t have all day.”
•If someone asks the agent a question or makes a request, the agent says
“Sorry, but I’m busy now.”
An important unanswered question from the standpoint of assessment is
whether a system with agents that express generic dialogue moves like those
above would result in more reliable and valid assessments than a system with-
out agents. Rules 13 might unveil the potential of a team that would not be
manifested without the agent. Rules 46 might expose the potential of individ-
ual team members that otherwise would not be exhibited. At this point in the
science, we lack systematic empirical research that addresses these possibilities.
Meanwhile, we believe that CPS assessment from open-ended chat with no
agents and no expectation and misconception tailored conversation is substan-
tially limited. We also suspect that intelligent tutoring systems will be needed
for more nuanced and impactful improvements on CPS performance of teams.
Again, these views require empirical testing in future research.
ACKNOWLEDGMENTS
The research was supported by the National Science Foundation (DRK-12-
0918409, DRK-12 1418288), the Institute of Education Sciences (R305C
120001), Army Research Lab (W911INF-12-2-0030), and the Office of Naval
Research (N00014-12-C-0643; N00014-16-C-3027). Any opinions, findings,
and conclusions or recommendations expressed in this material are those of the
authors and do not necessarily reflect the views of NSF, IES, or DoD. The
Tutoring Research Group (TRG) is an interdisciplinary research team com-
prised of researchers from psychology, computer science, and other depart-
ments at University of Memphis (visit http://www.autotutor.org).
REFERENCES
Bagley, E., & Shaffer, D. W. (2015). Learning in an urban and regional planning practicum:
The view from educational ethnography. Journal of Interactive Learning Research,26(4),
369393.
Cade, W., Copeland, J. Person, N., & D’Mello, S. K. (2008). Dialogue modes in expert tutoring. In
B. Woolf, E. Aimeur, R. Nkambou, & S. Lajoie (Eds.), Proceedings of the ninth international
conference on intelligent tutoring systems (pp. 470479). Berlin: Springer-Verlag.
Cai, Z., Feng, S., Baer, W., & Graesser, A. C. (2014). Instructional strategies in trialog-based intelli-
gent tutoring systems. In R. Sottilare, A. C. Graesser, X. Hu, & B. Goldberg (Eds.), Design
recommendations for intelligent tutoring systems: Adaptive instructional strategies (Vol. 2,
pp. 225235). Orlando, FL: Army Research Laboratory.
206 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Cai, Z., Graesser, A. C., Forsyth, C., Burkett, C., Millis, K., Wallace, P., Halpern, D., & Butler, H.
(2011, November). Trialog in ARIES: User input assessment in an intelligent tutoring
system. In W. Chen & S. Li (Eds.), Proceedings of the 3rd IEEE international conference on
intelligent computing and intelligent systems (pp. 429433). Guangzhou: IEEE Press.
Cai, Z., Graesser, A. C., & Hu, X. (2015). ASAT: AutoTutor script authoring tool. In R. Sottilare,
A. C. Graesser, X. Hu, & K. Brawner (Eds.), Design recommendations for intelligent tutoring
systems: Authoring tools (Vol. 3, pp. 199210). Orlando, FL: Army Research Laboratory.
Care, E., Scoular, C., & Griffin, P. (2016). Assessment of collaborative problem solving in education
environments. Applied Measurement in Education,29(4), 250264. doi:10.1080/08957347.
2016.1209204
Cen, L., Ruta, D., Powell, L., Hirsch, B., & Ng, J. (2016). Quantitative approach to collaborative
learning: Performance prediction, individual assessment, and group composition. Interna-
tional Journal of Computer-Supported Collaborative Learning,11(2), 187225. doi:10.1007/
s11412-016-9234-6
Cesareni, D., Cacciamani, S., & Fujita, N. (2016). Role taking and knowledge building in a blended
university course. International Journal of Computer-Supported Collaborative Learning,131.
doi:10.1007/s11412-015-9224-0 AU:3
Chi, M. T. H., Siler, S., Jeong, H., Yamauchi, T., & Hausmann, R. G. (2001). Learning from human
tutoring. Cognitive Science,25, 471533.
Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.
Collier, W., Ruis, A., & Shaffer, D. W. (2016). Local versus global connection making in discourse.
In C. K. Looi, J. L. Polman, U. Cress, & P. Reimann (Eds.), International conference of the
learning sciences (pp. 426433). Singapore: International Society of the Learning Sciences.
Dascalu, M., Trausan-Matu, S., McNamara, D. S., & Dessus, P. (2015). ReaderBench: Automated
evaluation of collaboration based on cohesion and dialogism. International Journal of
Computer-Supported Collaborative Learning,10(4), 395423. doi:10.1007/s11412-015-9226-y
Dillenbourg, P. (1999). Collaborative learning: Cognitive and computational approaches. New York,
NY: Elsevier Science INC.
Dowell, N. M. (2017). A computational linguistic analysis of learners’ discourse in computer-mediated
group learning environments. Dissertation, University of Memphis, Memphis, TN.
Dowell, N. M., Graesser, A. C., & Cai, Z. (2016). Language and discourse analysis with
Coh-Metrix: Applications from educational material to learning environments at scale.
Journal of Learning Analytics,3,7295.
Dowell, N. M., Oleksandra, S., Joksimovic
´, S., Graesser, A. C., Dawson, S., Ga ˇ
sevic
´, S., …
Kovanovic
´, V. (2015). Modeling learners’ social centrality and performance through lan-
guage and discourse. In O. Santos, J. Boticario, C. Romero, M. Pechenizkiy, A. Merceron,
P. Mitros, J. Luna, C. Mihaescu, P. Moreno, A. Hershkovitz, S. Ventura, and M. Desmarais
(Eds.), Proceedings of the 8th international conference on educational data mining
(pp. 250257). International Educational Data Mining Society.
Dzikovska, M., Steinhauser, N., Farrow, E., Moore, J., & Campbell, G. (2014). BEETLE II: Deep
natural language understanding and automatic feedback generation for intelligent tutoring in
basic electricity and electronics. International Journal of Artificial Intelligence in Education,
24, 284332.
El Masri, Y. H., Baird, J., & Graesser, A. C. (2016). Language effects in international testing: The
case of PISA 2006 science items. Assessment in Education: Principles, Policy, & Practice,23,
427455.
Erkens, M., Bodemer, D., & Hoppe, H. U. (2016). Improving collaborative learning in the class-
room: Text mining based grouping and representing. International Journal of Computer-
Supported Collaborative Learning,11(4), 387415. doi:10.1007/s11412-016-9243-5
Fiore, S. M., Rosen, M. A., Smith-Jentsch, K. A., Salas, E., Letsky, M., & Warner, N. (2010).
Toward an understanding of macrocognition in teams: Predicting processes in complex
collaborative contexts. Human Factors,52, 203224.
207Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Fiore, S. M., Wiltshire, T. J., Oglesby, J. M., O’Keefe, W. S., & Salas, E. (2014). Complex collabora-
tive problem-solving processes in mission control. Aviation, Space, and Environmental
Medicine,85(4), 456461.
Fischer, F., Kollar, I., Stegmann, K., & Wecker, C. (2013). Toward a script theory of guidance
in computer-supported collaborative learning. Educational Psychologist,48(1), 5666.
doi:10.1080/00461520.2012.748005
Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). The measurement of textual coherence with
Latent Semantic Analysis. Discourse Processes,25, 285307.
Foltz, P. W., & Martin, M. J. (2008). Automated communication analysis of teams. In E. Salas,
G. F. Goodwin, & S. Burke (Eds.), Team effectiveness in complex organisations and systems:
Cross-disciplinary perspectives and approaches (pp. 411431). New York, NY: Routledge.
Funke, J. (2010). Complex problem solving: A case for complex cognition? Cognitive Processes,11,
133142.
Gilbert, S. B., Slavina, A., Dorneich, M. C., Sinatra, A. M. Bonner, D. Johnston, J. …Winer, E.
(2017). Creating a team tutor using GIFT. International Journal of Artificial Intelligence in
Education. doi:10.1007/s40593-017-0151-2
Graesser, A. C. (2016). Conversations with AutoTutor help students learn. International Journal of
Artificial Intelligence in Education,26, 124132.
Graesser, A. C., Cai, Z., Hu, X., Foltz, P. W., Greiff, S., Kuo, B. C. …Shaffer, D. W. (2017).
Assessment of collaborative problem solving. In R. Sottilare, A. C. Graesser, X. Hu, &
G. Goodwin (Eds.), Design recommendations for intelligent tutoring systems: Volume 5—
Assessment. Orlando, FL: U.S. Army Research Laboratory.
Graesser, A. C., D’Mello, S. K., & Person, N. K. (2009). Meta-knowledge in tutoring. In D. J.
Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Handbook of metacognition in education.
Mahwah, NJ: Routledge.
Graesser, A. C., Foltz, P. W., Rosen, Y., Shaffer, D. W., Forsyth, C., & Germany, M. (2017).
Challenges of assessing collaborative problem solving. In E. Care, P. Griffin, & M. Wilson
(Eds.), Assessment and teaching of 21st century skills (pp. 7591). Heidelberg: Springer
Publishers.
Graesser, A. C., Forsyth, C. M., & Foltz, P. (2017). Assessing conversation quality, reasoning, and
problem solving performance with computer agents. In B. Csapo, J. Funke, & A. Schleicher
(Eds.), On the nature of problem solving: A look behind PISA 2012 problem solving assessment
(pp. 245261). Heidelberg: OECD Series.
Graesser, A. C., Jeon, M., Yang, Y., & Cai, Z. (2007). Discourse cohesion in text and tutorial
dialogue. Information Design Journal,15, 199213.
Graesser, A. C., McNamara, D. S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014).
Coh-Metrix measures text characteristics at multiple levels of language and discourse.
Elementary School Journal,115, 210229.
Graesser, A. C., Penumatsa, P., Ventura, M., Cai, Z., & Hu, X. (2007). Using LSA in AutoTutor:
Learning through mixed initiative dialogue in natural language. In T. Landauer,
D. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis
(pp. 243262). Mahwah, NJ: Erlbaum.
Graesser, A. C., & Person, N. K. (1994). Question asking during tutoring. American Educational
Research Journal,31, 104137.
Graesser, A. C., Person, N. K., Harter, D., & Tutoring Research Group. (2001). Teaching tactics
and dialog in AutoTutor. International Journal of Artificial Intelligence in Education,12(3),
257279.
Graesser, A. C., Person, N. K., & Magliano, J. P. (1995). Collaborative dialogue patterns in natural-
istic one-to-one tutoring. Applied Cognitive Psychology,9, 495522.
Greiff, S., Wu
¨stenberg, S., Csapo
´, B., Demetriou, A., Hautama
¨ki, J., Graesser, A. C., & Martin, R.
(2014). Domain-general problem solving skills and education in the 21st century. Educational
Research Review,13,7483.
208 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Gress, C. L. Z., Fior, M., Hadwin, A. F., & Winne, P. H. (2010). Measurement and assessment in
computer-supported collaborative learning. Computers in Human Behavior,26(5), 806814.
doi:10.1016/j.chb.2007.05.012
Griffin, P., & Care, E. (2015). ATC21S method. In P. Griffin & E. Care (Eds.), Assessment and
teaching of 21st century skills: Methods and approach. Dordrecht: Springer.
Haviland, S. E., & Clark, H. H. (1974). What’s new? Acquiring new information as a process in
comprehension. Journal of Verbal Learning and Verbal Behavior,13, 512521.
Hempelmann, C. F., Dufty, D., McCarthy, P., Graesser, A. C., Cai, Z., & McNamara, D. S. (2005).
Using LSA to automatically identify givenness and newness of noun-phrases in written dis-
course. In B. Bara (Ed.), Proceedings of the 27th annual meeting of the cognitive science soci-
ety (pp. 941946). Mahwah, NJ: Erlbaum.
Hesse, F., Care, E., Buder, J., Sassenberg, K., & Griffin, P. (2015). A framework for teachable col-
laborative problem solving skills. In P. Griffin & E. Care (Eds.), Assessment and teaching of
21st century skills (pp. 3755). Heidelberg, Springer.
Hu, X., Cai, Z., Franceschetti, D., Penumatsa, P., Graesser, A. C., Louwerse, M. M. …Tutoring
Research Group. (2003). LSA: The first dimension and dimensional weighting. In
Proceedings of the 25th meeting of the cognitive science society (pp. 16).
Jackson, G. T., & Graesser, A. C. (2006). Applications of human tutorial dialog in AutoTutor: An
intelligent tutoring system. Revista Signos,39(60), 3148.
Janis, I. L. (1982). Groupthink: Psychological studies of policy decisions and fiascoes. Boston, MA:
Cengage Learning.
Jurafsky, D., & Martin, J. (2008). Speech and language processing. Englewood, NJ: Prentice Hall.
Kapur, M. (2011). Temporality matters: Advancing a method for analyzing problem-solving pro-
cesses in a computer-supported collaborative environment. International Journal of
Computer-Supported Collaborative Learning,6(1), 3956. doi:10.1007/s11412-011-9109-9
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis.
Discourse Processes,25, 259284.
Landauer, T. K., McNamara, D. S., Dennis, S., & Kintsch, W. (2007). Handbook of latent semantic
analysis. Mahwah, NJ: Erlbaum.
Liu, L., Von Davier, A., Hao, J., Kyllonen, P., & Zapata-Rivera, D. (2015). A tough nut to crack:
Measuring collaborative problem solving. In R. Yigal, S. Ferrara, & M. Mosharraf (Eds.),
Handbook of research on technology tools for real-world skill development (pp. 344359).
Hershey, PA: IGI Global.
McCarthy, P. M., Dufty, D., Hempelman, C., Cai, Z., Graesser, A. C., & McNamara, D. S. (2012).
Evaluating givenness/newness. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied
natural language processing: Identification, investigation, and resolution (pp. 457478).
Hershey, PA: IGI Global.
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text
and discourse with Coh-Metrix. Cambridge: Cambridge University Press.
Morgan, B., Burkett, C., Bagley, E., & Graesser, A. C. (2011). Typed versus spoken conversations
in a multi-party epistemic game. In G. Biswas, S. Bull, J. Kay, & A. Mitrovic (Eds.),
Proceedings of 15th international conference on artificial intelligence in education
(pp. 513515). Berlin: Springer-Verlag.
Morgan, B., Keshtkar, F., Duan, Y., & Graesser, A. C. (2012). Using state transition networks to
analyze multi-party conversations in a serious game. In. S. A. Cerri & B. Clancey (Eds.),
Proceedings of the 11th international conference on intelligent tutoring systems (ITS 2012)
(pp. 162167). Berlin: Springer-Verlag.
Mu, J., Stegmann, K., Mayfield, E., Rose
´, C., & Fischer, F. (2012). The ACODEA framework:
Developing segmentation and classification schemes for fully automatic analysis of online
discussions. International Journal of Computer-Supported Collaborative Learning,7(2),
285305.
Nash, P., & Shaffer, D. W. (2011). Mentor modeling: The internalization of modeled professional
thinking in an epistemic game. Journal of Computer Assisted Learning,27(2): 173189.
209Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Nash, P., & Shaffer, D. W. (2013). Epistemic trajectories: Mentoring in a game design practicum.
Instructional Science,41(4): 745771.
National Research Council. (2011). Assessing 21st century skills. Washington, DC: National
Academies Press.
Nye, B. D., Graesser, A. C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural
language tutoring. International Journal of Artificial Intelligence in Education,24(4), 427469.
OECD. (2010). PISA 2012 field trial problem solving framework. Paris: OECD. Retrieved from
http://www.oecd.org/dataoecd/8/42/46962005.pdf
OECD. (2013). PISA 2015 collaborative problem solving framework. Paris: OECD. Retrieved
from http://www.oecd.org/pisa/pisaproducts/Draft%20PISA%202015%20Collaborative%20
Problem%20Solving%20Framework%20.pdf
Olney, A., Louwerse, M., Mathews, E., Marineau, J., Hite-Mitchell, H., & Graesser, A. C. (2003).
Utterance classification in AutoTutor. In J. Burstein & C. Leacock (Eds.), Proceedings of the
HLT-NAACL 03 workshop on building educational applications using natural language proces-
sing. Philadelphia, PA: Association for Computational Linguistics.
O’Neil, H. F., Chuang, S. H., & Baker, E. L. (2010). Computer-based feedback for computer-based
collaborative problem-solving. In D. Ifenthaler, P. Pirnay-Dummer, & N. M. Seel (Eds.),
Computer-based diagnostics and systematic analysis of knowledge (pp. 261279). New York,
NY: Springer-Verlag.
Pennebaker, J. W., Booth, R., & Francis, M. (2007). LIWC2007: Linguistic inquiry and word count.
Austin, TX: liwc.net. 2007. Retrieved from liwc.net
Prince, E. (1981). Toward a taxonomy of given-new information. In P. Cole (Ed.), Radical pragmat-
ics (pp. 223256). New York, NY: Academic Press.
Reimann, P. (2009). Time is precious: Variable- and event-centred approaches to process analysis in
CSCL research. International Journal of Computer-Supported Collaborative Learning,4(3),
239257. doi:10.1007/s11412-009-9070-z
Rosen, Y. (2014). Comparability of conflict opportunities in human-to-human and human-to-agent
online collaborative problem solving. Technology, Knowledge and Learning,19, 147174.
Rose
´, C. P., & Ferschke, O. (2016). Technology support for discussion based learning: From com-
puter supported collaborative learning to the future of massive open online courses.
International Journal of Artificial Intelligence in Education,26(2), 660678. doi:10.1007/
s40593-016-0107-y
Rose
´, C., Wang, Y. C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., & Fischer, F. (2008).
Analyzing collaborative learning processes automatically: Exploiting the advances of compu-
tational linguistics in computer-supported collaborative learning. International Journal of
Computer-Supported Collaborative Learning,3, 237271.
Rus, V., Lintean, M. C., Banjade, R., Niraula, N. B., & Stefanescu, D. (2013). SEMILAR: The
Semantic similarity toolkit. In ACL conference system demonstrations (pp. 163168). AU:4
Rus, V., Lintean, M., Graesser, A. C., & McNamara, D. S. (2012). Text-to-text similarity of state-
ments. In P. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing:
Identification, investigation, and resolution (pp. 110121). Hershey, PA: IGI Global.
Rus, V., & Stefanescu, D. (2016). Toward non-intrusive assessment in dialogue-based intelligent
tutoring systems. In State-of-the-art and future directions of smart learning (pp. 231241).
Springer Singapore.
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of
turn-taking for conversation. Language, 696735.
Salas, E., Cooke, N. J., & Rosen, M. A. (2008). On teams, teamwork, and team performance:
Discoveries and developments. Human Factors: The Journal of the Human Factors and
Ergonomics Society,50(3), 540547.
Samei, B., Li, H., Keshtkar, F., Rus, V., & Graesser, A. C. (2014). Context-based speech act classifi-
cation in intelligent tutoring systems. In S. Trausan-Matu, K. Boyer, M. Crosby, &
K. Panou (Eds.), Proceedings of the twelfth international conference on intelligent tutoring sys-
tems (pp. 236241). Berlin: Springer.
210 ARTHUR C. GRAESSER ET AL.
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
Sawyer, R. K. (2014). The new science of learning. In R. K. Sawyer (Ed.), The Cambridge handbook
of the learning sciences (pp. 120). Cambridge: Cambridge University Press. AU:5
Shaffer, D. W. (2017). Quantitative ethnography. Madison, WI: Cathcart Press.
Shaffer, D. W., Collier, W., & Ruis, A. R. (2016). A tutorial on epistemic network analysis:
Analyzing the structure of connections in cognitive, social, and interaction data. Journal of
Learning Analytics,3,945.
Shaffer, D. W., Hatfield, D., Svarovsky, G., Nash, P., Nulty, A., Bagley, E. A., …Mislevy, R. J.
(2009). Epistemic network analysis: A prototype for 21st century assessment of learning. The
International Journal of Learning and Media,1(1), 121.
Shaffer, D. W., Ruis, A. R., & Graesser, A. C. (2015). Authoring networked learner models in com-
plex domains. In. R. Sottilare, A. C. Graesser, X. Hu, & K. Brawner (Eds.), Design recom-
mendations for intelligent tutoring systems: Authoring tools (Vol. 3, pp. 179192). Orlando,
FL: Army Research Laboratory.
Sottilare, R. A., Burke, C. S., Salas, E., Sinatra, A. M., Johnston, J. H., & Gilbert, S. B. (2017).
Designing adaptive instruction for teams: A Meta-analysis. International Journal of Artificial
Intelligence in Education. doi:10.1007/s40593-017-0146-z
Sottilare, R., Graesser, A. C., Hu, X., & Brawner, K. (Eds.) (2015). Design recommendations
for intelligent tutoring systems: Authoring tools (Vol. 3). Orlando, FL: Army Research
Laboratory.
Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition. Oxford: Blackwell.
Stahl, G. (2005). Group cognition in computer-assisted collaborative learning. Journal of Computer
Assisted Learning,21(2), 7990. doi:10.1111/j.1365-2729.2005.00115.x
Suthers, D. D. (2006). Technology affordances for intersubjective meaning making: A research
agenda for CSCL. International Journal of Computer-Supported Collaborative Learning,1(3),
315337. doi:10.1007/s11412-006-9660-y
Tausczik, Y. R., & Pennebaker, J. W. (2013). Improving teamwork using real-time language feed-
back. Proceedings of Human Factors in Computing Systems (CHI), 459468.
Tegos, S., Demetriadis, S., Papadopoulos, P. M., & Weinberger, A. (2016). Conversational agents
for academically productive talk: a comparison of directed and undirected agent interven-
tions. International Journal of Computer-Supported Collaborative Learning,11(4), 417440.
doi:10.1007/s11412-016-9246-2
VanLehn, K. (2006). The behavior of tutoring systems. International Journal of Artificial Intelligence
in Education,16(3), 227265.
Von Davier, A., & Halpin, P. (2013). Collaborative problem solving and the assessment of cognitive
skills: Psychometric considerations. Research Report No. ETS RR-13-41 (pp. 142).
Educational Testing Service. Retrieved from http://www.ets.org/research/contact.html
Von Davier, A., Zhu, M., & Kyllonen, P. C. (Eds.). (2017). Innovative assessment of collaboration.
New York, NY: Springer.
Zapata-Rivera, D., Jackson, G. T., & Katz, I. (2015). Authoring conversation-based assessment sce-
narios. In R. Sottilare, X. Hu, A. Graesser, & K. Brawner (Eds.), Design recommendations
for adaptive intelligent tutoring systems (Vol. 3, pp. 169178). Orlando, FL: Army Research
Laboratory.
211Conversational Tutors for Team Collaborative Problem Solving
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
AUTHOR QUERY FORM
Book: RMGT-V019-3611643
Chapter: CH008
Please e-mail or fax your responses
and any corrections to:
E-mail:
Fax:
Dear Author,
During the preparation of your manuscript for typesetting, some questions may have arisen. These
are listed below. Please check your typeset proof carefully and mark any corrections in the margin of
the proof or compile them as a separate list.
Disk use
Sometimes we are unable to process the electronic file of your article and/or artwork. If this is the
case, we have proceeded by:
□Scanning (parts of) your article □Rekeying (parts of) your article
□Scanning the artwork
Bibliography
If discrepancies were noted between the literature list and the text references, the following may
apply:
□The references listed below were noted in the text but appear to be missing from your lit-
erature list. Please complete the list or remove the references from the text.
□UNCITED REFERENCES: This section comprises references that occur in the refer-
ence list but not in the body of the text. Please position each reference in the text or delete
it. Any reference not dealt with will be retained in this section.
Queries and/or remarks
Location in Article Query / remark Response
AU:1 Reference citations “Dascalu, 2015; Von
Davier & Halpin, 2015; Von Davier &
Halpern, 2013; Dowell, Graesser, & Cai, in
press” have been changed to “Dascalu et al.,
2015; Von Davier & Halpin, 2013; Von
Davier & Halpin, 2013; Dowell, Graesser, &
Cai, 2016” as per the reference list. Please
check.
AU:2 As per style bold and italic text is not
preferred. Therefore italic formatting alone
has been retained wherever bold and italics
formatting are used together. Please check.
AU:3 Please provide volume number for reference
“Cesareni et al., 2016”.
AU:4 Please provide publisher name and location
for reference “Rus et al., 2013”.
AU:5 Please provide publisher location for
reference “Sawyer, 2014”.