Content uploaded by Rosemary Luckin
Author content
All content in this area was uploaded by Rosemary Luckin on Jul 20, 2018
Content may be subject to copyright.
NATURE HUMAN BEHAVIOUR 1, 0028 (2017) | DOI: 10.1038/s41562-016-0028 | www.nature.com/nathumbehav 1
comment
PUBLISHED: 1 MARCH 2017 | VOLUME: 1 | ARTICLE NUMBER: 0028
Towards artificial intelligence-
based assessment systems
Rose Luckin
‘Stop and test’ assessments do not rigorously evaluate a student’s understanding of a topic. Artificial
intelligence-based assessment provides constant feedback to teachers, students and parents about how
the student learns, the support they need and the progress they are making towards their learning goals.
Decades of research have shown that
knowledge and understanding
cannot be rigorously evaluated
through a series of 90-minute exams. e
prevailing exam paradigm is stressful,
unpleasant, can turn students away from
education, and requires that both students
and teachers take time away from learning.
And yet we persist globally to rely on these
blunt instruments, sending students o to
universities and the workplace ill-equipped
for their futures.
Perhaps one reason for the long-lasting
persistence of ‘stop and test’ forms of
assessment is that the alternatives available so
far have been unattractive and equally, or even
more, unreliable than current examination
systems. For example, within the school
education system, marks from work that
students complete as part of their course has
formed part, or all, of their exam result. Fears
about the extent to which such coursework
is truly the sole work of the student has
reduced the attractiveness of this option
and we have moved back towards exams. In
higher education, ‘open book exams’ have
been used to reduce the pressure on students
to remember lots of information. is type
of approach can help, but it tackles only a
small part of the overall problem, in this case,
the pressure on memory. Other stressful and
unreliable features remain, such as the exam
conditions, the very limited range of the
assessment, and the accuracy of marking.
However, the situation is now dierent
and a realistic and economically attractive
alternative lies at our ngertips. We have the
technology to build a superior assessment
system — one based on articial intelligence
(AI) — but we now need to see if we have the
social and moral appetite to disrupt tradition.
AI is everywhere
AI can be dened as the ability of computer
systems to behave in ways that we would
think of as essentially human. AI systems
are designed to interact with the world
through capabilities, such as speech
recognition, and intelligent behaviours,
such as assessing a situation and taking
sensible actions towards a goal1. e use
of AI in our day-to-day life has increased
exponentially: we use the intelligent search
behind Google, the AI voice recognition
and knowledge management in the iPhone’s
personal assistant, Siri, and navigation
tools such as Citymapper to help us travel
eectively in cities. Clever AI has penetrated
general use to become so useful that it is not
labelled as AI anymore2. We trust it with
our personal, medical and nancial data
without a thought, so why not trust it with
the assessment of our children’s knowledge
and understanding?
AI and assessment
e application of AI to education has
been the subject of academic research for
more than 30years, with the aim of making
Figure 1 | A simple Open Learner Model for tracking how a child is using the help facilities of a piece of
science software. The map in the dialogue box entitled ‘Activities’ depicts the area of the curriculum that
the child is studying, with each node representing a curriculum topic. When the user clicks on a node in
this map, the bar chart below and to the left of the mapindicates the level of diculty of the work that
the child has completed whileworking on this topic, and the dots on the‘dice’ below and to the right of
the map indicatehow much help the child has received. Figure courtesy of Ecolab (Luckin, 2016).
2 NATURE HUMAN BEHAVIOUR 1, 0028 (2017) | DOI: 10.1038/s41562-016-0028 | www.nature.com/nathumbehav
comment
“computationally precise and explicit forms
of educational, psychological and social
knowledge which are oen le implicit”3.
e evidence from existing AI systems that
assess learning as well as provide tutoring
is positive with respect to their assessment
accuracy4. AI is a powerful tool to open up
the ‘black box of learning’, by providing a
deep, ne-grained understanding of when
and how learning actually happens.
In order to open this black box of
learning, AI assessment systems need
information about: (1) the curriculum,
subject area and learning activities that each
student is completing; (2) the details of the
steps each student takes as they complete
these activities; and (3) what counts as
success within each of these activities
and within each of the steps towards the
completion of each activity.
AI techniques, such as computer
modelling and machine learning, are
applied to this information and the AI
assessment system forms an evaluation
of the student’s knowledge of the subject
area being studied. AI assessment systems
can also be used to assess students’ skills,
such as collaboration and persistence, as
well as students’ characteristics, such as
condence and motivation. e information
collection and processing carried out by an
AI assessment system to form an evaluation
of each student’s progress takes place over a
period of time. Unlike the 90-minute exam,
this period of time may be a whole school
semester, a year, several years or more.
e output from AI assessment soware
provides the ingredients that can be
synthesized and interpreted to produce
visualizations (Fig.1). ese visualizations,
referred to as Open Learner Models
(OLMs), represent a student’s knowledge,
skills or resource requirements and they
help teachers and students understand
their performance and its assessment5. For
example, an AI assessment system collects
data about student’s achievements, their
emotional state, or motivation. is data
can be analysed and used to create an
OLM to: (1) help teachers understand their
students’ approach to learning to shape their
future teaching appropriately; and (2) help
motivate students by enabling them to track
their own progress and encouraging them to
reect on their learning.
AIAssess (Box1) is a generic AI
assessment system that exemplies just one
approach to assessing how much a student
knows and understands. e system is
suitable for subjects such as mathematics
or science and is based on existing research
tools6,7. However, there are many dierent
AI techniques — such as natural language
processing, speech recognition and semantic
analysis — that can be used to evaluate
student learning, and an appropriate mix of
tools would be required for other subjects,
such as spoken language or history, and
skills such as collaborative problem-solving.
The cost of AI assessment
Building AI systems is not cheap and a
large-scale project would certainly need
extremely careful management. ere is no
reliable estimate of the cost of a scaled-up AI
assessment system that could assess multiple
school subject areas and skills.
One way of getting a glimpse of the scale
of initial investment needed to develop a
national AI assessment system would be to
look at the costs of other large AI projects.
In January 2016, the Obama administration
announced that it planned to invest
US$4billion over a decade (US$400million
per year) to make autonomous vehicles
viable8, and in November 2015, Toyota
committed to an initial investment of
US$1billion over the next ve years
(US$200million per year) to establish and
sta two new AI and robotics research and
development operation centres9. If we add
AIAssess is intelligent assessment soware
designed for students learning science and
mathematics: it assesses as students learn.
AIAssess was developed by researchers
at UCL Knowledge Lab through multiple
evaluated implementations5,6. Specically,
AIAssess provides activities that assess and
develop conceptual knowledge by oering
students dierentiated tasks of increasing
levels of diculty as the student progresses.
In order to ensure that the student
keeps persevering, AIAssess provides
dierent levels of hints and tips to help
the student complete each task. It assesses
each student’s knowledge of the subject
matter, as well as their metacognitive
awareness, knowledge of their own ability
and learning needs, which is a key skill
possessed by eective students and a good
predictor of future performance.
To assess each student’s progress
AIAssess uses: a Knowledge Component
that stores AIAssess’s knowledge about
science and mathematics so that it can
check if each student’s work is correct;
an Analytics Component that collects
and analyses data about each student’s
interactions with the soware; and a
Student Model Component that constantly
calculates and stores what AIAssess judges
to be each student’s subject knowledge and
metacognitive awareness.
e AIAssess Knowledge Component is
ne-grained so that it can generate correct
and incorrect steps toward a solution, not
just correct and incorrect answers. For any
given task that the student is required to
perform, AIAssess can generate all possible
steps that a student might take as they
complete each task.
e AIAssess Analytics component
collects each student’s interactions with
the soware. Specically, it collects
data about each step the student takes
towards a task solution, the amount of
hints or tips that the student requires to
successfully complete each step and each
task, and the diculty level of each task the
student completes.
e AIAssess Student Model
Component uses outputs from the
Analytics Component to strengthen or
weaken its judgement about every student’s:
• Knowledge and understanding of
each concept in a mathematics or
science curriculum, by assessing each
student’s ability to complete a solution
step, or entire task, correctly without
any hints or tips.
• Potential for development in their
knowledge and understanding of each
concept in a mathematics or science
curriculum, by assessing each student’s
ability to complete a solution step, or
entire task, correctly with a particular
level of hints or tips.
• Metacognitive awareness of their
knowledge and understanding, and the
extent to which they need to use hints
and tips to succeed, by assessing each
student’s accuracy in determining the
level of hints or tips they need in order
to complete a solution step correctly,
and in evaluating the level of diculty
at which they can succeed correctly.
At any point in time, AIAssess can
produce a visualization (Fig.1) that
illustrates its judgements about a student’s
performance on a particular task, across a
set of tasks, and across all tasks completed.
is Open Learner Model can be
interrogated so that teachers and learners
can trace the evidence that supports each
judgement the soware makes.
Box 1 | AIAssess.
AI is a powerful tool to open
up the ‘black box’ of learning.
NATURE HUMAN BEHAVIOUR 1, 0028 (2017) | DOI: 10.1038/s41562-016-0028 | www.nature.com/nathumbehav 3
comment
the estimated costs of making autonomous
vehicles viable, this suggests an annual
budget of US$600million per year for a
complex AI project. It therefore seems
reasonable to suggest that a country, such as
England, might need to spend the equivalent
of US$600million (£500million) per year to
make AI assessment a reality for a set of core
subjects and skills, at least to start with until
the upfront system development costs have
been covered and the focus could shi to
maintenance and improvement.
It is also hard to estimate the cost of
the current exam system to make any
comparison. ere are no publicly available
up-to-date data about the costs of the
existing English exam system. e most
recent information is in a 2005 report, which
was prepared by PricewaterhouseCoopers
for the then exam regulator, the
Qualications and Curriculum Authority
(QCA)10. is report estimated the cost
of the English school exam system as
£610million per annum (Table1).
If we use Bank of England historical
ination rate data to convert this to a
gure for 2015, then the gure is about
£845million (US$1.03billion). Although the
English examination system is not the same
in 2016 as it was in 2005, it is not simpler
and is unlikely to be any less expensive, so
a gure of £845million as an estimate of
the cost of the English exam system in 2016
seems conservative. Although designing
a nationwide learning assessment system
may well be more complex than designing
autonomous vehicles, comparing the level
of investment in an existing complex
AI project to the cost of the current
examination system in England puts the
enterprise of building such a system within a
realistic context.
We also need to bear in mind that the
initial outlay for an AI assessment system
would be much greater than the ongoing
development and maintenance costs. is
is in contrast to the human-resource-heavy
exam systems, for which the costs inevitably
rise each year due to the increasing numbers
of students, and therefore the increasing
number of examiners, and the cost
of ination.
Social equality
e benets of developing an AI assessment
approach go beyond economics. Education
is the key to changing people’s lives, and
yet the changes that education makes to
people’s lives are not always for the better.
e less able and poorer students in society
are generally least well served by education
systems. Wealthier families can aord to
pay for the coaching and tutoring that can
help students access the best schools and
pass exams. AI would provide a fairer, richer
assessment system that would evaluate
students across a longer period of time
and from an evidence-based, value-added
perspective. It would not be possible for
students to be coached specically for an AI
assessment, because the assessment would
be happening ‘in the background’ over
time, without necessarily being obvious to
the student. AI assessment systems would
be able to demonstrate how a student deals
with challenging subject matter, how they
persevere and how quickly they learn when
given appropriate support. In addition,
national AI assessment systems would also
oer support and formative feedback to help
students improve.
Ethical concerns
e ethical questions around AI in general
are equally, if not more, acute when it comes
to education. For example, the sharing of
data introduces a host of challenges, from
individual privacy to proprietary intellectual
property concerns. If we are to build scaled
AI assessment systems that will be welcomed
by students, teachers and parents, it will
be essential to work with educators and
system developers to specify data standards
that prioritize both the sharing of data and
the ethics underlying data use. It is also
essential that we use the older AI approaches
that involve modelling as well as the more
modern machine-learning techniques.
e modelling approach to AI can make
transparent the AI system’s reasoning in
a way that machine-learning techniques
cannot, and it will be essential to be able to
explain the assessment decisions made by
any AI assessment system and constantly
provide informative feedback to students,
teachers and parents.
Looking forward
How do we progress from the current system
to achieve a step change in assessment
using AI? We need to advance on three
fronts. Socially, we need to engage teachers,
learners, parents and other education
stakeholders to work with scientists
and policymakers to develop the ethical
framework within which AI assessment can
thrive and bring benet. Technically, we
need to build international collaborations
between academic and commercial
enterprise to develop the scaled-up AI
assessment systems that can deliver a new
generation of exam-free assessment. And
politically, we need leaders to recognize
the possibilities that AI can bring to
drive forward much-needed educational
transformation within tightening budgetary
constraints. Initiatives on these three
fronts will require nancial support from
governments and private enterprise working
together. Initially, it may be more tractable
to focus on a single subject area as a pilot
project. is approach would enable us
to rm up the costs and demonstrate the
benets so that we can free teachers and
students from the burden of examinations. ❐
Rose Luckin is Professor of Learner Centred Design,
UCL Knowledge Lab, Institute of Education,
University College London, 23–29 Emerald Street,
London WC1N 3QS, UK.
e-mail: r.luckin@ucl.ac.uk
References
1. Luckin, R., Holmes, W., Griths, M. & Forcier, L.B. Intelligence
Unleashed: An Argument for AI in Education (Pearson, 2016);
http://go.nature.com/2jwF0zx
2. Bostrom, N. & Yudkowsky, E. in Cambridge Handbook of
Articial Intelligence (eds Frankish, K. & Ransey, W.M.) 316–334
(Cambridge Univ. Press, 2011).
3. Self, J. Int. J.Artif. Intell. Educ. 10, 350–364 (1999).
4. Hill, P. & Barber, M. Preparing for a Renaissance in Assessment
(Pearson, 2014).
5. Mavrikis, M. Int. J.Artif. Intell. Tools 19, 733–753 (2010).
6. Luckin, R. & du Boulay, B. Int. J.Artif. Intell. Educ.
26, 416–430 (2016).
7. Bull, S. & Kay, J. Int. J.Artif. Intell. Educ. 17, 89–120 (2007).
8. Spector, M. & Ramsey, M. U.S. proposes spending $4 billion to
encourage driverless cars. e Wall Street Journal (14 January
2016); http://go.nature.com/2jZePEM
9. Toyota will establish new articial intelligence research and
development company. Toy o ta http://bit.ly/2jRt1gW
(5 November 2015).
10. Memorandum Submitted by Association of School and College
Leaders (ASCL) (UK Parliament, 2007); http://go.nature.
com/2jpIBBN
Competing interests
e author declares no competing interests.
Table 1 | The cost of the English examination system (2005).
Direct costs Time costs Total
QCA core costs £8m £8m
QCA NCT costs £37m £37m
Awarding body costs £264m £264m
Exam centres: invigilation £97m £97m
Exam centres: support and sundries £61m £9m £70m
Exam centres: exams officers £134m £134m
Total costs £370m £240m £610m
Source: a memorandum submitted by the Association of School and College Leaders (ASCL) to the House of Commons Select Committee
on Children, Schools and Families10. NCT, national curriculum tests.