MS of article published in The Curriculum Journal, 19(4): 243-254
Alternative perspectives on learning outcomes:
challenges for assessment
The nature and quality of the outcomes of learning are central to any discussion of
the learner’s experience, from whichever perspective that experience is considered.
For those outcomes to be assessed it is also necessary to articulate in some way the
constructs on which such judgments are based. For the assessments of outcomes to
be valid the inferences drawn from the evidence of learning should be demonstrably
aligned to the learning outcomes.
The project that is reported in this paper, ‘Assessment of Significant Learning
Outcomes’ (ASLO), is a seminar series funded by the Teaching and Learning
Research Programme (TLRP) of the Economic and Social Research Council (ESRC).
It has its origins in two features of debates about the alignment between the outcomes
that are assessed and the programmes of learning to which those assessments purport
to relate. The first of these is the challenge, all too familiar to practitioners and
policy-makers as well as to academics, of maximising the validity of assessments.
The second, less frequently acknowledged, is the way in which the debates about
alignment are conceptualised in different ways across different educational contexts.
Within the UK, discussion of how the procedures for assessing learning outcomes are
aligned takes on a somewhat different character in each of the four constituent
countries and a fundamentally different form when the focus is on the school
curriculum than when, say, workplace learning is the context under consideration.
With the latter consideration in mind five case studies were chosen to illuminate the
differences in the way alignment is conceptualised:
• A school subject: mathematics education in England.
• Learning to learn: an EC project to develop indicators.
• Workplace learning in the UK.
• Higher education in the UK.
• Vocational education in England.
The aim of each of the context-specific seminars1 in the ASLO series was to clarify
the terms in which the alignment of assessment procedures to learning outcomes is
discussed. This necessarily involved exploring how, and by whom, control over
programmes of learning is exercised in each context as well as how those who are
engaged in the discussions perceive and express the issues involved. The main aim
was to identify insights that may have implications beyond the context from which
they emerged rather than to develop an overarching conceptual framework that could
be applicable to any context.
The roots of the ASLO project can be found in the work of the Assessment Reform
Group (ARG) and in TLRP’s Learning Outcomes Thematic Group (LOTG). Since its
inception as a response to the policy changes in curriculum and assessment brought in
by the Education Reform Act 1988, the ARG has reviewed the implications for policy
and practice of research on assessment. It has taken a particular interest in the
relationship between assessment and pedagogy (Gardner, 2006) and between
assessment and curriculum, especially through its work on enhancing quality in
assessment (Harlen, 1994). In recent years the assessment/pedagogy interaction has
been a prominent focus of the Group’s work (for example ARG, 2002).
Validity is the natural starting point for the assessment dimension of the project,
drawing on the work of Crooks et.al. (1996), Stobart (2008) and others. There are
recurring themes concerning the technical aspects of validity that can be traced across
diverse contexts. Some contributors to the validity debate (Lissitz & Samuelson,
2007) have made the case for a focus on the content and process of the assessment
itself. Others (Messick, 1989, 1995) argue that consideration of the consequences of
assessment is also necessary because construct validity is undermined if inappropriate
inferences are drawn from the assessment evidence.
1 The evidence reported here relates to the contexts at the time the seminars were held, between January
and October 2007
Whatever position is taken in that debate any discussion of validity will require an
answer to the question: ‘valid in relation to what?’ A rational consideration of valid
assessment procedures presupposes that the curriculum is expressed clearly enough
for the alignment of the one to the other to be feasible. This in turn assumes that the
constructs of interest are already established, agreed and expressed in unambiguous
terms. In practice, the desired outcomes of learning are often strongly contested and
there is a multiplicity of ways, at every level from programme design through to the
individual student and her/his teacher, of expressing the anticipated outcomes of
The TLRP’s remit has been to sponsor research ‘with the potential to improve
outcomes for learners’. In 2004, a grounded analysis by the Programme’s Learning
Outcomes Thematic Group (LOTG) of the outcomes mentioned in the first thirty
TLRP projects to be funded, led it to propose seven categories of outcome:
• Attainment – often school curriculum based or measures of basic competence
in the workplace.
• Understanding – of ideas, concepts, processes.
• Cognitive and creative – imaginative construction of meaning, arts or
• Using – how to practise, manipulate, behave, engage in processes or systems.
• Higher-order learning – advanced thinking, reasoning, metacognition.
• Dispositions – attitudes, perceptions, motivations.
• Membership, inclusion, self-worth – affinity towards, readiness to contribute
to the group where learning takes place.
(James & Brown, 2005, 10-11)
The linking of these two distinct strands of academic debate began with the setting
out of three research questions the ASLO project intended to address:
• What are the significant learning outcomes that are not being assessed in a
system that relies wholly on test-based assessment procedures?
• What are the indicators of student performance which have been / could be
developed in relation to such learning outcomes?
• What are the assessment procedures that do not rely on testing but do give /
could give dependable measures of student performance in relation to those
Framing the research questions in such terms is indicative of the project team’s
initial concerns about the limited range of outcomes that are seen to be prioritised in
contexts that rely on tests and examinations as the only sources of evidence about
students’ learning. All three questions imply that deficiencies in the alignment of
indicators of students’ learning are attributable to the narrow range of outcomes that it
is feasible to measure if assessment systems make use only of tests or examinations as
sources of evidence of learning.
As the seminar series developed other dimensions of the alignment of assessment to
learning outcomes were identified. The first of these was that a discourse within
which ‘learning outcomes’ are explicit is apparent only in certain of the contexts
under review and in only two of the five contexts is the term itself in widespread use.
In one of these contexts – indicators of ‘learning to learn’ across the countries of the
European Union (EU) - learning outcomes are central to a political debate the terms of
which were established in 2000 by the European Council’s setting of the so-called
‘Lisbon Objectives’ for education and training. In the other – higher education in
the UK – learning outcomes are part of the discourse about learning developed by the
organisation overseeing higher education institutions (HEIs), the Quality Assurance
Agency. In that context, where responsibility for defining outcomes is devolved to
individual institutions, outcomes are being articulated and codified not at a whole
system level but by the teaching staff responsible for the multiplicity of course units
that are brought together in degree programmes.
A second, related dimension that emerged during the course of the seminar series was
the way in which the role that assessment is seen to have in each context colours the
debate in that context about alignment. Not only is the extent to which the term
‘learning outcomes’ has currency variable across the five contexts but it is also clear,
more broadly, that the very nature of ‘curriculum’ and ‘assessment’ is seen in
fundamentally different ways in each context. In the workplace learning context,
‘curriculum’ itself is not a term in common usage; ‘assessment’ relates to becoming
qualified for the workplace. In the National Curriculum in England, ‘curriculum’
has been interpreted by policy-makers as what must be taught in all state-funded
schools; ‘assessment’ is coloured by the extent to which data on student performance
is aggregated and used as an indicator of the quality of schooling. In the vocational
education context, the definition of what is learned in terms of what will be formally
assessed has taken root to such an extent that in many vocational programmes
‘curriculum’ and ‘assessment’ are indistinguishable because all learning activities are
The discussion that follows relates therefore only in part to the project’s initial three
research questions. It reflects also a growing awareness as the seminar series
proceeded that rethinking the alignment of assessment to curriculum calls for a more
fundamental questioning of the terms in which debates about alignment (or, more
commonly, misalignment) are conducted. To echo one of the conclusions of the
TLRP’s Learning Outcomes Thematic Group:
The first challenge would be to convince stakeholders that the existing models no longer serve us
well; the second would be to convince them that alternatives are available or feasible to develop.
Alternatives would also need to be succinct, robust and communicable...’ (James & Brown, 2005,
In the course of the ASLO project seminar series a number of themes emerged as
offering insights into some, if not all, of the contexts under review. What follows
here is a brief discussion of each of those themes, illustrated as appropriate by
reference to the relevant contexts.
The reader who is interested in a specific context can refer to reports of each seminar
on the project’s website at: http://www.tlrp.org/themes/seminar/daugherty/index.html
A conference paper by the project team (Daugherty, et.al., 2007) also includes an
overview of the issues that arose in each of the first four context-specific seminars.
How, and by whom, the constructs involved are defined, interpreted and made real, in
terms of curriculum, pedagogy and assessment practices, has emerged as a major
issue in each of the case study contexts. ‘Construct validity’ has long been a central
concern in the field of assessment without the constructs themselves necessarily being
critically explored or closely defined. Even if the constructs have been considered at
the levels of assessment theory and qualification design, they may not be applied in
the day-to-day practice of assessors. At the other end of the curriculum/assessment
relationship the constructs informing the design of programmes of learning have often
been strongly contested. For example, in school mathematics diverse traditions,
interest groups and constituencies are involved, leading to strongly differentiated
views about desirable learning outcomes.
Only in two of the project’s case studies, school mathematics and learning to learn
indicators, was a particular learning domain in focus but uncertainties about learning
domains and the associated constructs were also apparent in the three sector-focused
studies. For example, none of the participants in the discussion of vocational
education was confident that ‘business studies’ had been adequately defined either by
those designing such programmes as a basis for national qualifications or by teachers
of business studies.
This suggests a need to clarify the constructs within a domain that inform the
development both of the programmes of learning and of the related assessments. That
requires much more than reference to, say, a domain of ‘school mathematics’ as
distinct from, say, ‘academic mathematics’. It calls for answers to the question
‘school mathematics for what?’ and yet, even amongst those who advocate a greater
emphasis on ‘functional mathematics’, there is no consensus on the constructs
underpinning that concept. If learning mathematics in school is to contribute to using
what has been learned to solve everyday ‘real life’ problems, how mathematics is
constructed for that purpose, and the ways in which learning outcomes are assessed,
will differ fundamentally from a mathematics curriculum with different priorities.
In relation to workplace learning, how knowledge is represented, and the ways in
which learners’ capabilities are then assessed, offers a contrast to the construct
dilemmas in school subjects. The types of professional learning that have been
studied by Eraut (2007) all depend to some extent on the novice professional
acquiring a formalised knowledge base. But, crucially, the informal, day-to-day and
tacit nature of necessary knowledge is just as important to effective performance, or
‘capability’. Judgment of those learning outcomes needs to take place in situations
that are as close as possible to the ‘real life’ workplace context. Whereas the hoped-
for subsequent applicability of learning mathematics in school may or may not figure
in ‘school mathematics’, the learner’s ability to deploy his/her knowledge in a real life
situation is central to any conceivable version of, say, ‘nurse education and training’.
In workplace contexts, as Eraut has argued, learning is better understood as ‘complex
performances developed over time’, with formative and summative assessments as
‘windows on the learning trajectories of individuals’.
The language used to define programmes of learning and the way formal
specifications are translated into student experiences differ in each of the five
contexts. The ‘subject benchmarks’ for higher education, developed under the
auspices of the QAA and interpreted at the levels of institutions and course units, are
quite different in form and substance from the statutory regulations for school subjects
in England though the latter are, of course, also mediated and modified in schools and
A central question in all the contexts investigated is who exercises control over the
design of programmes, their implementation and the assessment of outcomes. In two
of the case studies, school mathematics in England and the learning to learn indicators
project, organisations of the state have been prominent. In the other three, workplace
learning, vocational education and higher education, a variable degree of control of
curriculum and assessment is being exerted by such organisations. In all five
contexts, a diverse array of expertise, interest groups and government agencies dabble
in the specification and assessment of learning outcomes, thus contributing to
incoherence and exacerbating the alignment/congruence problem.
Progression is of key concern in the design and implementation of learning
programmes, and in particular for the implementation of assessment for learning.
However, its relevance to summative assessment depends on the structure of the
assessment system. If the only high-stakes summative test is a terminal one, then the
desired final outcomes are laid down, the test constructors have to reflect these in as
valid a way as they can, and the teachers discern, from study of a syllabus and of
examples of the test instruments and procedures, how best to focus their work.
Progression will also be an issue where the focus is on formative assessment. For
example, prior to the introduction of the national curriculum in England, secondary
teachers would apply their own models of progression over the five years of their
subject programmes. To focus on formative assessment, and assuming absence of
high-stakes summative pressures, it can be seen from our case studies that the models
of progression needed for formative purposes are very diverse. For their study of this
issue in school science and mathematics (Wilson & Draney, 2004; Denvir & Brown,
2004) detailed models have been developed, based on research studies in the
conceptual understanding of many science topics, with guidance to teachers on how to
use such models in a formative way.
The study by Anderson and Hounsell (2007) of assessment in higher education
presents a very different picture. For Biology, the emphasis is on progression in
relation to the modes and ground rules for developing and communicating new
knowledge, whilst the contrast is sharper still in History, where the aim is for students
to learn to experience participation in historical ways of thinking and acting. For such
aims, the particular topic is a context for the learning, and whilst assessment is
grounded therein, the context as such is not important. This contrast is reflected across
schemes of progression, with explicit and analytic models appropriate at one pole, and
more holistic models reliant on connoisseurship appropriate at the other. Most
subjects attempt a mixture of both approaches.
The other lesson from higher education has been the low profile of regulation, which,
given the lack of debate on alignment between assessment and the curriculum, has left
teachers with freedom to implement their own models of progression. The same is
true, for different reasons, in the case of vocational education, where the priority of
encouraging fragile learners to achieve test success has led to coaching to the test, i.e.
to a shallow model of progression in learning. Workplace learning is different again;
here progression is expressed in terms of development from novice to expert.
However, there are great variations from one context to another, so that it is hard to
codify progression without over-specifying and so undermining professional learning.
The compromise tends to be for qualifications based on minimum competence, with
progression beyond that level left to individual mentoring and appraisal.
In all of these cases, summative assessment requirements, driven by concerns for
uniformity and accountability, constrain the freedom of teachers and trainers to use
their own judgment in nurturing progression. This constraint can become far tighter if
accountability for progression is required, as in the key stage assessment system in
England. When that system was being planned it would have been possible for the
curriculum to have been set out without reference to an explicit model of progression.
For evidence of attainment at the end of each key stage, the assessment system could
have supplied an aggregate mark as the basis for summative grading. However, for
evidence of progress in learning, the system would have to be criterion referenced to
identify for each student the level of progression for which an acceptable degree of
mastery had been attained.
The system of ‘levels of attainment’ that was eventually put in place attempts to
follow the progression model, but the level descriptions that characterise the
presumed progression have been under-researched, and the National Curriculum tests,
facing the formidable difficulty of criterion referencing, are a vehicle for ranking
attainment rather than a source of evidence of progression in learning. Tests, in
mathematics as in other National Curriculum ‘core’ subjects, have become hurdles
which provide little useful information for formative purposes.
The combination of high-stakes external assessment with a loose specification of the
curriculum elevates the status of the test specification and of the tests based upon it,
for it will be to them that teachers look for translation of the vagueness into explicit
requirements. Given that test development is constrained by the conditions and
resources imposed for high-stakes testing, the result is a summative system which is
in tension with formative practices and very weak in its power to reflect or guide
Wilson and Black (2007) draw attention to the paradox that a more tightly prescribed
curriculum might be more helpful to learners: if the curriculum were very tightly
specified in setting out a detailed sequence of progression, it would follow that the
test constructors would have close guidance and would not be faced with deciding
how to interpret vague statements of aims in formulating specific and concrete
questions. Thus the power of the test constructors, which they may well find an
embarrassment, would be reduced, whilst that of the curriculum writers would be
enhanced. If the sequence of progression were well founded in relation to models of
learning in each subject discipline, then there could be better synergy between
assessment and effective pedagogy. Limitations in the criterion referencing for the
testing, and in constraints on test validity, would still be obstacles to be tackled, but
there might be a clearer basis for tackling them.
Assessment procedures and their impacts
Another major issue to emerge across the case study contexts was the impact of
assessment procedures on the alignment between intended or desirable outcomes from
learning and those outcomes which actually emerge. From a measurement
perspective, alignment is often conceived quite narrowly – in terms of content validity
– where misalignment between an assessment instrument and intended learning
outcomes represents a threat to the integrity of inferences from assessment results.
However, it can be conceived more broadly too, where misalignment represents a
threat to the integrity of learning itself. This resonates with the notion of systemic
validity, as discussed by Frederiksen and Collins (1989):
“A systemically valid test is one that induces in the education system
curricular and instructional changes that foster the development of the
cognitive skills that the test is designed to measure.” (p.27)
The case study contexts highlighted numerous situations in which the nature of an
assessment procedure threatened to disrupt the acquisition of desirable learning
outcomes by students. This disruption occurred when assessment procedures led
either to the failure to acquire desirable outcomes from learning, or to the acquisition
of undesirable outcomes from learning. In both cases, potential impacts were
attributable either to the design of the assessment instrument or to the nature of the
assessment event itself.
The failure to acquire desirable outcomes from learning
Some of the impacts attributable to the design of an assessment instrument occur
when only a sub-set of intended learning outcomes are, or can be, routinely assessed.
In measurement terms, to design an instrument to this specification would involve
intentional construct under-representation. This threat was evident in the school
mathematics context, which was characterised by the requirement for short tests to
cover a very full curriculum. This tends to rule-out ‘real world’ problem solving test
items requiring extended thinking and analysis. It is also evident in the assessment of
English, where speaking and listening are central elements of the National Curriculum
in England but do not feature in national curriculum tests.
Interestingly, this threat was often avoided in workplace contexts. The very idea of a
single curriculum is inappropriate here, given that desirable learning outcomes within
workplace learning contexts where each learner will have specific workplace
experiences (e.g. nurses in different wards acquiring subtly different sets of learning
outcomes). In this situation, a standardised assessment format would seriously risk
channelling learners into common learning trajectories, potentially leaving them unfit
for the specific requirements of their particular roles.
Another set of impacts occurs when an assessment is designed to assess certain
intended learning outcomes, but fails to assess them in practice. In measurement
terms, this would reflect unintentional construct under-representation. This threat was
perhaps most salient in the higher education context, where an increasing codification
of learning outcomes seemed to be associated with a decreasing ability to reward high
quality learning. There was a sense of the sum of the parts (the individual learning
outcomes which were amenable to description) being unavoidably smaller than the
whole (the high quality learning which was far less amenable).
For both of these kinds of impact, the mechanism by which disruption may occur is
as follows: students do not need to acquire the desirable learning outcomes to succeed
on the assessment; either they, or their teachers, or everyone realises this; either their
teachers decide not to teach these outcomes or the students decide not to acquire
Impacts attributable to the nature of the assessment events, rather than to the design
of the assessment instruments, occur when the assessment fails to facilitate the
acquisition of important learning outcomes, as might otherwise be intended. That is,
where the assessment fails as a pedagogical tool in its own right. Again, this was
particularly salient in the higher education context, where the increased codification
of assessment objectives was reported as hampering effective feedback to students.
Subtle feedback relating to higher level outcomes was being replaced by formulaic
feedback relating to lower level ones. A similar threat was observed in the workplace
context where codified assessment arrangements will tend to inhibit the kind of
informal conversations and feedback that are crucial to effective workplace learning.
The acquisition of undesirable outcomes from learning
A different type of impact attributable to the design of an assessment instrument
occurs when success on the assessment can be optimised by the acquisition of
undesirable, construct-irrelevant learning outcomes. The most obvious of undesirable
learning outcomes is cheating behaviour. As above, the mechanism by which
disruption occurs is straightforward: students discover that they can succeed on the
assessment using construct-irrelevant techniques; so they decide to do so; and these
techniques become the primary outcomes from learning. This aspect of the impact of
assessment procedures was not prominent in the evidence reviewed by the ASLO
project. However, recent changes to coursework requirements for the GCSE
examination in England, in maths and in other subjects, have been prompted not only
by doubts about the educational value of stereotyped responses to coursework tasks
but also by concerns about the prevalence of cheating.
Finally, impacts attributable to the nature of the assessment event also occur when
the assessment process actively corrupts the learning process. This threat was
observed in the vocational context, as fragmented assessment procedures combined
with poorly conceived curricula disposed students towards instrumental approaches to
Reassuringly, there seemed to be no necessary or direct causal relationship between
assessment procedures and their impacts. However, we should be acutely aware of
consequences arising from the uses to which assessment results are put. The higher
the stakes associated with those uses the more sensitive a system may be to
System-level accountability as a driver of alignment
Accountability takes very different forms, has different purposes and stakeholders
and has different effects on the interpretation of learning outcomes within each of the
assessment systems we reviewed in the seminar series. Two of the case studies in
particular reveal just how influential the political imperatives for system level
accountability can be in determining the role of assessment in shaping relevant
constructs but, perhaps more crucially, in shaping how teachers and students then
interpret and enact those constructs.
The fierce debates that have surrounded the Mathematics National Curriculum from
its inception to recent calls for it to be reshaped in favour of ‘functional’ mathematics
show both the effects of target-driven measures and the shifting emphasis towards
aggregate data on pupil attainment as an indicator of the performance of teachers,
schools, local authorities and the whole system.
Such problems with alignment in this context become less a matter of how valid a
national test is in mathematics is as a measure of ‘school mathematics’ and more a
question of how valid the test is as a source of data on system performance.
Predictably perhaps, this affects how teachers and students regard both the teaching
that leads to the test, and the test itself, with pedagogy geared increasingly towards
enhancing pupil performance. In this context, educational questions about the nature,
purpose and content of the mathematics curriculum, and how far mathematics should
be functional or more broadly-based, are distorted by different, often implicit
expectations about what counts as valid assessment for different, competing purposes.
Evidence from the EU Learning to Learn Indicators project (Fredriksson & Hoskins,
2007) made clear that this is another case of an assumed learning domain and its
associated constructs being shaped by accountability considerations, albeit with a
view to attaching due importance to a concept that is widely held to be vital for
learning in rapidly changing knowledge economies. As can be seen to occur in other
education systems where the main policy driver is system accountability (Herman &
Haertel, 2005), it is probable that any adoption by the EU of learning to learn as an
indicator of system performance would have substantial ‘washback’ effects. In this
case, the effect could be amplified because the pilot indicator, if adopted, would be
used to draw comparisons across national systems.
The case study on vocational education in England showed system-level
accountability as being far less driven by debates about what counts as valid
vocational education outcomes and measures of those outcomes, within specific
vocational subjects. Indeed, the porous and insecure nature of many vocational
subjects, and the very diverse purposes of vocational education, mean that
accountability is not derived from learning outcomes and measures of those outcomes.
Instead, political targets for retention, student progression and achievement and
teachers’ concerns that young people who are seen to be educationally and socially
disadvantaged should achieve qualifications, pressurise schools and colleges to coach
students through assessment criteria in order to maximise achievement.
Accountability is also affected by the demands of awarding bodies for standardised
achievement rates and grading decisions across subjects and between very diverse
15 Download full-text
providers. This in turn reinforces a compliant, instrumental emphasis on maximising
achievement and showing exactly how grades are arrived at.
It can be argued, then, that system level accountability in vocational education places
much more political emphasis on ‘delivering’ targets for retention, progression,
participation and achievement in general terms, rather than notions of educational or
cognitive progression and achievement within clearly defined and debated subject
domains. In turn, the dominance of teacher assessment in vocational education leads
to strongly regulated moderation and verification procedures to standardise
judgements. The combined effect is highly regimented and instrumental approaches
to both formative and summative assessment.
The ASLO project’s five case studies have mined a rich seam of examples of student
learning outcomes being socially constructed in diverse ways. In the two case study
contexts where desirable learning outcomes are formally specified through state-led
regulatory procedures the constructs involved are strongly and openly contested. In
the other three case studies, focussing on sectors of education within the UK rather
than on particular domains, the contestation of learning outcomes is also evident.
Each of the case studies exhibits the contested nature of the outcomes in a different
way. For school mathematics in England, the state controls the definition of the
subject but it is operationalised through the high stakes tests of students at ages 11 and
14 and then mediated through the actions of teachers and students. The search by
EU-funded groups for a pragmatic definition of ‘learning to learn’ and the devising of
pilot indicators is an equivalent process on a cross-national basis. The political
imperative to identify indicators has brought about a situation that McCormick (2007,
1) has characterised as ‘the proverbial assessment tail wagging the curriculum dog’.
On vocational education programmes in England, tightly-drawn specifications for
qualifications set goals for learners that are then interpreted and mediated in a range