ArticlePDF Available

The predictive validity of A-level grades and teacher-predicted grades in UK medical school applicants: A retrospective analysis of administrative data in a time of COVID-19

Authors:

Abstract

Objectives To compare in UK medical students the predictive validity of attained A-level grades and teacher-predicted A levels for undergraduate and postgraduate outcomes. Teacher-predicted A-level grades are a plausible proxy for the teacher-estimated grades that replaced UK examinations in 2020 as a result of the COVID-19 pandemic. The study also models the likely future consequences for UK medical schools of replacing public A-level examination grades with teacher-predicted grades. Design Longitudinal observational study using UK Medical Education Database data. Setting UK medical education and training. Participants Dataset 1: 81 202 medical school applicants in 2010–2018 with predicted and attained A-level grades. Dataset 2: 22 150 18-year-old medical school applicants in 2010–2014 with predicted and attained A-level grades, of whom 12 600 had medical school assessment outcomes and 1340 had postgraduate outcomes available. Outcome measures Undergraduate and postgraduate medical examination results in relation to attained and teacher-predicted A-level results. Results Dataset 1: teacher-predicted grades were accurate for 48.8% of A levels, overpredicted in 44.7% of cases and underpredicted in 6.5% of cases. Dataset 2: undergraduate and postgraduate outcomes correlated significantly better with attained than with teacher-predicted A-level grades. Modelling suggests that using teacher-estimated grades instead of attained grades will mean that 2020 entrants are more likely to underattain compared with previous years, 13% more gaining the equivalent of the lowest performance decile and 16% fewer reaching the equivalent of the current top decile, with knock-on effects for postgraduate training. Conclusions The replacement of attained A-level examination grades with teacher-estimated grades as a result of the COVID-19 pandemic may result in 2020 medical school entrants having somewhat lower academic performance compared with previous years. Medical schools may need to consider additional teaching for entrants who are struggling or who might need extra support for missed aspects of A-level teaching.
This is a repository copy of The predictive validity of A-level grades and teacher-predicted
grades in UK medical school applicants: A retrospective analysis of administrative data in
a time of COVID-19.
White Rose Research Online URL for this paper:
https://eprints.whiterose.ac.uk/178360/
Version: Accepted Version
Article:
MCMANUS, IC, Woolf, Katherine, Harrison, David et al. (4 more authors) (Accepted: 2021)
The predictive validity of A-level grades and teacher-predicted grades in UK medical
school applicants: A retrospective analysis of administrative data in a time of COVID-19.
BMJ Open. ISSN 2044-6055 (In Press)
eprints@whiterose.ac.uk
https://eprints.whiterose.ac.uk/
Reuse
Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless
indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by
national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of
the full text version. This is indicated by the licence information on the White Rose Research Online record
for the item.
Takedown
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by
emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request.
1
The predictive validity of A-level grades and
teacher-predicted grades
in UK medical school applicants:
A retrospective analysis of
administrative data in a time of COVID-19
I C McManus a,b,e,*
i.mcmanus@ucl.ac.uk ORCID 0000-0003-3510-4814
Katherine Woolf a,c,e
k.woolf@ucl.ac.uk ORCID 0000-0003-4915-0715
David Harrison a,e
david.harrison@ucl.ac.uk ORCID 0000-0002-6639-6752
Paul A Tiffin b,d,f
paul.tiffin@york.ac.uk ORCID 0000-0003-1770-5034
Lewis W Paton b,f
lewis.paton@york.ac.uk ORCID 0000-0002-3328-5634
Kevin Yet Fong Cheung b,g
Cheung.K@cambridgeenglish.org ORCID 0000-0002-9548-2932
Daniel T. Smith a,b,h
daniel.smith@gmc-uk.org ORCID 0000-0003-1215-5811
a UKMEDP089 project
b UKMEDP051 project
c UKMEDP089 Principal Investigator
d UKMEDP051 Principal Investigator
e Research Dept of Medical Education, University College London, London WC1E 6BT, UK
f Department of Health Sciences, University of York, Heslington, York, YO10 5DD, UK
g Cambridge Assessment, Shaftesbury Road, Cambridge CB2 8EA, UK
h General Medical Council, Regent’s Place, 350 Euston Road, London NW1 3JN, UK
* Author for correspondence: i.mcmanus@ucl.ac.uk
2
Abstract
Objectives
To compare in UK medical students the predictive validity of attained A-level grades and
teacher-predicted A-levels for undergraduate and postgraduate outcomes. Teacher-
predicted A-level grades are a plausible proxy for the teacher-estimated grades that
replaced UK examinations in 2020 as a result of the COVID-19 pandemic. The study also
models the likely future consequences for UK medical schools of replacing public A-level
examination grades with teacher-predicted grades.
Design
Longitudinal observational study using UK Medical Education Database (UKMED) data.
Setting
UK medical education and training.
Participants
Dataset 1: 81,202 medical school applicants in 2010 to 2018 with predicted and attained A-
level grades. Dataset 2: 22,150 18-year old medical school applicants in 2010 to 2014 with
predicted and attained A-level grades, of whom 12,600 had medical school assessment
outcomes and 1,340 had postgraduate outcomes available.
Outcome measures
Undergraduate and postgraduate medical examination results in relation to attained and
teacher-predicted A-level results.
Results
Dataset 1: Teacher-predicted grades were accurate for 48.8% of A-levels, over-predicted in
44.7% of cases and under-predicted in 6.5% of cases. Dataset 2: Undergraduate and
postgraduate outcomes correlated significantly better with attained than with teacher-
predicted A-level grades. Modelling suggests that using teacher-estimated grades instead of
attained grades will mean that 2020 entrants are more likely to under-attain compared with
previous years, 13% more gaining the equivalent of the lowest performance decile and
16% fewer reaching the equivalent of the current top decile, with knock-on effects for
postgraduate training.
Conclusions
The replacement of attained A-level examination grades with teacher-estimated grades as a
result of the COVID-19 pandemic may result in 2020 medical school entrants having
somewhat lower academic performance compared to previous years. Medical schools may
need to consider additional teaching for entrants who are struggling, or who might need
extra support for missed aspects of A-level teaching.
3
Strengths and limitations of this study
This is the first comparison of the predictive validity of teacher-predicted and attained A-
level grades for performance in undergraduate and postgraduate assessments five to eight
years later.
The large sample size of all UK medical applicants from 2010 to 2018 provides adequate
statistical power, and the complete population data means the results are unlikely to be
biased.
The teacher-predicted grades are those provided by schools as a part of university
application, and probably form a good proxy for the centre-assessment grades”, introduced
by Ofqual during the Covid crisis of 2020.
This study is with medical school applicants only, so that generalisability to students on
other university courses is uncertain; however the over-prediction of grades we find in
medical school applicants is similar to that found elsewhere for university applicants in
general.
4
Background
“… the … exam hall [is] a level playing field for all abilities, races and genders to get the
grades they truly worked hard for and in true anonymity (as the examiners marking don’t
know you). [… Now we] are being given grades based on mere predictions.” Yasmin Hussein,
letter to The Guardian, March 29th 2020 1.
“[Let’s] be honest, this year group will always be different…” Dave Thomson, blogpost on
FFT Educational Lab 2
“One headmistress commented that ‘entrance to university on teachers’ estimates may be
fraught with unimagined difficulties’. … If there is in the future considerable emphasis on
school assessment, some work of calibration is imperatively called for.” James Petch,
December 19643.
UK schools closed on 20th March 2020 in response to the COVID-19 pandemic, and Key Stage 5 [Level
3] public examinations such as A-levels and SQA assessments were cancelled for summer 2020, and
replaced by a complex system involving teacher assessments of the grades students would have
achieved had they taken the examinations. A-levels and SQA assessments, like other national
examinations in the UK, are normally set and marked anonymously by examination boards which are
entirely separate from schools, and teachers usually play no part in this external assessment process.
A-levels are good predictors of performance at university in general4, and at medical schools
specifically5 6. Within this context the present paper compares achieved A-level grades with teacher-
predicted grades, and in particular considers their relative predictive validities for educational
outcomes at UK medical schools. The analyses were originally described in May 2020 and published
as a preprint 7 while events were still ongoing and outcomes were not known. The present paper
maintains much of that structure, and while mostly looking forward from 2020, also in part looks
back from the perspective of 2021, meaning that past, present and future tenses are intermingled.
On April 3rd 2020 Ofqual (Office of Qualifications and Examinations Regulation) in England
announced that A-level, GCSE and other exams under its purview, would be replaced by Calculated
Grades, at the core of which are teachers’ estimates of the grades that their students would attain
(called Centre Assessment Grades, CAGs), which would then be moderated by Ofqual using a
computer algorithm which included the prior performance of the school attended by candidates
see the Calculated Grades subsection below for details. The Scottish Qualification Authority (SQA)
and other national bodies also announced similar processes for their examinations. Inevitably the
announcement of Calculated Grades resulted in confusion and uncertainty in examination
candidates, particularly those needing A-levels or SQA Advanced Highersa to meet conditional offers
for admission to university in autumn 2020. Universities also faced a major problem for student
selection, having had A-levels taken away, which are, “the single most important bit of information
[used in selection]” 8.
Some of the tensions implicit in Calculated Grades are well seen in the quotation above by Yasmin
Hussein, a GCSE student in Birmingham, with its clear emphasis that a key strength of current
examination systems such as GCSEs, A-levels and similar qualifications, is their anonymity and
externality with assessors who know nothing of the students whose work they are marking. In
a SQA Highers are taken the year before (rather like AS-levels used to be) and therefore they will be available
for 2020 applicants. Advanced Highers will not be available and will be estimated.
5
contrast the replacement of actual grades attained in the exam hall with what Hussein describes as
‘mere predictions’ raises a host of questions, not the least being the possibility of bias when
judgements are made by teachers.
Context of the current paper and the situation at the time of writing
Since the appearance of COVID-19 in Europe in early 2020, the situation has been and still is rapidly
changing. As mentioned earlier, this paper was originally written in May 2020, but was revised and
submitted to the journal, essentially as the preprint but with some additions, in November 2020
when Europe was in the midst of a second wave and England, Wales, Scotland and Northern
Ireland, in a second national lockdown. The paper took almost six months to be reviewed, with
revisions only being requested in May 2021 with the third UK national lockdown still not ended. To
help the reader situate the current paper we explain briefly here what the exams situation was in
the UK from April to August 2020, with more details provided in a postscript in Section 1 of the
Supplementary Information.
University selection in the UK for admission in October 2020 began in the autumn, with medical
school applicants submitting by October 15th to UCAS applications for four medical schools. Selection
which may include interviews and other assessments is usually completed by the end of March with
students being told of offers or rejections. Offers are usually conditional on A-levels and other
qualifications to be taken in May, with results announced in August. In Spring 2020 as UK universities
entered the final phases of the annual academic cycle of student selection, the present paper
considered the potential problems of using teacher-estimated grades such as the Calculated Grades
proposed by Ofqual, rather than attained grades obtained in the usual way via examinations. The
pre-print of May 2020 was circulated primarily for information to medical school admissions tutors.
By August 2020 some immediate effects on selection were shown when the algorithms used by
regulators resulted in many students, particularly those from historically poorly performing schools,
having their expected results adjusted downwards. This forced the Scottish Government, followed
then by the English and Welsh Governments, to accept either teacher-estimated Centre Assessment
Grades (CAGs) without moderation by an algorithm, or the Calculated Grade, whichever was the
higher.
As expected in the pre-print, given that teacher-estimated grades were found to be higher than
attained A-level grades, the scrapping of the algorithm resulted in a significant increase in grades
compared to 2019b, with an immediate impact on the numbers of students meeting university
conditional offers. Longer-term impacts are still to be seen, with some likely to result from the lower
predictive validity of teacher-estimated grades, and a likely increase in under-performing students in
medical schools and postgraduate training.
Medical school admissions
This paper mainly concentrates on medical school applications. UK medical education has a range of
useful educational measures, including admissions tests during selection, and outcomes at the end
of undergraduate training, which are linked together through UKMED (United Kingdom Medical
Education Database; https://www.ukmed.ac.uk/). UKMED provides a sophisticated platform for
assessing predictive validity in multiple entry cohorts in undergraduate and postgraduate training 9.
The current paper should also be read in parallel with a second study from some members of the
present team which assesses attitudes and perceptions to calculated grades and other changes in
selection of current medical school applicants in the UKMACS (UK Medical Applicants Cohort Study)
10 11.
b https://ffteducationdatalab.org.uk/2020/08/gcse-and-a-level-results-2020-how-grades-have-changed-in-
every-subject/
6
Fundamental questions about selection in 2020 concerned the likely nature of Calculated Grades,
and the extent to which they would predict outcomes to the same extent as currently did actual or
attained grades. The discussion will involve actual grades, and then four types of teacher-estimated
grades: predicted grades (sent to UCAS at application to university), centre assessment grades (CAGs
submitted by schools to Ofqual in 2020) calculated grades (CAGs adjusted using an algorithm) and
forecasted A-level grades (submitted by teachers to exam boards pre-2015 as a quality check for real
exam grades). These related but different assessments are summarised in Box 1 below, together
with final grades, which were the grades eventually accepted by UCAS and were the higher of the
calculated grade or centre assessed grade. It should be noted that we have tried to use ‘teacher-
predicted’ grades only to refer to the grades included as a part of the normal UCAS process, whereas
the term teacher-estimated grades is used in a more generic sense.
Calculated grades
The status of calculated grades was made clear by Ofqual in April 2020:
“The grades awarded to students will have equal status to the grades awarded in other years and
should be treated in this way by universities, colleges and employers. On the results slips and
certificates, grades will be reported in the same way as in previous years”. 12, p.6.
The decisions of Ofqual are supported by Ministerial statement, and universities and other bodies
have little choice therefore but to abide by them, although that does not mean that other factors
may not need to be taken into account in some cases, as often occurs when applicants do not attain
the grades in conditional offers.
None of the above means that calculated grades actually will be equivalent to conventional attained
grades. Calculated grades will not actually be attained grades, they may well behave differently to
attained grades, and in measurement terms they actually are not attained grades, even though in
administrative and even in legal terms, by fiat, they have to be treated as equivalent. From the
7
perspective of educational research, the key issue is the extent to which calculated grades actually
will or can behave in an identical way to attained grades.
In April 2020 Ofqual issued guidance on how calculated grades would be provided for candidates for
whom examinations have been cancelled. Essentially, teachers would be required, for individual
candidates taking individual subjects within a candidate assessment centre (usually a school), to
estimate grades for candidates, and then to rank order candidates within grades, to produce centre
assessment grades. A statistical standardisation process would then be carried out centrally using a
computer algorithm. Ranking is needed because standardisation, “will need more granular
information than the grade alone” (12 p.7}, presumably to break ties at grade boundaries which occur
because of standardisation. Standardisation, to produce calculated grades, would use an algorithm
that took into account the typical distribution of results from that centre for that subject in the three
previous years, along with aggregated centre data on SATS and previous exam attainment as in
GCSEsc. This approach is consistent with Ofqual’s approach to standard-setting. Following
Cresswell13, Ofqual has argued that during times of change in assessments, and perhaps more
generally, there should be a shift away from “comparable performance” (i.e. criterion-referencing),
and that there is an “ethical imperative” to use “comparable outcomes” (i.e. norm-referencing) to
minimise advantages and disadvantages to the first cohort taking a new assessment, as perhaps also
for later cohorts as teachers improve at teaching new assessments 14.
c It was this standardisation process that Governments reversed in August 2020 after the protests
against calculated grades.
Box 1: A-level grades: Actual, predicted, centre assessment, calculated, final, forecasted, and
teacher-estimated grades
Actual or attained grades. The grades awarded by examination boards/awarding organisations
based on written and other assessments which are set and marked externally. Typically sat in
May and June of year 13, with results announced in mid-August.
Predicted grades. Teacher estimates of the likely attained grades of candidates, provided to UCAS
in the first term of year 13, and by October 15th for medical and some other applicants.
Centre assessment grades. Used in the production of Calculated grades (see below). Provided by
examination centres (typically schools) between 1st and 12th of June 2020, consisting of teacher-
estimated grades and candidate rankings within examination centres.
Calculated grades. The final grades to be provided for candidates by exam boards for Summer
2020 assessments, in the absence of attained grades. Based on centre assessment grades, with
final calculated grades involving standardisation/adjustment by exam boards using an algorithm.
Calculated grades, “will have equal status to the grades awarded in other years and should be
treated in this way by universities, colleges and employers” (Ofqual). These grades were often
referred to as the ‘algorithm grades' and were abandoned by the UK governments in August 2020
Final grades. The grades used by UCAS in the 2020 admissions cycle the higher of the teacher
estimated grade or the centre assessment grade
Forecasted grades. Prior to 2015, teachers, in May of Year 13, provided to exam boards a forecast
of the likely grades of candidates along with rankings. Forecasted grades therefore take place
later in the academic cycle than predicted grades, close to the time examinations are actually sat.
Teacher-estimated grades. Generic term used in this paper to refer to grades estimated by
teachers. Includes predicted grades, centre assessment grades, calculated grades, and forecasted
grades.
8
Ofqual said that centre assessment grades, the core of calculated grades, “are not the same as …
predicted grades provided to UCAS in support of university applications” 15, (p.7). Predicted grades
in particular are provided by schools in October of year 13 and centre assessment grades in
May/June of year 13, seven months later, when Ofqual says that teachers should also consider
classwork, bookwork, assignments, mock exams and previous examinations such as AS-levels (taken
only by a minority of candidates now), but should not include GCSE results, or any student work
carried out after 20th March. Whether centre assessment grades, or calculated grades centre
assessment grades moderated by the algorithm - will be fundamentally different from predicted
grades is ultimately an empirical question, which should be answerable when UCAS data for 2020
are available for medical school applicants in UKMED. In the meantime, and it is a core and a
reasonable assumption, that centre assessment grades and hence calculated grades will probably
correlate highly with earlier predicted grades, except for a small proportion of candidates who have
improved dramatically from October 2019 to March 2020. Predicted grades, which have been
collected for decades, should therefore act as a reasonable proxy in research terms for centre
assessment grades and therefore calculated grades, particularly in the absence of any other
information.
The rationale for using A-level grades in selection
Stepping back slightly it is worth revisiting the reasons that A-levels exist and why universities use
them in selection. A-levels assess at least three things: subject knowledge, intellectual ability, and
study habits such as conscientiousness 16. Knowledge and understanding of, say, chemistry is
probably necessary for the high level study of medical science and medicine, to which it provides an
underpinning, and experience suggests that students without such knowledge may have problems.
A-levels also provide evidence for a student’s intellectual ability and capability of extended study at
a high level. A-levels are regarded as a ‘gold standard’ qualification because of the rigour and
objectivity of their setting and marking (see for example Ofqual’s ‘Reliability Programme’17). Their
measurement is therefore reliable, and the presumption is that they are also valid, in some of the
many senses of that word 18-20, and as a result are unbiased. A crucial assumption is of predictive
validity, that future outcomes at or after university are higher or better in those who have higher or
better A-levels, as found both in predicting degree classes in general 4 21 22 and medical school
performance in particular 5 23. There is also an assumption of incremental validity, A-levels being
better predictors than other measures6. At the other extreme, A-levels could be compared
conceptually with, say, a mere assertion by a friend or colleague that, “Oh yes, they know lots of
chemistry”. That is likely neither to be reliable, valid nor unbiased, and hence is a base metal
compared with the gold standard of A-levels. The empirical question therefore is where on the
continuum from gold to base metals, lie calculated grades or teacher-predicted grades.
The issue of predictive validity has been little discussed in relation to calculated grades, but in a TES
(Times Educational Supplement) survey of teachers, there were comments that, “predictions and
staff assessments would never have the same validity as an exam”, so that, “Predictions, past
assessment data and mock data is not sufficient, and will never beat the real thing in terms of
accuracy" 24. The changes in university selection inevitably meant that difficult policy decisions
needed to be made by universities and medical schools. Even in the absence of direct, high-quality,
evidence, policy-makers still have an obligation to make decisions, and, therefore it is argued, must
take theory, related evidence, and so on, into account 25. This paper provides both a review of other
evidence, and also results on the related issue of predicted grades, which it will be argued are likely
to behave in a way that is similar to calculated grades.
9
Review of literature on predicted and forecasted grades
Predicted grades in university selection
A notable feature of UK universities is that selection mostly takes place before A-levels or equivalent
qualifications have been sat, so offers are largely conditional on later attained grades. As a result,
UCAS application forms, since their inception in 1964, have included predicted grades, estimates by
teachers of the A-level grades a student is likely to achieve. Admissions tutors also use other
information in making conditional offers. A majority of applicants in England, applying in year 13 for
university entry at age 18 will have taken GCSEs at age 16 in year 11, a few still take AS-levels in year
12, some students submit an EPQ (Extended Project Qualification), and UCAS forms also contain
candidate statements and school references. Medical school applicants mostly also take admissions
tests such as U(K)CAT or BMAT at the beginning of year 13, and many will take part in interviews or
MMIs (multiple mini-interviews)d.
Predicted grades have always been controversial. A House of Commons Briefing Paper in 2019 noted
that the UK was unusual among developed countries in using predicted gradese, and said that,
The use of predicted grades for university admissions has been questioned for a long time.
Many critics argue that predicted grades should not be used for university entry because
they are not sufficiently accurate and it has been suggested that disadvantaged students in
particular lose out under this system.” 26 p.4
Others have suggested that as well as being “biased”, “predicting A-level grades is clearly an
imprecise science” 27 (p.418). There have been repeated suggestions over the years, none as yet
successful, that predicted grades should be replaced with a PQA (Post-Qualification Applications)
system. As Nick Hillman puts it,
The oddity of our system is not so much that people apply before receiving their results;
the oddity is that huge weight is put on predicted grades, which are notoriously unreliable. …
PQA could tackle this…”f.
The system of predicted grades is indeed odd, but also odd is the sparsity of academic research into
predicted grades. The most important question that seems almost never to have been asked, and
certainly not answered, is the fundamental one of whether it is predicted grades or actual grades
which are better at predicting outcomes. Petch3, in his 1964 monograph which was one of the first
serious discussions of the issues, considers that predicted and actual grades may be fundamentally
different, perhaps being “complementary and not contradictory” (p.29), one being about scholarly
attitude and the other about examination prowess, primarily because “the school knows the
candidate as a pupil, knowledge not available to the examiners”. For Petch, either a zero correlation
or a perfect correlation between predicted and actual grades would be problematic, the latter
perhaps implying that actual grades might be seen as redundant (p.6).
The advent of Ofqual’s calculated grades, which are in effect predicted grades carried out by
teachers in a slightly different way, means there was a serious need in 2020 to know how effective
predicted grades were likely to be as a substitute for attained A-level grades, and the same concern
will apply in 2021, with Ofqual implementing a different model for teacher-estimated gradesg. Are
d see https://www.medschools.ac.uk/studying-medicine/making-an-application/entry-requirements) .
e https://www.bbc.co.uk/news/education-44525719
f https://www.hepi.ac.uk/2019/08/14/pqa-just-what-does-it-mean/
g https://www.gov.uk/government/publications/awarding-qualifications-in-summer-2021/awarding-
qualifications-in-summer-2021
10
teacher-predicted grades in fact ‘notoriously unreliable’, being ‘mere predictions’, or do they have
equivalent predictive validity as attained grades?
The research literature on predicted grades
As part of Section 1 of the Supplementary Information to this paper we have included a more
detailed overview of research studies on predicted grades. Here we will merely provide a brief set of
comments.
Most studies look at predictions at the level of individual exam subjects, which at A-level are graded
from E to A or, from 2010 onwards, from E to A*. The most informative data show all combinations
of predicted grades against attained grades, and Figure 1 gives an example for medical school
applicants. Many commentators, though, look only at over-predictions (‘optimistic’) and under-
predictions (‘pessimistic’). Figure 2 summarises data from five studies of university applicants.
Accurate predictions occur in 52% of cases when A is the maximum grade and 17% when A* is the
maximum grade (and with more categories accuracy is likely to be lower). Grades are mostly over-
predicted, in 42% of cases pre-2010 and 73% post-2010, with under-prediction rarer at 7% of cases
pre-2010 and 10% post-2010. A number of studies have reported that under-prediction is more
common in lower socio-economic groups, non-White applicants, and applicants from state school or
further education 28-30. A statistical issue means such differences are not easy to interpret, as a
student predicted A* cannot be under-estimated, and therefore under-estimation will inevitably be
more frequent in groups with lower overall levels of attainment. This issue is discussed and analysed
at length in Section 5 of the Supplementary Information in relation to applicants from private-sector
schools.
Some studies also consider grade-point predictions, the sum of grade scores for the three best
attaining subjects, scored A*=12, A=10, B=8, etch.. In particular a large study by UCAS 31 showed that
applicants missing their predictions (i.e. they were over-predicted) tended to have lower predicted
grades, lower GCSE attainment, were more likely to have taken physics, chemistry, biology and
psychology, and were from disadvantaged areas. To some extent the same statistical problems of
interpretation apply as with analysis at the level of individual exam subjects. For a number of years
UCAS only provided grade-point predictions, and they are included in the P51 data analysed below.
What are predicted grades and how are they made?
UCAS says that “A predicted grade is the grade of qualification an applicant’s school or college
believes they’re likely to achieve in positive circumstances.”i Later though, the document says
predicted grades should be, “in the best interests of applicants fulfilment and success at college or
university is the end goal “, and “aspirational but achievable stretching predicted grades are
motivational for students, unattainable predicted grades are not” (all emphases in original).
Predicted grades should be professional judgements and be data-driven, including the use of, “past
Level 2 and Level 3 performance, and/or internal examinations to inform …predictions”.
Few empirical studies have asked how teachers estimate grades, with not much progress since 1964
when Petch said, “Little seems to be known about measures taken by schools to standardize
evaluations of pupils” 3 (p.7). Two important exceptions are the studies of Child and Wilsonj in 2015
and Gill 32 in May 2018, with only the latter published. Gill sent questionnaires to selected OCR
h In some studies a scoring of A*=6, A=5, B=4 is used. The 12,10,8... scoring was introduced so that AS levels,
weighted at half an A-level, could be scored as A=5, B=4 etc (there being no A* grade at AS-level). For most
purposes A*=12, A=10 … is equivalent in all respects to A*=6, A=5, etc, apart from a scaling factor.
i https://www.ucas.com/advisers/managing-applications/predicted-grades-what-you-need-know [Accessed
13th April 2020].
j Child, S., & Wilson, F. (2015). An investigation of A level teachers’ methods when estimating student grades.
Cambridge Assessment internal report. Cambridge, UK: Cambridge Assessment
11
(Oxford, Cambridge and Royal Society of Arts) Examination Board exam centres concerning
Chemistry, English Literature and Psychology exams. Teachers said the most important information
used in predicting grades was performance in mock exams, observations of quality of work and
commitment, oral presentation, the opinion of other teachers in the same subject and in other
subjects, and the head of department. Some teachers raised concerns about the lack of high stakes
for mock exams which meant that some students did not treat them seriously. AS-level grades were
an important aid in making predictions, and there were concerns about the loss of AS-levels to help
in prediction, as also mentioned elsewhere 33, and that is relevant to 2020 where most candidates
will not have taken AS-levels.
Studies considered so far almost entirely are concerned with teacher predictions of A-level grades,
since they are important for university admissions. More generally, studies looking at a wider range
of teacher estimates, often in younger children, find a tendency for over-estimation across a range
of skills34, with judgements often being systematically lower for marginalised learners35. A different
position is taken in a genetically-informed study of twins, which suggests, in a forcefully worded
conclusion, that, “Teachers can reliably and validly monitor students’ progress, abilities and
inclinations. … For these reasons, we suggest that teacher assessments could replace some, or all,
high-stakes exams”36. The study however uses only correlations as measures of accuracy, and cannot
assess over- or under-estimation. Also, teacher ratings were only available at ages 7,11 and 14, at
the same time as standardised tests are carried out, but were not available for GCSEs at age 16, or
for A-levels and University Entrance at age 18, and as such are not informative for the purposes of
the present study.
Predicted grades in other Key Stage 5 qualifications than A-levels
Almost all studies on predicted grades have considered A-levels, with a few occasional exceptions
looking at GCSEs. We know of no studies on the Extended Project Qualification (EPQ) in England, of
Scottish Highers and Advanced Highers, or any other qualifications. Section 3 of the Supplementary
Information includes data on both EPQ and SQA examinations.
Forecasted grades
Until 2015, teachers in the May of school year 13 provided awarding organisations with forecasted
grades, and those forecasts in part contributed to quality control of grades by the boards. Since
forecasted grades were produced five to seven months after predicted grades, and closer to the
exam date, they might be expected to be more accurate than predicted grades, being based on
better and more recent information. Forecasted grades are important as they are more similar than
predicted grades to the proposed calculated grades in the way they are calculated, and it is noted
that “they may differ somewhat from the predicted grades sent to UCAS as part of the university
application process” 37. Three formal analyses are available, for candidates in 2009 38, 2012 39 and
2014 37, and four other studies from 1940 40, 1963 3, 1977 41 and 2018 32 are also available, with one
post-2000 study before A* grades were introduced and three after (Figure 2). Petch 40 also provides
a very early description of forecasted grades, looking at teachers’ predictions of pass or fail in School
Certificate examinations in 1940, which also show clear over-prediction.
Forecasted A-level grades are similar in accuracy to predicted grades pre-2010 (42% vs 52%) but are
more accurate post-2010 (47% vs 17%), in part due to a drop in accuracy of predicted grades when
A* grades are available. Despite there being no aspirational or motivational reasons for teachers to
over-predict forecasted grades, particularly in the 1977 and 2018 studies, over-prediction
nevertheless remains as frequent as with predicted grades (pre-2010: 39%; post-2010: 37%) and
remains more common than under-prediction (pre-2010: 20%; post-2010 16%). Overall it is perhaps
possible that calculated grades may be somewhat more accurate than predicted grades, but
forecasted grades appear broadly in their behaviour to predicted grades. Two sets of forecasted
12
grades are available for GCSEs 42 43, and they show similar proportions of over and under-prediction
as do results for A-levels. Over-prediction seems to be a feature of all predictions by teachers.
The three non-official studies of forecasted grades also asked teachers to rank-order candidates, a
procedure which was included in calculated grades. The 1963 data3 found a median correlation of
rankings and exam marks within schools of 0.78, the 1977 data41 a correlation of 0.66 41, and the
recent 2018 data 32 a correlation of about .82. The three estimates, mean r = 0.75, are somewhat
higher than a meta-analytic estimate of .63 (SE =.03) for teachers’ ability to predict academic
achievement 44.
The Gill study 32 is also of interest as one teacher commented on the difficulty of providing rankings
with 260 students sitting one exam, and the author noted that, “it was easier for smaller centres to
make predictions because they know individual students better” (p.42), with it also being the case
that responses to the questionnaire were more likely to come from smaller centres. The 1963 study
of Petch3 , as well as commenting on “considerable divergencies … in the methods by which
estimates were produced” (p.27), as in the variable emphasis put on mock exams, also adds that,
“some of the comments from schools suggested that at times there may be a moral ingredient
lurking about some of the estimates”(p.28).
Overall it seems possible but unlikely that calculated grades might be more accurate than predicted
grades, but they also make clear the problems shown by teachers in ranking and grading candidates.
It also remains possible that examining boards have far more extensive and unpublished data on
forecasted grades that they intend to use in assessing the likely effectiveness of calculated grades.
Applicants to medical school
So far, this review section has been entirely about university applicants across all subjects and the
entire range of A-level grades. Only a handful of studies have looked at predicted grades in medical
school applicants.
Lumb and Vail emphasised the importance of teacher-predicted grades since they determine in large
part how shortlisting takes place 45. In a study of 1995 applicants they found 52% of predictions were
accurate, 41% were over-estimated and 7% under-estimated 45, values very similar to those reported
in university selection in general (Figure 2).
A study by one of the present team used path modelling to assess the causal inter-relationships of
GCSE grades, predicted grades, receipt of an offer, attained A-level grades, and acceptance at
medical school 46. Predicted grades were related to GCSE grades (beta=0.89), and attained A-level
grades were predicted by both GCSE grades (beta=0.44) and predicted A-level grades (beta=0.74).
The study supports claims that teachers may well be using GCSE grades in part to provide predicted
grades, which is perhaps not unreasonable given the clear correlation.
Richardson et al 47 in an important and seemingly unique study looked at the relative predictive
validity of predicted as compared with attained A-level grades. Using a composite outcome of pre-
clinical performance there was a minimal correlation with predicted grades (r=.024) compared with
a correlation of 0.318 (p<.001) with attained A-level grades. To our knowledge this is the only study
of any sort assessing the predictive validity of predicted vs attained A-level grades.
The present study
Although calculated grades are novel and untested in their details, predicted grades have been
around for half a century, and there is also a small literature on forecasted grades. This paper will try
to answer several empirical questions about predicted grades, for which data are now available in
UKMED. Predicted grades will then be used, faute de mieux, to make inferences about the likely
consequence of using calculated grades.
13
Empirical questions to be addressed.
The relationship between predicted and attained grades in medical school applicants
Few previous studies have looked in detail at this high-performing group of students. We will also
provide brief results on Scottish Highers and Advanced Highers, and the EPQ (Extended Project
Qualification), neither of which has been discussed elsewhere to our knowledge.
The predictive validity of predicted grades in comparison with attained grades
A fundamental question concerning calculated grades is whether teacher-predicted grades are
better or worse at predicting outcomes than are actual A-level grades. The relationship between
predicted grades and actual grades cannot itself answer that question. Instead what matters is the
relative performance of predicted and actual grades in predicting subsequent outcomes at the end
of undergraduate or postgraduate training. The only relatively small study on this of which we are
aware in medical students 47 found that only actual grades had predictive validity.
Method
The method provided here is brief. A fuller description including a detailed table of measures can be
found in Section 2 of the Supplementary Information. Overall the project is UKMEDP112, approved
by the UKMED Research Group in May 2020, with data coming from two separate but related
UKMED projects, both of which included predicted grades.
Project UKMEDP089, “The UK Medical Applicant Cohort Study: Applications and Outcomes Study”,
approved Dec 7th, 2018, with Dr Katherine Woolf as principal investigator, is an ongoing analysis of
medical student selection as a part of UKMACS (UK Medical Applicant Cohort Studyk). The data
upload of 21st Jan 2020 included detailed information from UCAS and HESA on applicants for
medicine from 2007 to 2018.
Project UKMEDP051, “A comparison of the properties of BMAT, GAMSAT and UKCAT”, approved
Sept 25th, 2017, with Dr Paul Tiffin as principal investigator, is an ongoing analysis of the predictive
validity of admissions tests and other selection methods such as A-levels and GCSEs in relation to
undergraduate and postgraduate attainment. The present analysis used the download files dated
13th May 2019l. UCAS data are included, although when the present analysis began the file had not
yet included the detailed subject level information available in UKMEDP089m. Outcome data for the
P51 dataset are extensive, and in particular undergraduate progression data are included, such as
UKFPO EPM and SJT, and PSA (Prescribing Safety Assessment), as well as performance on some
postgraduate examinations (MRCP Part1 and MRCS Part A).
Data from HESA and hence UKMED are required to be reported using their rounding and suppression
criteria (https://www.hesa.ac.uk/about/regulation/data-protection/rounding-and-suppression-
anonymise-statistics) and those criteria have been used for all UKMED data. In particular the
presence of a zero or the absence of a percentage may not always mean that there are no
individuals in a cell of a table, and all integers are rounded to the nearest 5.
Results
A fuller description of the results can be found in Section 3 of the Supplementary Information.
k https://ukmacs.wordpress.com/
l UKCAT51_APP_ALL_DATA_13052019_FILE1.SAV and UKCAT51_APP_ALL_DATA_13052019_FILE2.SAV .
m An upload for P51 was made available on 20th April 2020 but was not included in the present analyses.
14
The relationships between predicted and actual grades in medical school applicants.
Predicted and actual A-level grades for individual A-level examinations
Figure 1 shows the relationship between predicted and attained A-level grades for 237,030
examinations from 2010 to 2018 (i.e. assessments including A* outcomes). 39.3% of predicted
grades are A* compared with 23.7% of attained grades. Figure 1.a shows predicted grades in
relation to attained grades, with bold font for accurate predictions, green and blue shading for
under-prediction and orange and red shading for over-prediction. Overall 48.8% of predicted grades
are accurate, which is higher than for university applications in general (see Figure 2), reflecting the
high proportion of A and A* grades (69%). Over-prediction occurred in 44.7% of cases, and under-
prediction in 6.5% of cases. Figure 1.b show the data as percentages. About a half of A* predictions
result in an attained A grade, and over a third of predicted A grades result in grade B or lower.
Predicted and attained grades have a Pearson correlation of r = 0.63.
Differences between A-level subjects
There is little in the literature on the extent to which different A-level subjects may differ in the
accuracy of their predictions, perhaps with different degrees of bias or correlation. Detailed results
are presented in Section 3 of the Supplementary Information. Overall, Biology, Chemistry, Maths and
Physics are very similar in terms of over-prediction and correlation with actual grades. However
General Studies is particularly over-estimated compared with other subjects.
Extended Project Qualification (EPQ) and SQA Advanced Highers
Section 3 of the Supplementary Information contains information on these qualifications. SQA
Advanced Highers, as well as the EPQ, show similar proportions of over-estimation as other
qualifications (see Figure 2).
Reliability of predicted and attained A-level grades
Considering the best three A-level grades, the reliability of an overall score can be calculated from
the correlations of the individual subjects. For 66,006 candidates with at least three paired
predicted and actual grades, Cronbach’s alpha was 0.827 for actual grades and 0.786 for predicted
grades, with a highly significant difference. The difference may in part reflect the higher proportion
of A* grades in predicted than actual grades, and hence a greater ceiling effect, but may also reflect
greater measurement precision in the marking of actual A-levels.
How reliable are attained A-level grades?
Attained A-level grades, like any behavioural measurement are not perfectly reliable, in the sense
that if a candidate took a parallel test containing equivalent but different items it is highly unlikely
that they would get exactly the same mark as on the first attempt. They may, for instance, have
been lucky (or unlucky) at their first attempt, being asked questions on topics which they happened
to have studied or revised more (or revised less), and so on. Reliability is a technical subjectn with
many different approaches 48 49. For continuous measures of raw scores, the reliability can be
expressed as a coefficient such as alpha (and in one A-level maths test in 2011, alpha for the full test
was about 0.97 50, although it is suggested that value is unusually high). Boards though do not report
raw scores, but instead award grades on a scale such as A* to E. The ‘classification accuracy’ of
grades is harder to estimate, and is greater with fewer grade points, wider grade intervals, and a
wide spread of candidate ability 50. There seem to be few published estimates of classification
accuracy for A-levels (although they do exist for GCSEs and AS-levels 50).
Estimating classification accuracy for the present high-attaining group of medical school applicants is
not easy. A fundamental limit for any applicants is that predicted grades cannot possibly predict
n See https://www.gov.uk/government/publications/reliability-of-assessment-compendium for a range of
important papers commissioned and published by Ofqual.
15
actual grades better than attained grades predict themselves (the reliability or classification
accuracy). However, from considering the correlation of the three best predicted and actual grades it
is unlikely that such a limit has currently been reached. The correlation of actual with predicted
grades is .585, and the alpha reliabilities of .827 for actual grades and .786 for predicted grades (see
above). The disattenuated correlation between predicted and actual grades is therefore
.585/((.827x.786) = 0.726, which is substantially less than one, with predicted grades accounting for
only about a half of the true variance present in actual grades. If the disattenuated correlation were
close to one then it could be argued that predicted grades were doing as well as they could possibly
do given that attained grades are not perfectly reliable, but that is clearly far from the case.
True scores and actual scores
From a theoretical, psychometric, point of view it could be argued that it is neither actual nor
predicted grades which need to be estimated for applicants, but their ‘true ability scores’, or the
‘latent scores’, to use the technical expressions, of which predicted and actual grades are but
imperfect estimates. In an ideal world that would be the case, and a well-constructed exam tries to
get as close as possible to true scores. However, it is not possible to know true scores (and if it were
the boards would provide selectors with those scores). Selection itself does not work on true scores
but on the actual grades that are written down, by teachers for predicted grades, and as grades on
exam result certificates by boards. They are the currency in which transactions are conducted during
selection, so that a predicted grade of less than a certain level means a candidate will not get a
conditional offer, and likewise too low an actual grade means a candidate holding a conditional offer
will be rejected. For that reason it is not strictly the correlation of predicted and actual grades which
matters, the two measures being treated as symmetric, but the forward prediction of actual grades
from predicted grades, i.e. the actual grades conditional on the predicted grades (as shown in figure
1b).
Predictive validity of predicted and attained A-level grades in medical students.
Predictive validity in UKMEDP051
The version of the P51 data used here consists entirely of applicants applying to medical schools, but
there is also follow-up into undergraduate and postgraduate training. Predicted A-level grades were
available only for the UCAS application cycles of 2010 to 2014 (i.e. applying for university entry in
October 2009, for the academic year 2010/11, etc), and consisted of a single score in the range 4 to
36 points, based on the sum of the three highest predicted grades, scored as A*=12, A=10, etc. The
modal score for 38,965 applicants was 30 (equivalent to AAA; mean=31.17; SD= 3.58; Median = 32;
5th, 25th, 75th and 95th percentiles= 26, 30, 34 and 36). For simplicity the study was restricted to
applicants aged 18 in the year of application, who had both predicted and attained A-levels, which
also ensured the sample contained only first applications for non-graduate courses, from candidates
who had not taken pre-2010 A-levels, when A* grades were not available. Overall, 22,955 applicants
were studied. Other selection measures included were GCSEs (mean grade for best eight grades), as
well as U(K)CAT and BMAT scores, based on the most recent attempt which for cases was also the
first attempt. For simplicity we used the total of the four sub-scores of U(K)CAT, and the total of
Section 1 and 2 scores for BMAT.
Follow-up is complicated as application cohorts enter medical school in different years, and spread
out in time through medical school and training. Figure 3 uses an Ibry chart 51-54 to show the
educational progression of typical 18-year old medical school entrants, through to postgraduate
qualifications. There are however many variants on this theme. The horizontal axis shows academic
years (September to August) and training years (August to July), with career stages, key events and
measures used on the vertical axis, with coloured boxes indicating typical students, although there
are many variants on entry and progression. The blue boxes show typical students on a five-year
course who entered medical school in October 2010 at the age of 18. They would have taken GCSEs
16
in June 2008 in school year 11, in the 2007/8 academic year, and some would have taken AS-levels in
June 2009. Applicants would have taken aptitude tests in school year 13, most taking either U(K)CAT
or BMAT but some taking both tests. U(K)CAT would have been taken between July and September
2009, and BMAT in November 2009. UCAS applications are submitted in October, with teachers
providing teacher-estimated grades. Note that U(K)CAT results are known before UCAS applications,
but BMAT results are not known until after application. A-levels would have been taken in May-June
2010, with results known in August 2010, and successful applicants entering medical school in Oct
2010. Students on a five year course would start the 2nd medical school year in Oct 2011, the 3rd and
4th years in 2012 and 2013, and during their final year beginning in Oct 2014 they would take the SJT
and PSA tests and be awarded an EPM score, with graduation in May 2015. The first of the two
Foundation years starts in August 2015, and core or specialist training begins in August 2017.
Medical students at some schools take an optional or a compulsory intercalated BSc (iBSc) between
years 2 and 3. As a result they are then a year later in progressing to the later stages, and are shown
by the green boxes in figure 3. Although years are broadly divided into Basic Medical Science and
Clinical stages, some medical schools have courses which are far more integrated55.
The above description is for 18-year olds entering the 2010 entry cohort. The present study included
the 2010 to 2014 entry cohorts (shown by the solid black box in the lower left of figure 3). For
simplicity the last of those cohorts is the only other one, the 2014 entrants having red boxes to show
progression for a five year course, and orange for a six-year course including an iBSc. It should be re-
emphasised that all career trajectories are idealized, and in reality students and doctors have many
and varied training trajectories.
Data were available up until the 2018 academic year, and years after that are therefore shown
greyed out in figure 3. Although all cohorts had data for EPM, SJT and PSA, the later entry cohorts
are less likely to have postgraduate qualifications.
Undergraduate outcome measures were for simplicity restricted to the deciles of the UKFPO’s
Educational Performance Measure (EPM), the raw score of the UKFPO’s Situational Judgement Test
(SJT), and the score relative to the pass mark of the Prescribing Safety Assessment (PSA), all at first
attempt. Relatively few doctors, mostly from the earlier cohorts, had progressed through to
postgraduate assessments, but sufficient numbers for analysis were present for MRCP(UK) Part 1
and MRCS Part A, scores being analysed at the first attempt. It should be noted that while U(K)CAT,
BMAT, PSA, SJT, and postgraduate assessments are nationally standardised, EPM deciles are locally
standardised within medical schools.
EPM, is a complicated measure summarising academic progression through the first four years of
medical school, with individual medical schools deciding what measures to include 56, and expressed
as deciles within each school and graduating cohort year. EPM is here used as the main
undergraduate outcome measure. EPM deciles are confusing, as UKFPO scores them in the reverse
of the conventional order, the first decile being highest performance and the tenth the lowesto. Here
for ease of interpretation we reverse the scoring in what we call revDecile, so that higher scores
indicate higher performance. It should also be remembered that deciles are not an equal interval
scale (figure 4)
Correlations between the measures are summarised in Figure 5. Large differences in Ns reflect some
measures being used in applicants during selection, and others being outcome measures that are
only present in entrants, as well as the smaller numbers of doctors who had progressed to
postgraduate assessments. The distinction is emphasised by dividing the correlation matrix into
o https://foundationprogramme.nhs.uk/wp-content/uploads/sites/2/2019/11/UKFP-2020-EPM-Framework-
Final-1.pdf,
17
three separate parts. Correlations of selection and outcome measures necessarily show range
restriction because candidates have been selected on the basis of the selection measures, and
likewise doctors taking postgraduate examinations may be self-selected for earlier examination
performance.
Figure 5 contains much of interest (see also Section 3 of the Supplementary Information), but the
most important question for present purposes is the extent to which Predicted and Attained A-level
grades (shown in pink and green in Figure 5) differ in their prediction of the five outcome measures,
remembering that undergraduate outcomes are typically five or six years after selection, and
postgraduate outcomes are seven or eight years after selection.
Attained A-levels predict EPM with a simple Pearson correlation of r=0.297 compared with a
correlation of only 0.198 for predicted grades (simple correlations, r, are shown in blue in figure 5). N
is large for these correlations and hence the difference, using a test for correlated correlations 57 is
highly significant (Z=12.6, p<10-33). Multiple regression (see Section 3 of the Supplementary
Information) suggests that predicted grades may have a small amount of predictive variance which is
not shared with attained A-levels. Figure 4 shows mean EPM revDecile scores in relation to actual
and predicted A-levels. The slope of the line is clearly less for predicted A-levels, showing a less good
prediction. It is also clear that attained grades predict well, A*A*A* entrants scoring an average of
two deciles higher at the end of the course than those with AAA grades, each extra grade raising
average performance by about two-thirds of a decile. In contrast the slope is less for predicted
grades, being slightly less than half a decile per predicted A-level grade. The broad pattern of results
is similar for the other undergraduate outcomes, SJT and PSA, and is shown in section 3 of the
Supplementary Information.
The two postgraduate outcome measures, MRCP(UK) Part 1 (Membership of the Royal Colleges of
Physicians (UK) examination) Part 1, and MRCS (Membership of the Royal College of Surgeons) Part
A , although both based on smaller, but still substantial, numbers of doctors, are still significant,
actual grades correlating more highly with MRCP(UK) Part 1 (r=.421) than do predicted grades
(r=.283; Z= 4.54, p=.000055). Likewise, actual grades correlate more highly with MRCS Part A
(r=.421) than do predicted grades (r=.358; Z= 3.67, p=.000238).
The simple correlations (r) in figure 5 are inevitably range restricted as A-level grades and predicted
A-level grades have themselves been used as a part of the selection process. Taking range
restriction into account using the method of Hunter, Schmidt and Le6 58 (see also 59), uses uX, the
ratio of standard deviations in the predictors in the unrestricted and the restricted population,
values below one indicating more range restriction. Figure 5 shows uX [uX] at the bottom of the
columns, and it can be seen that it is much lower for actual A-level grades than predicted A-level
grades, suggesting that actual grades are more important in the selection process than are predicted
grades. Construct-level predictive validity (CLPV)6 can be calculated, taking reliability of measures
into account, using .827 for attained A-levels and .785 for predicted A-levels (see earlier), with all
other reliabilities set at 0.9 in the absence of better estimates. Note that the calculation, unlike that
carried out previously 6, for simplicity does not take censorship/ceiling effects of A-levels into
account, and a fuller analysis will be presented elsewhere. The CLPV, ρTPa [shown as rTPa in figure 5],
given the greater range restriction, are relatively higher for actual Alevel grades than for predicted
Alevel grades. CLPV for predicting EPM is 0.403 for actual A-level grades compared with 0.251 for
predicted A-level grades. For predicting postgraduate qualifications, CLPV for MRCP(UK) Part 1 and
MRCS Part A are .601 and .519 for attained A-level grades compared with .360 and .216 respectively
for predicted A-level grades.
There are suggestions that predicted grades may not be equivalent in candidates from state schools
and private schools, grades being predicted more accurately in independent schools 28 29. That is
18
looked at in Section 5 of the Supplementary Information, and while there is clear evidence, as found
before in the UKCAT-12 study 60, that private school entrants underperform relative to expectations
based on their A-levels, there is no evidence that predicted grades behave differently in candidates
from private schools.
A practical question relevant to calculated grades concerns the extent to which, in the absence of
attained A-level grades, other selection measures such as GCSEs, U(K)CAT and BMAT can replace the
predictive variance of attained A-level grades. That will be considered for EPM where the sample
sizes are large. Attained grades alone give r = 0.297, and predicted grades alone give r=.198,
accounting for less than half as much outcome variance. Adding GCSEs to a regression model
including just predicted grades increases multiple R to .225, and also including U(K)CAT and BMAT
increases it to .231, which though is still substantially less than the .297 for attained A-levels alone.
In the absence of attained A-level grades, prediction is improved by including GCSEs and U(K)CAT or
BMAT, but the prediction still falls short of that for actual A-levels alone.
Modelling the effect of only predicted grades being available for selection
In the context of the 2020 pandemic, an important question is the extent to which future outcomes
may change as a result of selection being in terms of calculated grades. Calculated grades
themselves were not known at the time of the study, but predicted grades are probably a
reasonable surrogate for them in the first instance. A modelling exercise was therefore carried out
whereby the numbers of students in the various EPM RevDeciles were tabulated in relation to
predicted grades at five grade levels, 36 pts ≡ A*A*A*, 34 pts ≡ A*A*A, 32 pts ≡ A*AA, 30 pts ≡ AAA
and ≤ 28 pts ≡ ≤ AAB, with the probability of each decile found for each predicted A-level band.
Assuming that selection results in the usual numbers of entrants with grades of A*A*A*, A*A*A, etc,
but based on calculated grades rather than actual grades, the expected numbers of students in the
various EPM deciles can be found. Figure 6 shows deciles as standard UKFPO deciles (1 = highest),
UKFPO scores (43 = highest), and RevDeciles, (10 = highest). The blue column shows the actual
proportions in the deciles based on attained A-level grades. Note that for various reasons there are
not exactly equal proportions in the ten decilesp. Based on selection on attained A-level grades there
are 7.2% of students in the lowest performing decile, compared with an expected proportion of 8.1%
for selection on predicted grades, an increase of 0.9% percentage points, which is a relative increase
of 13.0% in the proportion of the lowest decile, with an odds ratio of 1.141 of attaining the lowest
decile. For the highest scoring decile, the proportion decreases from 10.1% with actual A-level
grades to 8.8% if predicted A-level grades are used, an absolute decrease of 1.4%, and a relative
decrease of 13.4% of top deciles, with an odds ratio of 0.853.
Of course, the above calculations are based on the assumption that the ‘deciles’ for calculated
grades are expressed at the same standard as currently. Were the outcomes to be re-standardised
so that all deciles were equally represented then of course at finals no noticeable difference in
performance would be presentq. However the academic backbone would still be present, and overall
poorer performance on statistically equated postgraduate exams61 would be expected.
Discussion
The present data make clear that under a half of predicted grades are accurate, with 45% being
higher than attained grades, and 17% being lower. The data also show that attained grades are far
p In part this reflects the fact that some students, particularly weak ones, are given an EPM score, but then fail
finals.
q This is based on deciles being calculated in a way that are equated to levels used in the present calculation.
Of course if calculated strictly as deciles then of necessity 10% will still remain in the top decile, etc.. That
difficulty of deciles will in large part be removed when the 2020 entrants graduate as UKMLA should be on
stream by then.
19
better predictors of medical school performance than are predicted grades, which account for only
about a third as much outcome variance as attained grades. Attained grades are also more reliable
than predicted grades.
Validation is the bottom line for all measures used during selection, and in the present case it is
validation against assessment five to eight years down the line from the original A-levels, in both
undergraduate and postgraduate assessments. That is strong support for what we have called ‘the
academic backbone’, prior attainment providing the underpinning for later attainment, and hence
there are correlations in performance at all stages of training from GCSEs through to medical
degrees and on into postgraduate assessments 5.
Our findings contradict suggestions that holistic judgments by teachers of predicted grades are
better predictors of outcomes since teachers may know their students better than examiners. The
immense efforts by exam boards and large numbers of trained markers to refine educational
measurements is therefore gratifying and reassuring. Careful measurement does matter.
An important question is whether there is some variance in predicted and actual grades which is
complementary. We found that adding predicted grades to the model predicting outcomes
improved the multiple correlation coefficient by only 0.05, accounting for only an additional 0.25%
of variance)). This suggests that predicted grades may provide a very small amount of additional
information in predicting outcomes. What that information might be is unclear, and it is possible
that it is what Petch called ‘scholarly attitude’. At present though it is worth remembering that
examination grades at A-level are primarily predicting further examination grades at the end of
medical school, although EPM scores do include formal assessments of course work, and practical
and clinical skills. If other outcome measures, perhaps to do with communication, caring or other
non-cognitive skills were available then predicted grades might show a greater predictive value.
The present data inevitably have some limitations. There is little likelihood of bias since complete
population samples have been considered, and there is good statistical power with large sample
sizes. Inevitably not all outcomes can be considered, mainly because the cohorts analysed have not
yet progressed sufficiently through postgraduate training. However, those postgraduate outcomes
which are included do show substantial effects which are highly significant statistically.
Our questions about predicted grades have been asked in the practical context of the cancellation of
A-level assessments and their replacement by calculated grades, as a result of the COVID-19
pandemic. It seems reasonable to assume, given the literature on predicted grades, and particularly
on forecasted grades, that calculated grades will probably have similar predictive ability to predicted
grades, but perhaps be a little more effective due to occurring later in the academic cycle. Such a
conclusion would be on firmer ground if exam boards had analysed the predictive validity of the data
they had collected on forecasted grades, particularly in comparison with predicted and actual
grades. Such data may exist, and if so then they need to be seen. In their absence, the present data
may be the best available guesstimates of the likely predictive validity of calculated rather than
actual grades.
A potential limitation of our study is that we do not include the calculated and final grades for
students who applied for admission in 2020;however calculated and final grades for 2020 will be
available in UKMED in 2021, and since that year group will also have the teacher-predicted grades
submitted to UCAS, an immediate question of interest will be the extent of the correlation of the
measures, and hence whether teacher-predicted grades are indeed a proxy for calculated grades.
Having said that, it will not be possible to calculate the predictive validity of teacher-predicted and
calculated grades for a number of years until the cohort progresses through undergraduate training.
Medium- and long-term predictive validity inevitably take time to acquire, and practical decision-
making sometimes has to be based on proxy, surrogate measures, with teacher-predicted grades at
20
application to UCAS being a reasonable substitute. If it were the case that teacher-predicted grades
for UCAS and teacher-estimated grades as a part of calculated grades were fundamentally
discrepant then serious questions would be raised about one or other set of estimates. The same
applies to the teacher-estimated grades being used as a substitute for A-levels in the summer of
2021, which will apply to the cohort applying for entry to medical school in 2021.
Under-prediction
Under-prediction is a particular risk in cases where teachers do not know their students well, or in
some cases perhaps underestimate their ability because of attitude, personal characteristics, or
other factors. There is some evidence that teacher-assessed grades relate more to student
personality than do grades in national examinations62 63, although effects were relatively weak. Any
such biases are traditionally solved by the externality and objectivity of national examinations.
Petch, once again, put it well, describing,
“instances, where, in the examination room, candidates have convinced the examiners that
they are capable of more than their schools said that they were … Paradoxical as it will
seem, examiners are not always on the side of authority; an able rebel can find his wider
scope within the so-called cramping confines of an examination.” 3(p.29).
There is a clear echo here of the quote by Yasmin Hussein with which this paper began. Hussein’s
concerns are not alone, and the UKMACS study in April 2020 found concerns about fairness were
particularly present in medical school applicants from non-selective schools, from Black Asian and
Minority Ethnic (BAME) applicants, from female applicants, and from those living in more deprived
areas 10.
Effects of loss of schooling.
A further consideration is more general and asks what the broader effects of the COVID-19
pandemic may be on medical education. Students at all levels of education have had teaching and
learning disrupted, often extensively, and that is also true of all stages of medical education. The
2020 cohort of applicants/entrants will not have been assessed formally at A-level. As well as
meaning that they may only have calculated grades, which are likely to be less accurate, they also
will have missed out on significant amounts of teaching. UK students who should have taken A-level
exams in 2020 missed around 30 to 40 school days; those in the year below from whom 2021
medical school entrants will be drawn, will have missed around 80 days. Burgess and Sievertsen 64,
using data from two studies 65 66, estimate that 60 lost school days results in a reduction in
performance of about 6% of a standard deviation, which they say is, “non-trivial” (and for
comparison a rule of thumb is that students in school improve by about one third of a standard
deviation in each school year67). These effects are likely to differ also by socio-economic
background, particularly given variability in the effectiveness of home schooling. Applicants not
taking A-levels will also suffer from the loss of the enhanced learning that occurs when learners are
tested the ‘testing effect’ – for which meta-analyses have found effect sizes of about 0.50 68 69,
which is also non-trivial. Taken overall, 2020 entrants to medical school, and perhaps those in 2021
as well, may without additional support - perform less well in the future as a result of missing out
both on education and on its proper assessment.
Conclusions
The events of 2020 as a result of the COVID-19 pandemic were extra-ordinary, and unprecedented
situations occurred of which the cancellations of GCSE and A-level exam cancellations were but one
example. The current study should not be seen as criticism of the response of Ofqual to that
situation; given the circumstances in which it found itself, with examinations cancelled (when the
Chair of Ofqual, Roger Taylor, had recommended socially-distanced or delayed exams), Ofqual’s
solution to the problems had many obvious virtues. We began this paper by quoting a letter to a
21
newspaper in March 2020 at the beginning of lockdown by a student taking GCSEs, and so it is
probably appropriate to finish with a letter to a different newspaper by an A-level student. Written
at the height of the A-levels crisis, in August 2020, it raises many subtle, important and mostly
neglected questions, ones which researchers will need to grapple with in the future:
Ofqual’s grading system appears to be lacking in advocates. Blinded by rhetoric about what
protesters call a ‘classist’ algorithm, key facts have been overlooked. It is very clear that
teachers are shockingly bad at predicting grades; using teacher predictions there will be a
12% inflation in higher grades compared with last year. While some centres predicted
accurately, some centres predicted only the highest grades for their students. This U-turn
from the government entails a huge injustice for the pupils who had fair and accurate
predictions, as well as for those taking exams next year. In the zero-sum game of university
applications, the results of these pupils make them appear weaker than they are.
Irresponsible teachers who over-predicted their pupils’ results ought to be ashamed that
they too have thereby ‘dashed the dreams’ of many young people across the country. That it
is less obvious does not make it any less true.” (Letter to The Times, 19th August 2020, by Seb
Bird, A-level student, Bristol)70.
For most university applicants there already existed predicted grades from the previous autumn
when UCAS applications were submitted, but they would have been on average half a grade or so
too high, being aspirational as much as realistic, and also for medical students would have been
made by October 2019, whereas calculated grades would be based on teacher predictions in May
2020, albeit with several months of courses missing since March 2020.
In May 2020 we wrote that raw teacher-predicted grades would have wrecked much university
planning, particularly coming so late in the year, after offers had been made, as numbers of
acceptances would inevitably have been far too high7. That in fact happened, and quotas for
university entries had to be abandoned in August 2020, including for medicine, and that had knock-
on effects into first year university courses, and probably beyond. There was also a risk that
predicted grades could have been systematically higher from some schools than others the ones
with a tendency to call all of their geese swans -- and that probably applies also to the centre
assessment grades sent to examination boards and mostly eventually accepted without central
standardisation in August 2020. The consequences of that will not become apparent for a few years.
This paper has provided evidence that the grades awarded to medical applicants in summer 2020
will probably not predict future outcomes with the same effectiveness as actual, attained grades,
and that is a problem that universities and medical schools and postgraduate deaneries will have to
work with, probably for many years as the 2020 cohort works through the system. It seems likely
therefore, as Thomson has said, “… this year group will always be different…”2.
22
Figure captions
Figure 1: Predicted vs attained A-level grades for individual subjects in applicants to UK medical
schools. Accurate predictions are in bold; yellow over-estimates by 1 grade; orange over-
estimates by 2+ grades; green under-estimates by 1 grade; blue under-estimates by 2+ grades.
a) Counts; b) attained grades as percentages within predicted grades.
Figure 2: Over-estimated, under-estimated and accurate predicted grades in various studies. Black
font: predicted grades; red font: forecasted grades; yellow background: pre-2000; blue background:
pre-2010; bold, underlined: averaged results post-2000.
Figure 3: An Ibry chart illustrating the progression of the 2010 to 2014 medical school entry cohorts
through secondary schooling, application to medical school, undergraduate and post-graduate
training, with the timing of key events shown. See text for further details.
Figure 4: Mean EPM revDeciles (95% CI) in relation to actual A-level grades (green) and predicted A-
level grades (red)
Figure 5: Correlation matrix of selection measures, undergraduate outcome measures, and
postgraduate outcome measures (separated by grey lines for clarity). Cells indicate simple Pearson
correlations (r; in blue), construct-level predictive validity (rTPa; in red) and sample size (N; in black).
Figure 6: Predicted decile outcomes if selection were on Predicted A-level grades (blue) rather than
actual A-level grades (orange).
23
Acknowledgements
We are grateful to Paul Garrud, Gill Wyness, Paul Newton, Colin Melville and Christian Woodward
for their comments on earlier versions of this manuscript, and to Jon Dowell, Peter Tang, Rachel
Greatrix and other members of the UKMED Research Group and Advisory Board for their assistance
in fast-tracking the pre-print of this paper, and for their comments on it. We also thank Tim Gill for
providing us with an unpublished manuscript.
Contributors
DTS prepared the data extracts, provided details on date sources and variable definitions where
required and commented on manuscript drafts. ICM originated the idea for the study, and discussed
it with other authors throughout the project. ICM wrote the first draft of the manuscript, and KW,
DH, PAT, LP, KYFC and DTS have read, reviewed and commented on earlier drafts and contributed
ideas, as well as approving the final draft, both of the preprint, and of the present paper.
Funding
KW is a National Institute for Health Research (NIHR) Career Development Fellow (NIHR CDF-2017-
10-008) and is principal investigator for the UKMACS and UKMEDP089 projects supported by the
NIHR funding.
DH is funded by NIHR grant CDF-2017-10-008 to KW.
PAT‘s research time is supported by an NIHR Career Development Fellowship (CDF 2015-08-11), and
PAT is also principal investigator for the UKMEDP051 project.
LWP is partly supported by NIHR grant CDF 2015-08-11 to PAT, and a portion of his research time is
funded by the UCAT board.
ICM, KYFC and DTS have received no specific funding for this project.
Disclaimers
KW and DH state that this publication presents independent research funded by the National
Institute for Health Research (NIHR). The views expressed are those of the authors and not
necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
PAT states this research is supported by an NIHR Career Development Fellowship (CDF 2015-08-11).
This paper presents independent research partly funded by the National Institute for Health
Research. The views expressed are those of the authors and not necessarily those of the NHS, the
NIHR or the Department of Health and Social Care.
KYFC is employed as the Head of Marking and Results at Cambridge Assessment English. The views
expressed are those of the authors and do not represent the views of Cambridge Assessment.
DTS is employed by the GMC as a data analyst working on the UKMED project. The views expressed
here are his views and not the views of the GMC.
Data sources. UK Medical Education Database (UKMED) UKMEDP051 data extract generated on 13th
May 2019. UKMEDP089 extract generated 21st January 2020. UKMEDP112 project, using
UKMEDP051 and UKMEDP089 data, approved for publication on 29th May 2020. We are grateful to
UKMED for the use of these data. However, UKMED bears no responsibility for their analysis or
interpretation.
UKMEDP051 data includes information derived from that collected by the Higher Education Statistics
Agency Limited (HESA) and provided to the GMC (HESA Data). Source: HESA Student Record
24
2002/2003 to 2014/2015. Copyright Higher Education Statistics Agency Limited. The Higher
Education Statistics Agency Limited makes no warranty as to the accuracy of the HESA Data, cannot
accept responsibility for any inferences or conclusions derived by third parties from data or other
information supplied by it.
UKMEDP051 and UKMEDP089 include Universities and Colleges Admissions Service (UCAS) data
provided to the GMC (UCAS data). Source: UCAS (application cycles 2007 to 2018). Copyright
Universities and Colleges Admissions Service (UCAS). UCAS makes no warranty as to the accuracy of
the UCAS Data and cannot accept responsibility for any inferences or conclusions derived by third
parties from data or other information supplied by it.
All data from HESA are required to be reported using their rounding and suppression criteria
(https://www.hesa.ac.uk/about/regulation/data-protection/rounding-and-suppression-anonymise-
statistics) and we have applied those criteria to all UKMED-based tables and values reported here.
Competing interests
ICM is a member of the UKMED Research Group and the UKMED Advisory Board, and is also on the
UKMACS advisory group.
PAT is a member of the UKMED Research Group. PAT has previously received research
funding from the ESRC, the EPSRC, the Department of Health for England, the UCAT Board,
and the GMC. In addition, PAT has previously performed consultancy work on behalf of his
employing University for the UCAT Board and Work Psychology Group and has received travel
and subsistence expenses for attendance at the UCAT Research Group.
KYFC is a member of the UKMED Research Group, and is an employee of Cambridge Assessment - a
group of exam boards that owns and administers the BioMedical Admissions Test (BMAT); UK GCSEs
and A-levels; and International GCSEs and A-levels.
DTS is a member of the UKMED Research Group and the UKMED Advisory Board and is employed by
the GMC as a data analyst working on the UKMED project.
KW, DH and LP declare no competing interests.
Authorship
ICM conceived the idea for the study, conducted the statistical analysis and wrote the first draft of
the paper. All authors contributed to subsequent drafts and approved the final version.
Ethical approval
Queen Mary Research Ethics Committee, University of London, agreed on 11 November 2015 that
there was no need for ethical review of UK Medical Education Database research studies.
https://www.ukmed.ac.uk/documents/UKMED_research_projects_ethics_exemption.pdf
All data from HESA are required to be reported using their rounding and suppression criteria
(https://www.hesa.ac.uk/about/regulation/data-protection/rounding-and-suppression-anonymise-
statistics) and we have applied those criteria to all UKMED-based tables and values reported here.
Provenance and peer review
Not commissioned; reviewed by the UKMED Research Subgroup and Advisory board, and to
submitted to an externally peer reviewed journal. This paper was fast-tracked through the UKMED
governance processes as with other COVID-19 related research projects elsewhere.
25
Patient and public involvement.
No patient involvement.
Data sharing statement
Researchers wishing to re-analyse the data used for this study can apply for access to the same
datasets via UKMED (www.ukmed.ac.uk).
Figure captions
Figure 1: Predicted vs attained A-level grades for individual subjects in applicants to UK medical
schools. Accurate predictions are in bold; yellow over-estimates by 1 grade; orange over-
estimates by 2+ grades; green under-estimates by 1 grade; blue under-estimates by 2+ grades. a)
Counts; b) attained grades as percentages within predicted grades.
Figure 2: Over-estimates, under-estimate and accurate predicted grades in various studies. Black
font: predicted grades; red font: forecasted grades; yellow background: pre-2000; blue background:
pre-2010; bold, underlined: averaged results post-2000.
Figure 3: An Ibry chart illustrating the progression of eighteen-year old 2010 to 2014 medical school
entry cohorts through secondary schooling, application to medical school, undergraduate and post-
graduate training, with the timing of key events shown. See text for further details.
Figure 4: Correlation matrix of selection measures, undergraduate outcome measures, and
postgraduate outcome measures (separated by grey lines for clarity). Cells indicate Pearson
correlations (r) and N, as well as the construct-level predictive validity (rTPa). Range restriction is
indicated by uX. See text for details.
Figure 5: Predicted decile outcomes if selection were on Predicted A-level grades (blue) rather than
actual A-level grades (orange).
26
References
1. Hussein Y. Cancellation of GCSE is unfair to some students. The Guardian 2020(March 29th):
https://www.theguardian.com/world/2020/mar/29/cancellation-of-gcse-exams-unfair-to-
some-students.
2. Thomson D. Moderating teaching judgments in 2020 [Blog post, 25th March 2020]. London: FFT
Educational Lab: https://ffteducationdatalab.org.uk/2020/03/moderating-teacher-
judgments-in-2020/ (accessed 16th April 2020) 2020.
3. Petch JA. School estimates and examination results compared. Manchester: Joint Matriculation
Board 1964.
4. Higher Education Funding Council for England [HEFCE]. Differences in student outcomes: The
effect of student characteristics. Data Analysis March 2018/05,. Bristol: HEFCE 2018.
5. McManus IC, Woolf K, Dacre J, et al. The academic backbone: Longitudinal continuities in
educational achievement from secondary school and medical school to MRCP(UK) and the
Specialist Register in UK medical students and doctors. BMC Medicine
2013;11:242:doi:10.1186/741-7015-11-242.
6. McManus IC, Dewberry C, Nicholson S, et al. Construct-level predictive validity of educational
attainment and intellectual aptitude tests in medical student selection: Meta-regression of
six UK longitudinal studies. BMC Medicine 2013;11:243:doi:10.1186/741-7015-11-243.
7. McManus IC, Woolf K, Harrison D, et al. Calculated grades, predicted grades, forecasted grades
and actual A-level grades: Reliability, correlations and predictive validity in medical school
applicants, undergraduates, and postgraduates in a time of COVID-19. medRxiv 2020;doi:
https://doi.org/10.1101/2020.06.02.20116830
8. McKie A. Scrapped exams may spark UK admissions 'scramble'. Times Higher Education 2020;26th
March 2020:9-9.
9. Dowell J, Cleland J, Fitzpatrick S, et al. The UK medical education database (UKMED): What is it?
Why and how might you use it? BMC Medical Education 2018;18 (DOI 10.1186/s12909-017-
1115-9)(6):1-8.
10. Woolf K, Harrison D, McManus IC. The attitudes, perceptions and experiences of medical school
applicants following the closure of schools and cancellation of public examinations due to
the COVID-19 pandemic in 2020. medRxiv 2020;submitted
11. Woolf K, Harrison D, McManus C. The attitudes, perceptions and experiences of medical school
applicants following the closure of schools and cancellation of public examinations in 2020
due to the COVID-19 pandemic: a cross-sectional questionnaire study of UK medical
applicants. BMJ open 2021;11(3):e044753.
12. Ofqual. Summer 2020 grades for GCSE, AS and A level, Extended Project Qualification and
Advanced Extension Award in maths: Guidance for teachers, students, parents and carers.
Coventry: Ofqual: Ofqual/20/6607/2
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_
data/file/877842/Summer_2020_grades_for_GCSE_AS_A_level_EPQ_AEA_in_maths_-
_guidance_for_teachers_students_parents.pdf (accessed 3rd April 2020) 2020.
13. Cresswell M. Heaps, prototypes and ethics: The consequence of using judgments of student
performance to set examination standards in a time of change. London: Institute of
Education 2003.
14. Ofqual. Setting GCSE, AS and A Level Grade Standards in Summer 2014 and 2015. London:
https://www.gov.uk/government/publications/setting-gcse-and-a-level-grade-standards-in-
summer-2014-and-2015 [accessed 18th April 2020] 2020.
15. Ofqual. Summer 2020 grades for GCSE, AS and A level, Extended Project Qualification and
Advanced Extension Award in maths: Information for Heads of Centre, Heads of Department
and teachers on the submission of Centre assessment grades. Coventry: Ofqual:
Ofqual/20/6607/1
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_
data/file/877930/Summer_2020_grades_for_GCSE_AS_A_level_EPQ_AEA_in_maths_-
_guidance_for_heads_of_centres.pdf (accessed 3rd April 2020) 2020.
27
16. McManus IC, Powis DA, Wakeford R, et al. Intellectual aptitude tests and A levels for selecting UK
school leaver entrants for medical school. BMJ 2005;331:555-59.
17. Opposs D, He Q. The reliability programme: Final Report. Coventry: Office of Qualifications and
Examinations Regulation (http://www.ofqual.gov.uk/files/reliability/11-03-16-Ofqual-The-
Final-Report.pdf) 2011.
18. Downing SM. Validity: On the meaningful interpretation of assessment data. ME 2003;37:830-37.
19. Association AER, Association AP, Education NCoMi. Standards for educational and psychological
testing. Washington, DC: American Educational Research Association 2014.
20. Kane MT. Validating the interpretations and uses of test scores. Journal of Educational
Measurement 2014;50(1):1-73.
21. Bekhradnia B, Thompson J. Who does best at University? London: Higher Education Funding
Council England
(http://webarchive.nationalarchives.gov.uk/20081202000732/http://hefce.ac.uk/Learning/
whodoes/) 2002.
22. Higher Education Funding Council for England [HEFCE]. Differences in degree outcomes: The
effect of subject and student characteristics. Issues Paper 2015/21. Bristol: HEFCE 2015.
23. McManus IC, Smithers E, Partridge P, et al. A levels and intelligence as predictors of medical
careers in UK doctors: 20 year prospective study. BMJ 2003;327:139-42.
24. Lough C. GCSEs: Only 39% teachers think 2020 grades fair for all: Plan for teacher-assessed GCSE
and A-level grades prompts concerns about potential teacher bias, TES survey of 19,000
finds. TES (Times Educational Supplement) 2020;13th May
2020:https://www.tes.com/news/coronavirus-gcses-only-39-teachers-think-2020-grades-
fair-all.
25. Lilford R. Policy makers should use evidence, but what should they do in an evidence vacuum?
ARC West Midlands News Blog [NIHR Applied Research Collaboration, West Midlands]
2020;2(4 (April 24th, 2020)):1-2 (https://arcwm.files.wordpress.com/2020/04/arc-wm-
newsblog-20-04-24.pdf).
26. Hubbles S, Bolton P. The review of university admissions [Briefing Paper Number 8538, 10 April
2019]. London: House of Commoons
[https://researchbriefings.files.parliament.uk/documents/CBP-8538/CBP-8538.pdf] 2019.
27. Snell M, Thorpe A, Hoskins S, et al. Teachers' Perceptions and A-Level Performance: Is There Any
Evidence of Systematic Bias? Oxford Review of Education 2008;34(4):403-23.
28. Everett N, Papageorgiou J. Investigating the Accuracy of Predicted A Level Grades as part of 2009
UCAS Admission Process. London: Department for Business, Innovation and Skills 2011.
29. Wyness G. Predicted grades: Accuracy and impact. A report of University and College Union.
London: University and College Union (https://www.ucu.org.uk/media/8409/Predicted-
grades-accuracy-and-impact-Dec-16/pdf/Predicted_grades_report_Dec2016.pdf) 2016.
30. UCAS. End of cycle report 2017: Qualifications and competition. Cheltenham: UCAS
[https://www.ucas.com/data-and-analysis/ucas-undergraduate-releases/ucas-
undergraduate-analysis-reports/2017-end-cycle-report] 2017.
31. UCAS. Factors associated with predicted and achieved A level attainment, August 2016.
Cheltenham: UCAS: https://www.ucas.com/file/71796/download?token=D4uuSzur 2016.
32. Gill T. Methods used by teachers to predict final Alevel grades for their students. Research
Matters (UCLES) 2019(28):33-42.
33. Walland E, Darlington E. Insights on trends in AS Levels, the EPQ and Core Maths: summary
report. Cambridge: 35859 /id}. Cambridge Assessment:
https://www.cambridgeassessment.org.uk/.../527125-insights-on-trends-in-as- levels-the-
epq-and-core-maths-summary-report.pdf 2019.
34. Urhahne D, Wijnia L. A Review on the Accuracy of Teacher Judgments. Educational Research
Review 2020:100374.
35. Meissel K, Meyer F, Yao ES, et al. Subjectivity of teacher judgments: Exploring student
characteristics that influence teacher judgments of student ability. Teaching and Teacher
Education 2017;65:48-60.
28
36. Rimfeld K, Malanchini M, Hannigan LJ, et al. Teacher assessments during compulsory education
are as reliable, stable and heritable as standardized test scores. Journal of Child Psychology
and Psychiatry 2019;60(12):1278-88.
37. Gill T, Benton T. The accuracy of forecast grades for OCR Alevels in June 2014: Statistics Report
Series No 90. Cambridge: Cambridge Assessment
[https://www.cambridgeassessment.org.uk/Images/241261-the-accuracy-of-forecast-
grades-for-ocr-a-levels-in-june-2014.pdf] 2015.
38. Gill T, Rushton N. The accuracy of forecast grades for OCR Alevels: Statistics Report Series No 26.
Cambridge: Cambridge Assessment [https://www.cambridgeassessment.org.uk/our-
research/all-published-resources/statistical-reports/150215-the-accuracy-of-forecast%20-
grades-for-ocr-a-levels-in-june-2012.pdf/] 2011.
39. Gill T, Chang Y. The accuracy of forecast grades for OCR A levels in June 2012: Statistics Report
Series No.64. Cambridge: Cambridge Assessment Statistics Report Series No.64 2013.
40. Petch JA. Fifty years of examining: The Joint Matriculation Board, 1903-19531953.
41. Murphy RJL. Teachers' assessments and GCE results compared. Educational Research
1979;22(1):54-59.
42. Gill T, Chang Y. The accuracy of forecast grades for OCR GCSEs in June 2013: Statistics Report
Series No 89. Cambridge: Cambridge Assessment
[https://www.cambridgeassessment.org.uk/Images/241260-the-accuracy-of-forecast-
grades-for-ocr-gcses-in-june-2013.pdf] 2015.
43. Gill T, Benton T. The accuracy of forecast grades for OCR GCSEs in June 2014: Statistics Report
Series No 91. Cambridge: Cambridge Assessment
[https://www.cambridgeassessment.org.uk/Images/241265-the-accuracy-of-forecast-
grades-for-ocr-gcses-in-june-2014.pdf] 2015.
44. Südkamp A, Kaiser J, Möller J. Accuracy of Teachers' Judgments of Students' Academic
Achievement: A Meta-Analysis. Journal of Educational Research 2012;104(3):743-62.
45. Lumb AB, Vail A. Applicants to medical school: the value of predicted school leaving grades. Med
Educ 1997;31:307-11.
46. McManus IC, Richards P, Winder BC, et al. Medical school applicants from ethnic minorities:
identifying if and when they are disadvantaged. Brit Med J 1995;310:496-500.
47. Richardson PH, Winder B, Briggs K, et al. Grade predictions for school-leaving examinations: do
they predict anything? Med Educ 1998;32:294-97.
48. Wilmut J, Wood R, Murphy R. A review of research into the reliability of examinations: A
discussion paper prepared for the School Curriculum and Assessment Authority. Nottingham:
School of Education [available at www.gov.uk/systems/uploads] 1996.
49. Bramley T, Dhawan V. Estimates of Reliability of Qualifications. Cambridge: Cambridge
Assessment:[https://assets.publishing.service.gov.uk/government/uploads/system/uploads/
attachment_data/file/578868/2011-03-16-estimates-of-reliability-of-qualifications.pdf]
2010.
50. Wheadon C, Stockford I. Classification accuracy and consistency in GCSE and A level
examinations offered by the Assessment and Qualifications Alliance (AQA) November 2008
to June 2009. Office of Qualifications and Examinations Regulation: Coventry
(http://www.ofqual.gov.uk/files/reliability/11-03-16-AQA-Classification-Accuracy-and-
Consistency-in-GCSE-and-A-levels.pdf) 2011.
51. Marey EJ. La Méthode graphique dans les sciences expérimentales et particuliérement en
physiologie et en médecine. Paris1878.
52. Wainer H, Harik P, Neter J. Visual Revelations: Stigler's Law of Eponymy and Marey's Train
Schedule: Did Serjev Do It Before Ibry, and What About Jules Petiet? Chance 2013;26(1):53-
56.
53. Tufte ER. The visual display of quantitative information. Cheshire, CT: Graphics Press 2018.
54. Garrud P, McManus IC. Impact of accelerated, graduate-entry medicine courses: a comparison of
profile, success, and specialty destination between graduate entrants to accelerated or
standard medicine courses in UK. BMC Medical Education 2018;18:
250:https://doi.org/10.1186/s12909-018-1355-3.
29
55. Devine OP, Harborne AC, Horsfall HL, et al. The Analysis of Teaching of Medical Schools (AToMS)
survey: an analysis of 47,258 timetabled teaching events in 25 UK medical schools relating to
timing, duration, teaching formats, teaching content, and problem-based learning. BMC
Medicine 2020;18(126):https://doi.org/10.1186/s12916-020-01571-4.
56. Curtis S, Smith D. A comparison of undergraduate outcomes for students from gateway courses
and standard entry medicine courses. BMC Medical Education
2020;20(4):https://doi.org/10.1186/s12909-019-1918-y.
57. Meng X-L, Rosenthal R, Rubin DB. Comparing correlated correlation coefficients. Psychological
Bulletin 1992;111(1):172-75.
58. Hunter JE, Schmidt FL, Le H. Implications of direct and indirect range restriction for meta-analysis
methods and findings. Journal of Applied Psychology 2006;91(3):594-612.
59. Fife DA, Mendoza JL, Terry R. Revisiting Case IV: A reassessment of bias and standard errors of
Case IV under range restriction. British Journal of Mathematical and Statistical Psychology
2013;66(3):521-42.
60. McManus IC, Dewberry C, Nicholson S, et al. The UKCAT-12 study: Educational attainment,
aptitude test performance, demographic and socio-economic contextual factors as
predictors of first year outcome in a collaborative study of twelve UK medical schools. BMC
Medicine 2013;11 :244:doi:10.1186/741-7015-11-244.
61. McManus IC, Chis L, Fox R, et al. Implementing statistical equating for MRCP(UK) Parts 1 and 2.
BMC Medical Education 2014;14(204):http://www.biomedcentral.com/1472-6920/14/204;
doi:10.1186/472-6920-14-204.
62. Papageorgiou KA, Likhanov M, Costantini G, et al. Personality, Behavioral strengths and
difficulties and performance of adolescents with high achievements in science, literature, art
and sports. Personality and Individual Differences 2020;160:109917.
63. Zimmermann F, Schütte K, Taskinen P, et al. Reciprocal effects between adolescent externalizing
problems and measures of achievement. Journal of educational psychology 2013;105(3):747.
64. Burgess S, Sievertsen HH. Schools, skills, and learning: The impact of COVID-19 on education.
https://voxeu.org/article/impact-COVID-19-education (1st April 2020; accessed 31st May
2020)2020.
65. Carlsson M, Dahl GB, Öckert B, et al. The effect of schooling on cognitive skills. Review of
Economics and Statistics 2015;97(3):533-47.
66. Lavy V. Do differences in schools' instruction time explain international achievement gaps?
Evidence from developed and developing countries. The Economic Journal
2015;125(588):397-424.
67. Hanushek EA, Woessman L. The economic impacts of learning losses (OECD Education Working
Paper). Education Working Papers (OECD Publishing, Paris) 2020;No. 225 doi:
https://doi.org/10.1787/21908d74-e
68. Rowland CA. The effect of testing versus restudy on retention: A meta-analytic review of the
testing effect. Psychological Bulletin 2014;140(6):1432-63.
69. Yang C, Luo L, Vadillo MA, et al. Testing (quizzing) boosts classroom learning: A systematic and
meta-analytic review. Psychological Bulletin 2021
70. Bird S. A-levels fiasco. The Times 2020 August 19th.
... This contrasts with the teacher-assessed grades that were issued in 2021, which represented the grades at which students had evidenced achievement through other assessments (Ofqual, 2022). Previous work by McManus et al. (2021) has shown that performance at university correlates better with attained rather than predicted grades at A-level. Given the general optimism displayed by the 2020 CAGs, this could suggest that some of the 2020 university entrants may under attain at university relative to other cohorts. ...
... On top of this, this issue has been exacerbated in recent times by the global COVID-19 pandemic, as many students for many years did not have to sit exams, and instead were given in ated "teacher assessed grades" [11]. It is reported that these grades were (on the most part) in ated, with one report showing that 44.7% of grades for medical students were overpredicted [12]. For a lot of students, this will have led to an overestimation of individual ability, and a lack of skilldevelopment so students were entering higher education programmes with grades they might not have truly been capable of, and lacking in examination, revision, and problem-solving skills [13]. ...
Preprint
Full-text available
Background One of the most challenging times for a student is their transition to university from previous education, and this transition can be particularly difficult if their expectations vary greatly from what is likely to be genuine when they arrive. In healthcare education, this can be exacerbated as students also need to be safe and professional from the beginning of their academic journey. This highlights the importance of understanding students’ expectations as they join higher education, in order to identify any areas where expectations may need to be managed effectively. Methods This project utilised online surveys of: (1) incoming (transitioning) students (n = 37), (2) current students (n = 21), and (3) academic staff (n = 13) – all involved in healthcare programmes at a UK-based University. The questions were targeted around perceptions of outcomes, performance, workload, extracurricular activities, professionalism and support. Quantitative data were analysed using t-tests and ANOVAs; qualitative data were investigated using inductive thematic analysis. Results Data show that on average, both incoming and current students expect to graduate with a first-class degree but expect that they will ned to work harder than they did previously to achieve it. Data also suggest that overall there is a relatively large agreement in expectation between all three participant groups, however there are mismatches surrounding: the need to attend lectures and what constitutes an achievable grade. In particular, incoming students who reported having a close relative go to university and support them with their expectations were significantly less likely to agree that they need to attend lectures to achieve success. Thematic analyses highlight positive experiences of university align with a healthy work-life balance, good support structures, and good relationships; whereas negative experiences are associated with feelings of isolation, struggling, and experiencing poor mental health. Conclusions Overall these data suggest that best practice in teaching and supporting students could be facilitated through implementing small changes to help support management of student expectations as they join the institution. These include: study skills support, training in understanding grade boundaries in higher education, and reviewing how we utilise timetabled lecture sessions to ensure they are perceived to be maximally useful.
... The global pandemic of COVID-19 has accelerated the popularity of online learning platforms (E-Leaning) [5] [20], while traditional prediction models for offline learning do not apply to the predictive assessment of online learning effectiveness, the literature [21] [22] [23] explored performance prediction approaches for online learning but were only able to perform performance prediction for learning on specific online platforms, and the authors did not discuss methods to integrate with traditional offline teaching performance prediction. At the same time, the dramatic shift in teaching methods has raised concerns among some researchers about the quality of teaching and learning [24] [25]. Since the level of knowledge and competence acquired through online learning is only 1/3 of that of traditional offline teaching [26]. ...
Article
Full-text available
Early warning of student performance is using data analytics to predict future performance and intervene in advance. The popularity of machine learning has improved prediction accuracy. However, most current models only consider subjective student factors without examining the external environment and objective elements. Meanwhile, the global pandemic of Pneumonia has brought serious disruptions to teaching and learning in universities, and existing models cannot cope with this challenge. In this study, we propose a neural network model that integrates various internal and external factors and incorporates data-level sample synthesis and multi-classification cost-sensitive learning methods to achieve early warning of student performance in universities and improve teaching quality and management. Experimental results show that the model can be applied to teaching scenarios with a mixture of online and offline teaching, has higher accuracy than previous prediction mechanisms that only consider some student’s academic characteristics, and outperforms traditional machine learning methods.
... For example, in the UK, those completing their final year of schooling were awarded calculated A-level grades in both 2020 and 2021 rather than sitting formal examinations, in a process fraught with much difficulty and dispute. Initial studies have reported reduced validity of these calculated grades compared to actual grades, and this is likely to cause difficulties for students when progressing into university, and pose issues for higher education institutes, as they may not accurately reflect student ability [5]. Francis et al highlight a variety of issues for incoming students and their new institutions, including variation due to different educational experiences, lack of experience with examinations and formal assessment, and knowledge and practical skills gaps [6]. ...
Conference Paper
Full-text available
This paper reports on a preliminary study that was carried out to understand the experiences of engineering students transitioning to on-campus learning following the Covid-19 pandemic. Two cohorts were considered: year 1 students joining the university for the first time after having experienced considerable disruption for the final two years of their schooling and year 2 students who experienced their first year at university almost entirely online. Data was gathered from student surveys which found that the greatest areas of difficulty for students were the academic level of the programme and the workload. A limited comparison was drawn between this finding and some pre-pandemic data which suggests that the difficulty that students had in this area was higher than for students before the pandemic, indicating that two years of disrupted education may have had a negative impact on students’ preparedness for higher education. Qualitative open-ended responses by students showed that there was a clear preference for face-to-face teaching, but that students see clear benefits to online resources and lecture recordings, and value having some flexibility in how they learn. Some reduction in student performance was noted.
Article
Successful completion of the Intercollegiate Membership of the Royal Colleges of Surgeons (MRCS) examination is mandatory for surgical trainees entering higher specialist training in the United Kingdom. Despite its international reputation, and the value placed on the examination in surgical training, there has been little evidence of its predictive validity until recently. In this review, we present a summary of findings of four recent Intercollegiate studies assessing the predictive validity of the MRCS Part A (written) examination. Data from all four studies showed statistically significant positive correlations between the MRCS Part A and other written examinations taken by surgical trainees over the course of their education. The studies summarised in this review provide compelling evidence for the predictive validity of this gatekeeping examination. This review will be of interest to trainees, training institutions and the Royal Colleges given the value placed on the examination by surgical training programmes.
Article
Full-text available
The Intercollegiate Membership of the Royal Colleges of Surgeons (MRCS) is a high-stakes postgraduate examination taken by thousands of surgical trainees worldwide every year. The MRCS is a challenging assessment, highly regarded by surgical training programmes and valued as a gatekeeper to the surgical profession. The examination is taken at considerable personal, social and financial cost to surgical trainees, and failure has significant implications for career progression. Given the value placed on MRCS, it must be a reliable and valid assessment of the knowledge and skills of early-career surgeons. Our first article 'Establishing the Predictive Validity of the Intercollegiate Membership of the Royal Colleges of Surgeons Written Examination: MRCS Part A' discussed the principles of assessment reliability and validity and outlined the mounting evidence supporting the predictive validity of the MRCS Part A (the multiple-choice questionnaire component of the examination). This, the second article in the series discusses six recently published studies investigating the predictive validity of the MRCS Part B (the clinical component of the examination). All national longitudinal cohort studies reviewed have demonstrated significant correlations between MRCS Part B and other assessments taken during the UK surgical training pathway, supporting the predictive validity of MRCS Part B. This review will be of interest to trainees, trainers and Royal Colleges given the value placed on the examination by surgical training programmes.
Article
The COVID-19 pandemic has created significant challenges for UK schools, but a time of cancelled exams and uncertainty around future examinations can provide opportunities to explore novel assessment methods. Hence, the 2020 proposal of the Ofqual algorithm which combines teachers' estimated grades and schools' historical performance seemed timely. However, the algorithmically calculated grades resulted in a public backlash and withdrawal of the proposal. While the failed Ofqual algorithm could be considered an example of AI, we do not yet have a thorough understanding of its numerical accuracy and how it performs in comparison to other AI models. This paper investigates this novel application: the potential use of a range of AI models as assessment tools in a selective, independent, secondary school in England. The following questions were examined: (1) how accurate are modern AI models in predicting GCSE exam grades? (2) what are the differences in model accuracy across subjects and can these be explained by qualitative differences in teachers' grading practices? Results indicate that while models yield acceptable mean absolute errors, individual mispredictions can be larger than desired. Subject differences highlighted that grading subjectivity is less significant in science, technology, engineering, and maths (STEM) subjects, which could explain why objective models fail to predict non-STEM grades more frequently. In summary, numerical results indicate that grade prediction could be an interesting novel application of AI, but more research is needed to reduce outliers.
Thesis
Full-text available
Completion of the Intercollegiate Membership of the Royal Colleges of Surgeons examination (MRCS) is mandatory for progression in surgical training in the United Kingdom (UK). Until recently, little was known about individual and organisational influences on examination performance. Identifying these factors is of great interest to trainees and training institutions responsible for producing safe and appropriately qualified surgeons. Examination of the literature reveals that three key factors commonly impact performance in other postgraduate medical assessments: differences in teaching and training experiences, academic ability, and sociodemographic differences. Thus, the current thesis aimed to investigate the impact of these three factors on MRCS examination performance. This was achieved by conducting seven longitudinal cohort studies, analysing academic performance data for all UK MRCS candidates over a 13-year period. The studies identified significant differences in MRCS pass rates across UK undergraduate and postgraduate training locations. However, MRCS performance was not related to subjective measures of training quality, and few training institutions appear to offer a degree of incremental value over and above the academic ability of their trainees, with regards to later postgraduate performance. Therefore, these differences are likely related to the most popular training locations recruiting the top academic performers. Prior academic attainment as a measure of academic ability was the strongest predictor of later success at MRCS. However, even after accounting for this, there were significant group-level performance differences at MRCS according to individual differences in personal and social circumstances. This thesis will be of interest to candidates and those in charge of the delivery of surgical training. These findings are already being used to advocate for change in the culture of surgical training and the equitable redistribution of support and resources to those at risk of failing this key assessment. The thesis studies also open up several new questions for future research.
Article
Full-text available
Over the last century hundreds of studies have demonstrated that testing is an effective intervention to enhance long-term retention of studied knowledge and facilitate mastery of new information, compared with restudying and many other learning strategies (e.g., concept mapping), a phenomenon termed the testing effect. How robust is this effect in applied settings beyond the laboratory? The current review integrated 48,478 students' data, extracted from 222 independent studies, to investigate the magnitude, boundary conditions, and psychological underpinnings of test-enhanced learning in the classroom. The results show that overall testing (quizzing) raises student academic achievement to a medium extent (g = 0.499). The magnitude of the effect is modulated by a variety of factors, including learning strategy in the control condition, test format consistency, material matching, provision of corrective feedback, number of test repetitions, test administration location and timepoint, treatment duration, and experimental design. The documented findings support 3 theories to account for the classroom testing effect: additional exposure, transfer-appropriate processing, and motivation. In addition to their implications for theory development, these results have practical significance for enhancing teaching practice and guiding education policy and highlight important directions for future research. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Full-text available
Objective Describe the experiences and views of medical applicants from diverse social backgrounds following the closure of schools and universities and the cancellation of public examinations in the UK due to COVID-19. Design Cross-sectional questionnaire study, part of the longitudinal UK Medical Applicant Cohort Study (UKMACS). Setting UK medical school admissions in 2020. Participants 2887 participants completed an online questionnaire from 8 April to 22 April 2020. Eligible participants had registered to take the University Clinical Admissions Test in 2019 and agreed to be invited to take part, or had completed a previous UKMACS questionnaire, had been seriously considering applying to medicine in the UK for entry in 2020, and were UK residents. Main outcome measures Views on calculated grades, views on medical school admissions and teaching in 2020 and 2021, reported experiences of education during the national lockdown. Results Respondents were concerned about the calculated grades that replaced A-level examinations: female and Black Asian and Minority Ethnic applicants felt teachers would find it difficult to grade and rank students accurately, and applicants from non-selective state schools and living in deprived areas had concerns about the standardisation process. Calculated grades were generally not considered fair enough to use in selection, but were considered fair enough to use in combination with other measures including interview and aptitude test scores. Respondents from non-selective state (public) schools reported less access to educational resources compared with private/selective school pupils, less online teaching in real time and less time studying during lockdown. Conclusions The COVID-19 pandemic has and will have significant and long-term impacts on the selection, education and performance of our medical workforce. It is important that the views and experiences of applicants from diverse backgrounds are considered in decisions affecting their future and the future of the profession.
Preprint
Full-text available
Objective To describe medical applicants' experiences of education and their views on changes to medical school admissions, including the awarding of calculated grades, following the 2020 closure of schools and universities, and the cancellation of public examinations in the United Kingdom due to the COVID-19/coronavirus pandemic. To understand how applicants from diverse social backgrounds might differ in these regards. Design Cross-sectional questionnaire study forming part of the longitudinal United Kingdom Medical Applicant Cohort Study (UKMACS). Setting United Kingdom medical school admissions. Participants 2887 participants (68% female; 64% with at least one degree-educated parent; 63% with at least one parent in the highest socioeconomic group) completed an online questionnaire between 8th and 22nd April 2020. To be invited to complete the questionnaire, participants had to have registered to take the University Clinical Admissions Test (UCAT) in 2019 and to have agreed to be invited to take part in the study, or they needed to have completed one or more previous UKMACS questionnaires. They also need to have been seriously considering applying to study medicine in the UK for entry in 2020 between May and October 2019, and be resident in the UK or Islands/Crown Dependencies. Main outcome measures Views on calculated grades, views on potential changes to medical school admissions and teaching in 2020 and 2021, reported experiences of education following the closure of educational institutions in March 2020. Results Respondents had concerns about the calculated grades that will replace A-level examinations, especially female applicants and applicants from Black Asian and Minority Ethnic (BAME) backgrounds who felt teachers would find it difficult to grade and rank students accurately, as well as those from non-selective state schools and those living in deprived areas who had some concerns about the grade standardisation process. Calculated grades were not considered fair enough by a majority to use in the acceptance or rejection of medical offer-holders, but several measures - including interview and aptitude test scores - were considered fair enough to use in combination. Respondents from non-selective state (public) schools reported less use of and less access to educational resources compared to their counterparts at private/selective schools. In particular they reported less online teaching in real time, and reported spending less time studying during the lockdown. Conclusions The coronavirus pandemic will have significant and long term impacts on the selection, education and performance of our future medical workforce. It is important that the views and experiences of medical applicants from diverse backgrounds are taken into consideration in decisions affecting their futures and the future of the profession.
Preprint
Full-text available
Calculated A-level grades will replace actual, attained A-levels and other Key Stage 5 qualifications in 2020 in the UK as a result of the COVID 19 pandemic. This paper assesses the likely consequences for medical schools in particular, beginning with an overview of the research literature on predicted grades, concluding that calculated grades are likely to correlate strongly with the predicted grades that schools currently provide on UCAS applications. A notable absence from the literature is evidence on whether predicted grades are better or worse than actual grades in predicting university outcomes. This paper provides such evidence on the reduced predictive validity of predicted A-level grades in comparison with actual A-level grades. The present study analyses the extensive data on predicted and actual grades which are available in UKMED (United Kingdom Medical Education Database), a large-scale administrative dataset containing longitudinal data from medical school application, through undergraduate and then postgraduate training. In particular, predicted A-level grades as well as actual A-level grades are available, along with undergraduate outcomes and postgraduate outcomes which can be used to assess predictive validity of measures collected at selection. This study looks at two UKMED datasets. In the first dataset we compare actual and predicted A-level grades in 237,030 A-levels taken by medical school applicants between 2010 and 2018. 48.8% of predicted grades were accurate, grades were over-predicted in 44.7% of cases and under-predicted in 6.5% of cases. Some A-level subjects, General Studies in particular, showed a higher degree of over-estimation. Similar over-prediction was found for Extended Project Qualifications, and for SQA Advanced Highers. The second dataset considered 22,150 18-year old applicants to medical school in 2010 to 2014, who had both predicted and actual A-level grades. 12,600 students entered medical school and had final year outcomes available. In addition there were postgraduate outcomes for 1,340 doctors. Undergraduate outcomes are predicted significantly better by actual, attained A-level grades than by predicted A-level grades, as is also the case for postgraduate outcomes. Modelling the effect of selecting only on calculated grades suggests that because of the lesser predictive ability of predicted grades, medical school cohorts for the 2020 entry year are likely to under-attain, with 13% more gaining the equivalent of the current lowest decile of performance, and 16% fewer gaining the equivalent of the current top decile, effects which are then likely to follow through into postgraduate training. The problems of predicted/calculated grades can to some extent, although not entirely, be ameliorated, by taking U(K)CAT, BMAT, and perhaps other measures into account to supplement calculated grades. Medical schools will probably also need to consider whether additional teaching is needed for entrants who are struggling, or might have missed out on important aspects of A-level teaching, with extra support being needed, so that standards are maintained.
Article
Full-text available
Background: What subjects UK medical schools teach, what ways they teach subjects, and how much they teach those subjects is unclear. Whether teaching differences matter is a separate, important question. This study provides a detailed picture of timetabled undergraduate teaching activity at 25 UK medical schools, particularly in relation to problem-based learning (PBL). Method: The Analysis of Teaching of Medical Schools (AToMS) survey used detailed timetables provided by 25 schools with standard 5-year courses. Timetabled teaching events were coded in terms of course year, duration, teaching format, and teaching content. Ten schools used PBL. Teaching times from timetables were validated against two other studies that had assessed GP teaching and lecture, seminar, and tutorial times. Results: A total of 47,258 timetabled teaching events in the academic year 2014/2015 were analysed, including SSCs (student-selected components) and elective studies. A typical UK medical student receives 3960 timetabled hours of teaching during their 5-year course. There was a clear difference between the initial 2 years which mostly contained basic medical science content and the later 3 years which mostly consisted of clinical teaching, although some clinical teaching occurs in the first 2 years. Medical schools differed in duration, format, and content of teaching. Two main factors underlay most of the variation between schools, Traditional vs PBL teaching and Structured vs Unstructured teaching. A curriculum map comparing medical schools was constructed using those factors. PBL schools differed on a number of measures, having more PBL teaching time, fewer lectures, more GP teaching, less surgery, less formal teaching of basic science, and more sessions with unspecified content. Discussion: UK medical schools differ in both format and content of teaching. PBL and non-PBL schools clearly differ, albeit with substantial variation within groups, and overlap in the middle. The important question of whether differences in teaching matter in terms of outcomes is analysed in a companion study (MedDifs) which examines how teaching differences relate to university infrastructure, entry requirements, student perceptions, and outcomes in Foundation Programme and postgraduate training.
Article
Full-text available
Background: Gateway courses are increasingly popular widening participation routes into medicine. These six year courses provide a more accessible entry route into medical school and aim to support under-represented students' progress and graduation as doctors. There is little evidence on the performance of gateway students and this study compares attainment and aptitude on entry, and outcomes at graduation of students on the UK's three longest running gateway courses with students studying on a standard entry medical degree (SEMED) course at the same institutions. Methods: Data were obtained from the UK Medical Education Database for students starting between 2007 and 2012 at three UK institutions. These data included A-levels and Universities Clinical Aptitude Test scores on entry to medical school and the Educational Performance Measure (EPM) decile, Situational Judgement Test (SJT) and Prescribing Safety Assessment (PSA) scores as outcomes measures. Multiple regression models were used to test for difference in outcomes between the two types of course, controlling for attainment and aptitude on entry. Results: Four thounsand three hundred forty students were included in the analysis, 560 on gateway courses and 3785 on SEMED courses. Students on SEMED courses had higher attainment (Cohen's d = 1.338) and aptitude (Cohen's d = 1.078) on entry. On exit SEMED students had higher EPM scores (Cohen's d = 0.616) and PSA scores (Cohen's d = 0.653). When accounting for attainment and aptitude on entry course type is still a significant predictor of EPM and PSA, but the proportion of the variation in outcome explained by course type drops from 6.4 to 1.6% for EPM Decile and from 5.3% to less than 1% for the PSA score. There is a smaller significant difference in SJT scores, with SEMED having higher scores (Cohen's d = 0.114). However, when measures of performance on entry are accounted for, course type is no longer a significant predictor of SJT scores. Conclusions: This study shows the differences of the available measures between gateway students and SEMED students on entry to their medical degrees are greater than the differences on exit. This provides modest evidence that gateway courses allow students from under-represented groups to achieve greater academic potential.
Article
Full-text available
Background Children in the UK go through rigorous teacher assessments and standardized exams throughout compulsory (elementary and secondary) education, culminating with the GCSE exams (General Certificate of Secondary Education) at the age of 16 and A‐level exams (Advanced Certificate of Secondary Education) at the age of 18. These exams are a major tipping point directing young individuals towards different lifelong trajectories. However, little is known about the associations between teacher assessments and exam performance or how well these two measurement approaches predict educational outcomes at the end of compulsory education and beyond. Methods The current investigation used the UK–representative Twins Early Development Study (TEDS) sample of over 5,000 twin pairs studied longitudinally from childhood to young adulthood (age 7–18). We used teacher assessment and exam performance across development to investigate, using genetically sensitive designs, the associations between teacher assessment and standardized exam scores, as well as teacher assessments’ prediction of exam scores at ages 16 and 18, and university enrolment. Results Teacher assessments of achievement are as reliable, stable and heritable (~60%) as test scores at every stage of the educational experience. Teacher and test scores correlate strongly phenotypically (r ~ .70) and genetically (genetic correlation ~.80) both contemporaneously and over time. Earlier exam performance accounts for additional variance in standardized exam results (~10%) at age 16, when controlling for teacher assessments. However, exam performance explains less additional variance in later academic success, ~5% for exam grades at 18, and ~3% for university entry, when controlling for teacher assessments. Teacher assessments also predict additional variance in later exam performance and university enrolment, when controlling for previous exam scores. Conclusions Teachers can reliably and validly monitor students’ progress, abilities and inclinations. High‐stakes exams may shift educational experience away from learning towards exam performance. For these reasons, we suggest that teacher assessments could replace some, or all, high‐stakes exams.
Article
In everyday school life, teachers need a wide range of judgment competencies to accurately assess student characteristics, learning and task requirements. The purpose of this literature review is to synthesize the methodological, empirical, theoretical, and practical knowledge from 40 years of research on the accuracy of teacher judgments. We define the accuracy of teacher judgments and differentiate the term from other related constructs. We explain the methodological approaches and summarize the main research findings on the accuracy of teacher judgments of student characteristics and task difficulties. Furthermore, we empirically demonstrate that teachers tend to overestimate student achievement on standardized tests. We discuss possible moderators of teachers' judgment accuracy and show the effects on teaching and the learning of students. We present the main theoretical approaches that can explain the empirical findings and describe ways to improve teacher judgment accuracy. In the discussion, we address important implications for research and practice.
Article
Individual variation in personality is related to differences in behavioral difficulties and achievement in unselected samples, and in samples selected for high achievement in various domains. This is the first study to explore and compare the connections between self-report measures of personality (Big Five and Dark Triad), behavioral strengths and difficulties, and school achievement in four tracks of high-achieving adolescents (N = 1,179) selected based on their exceptional performance in: Science, Arts, Sports and Literature. Personality was more strongly related to behavioral strengths and difficulties than to achievement in all tracks. As such, personality traits may be indirectly linked with achievement via behavioral strengths and difficulties. For example, narcissism correlated negatively with behavioral difficulties but did not significantly correlate with achievement. However, achievement was correlated negatively with behavioral difficulties. Network analyses indicated that teacher-awarded grades, but not anonymous exam grades, were weakly connected with personality. Specifically, teachers awarded higher grades to students with more ‘desirable’ personality traits such as high agreeableness. Results also showed track differences in the networks of personality, behavior and achievement. These findings are discussed in the context of personality as a resilience factor against behavioural difficulties and as a contributor to school achievement in gifted adolescents.