Content uploaded by Daniel H. Robinson
Author content
All content in this area was uploaded by Daniel H. Robinson on Sep 07, 2022
Content may be subject to copyright.
College entrance exams
A modern “shibboleth”?
The Covid-19 pandemic led many US colleges to drop requirements for
admissions tests. Daniel Robinson and Howard Wainer consider what
the consequences of this decision might be – for students and universities תֶלֹּבׁ ִש
Kaplan Test Prep/Bigstock.com
SIGNIFICANCE
22 June 2022 © 2022 The Royal Statistical Society
the Scholastic Assessment Test (SAT) and
American College Test (ACT), are standardised
tests that score college applicants on writing,
reading, mathematics and other abilities in
an eort to determine their readiness for the
next stage of their education. Colleges will use
these scores in deciding which students to
admit, hoping to pick the best and brightest.
However, critics contend that admissions test
scores are biased against certain groups of
students, and that the scores do not serve the
purpose assigned to them – to identify those
most likely to succeed (see box, “Tests for
dierent purposes”).
Currently, numerous colleges and universities
are considering whether to continue using SAT
and ACT scores in admissions decisions. Thus,
the question that we discuss here is what role
such scores should play in the decision about
where a student can go to college. Or are they
simply a modern “shibboleth”?
Dropping test requirements
During the Covid-19 pandemic, gathering
large numbers of examinees safely in testing
centres was a task of insuperable diiculty.
Remote testing was not possible for some
students because of limited access to
comput ers and t he intern et, and monitorin g
for cheating was impossible, so there were
concerns that the reliability of tests would
be undermined. As a result, about 600 US
colleges simply decided to suspend the
requirement for admiss ions test scores as part
of the application process (bit.ly/3KkazZ4).
Since then, many have been considering
making that suspension permanent. In May
2020, for example, the University of California
board of regents voted unanimously to
discontinue the testing requirement. They
gave themselves 5 years to develop a “better”
standardised test (bit.ly/3Kh8YD7), otherwise
they will drop the testing requirement
permanently for state residents.
Some immediate eects of not requiring
admissions tests appear, at first glance,
to be positive. Selective colleges – those
that admit fewer than 50% of applicants
– experienced substantial increases in
applications in spring 2021 – especially from
first-generation, minority, and low-income
applicants (bit.ly/3KeUOm0). Whether this
increase was solely due to dropping test
requirements is unknown. (As of February
2021, only 44% of applicants submitted
would be killed as a suspected Ephraimite.
Forty-two thousand men met their end this
way. Perhaps the total also included some
Gileadites with a speech impairment.1
In the intervening three millennia we have
learned a great deal more about the proper
use of tests. We know that the more serious
the consequences of the test, the more
reliable and valid must be its scores. But we
also know that – like the “shibboleth” – no
test perfectly measures what it is supposed
to measure. Consider US college admissions
tests, for example. The most common,
One of the earliest documented uses
of a test for selection purposes is
described in the Bible, Judges 12:4–
6, where, during the time of David,
aer the Gileadites captured the fords of the
Jordan leading to Ephraim, they developed a
one-item verbal test to determine the tribe to
which the survivors of the battle belonged. If
a survivor said, “Let me cross over”, the men
of Gilead would ask, “Are you an Ephraimite?”
If he said no, the survivor would be asked
to say “s hibboleth”. If he pronounced it
incorrectly as “sibboleth”, the survivor
Kaplan Test Prep/Bigstock.com
23
June 2022 significancemagazine.com
ADMISSIONS
TESTS
test scores, compared to 77% in 2020.) In
any case, this increase in applications is a
welcome counter to the 4.5% pandemic-
induced decline in enrolments during the
2020–21 academic year, compared to the
previous year (bloom.bg/3OzdEaZ). Given
such encouraging results, why do we need
admissions tests?
Unpopular, but necessary
Admissions tests have never been particularly
popular, but lately criticisms of them have
increased. Writing in the Chronicle of Higher
Education, Jon Boeckenstedt referred to them
as “pseudo-academic factors” that add “almost
nothing to an admission oicer’s ability to
predict an individual student’s academic
performance in college” (bit.ly/38w730t).
Several arguments have been made for
getting ri d of admissi ons tests in favo ur of test-
optional and test-free admissions. (See Sackett
et al. for a more thorough treatment of these
arguments.2) Three are particularly relevant
for our purposes here, and for each we oer
some refutations.
Admissions tests do not predict
college performance
This is the most common argument, but it is
incorrect. The correlation between SAT score
and first-year college grade point average
(GPA) has been estimated to be 0.55, which
is about the same as the correlation between
high school GPA and college GPA.3 There is an
even stronger positive relationship between
a college’s average admissions test score
and its six-year graduation rate. Using data
from over 1,000 colleges from the National
Center for Educational Statistics’ Integrated
Postsecondary Education Data System, the
correlations between graduation rate and
ACT Composite Score or SAT Verbal + Math
are about 0.8.
Admissions tests are merely a proxy for
student wealth
The correlation between a student’s SAT
score and socio-economic status (SES)
is 0.42. But do SAT and SES account for
basically the same variance in first-year
college GPA scores? Not really. Aer removing
SES influence from SAT scores, the overall
predictive validity of the latter is reduced
by only about 0.03. Thus, most of what
admissions tests measure is something that
is not student wealth.2
Admissions tests are biased
This conclusion is drawn because admissions
test scores dier among racial groups.
From this fact, critics infer that admissions
tests are one of the barriers that prevent
underrepresented minorities from getting
a college degree. First, dierences in scores
among racial groups are a serious matter,
but they are not only seen in admissions test
scores; similar dierences are observed in
tests administered throughout elementary
40353025201510
Hispanic Students
Composite ACT Score - 25th Percent ile
Graduation Rate (%)
Graduation rate = 3.6 x ACT Score - 20.9
r = 0.9
20
40
60
80
100
0
353025201510
0
20
40
60
80
100
Black - non Hispanic Students
Composite ACT Score (25th percentile)
Graduation Rate (%)
Graduation rate = 3.6 x ACT Score - 25.3
r = 0.8
!"#$"%&'()*!+)%,"-()./0'1)$(-,(2'&3(4
5-6786'&"2)-6'().94:);36,<).2"2=>&%$62&,4)%'87(2'%
5-6786'&"2)-6'()?)@AB)C)*!+)%,"-()D)/0A@))
-)?)EAF
.,4
Te st s f o r d i e r e nt p ur p o se s
Although it would seem that colleges that accept all of their applicants, like most
community colleges and for-profits, would have no use for admissions tests, this is not so.
Test scores can still serve an important role in helping both the student and the college
decide on a curriculum that would best serve the student’s education. At the other
extreme, elite colleges that accept fewe r than 5% of applican ts use adm issions tests very
dierently. Such colleges are searching for students whose academic performance allows
them to flourish in a very rigorous environment. Such applicants, although oen obvious
within their individual high schools, may be harder to find by highly selective colleges
which oen draw their student bodies from an international pool and regional quotas.
Expe rie nce (e. g., th e Na tio nal Mer it Sch olar shi p P rogra m) has sh own tha t us ing a well -
designed test for initial screening and searching is an eicient, practical, and valid tool.
But because dierent colleges have dierent purposes, it is not obvious that the same
admissions test could (or should) serve all uses with equal eicacy.
353025201510
0
20
40
60
80
100
All Students
Composite ACT Score - 25th Percent ile
Graduation Rate (%)
Graduation rate = 3.2 x ACT Score - 6.7
r = 0.9
!"#$"%&'()*!+)%,"-()./0'1)$(-,(2'&3(4
5-6786'&"2)-6'().94:)633)%'87(2'%
5-6786'&"2)-6'();)<=/)>)*!+)%,"-()?)@=A))
-);)B=C
.64
353025201510
0
20
40
60
80
100
All Students
Composite ACT Score - 25th Percent ile
Graduation Rate (%)
Graduation rate = 3.2 x ACT Score - 6.7
r = 0.9
40353025201510
Hispanic Students
Composite ACT Score - 25th Percent ile
Graduation Rate (%)
Graduation rate = 3.6 x ACT Score - 20.9
r = 0.9
20
40
60
80
100
0
!"#$"%&'()*!+)%,"-()./0'1)$(-,(2'&3(4
5-6786'&"2)-6'().94:);&%$62&,)%'87(2'%
5-6786'&"2)-6'()<)=>?)@)*!+)%,"-()A)/B>C))
-)<)B>C
.D4
SIGNIFICANCE
24 June 2022
ADMISSIONS
TESTS
and high school as part of “high-stakes”
testing,4,5 and in high school GPA as well
(bit.ly/3KeXYGo). These likely reflect the
legacy of systemic racism and society’s
unequal allocation of resources.6 Second,
the fact that there are dierences in scores
among racial groups does not, by itself, make
a test biased. For true test bias to occur, one
needs to show that the tests have dierent
predictive validities for dierent groups. The
SAT and ACT predict college GPAs that are
higher than actual college GPAs for black and
Hispanic students, while underpredicting for
white and Asian students.2 Finally, getting
rid of admissions tests and instead basing
admission decisions on other measures does
not remove bias from the equation.
Consequences of a test ban
Suppose colleges ignore the above
refutations and ban admissions tests anyway.
What can we expect as a result? As has
already been documented, getting rid of
admissions tests may contribute to increases
in the numbers of applications most colleges
receive from underrepresented minorities. A
more diverse student body is a positive and
we all benefit from such heterogeneity.
But if we eliminate admissions tests and
their standardised measurement, colleges
are le with high school GPA, class rank,
admissions essays, recommendation
letters, résumés showing extracurricular
involvement, etc., to use in admission
decisions. All of these have a measure
of subjectivity that requires heroic and
untestable assumptions to justify using
them to make comparative decisions
among students. Admissions tests, on the
other hand, are more reliable and also both
objective and standardised.
Admissions tests can predict college GPA
with only about 2 hours of data gathering,
whereas it takes over 3 years to gather the
information to compute high school GPA/
rank and its equitable computation is
complicated for students whose records are
incomplete (e.g., because of transferring
from another district or because of extended
absences due to illness or family travel). Such
missing data are much more common among
students who experience a more transient
home life. Thus, high school GPA may not
fairly nor accurately capture the academic
potential of these students.
We would also argue that admissions test
scores are fairer than high school GPA. An
important dierence between using tests
and grades that require human judgement
is that while both can contain biases, tests
are subject to control, like experiments, and
so biases in tests can be discovered and
corrected. Using human judgement is more
like an observational study without any such
user control. Such judgements are just found
“in the street”,7 and we have no way to really
know what they mean. A biased observer
here and there is essentially invisible.
In the past, there were test items
that performed dierently in dierent
subpopulations of students. But such items
can be detected and elided, so that the test is
constantly being improved and made fairer.
Testing companies expend considerable
resources in fairness reviews by subject-
matter experts and in statistical analyses
looking for dierential performance of any
items in dierent subgroups.8
In the past, arguments against admissions
tests were precisely because they were less
biased than the alternatives now being
proposed. For example, Henry Chauncey,
before he founded the Educational Testing
Service, was an admissions oicer at Harvard
and an avid fan of admissions tests, one of
which had then been in use at Columbia for
some time. Chauncey proposed its use to
Harvard President Abbott Lawrence Lowell
who rejected it because it would not exclude
enough Jewish students. He preferred
quotas instead (bit.ly/3EQcjYG). Recently,
Harvard removed Lowell’s portrait from his
eponymous residence hall because of “his
racism, homophobia, anti-Semitism, and
xenophobia” (bit.ly/3KyAjkB).
Figure 1 displays scatterplots (and regression
lines) of 25th percentile ACT scores and six-
year graduation rates for (a) all students, (b)
Hispanic students, and (c) black (non-Hispanic)
studen ts at 266 dierent US univer sities. As the
three graphs convey, regardless of ethnicity,
there is a positive association between college
graduation rate and admissions test scores.
For all students, for every one-point ACT
score increase, the graduation rate increases
by 3.2 percentage points. For both black and
Hispanic students, for every one-point ACT
score increase, the graduation rate increases
by 3.6 percentage points.
Banning admissions tests is likely to
mean that, for the most selective colleges,
there will be no change in overall numbers
of admissions as they are already at full
capacity. If they admit more students than
usual who score lower on admissions tests,
then they may expect lower graduation
rates for those students. The less selective
colleges that have capacity for more students
will admit more students than before.
This increase in students will initially be
welcomed as it helps address colleges’
financial issues. However, similar to the most
selective colleges, if they admit students who
Figure 1: (a) Colleges
with higher ACT scores
have higher graduation
rates for their entire
student bodies. The
same fundamental
relationship is evident
for (b) Hispanic
students and (c)
black (non-Hispanic)
students, although with
somewhat dierent
slopes. Source: 2019
data obtained for 266
R1 and R2 universities
from the Integrated
Postsecondary
Education Data System.
40353025201510
Hispanic Students
Composite ACT Score - 25th Percent ile
Graduation Rate (%)
Graduation rate = 3.6 x ACT Score - 20.9
r = 0.9
20
40
60
80
100
0
353025201510
0
20
40
60
80
100
Black - non Hispanic Students
Composite ACT Score (25th percentile)
Graduation Rate (%)
Graduation rate = 3.6 x ACT Score - 25.3
r = 0.8
!"#$"%&'()*!+)%,"-()./0'1)$(-,(2'&3(4
5-6786'&"2)-6'().94:);36,<).2"2=>&%$62&,4)%'87(2'%
5-6786'&"2)-6'()?)@AB)C)*!+)%,"-()D)/0A@))
-)?)EAF
.,4
25
June 2022 significancemagazine.com
previously would not have been admitted
due to low admissions test scores, then they
may also expect lower graduation rates for
those students.
Even if college leaders are aware that
their relaxed testing criteria will put some
students at risk of failure, what changes will
they make to ensure that these students
return aer year 1 (retention) and graduate
within 4 or 6 years? Will they greatly expand
academic support services for underprepared
students? Or will instructors be pressured
to lower the standards and rigour of their
courses to avoid failing more students than
before? The former plan requires additional
resources (e.g., money, faculty, space) of
which most budgets are sadly lacking, a
situation only made worse by the pandemic.
The latter plan requires malfeasance and was
predicted several years ago.9 It is simply not
possible to increase enrolment by admitting
more underprepared students while
maintaining the same graduation rate and
academic standards.
Adopting a test-optional policy is certainly
not a new idea. One of the conclusions in
a report by the National Association for
College Admission Counseling in 2008 was to
encourage colleges to make the submission
of admissions test scores optional. In 1969,
Bowdoin College, a small, selective, liberal
arts college in Maine, adopted exactly that
policy for its applicants. Although the SAT was
optional for Bowdoin, it was still required by
other schools to which Bowdoin applicants
might also apply. Essentially all of the
students who were accepted and attended
Bowdoin took the SAT – 72% submitted their
scores and 28% did not. Not surprisingly,
overall SAT scores were about a standard
deviation lower for those who did not include
their scores in their application materials
than those who did. The students who did
not submit their scores did predictably worse
as measured by first-year college GPA than
those who submitted scores. These are data
from only one small school, but they are hard
evidence of what colleges can expect if they
adopt a test-optional policy.10
A very dierent story involves an eort
by the University of Nebraska-Lincoln to
increase its national academic reputation
by raising admissions standards in the early
1990s. In 1990, the average ACT composite
score for entering freshmen was 22.5. The
four- and six-year graduation rates were 14%
and 47%, respectively. By 2005, the average
ACT score had increased to 24.9 and the
graduation rates were 32% and 67%. What
was the eect on enrolment? The numbers of
entering freshmen took an immediate hit and
did not surpass those from 1990 until 2006.
But, as of 2011, the numbers of entering
students had been sustained and average
ACT score was 25.3.
States have been pressuring public
colleges for years to increase both
accessibility (state institutions should truly
serve everyone in the state) and four-year
graduation rates (students should not have to
spend more than 4 years to get a degree – it
is too expensive). Unfortunately, the maths
simply does not work for colleges to serve
two masters (admit everyone and graduate
them quickly). The fact that the projected
production of high school graduates will be
flat in a decade (bit.ly/3vkA3RB) will quickly
convert this dilemma into a trilemma (admit
more students, fewer high school graduates,
and graduate them faster).
Getting rid of admissions tests does not fix
this problem. The only thing it guarantees
is the equivalent of lowering the average
admissions test score for those colleges which
admit more underprepared students. This will,
in turn, result in lowering the graduation rate.
An evaluation plan
Recently, one prominent university – the
Massachusetts Institute of Technology –
indicated that the information obtained
from admissions tests is too valuable to be
elided, and it reinstated the requirement
for applicants to submit SAT or ACT scores
(bit.ly/3LtXGwY). Meanwhile, back at the
University of California, at a November 2021
meeting, it was announced that “UC will
continue to practice test-free admissions now
and into the future” (lat.ms/3KfpdAE).
How well will these students do compared
to previous students who were admitted
using test scores? How can we accurately
evaluate such a policy change? Here is what
we suggest. First, we will need some outcomes
we care about and which might plausibly be
aected by our policy change, such as:
■ graduation rates – four- and six-year
graduation rates for everyone and for
subgroups of interest;
■ retention rates – percentage of new
students who return aer their first year;
■ total college enrolment;
■ total college revenue attributed to
enrolment;
■ average debt and loan defaults for students
6 years aer admission, delineated by the
demographic characteristics of the students.
Once the outcomes have been chosen, we
could easily compare the numbers for years
prior to the policy change with those from later
years. But there is always a possibility that
changes are due to something else besides
the new policy (e.g., more or fewer graduating
high school seniors, global pandemic). The
gold standard for such data comparison is
a randomised experiment. There are two
dierent kinds of such experiments:
■ Within-college comparison. Continue to
require admissions tests but, during the
admission process, randomly assign half
the applicants to an experimental group
where the decision is made without using
Daniel Robinson is associate dean
of research, College of Education,
The University of Texas at Arlington.
Howard Wainer is a statistician and author,
and a former principal research scientist at the
Educational Testing Service. His latest book
is A History of Data Visualization & Graphic
Communication (with Michael Friendly).
SIGNIFICANCE
26 June 2022
ADMISSIONS
TESTS
test scores. For the control group, use
the scores in the admission decision as
before. Then compare both groups on
the dependent variables of interest to see
which one does better, and by how much.
■ Between-college comparison. This one will
take some doing and will need to be handled
by a state system with several colleges (e.g.,
the University of Texas). Randomly assign
half the colleges to use tests in admission
decisions and the other half to not use them.
Compare the two groups of colleges on the
outcomes of interest.
This suggestion would allow the easy
assessment of the value of any proposed
new policy.
In the three thousand years since 42,000
Ephraimites were slaughtered on the banks
of the Jordan, much has changed. One
important lesson has been that before
adopting any strategy for making decisions
about people, we must rigorously measure
the eect of that strategy as compared to
other alternatives.
In a June 2020 tweet, President Donald
Trump said testing for Covid-19 “makes
us look bad”. At a campaign rally in Tulsa
the same month, he said he had asked his
“people” to “slow the testing down, please”.
Slowing down or getting rid of admissions
tests will not fix anything, anymore than not
standing on a scale will help you lose weight.
Reduced testing, whether for Covid-19 or
for academic skills, is likely to make things
worse. Ignorance is not bliss.
Disclosure statement
The authors declare no conflicts of interest.
References
1. Wainer, H. (2014) The route to the USMLE: The
Shibboleth of modern medical licensure. Journal of
Medical Regulation, 100(4), 21–28.
2. Sackett, P. R., Borneman, M. J. and Connelly, B. S.
(2008) High-stakes testing in higher education and
employment: Appraising the evidence for validity and
fairness. American Psychologist, 63(4), 215–227.
3. Berry, C. M. and Sackett, P. R. (2009) Individual
dierences in course choice result in underestimation of
college admissions systems validity. Psychological
Science, 20(7), 822–830.
4. Reardon, S. and Galindo, C. (2009) The Hispanic-
White achievement gap in math and reading in the
elementary grades. American Educational Research
Journal, 46(3), 853–891.
5. Shores, K., Kim, H. E. and Still, M. (2020) Categorical
inequality in Black and White: Linking
disproportionality across multiple educational
outcomes. American Educational Research Journal, 57(5),
2089–2131.
6. Wainer, H. (2012) Waiting for Achilles. Chance, 25(4),
50–51.
7. Holland, P. W. (1986) Statistics and causal
inference.Journal of the American Statisti cal Association,
81, 945–970.
8. Holland, P. W. and Wainer, H. (1993) Dierential item
functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
9. Mulvenon, S. W. and Robinson, D. H. (2014) The
paradox of increasing both enrollments and graduation
rates: Acknowledging elephants in the ivory tower.
International Journal of Higher Education, 3, 66–70.
10. Wainer, H. (2011) Uneducated Guesses: Using
Evidence to Uncover Misguided Education Policies.
Princeton, NJ: Princeton University Press.
27
June 2022 significancemagazine.com
SNEHITPHOTO/Bigstock.com