ArticlePDF Available

Supporting Students Who Experience Cultural Bias in Standardized Tests

Authors:

Abstract

Cultural test bias is well-documented in standardized testing (Health, 1989; Helms, 1992; Veale & Foreman, 1983). At the core of cultural test bias is Eurocentrism, which situates European values as the standard upon which other cultures are compared (Helms, 1989). Thus, many tests have been standardized on predominantly Caucasian samples and give differential favoritism to Caucasian Americans and other groups who naturally use or can develop a similar style of thinking (Helms, 1992; Scarr, 1988). Standardized tests are typically required for admittance, midpoint, and exit exams in higher education and decisions based on these tests unfairly stereotype minoritized students (Helms, 2006). We introduce differential item functioning (DIF) which is used as one statistical tool to detect cultural bias in standardized tests (Camilli, 2013; Clauser & Mazor, 1998; Sackett, Schmitt, Ellingson, & Kabin, 2001; Santelices & Wilson, 2010). In this paper, we will provide an overview of cultural bias, fairness, and methods for controlling these issues in standardized testing. Moreover, we provide implications for higher education institutions in recruiting, admitting, and supporting students who may experience cultural test bias.
4ACPA—College Student Educators International
Imagine the day you rst took your college entrance
exam, perhaps the SAT or ACT. Picture yourself as a
teenager walking into the exam room and nding your
seat with your number 2 pencils and scratch paper in
hand. You are nervous, perhaps it is early in the morning
so you are still waking up, and there is a chill in the
room from the air conditioner. Your heart is pumping
because you know the stakes are so high to get into
the college of your dreams and you should have spent
more time studying. Now imagine looking at all of the
other test takers in the room and noticing that no one
else looks like you. Your skin color does not match that
of the exam proctors either. Now you are unsure if the
proctor gave you that stern look because they too are
tired or because they do not think you belong there.
Suddenly, you feel out of place and not good enough,
because somewhere in the back of your mind you
believe that people of your skin color are not good at
math and are lucky just to get into the local community
college. “Time starts now,” the proctor says as your
hands tremble and you begin the rst exam question.
Depending on the messages that one grows up with
about their cultural group they may experience high
stakes testing situations dierently, resulting in varying
levels of performance. In this paper, we discuss the
high stakes situations of taking standardized tests in
multiple stages of a student’s educational career. First,
we introduce relevant concepts that factor into cultural
bias in standardized testing. Next, we review statistical
procedures that test developers have used to detect
and minimize test bias. Finally, we conclude with a
discussion of implications for student aairs practice.
CULTURAL BIAS IN STANDARDIZED
TESTS
Standardized tests are those “in which the questions,
the scoring procedures, and the interpretation of
results are consistent and which are administered and
scored in a manner allowing comparisons to be made
across individuals and groups” (Benjamin, Miller,
Rhodes, Banta, Pike, & Davies, 2012, p. 7). They
are implemented throughout a student’s educational
career purportedly to assess generic knowledge and
skills, quality of faculty teaching, and student learning
(Benjamin et al, 2012; Brunn-Bevel & Byrd, 2015).
Admissions counselors, academic and/or testing
support services, and employers then make predictions
of student’s future success based on a myriad of
UNDERSTANDING AND SUPPORTING STUDENTS WHO EXPERIENCE
CULTURAL BIAS IN STANDARDIZED TESTS
CORI M. BAZEMORE-JAMES
UnIvERSIty OF GEORGIA
tHItAPA SHInAPRAyOOn
UnIvERSIty OF GEORGIA
JILLIAn MARtIn
UnIvERSIty OF GEORGIA
Cultural test bias is well-documented in standardized testing (Health, 1989; Helms, 1992; Veale & Foreman, 1983). At
the core of cultural test bias is Eurocentrism, which situates European values as the standard upon which other cultures
are compared (Helms, 1989). Thus, many tests have been standardized on predominantly Caucasian samples and give
differential favoritism to Caucasian Americans and other groups who naturally use or can develop a similar style of
thinking (Helms, 1992; Scarr, 1988). Standardized tests are typically required for admittance, midpoint, and exit
exams in higher education and decisions based on these tests unfairly stereotype minoritized students (Helms, 2006). We
introduce differential item functioning (DIF) which is used as one statistical tool to detect cultural bias in standardized
tests (Camilli, 2013; Clauser & Mazor, 1998; Sackett, Schmitt, Ellingson, & Kabin, 2001; Santelices & Wilson, 2010). In
this paper, we will provide an overview of cultural bias, fairness, and methods for controlling these issues in standardized
testing. Moreover, we provide implications for higher education institutions in recruiting, admitting, and supporting
students who may experience cultural test bias.
Commission for Academic Support Monograph 5
standardized tests (e.g., IQ tests, statewide tests,
college and graduate program entrance and exit exams,
job aptitude tests, etc.; McMahon, 2015). Because the
predictions from these test scores have a massive impact
on students’ lives, test developers should minimize any
bias and measurement error in standardized tests. Test
evaluators determine if and where students are accepted
into college and graduate school, whether they must
take remedial courses, whether they can remain in or
complete current higher education programs, and if and
where they will be hired post-graduation.
While an ongoing debate ensues as to whether
standardized tests are adequate in making these
predictions, there is also an issue of varying subgroup
(e.g., groups based on race/ethnicity, culture, language,
gender, socioeconomic status, etc.) mean scores
(AERA, APA, & NCME, 1999). In this paper, we focus
on a particular discrepancy in test outcomes that occurs
based on ethnic/racial group membership. For instance,
African Americans tend to score one whole standard
deviation below Caucasian Americans on standardized
cognitive ability tests (Aiken, 2003; Onwuegbuzie &
Daley, 2001; Rushton & Jensen, 2005). In fact, there
is a historical rank order of performance on IQ tests
based on racial group membership, in which Asian
Americans score the highest at about 3 points above
Caucasian Americans, who then score approximately
15 points above African Americans, while Latinos and
Native Americans typically fall somewhere in between
Caucasian and African Americans (Onwuegbuzie et
al, 2001). This discrepancy between test scores are
attributed to construct irrelevant factors such as racial
group membership (Beutler, Brown, Crothers, Booker,
& Seabrook, 1996; Helms, Jernigan, & Mascher, 2005).
This could also indicate that unfairness not only aects
the individuals in the lower scoring group but also “that
at least some individuals in the higher scoring group
benet from whatever unfairness potentially underlies
the racial-group mean dierences” (Helms, 2006).
The dierence in racial subgroup mean scores
mimics the intended outcomes of the original
standardized IQ tests, with exception to Asian
Americans. Such tests were invented in the 1910s to
demonstrate the superiority of rich, U.S.-born, White
men of northern European descent over non-Whites
and recent immigrants (Gersh, 1987). By developing
an exclusion-inclusion criteria that favored the
aforementioned groups, test developers created a
norm “intelligent” (Gersh, 1987, p. 166) population
“to dierentiate subjects of known superiority from
subjects of known inferiority” (Terman, 1922, p. 656).
While such blatant racism is less common today, a
problematic outcome exists: Eurocentrism. Eurocentrism
is “a perceptual set in which European and/or European
American values, customs, traditions and characteristics
are used as exclusive standards against which people
and events in the world are evaluated and perceived”
(Helms, 1989, p. 643). Although psychometricians are
aware of possible cultural biases, tests overwhelming
favor a Eurocentric cultural way of thinking. White
psychometricians, who have been interpersonally and
professionally socialized in Eurocentric environments,
have created the cognitive ability tests and standardized
them on similarly socialized White samples (Helms,
1992). Therefore, such tests would give dierential
favoritism to Whites and other groups who naturally
use or can develop a similar style of thinking (Scarr,
1988). This advantage would not occur because Whites
are more intelligent than other groups, but because
they created an articial ination in their favor and a
devaluing of the intelligence of other culturally based
ideologies (Helms, 1992, 2006; Prilleltensky, 1989).
A rigid adherence to this practice in testing “adversely
impacts the groups for whom the norm is foreign…
[and] it potentially deprives society of the kinds of
diversity in intellectual functioning that might lead to a
better society” (Helms, 1992, p. 1091).
Although it is true that psychometricians have
procedures to minimize test bias, these procedures
may not screen out biased items if test developers use
samples that do not reect culturally diverse groups.
For example, say test developers use a mostly Western
culture sample to measure job performance in which
they remove items that are biased against the majority
of test takers in the sample. So while biased items have
been removed, how applicable could this test be for
measuring job performance in Eastern cultures? Items
that contain cultural knowledge or norms may not be
6ACPA—College Student Educators International
generalizable across cultures.
Therefore, careful review of test fairness is of
great importance. Denitions of test fairness generally
include giving all test-takers, no matter their group
memberships, the exact same test and testing procedure,
except for in cases of physical or learning disabilities
in which the test-taker can be given reasonable
accommodations to ensure equity in the testing process
(AERA et al, 1999; Camilli, 2013; Kane, 2010).
Testing conditions and content should also be free of
stereotyping, culturally oensive material, and other
negative implications to ensure that a test measures
what it is intended to measure (i.e. content validity)
across dierent racial groups (Camilli, 2013). Thus,
Camilli (2013) called for the use of a sensitivity review
of new testing programs to avoid statistical bias and
faulty interpretation of test scores from the accidental
usage of cultural insensitivity. He also suggested that
tests should include multiple types of measurement to
insure fairness. In the case of classroom assessment,
he proposed that testers consider “the strength of the
link between assessment and instruction, opportunity to
learn, sensitivity of assessment procedures to cultural
and religious dierences, and the use of multiple
measures” (p. 116).
Regardless of attempts to minimize test bias,
item bias often takes place. Item bias occurs when
some unintended characteristic of a test item gives an
unfair advantage to one subgroup of examinees over
another (Clauser & Mazor, 1998). Culturally biased
test items “have characteristics that are unrelated to
the achievement construct being measured but are
sensitive for particular cultural groups and aect their
performance” (Banks, 2006, p. 115). For example, this
occurs when members of racial subgroups interpret
response options on a multiple choice test in dierent
ways than anticipated by the test developers (Health,
1989; Veale et al, 1983). An investigation into the
fairness of a test would be for the purpose of “sort[ing]
out whether the reasons for group dierences are due to
factors beyond the scope of the test (such as opportunity
to learn or level of achievement) or articially dependent
on testing procedures” (Camilli, 2013, p. 108). In the
next section, we review a common statistical approach
of evaluating test fairness.
METHODS AND STATISTICAL
ANALYSES FOR TEST FAIRNESS
To study cultural fairness in standardized
testing, item response theory (IRT) allows researchers
to examine whether test items measure the underlying
trait, or true performance (Camilli, 2013; Peneld &
Camilli, 2007). In IRT, dierential item functioning
(DIF) is an analysis that assumes the same underlying
trait and ability among test takers. If test scores
dier from extraneous variables (e.g., race, culture,
socioeconomic status, etc.) other than the ability that
was intended to be measured, the tests are biased.
DIF is often used to determine test fairness and can
reveal a signicant dierence of success rates (i.e., the
probability that members from dierent groups answer
an item with dierent success rates) between subgroups
(Camilli, 2013; Clauser et al, 1998).
Although DIF can detect a cultural bias, it does
not always necessarily indicate a biased test. DIF also
allows test developers to identify the biased order of test
items (e.g., Çokluk, Gül, & Doğan-Gül, 2016), or the
diculty of items. It is often benecial to use dicult
items to dierentiate between people who understand
the materials at a deeper level from people who do not.
In other words, DIF can occur if the dierent success
rates indicate that people who deeply understand the
materials correctly answer dicult items while people
who supercially understand the materials incorrectly
answer dicult items. However, items are biased if
people with the same ability, but from dierent racial
or cultural backgrounds, answer test items at dierent
success rates because it may indicate that the test does
not necessarily measure the true ability or performance.
Thus, it is necessary for a diverse group of experts to
review these items with DIF to ensure the cultural
fairness of the test (Camilli, 2013; Clauser et al, 1998;
Huang & Han, 2012; Peneld et al, 2007; Perrone,
2006).
Regardless of expert reviews in test development,
various tests still show cultural bias. For example,
the Beck Depression Inventory (BDI) was created to
screen patients with depression, regardless of culture,
language, and gender. However, the BDI showed
dierent responses when it was translated from English
Commission for Academic Support Monograph 7
to Spanish. Spanish speakers tended to agree or disagree
more strongly to some items than English speakers
because of cultural dierences (Kerr & Kerr Jr., 2001).
A translation can cause the test bias by changing the
meaning of words or phrases (Huang et al, 2012).
The SAT also contains culturally biased
questions in the verbal section, in particular for sentence
completion and reading comprehension sections.
Empirical studies have shown that African American
and Latino examinees performed better on the sentence
completion and reading comprehension sections
than Caucasian examinees if the questions contained
content relating to their minoritized cultures (Carlton &
Harris, 1992; Schmitt & Dorans, 1990). Other research
has also showed that African American examinees
generally performed worse on dicult verbal and math
questions, but performed better on easy verbal and
math questions in comparison to Caucasian examinees.
This dierence in performance may be due to dierent
interpretation of words based on culture and also
socioeconomic status (Freedle, 2003). On the contrary,
some researchers argued that this cultural bias was
found on SAT questions that were used before the ETS
implemented their DIF screening procedure to detect
any biased SAT questions. Moreover, these researchers
argued that the cultural bias was found with a weaker
DIF method (Dorans, 2004; Dorans & Zeller, 2004).
To resolve the criticisms toward the previous
cultural bias on the SAT, however, a more recent study
demonstrated that some cultural bias still occurred
even after the use of a stronger DIF statistical method
and the use of SAT questions that had been screened
by the ETS’s DIF screening procedure (Santelices &
Wilson, 2010). It was found that Latino and Caucasian
test takers performed similarly on the verbal and math
tests, and African American and Caucasian test takers
performed similarly on the math test. However, African
Americans performed better on hard verbal questions,
but worse on easy verbal questions in comparison to
Caucasians (Santelices et al, 2010). Similar eects of
DIF on hard and easy questions are not limited to SAT
and GRE, but have also appeared on other tests, such as
the Civic Education Study (CES) which assesses basic
knowledge about the U.S. government and Constitution
(Scherbaum & Goldstein, 2008). In response to varying
performances among subgroups in standardized tests,
test developers can utilize some strategies to reduce
the subgroup dierence without compromising the test
validity. The next section discusses the strategies to
reduce the subgroup dierence and the implications.
DISCUSSION
Cultural bias in standardized testing is an important
consideration for access and equity in higher education.
By eectively predicting students’ educational
pathways, these tests aect the social mobility and
individual agency for students. By continuing to use
testing that is culturally biased, institutions perpetuate
inequity in education as these tests remain a barrier
for students accessing higher education. We argue that
higher education administrators have a moral obligation
to adjust their dependence on testing in admissions and
placement decisions in their eorts to ensure inclusive
excellence (Williams, Berger, & McClendon, 2005).
Student aairs practitioners are uniquely situated to be
advocates for these considerations in culturally biased
tests in their expertise on students in higher education
and whose core competencies include holistic
development, student learning, and social justice
(ACPA, 1994; ACPA & NASPA, 2015).
Additionally, the Council for the Advancement
of Standards in Higher Education (CAS) is “the pre-
eminent force for promoting standards in student aairs,
student services, and student development programs”
(About CAS, 2016). CAS is made up of member
associations that are part of higher education broadly,
and depending on the institution, divisions of student
aairs specically ensure student development and
learning through standards of practice for 45 functional
areas. In 2014, CAS included the National College
Testing Association (NCTA) and National College
Learning Center Association (NCLC) in its member
associations and included standards of practice for
practitioners working in these areas in its ninth edition
(CAS Sta, 2014; CAS, 2015). CAS recognizes testing
and academic support services as critical to institutional
operations in higher education. As such, higher
education professionals should be aware of cultural
8ACPA—College Student Educators International
bias in standardized testing and implement policies and
services for aected students. To this aim, we discuss
several implications for practice in higher education
and student aairs to understanding and addressing the
eects of cultural bias in standardized testing.
Implications for Student Aairs Practice
Central to each of these implications for practice
is the acknowledgement that cultural bias is present in
standardized testing and creates vulnerability for racially
minoritized populations (Stewart & Haynes, 2015;
Steele & Aronson, 1995). This vulnerability happens
when an individual’s “performance suers when the
situation redirects attention needed to perform a task onto
some other concern -- in the case of stereotype threat, a
concern with the signicance of one’s performance in
light of a devaluing stereotype” (Steele et al, 1995, p.
798). As a result, some racially minoritized populations
are disadvantaged by standardized testing relative to
their Caucasian counterparts in the exact same setting
before test administration (Helms, 2006; Steele et al,
1995).
For higher education, there are three important
implications for practice that should be considered:
professional learning opportunities for administrators
regarding cultural bias, consideration of weights and
controls for cultural bias in admissions and placement
decisions, and implementation of programming to
address stereotype threat and provide additional support
for aected populations of cultural bias.
Professional learning opportunities about
cultural bias. Institutions should provide professional
development and learning opportunities for faculty,
sta, and administrators that educate about the eects of
cultural bias in standardized testing. These professional
learning opportunities should be focused on the history
of standardized testing, controversies over their use
in college admissions and post-graduation decisions,
the negative eect of cultural bias in standardized
testing for minority students, and how institutions can
consider alternatives to standardized testing to control
for cultural bias. Further, these opportunities encourage
collaborations between divisions of student aairs,
oces of human resources, and centers for teaching and
learning to create institutional-specic solutions in the
place of standardized testing in admissions decisions.
Admissions and placement decisions: Weights
and controls. Some higher education institutions have
begun to place less weight on standardized testing in
their admissions decisions (i.e. Hampshire College,
New York University, etc.; Sanchez, 2015). As discussed
above, dierential item functioning (DIF) is a statistical
tool that enhance the understanding of fairness bias.
Further, Stewart and Haynes (2015) encouraged the
use of collaborative eorts between students, teachers,
and administrators in primary and secondary schools
that focused on critical multicultural education. This
focus creates a holistic view of education that can
be transferred into higher education and used in
consideration of admissions decisions.
While controversial, race-conscious admissions
policies are attempts by universities to ameliorate the
eects of historical oppression and underrepresentation
of racially minoritized populations in higher education
(Stulberg & Chen, 2013). This has led to several
court cases regarding the extent to which race can be
considered as part of admissions decisions (Grutter
v. Bollinger, 2003; Fisher v. The University of
Texas at Austin, 2016). Similarly, higher education
administrators, particularly those who develop
admissions decision policies, should implore weights
and controls in consideration of standardized testing in
admissions decisions. These weights and controls are
particularly important for minoritized populations by
both race and class. By de-emphasizing standardized
testing in higher education admission decisions,
institutions can send a message to test developers about
the need to reduce cultural bias in standardized tests
and create more equitable testing procedures.
Programming and support services. In addition to
the above measures, education professionals should ensure
that campus-wide programming and support services are
available to students who may be adversely aected by
cultural bias in standardized testing. By implementing
programming throughout the educational pipeline,
students are better able to cope with the negative eects
of stereotype threat and cultural bias they experience. In
addition to programming, support services should be part
of the student’s’ educational career.
Commission for Academic Support Monograph 9
Implications for Workplace
Similar strategies from student aairs practice
to reduce the cultural bias are also applicable to the
workplace context. For instance, employers can include
other measures such as personality test, interpersonal
skills, motivation, and experience that may be good
indicators of the true performance, instead of relying
only on the standardized testing. Moreover, coaching
and training programs (e.g., tutoring, mock interviews,
workshops) can increase employees’ familiarity with
job seeking procedures (Sackett, Schmitt, Ellingson, &
Kabin, 2001).
CONCLUSION
Cultural bias is an ongoing concern in standardized
tests, which students must encounter throughout their
educational careers. Perhaps the ultimate goal is to nd
alternative ways to assess student abilities and future
performance. In the meantime, as discussed in this
paper, there are many ways in which student aairs
professionals and institutions of higher education can
work to alleviate the problem and support students to
create more equitable opportunities.
REFERENCES
Aiken, L. (2003). Psychological testing and assessment (11th ed.). Boston: Allyn & Bacon.
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (AERA, APA, & NCME, 1999). Standards for educational and psychological
testing (2nd ed.). Washington, DC: American Educational Research Association.
American College Personnel Association. (1994). The student learning imperative: Implications
for student aairs. Retrieved from http://www.acpa.nche.edu/sli/sli.htm
American College Personnel Association & National Association of Student Personnel Administrators. (2015).
Professional competency areas for student aairs
educators. Retrieved from http://www.naspa.org/images/uploads/main/ACPA_NASPA
_Professional_Competencies_FINAL.pdf
Banks, K. (2006). A comprehensive framework for evaluating hypotheses about cultural bias in educational
testing. Applied Measurement in Education, 19(2), 115–132.
Benjamin, R., Miller, M. A., Rhodes, T. L., Banta, T. W., Pike, G. R., and Davies, G. (2012, September). The
Seven Red Herrings About Standardized Assessments in Higher Education (NILOA Occasional Paper
No.15). Urbana, IL: University for Illinois and Indiana University, National Institute for Learning Outcomes
Assessment. Retrieved from http://learningoutcomesassessment.org/documents/HerringPaperFINAL.pdf
Beutler, L. E., Brown, M. T., Crothers, L., Booker, K., & Seabrook, M. K. (1996). The dilemma of ctitious
demographic distinctions in psychological research. Journal of Consulting and Clinical Psychology, 64, 892–
902.
Brunn-Bevel, R. J., & Byrd, W. C. (2015). The foundation of racial disparities in the standardized testing era.
Humanity & Society, 39(4), 419-448. doi:10.1177/0160597615603750
Camilli, G. (2013). Ongoing issues in test fairness. Educational Research and Evaluation, 19(2-3), 104–120. doi
:10.1080/13803611.2013.767602
Carlton, S., & Harris, A. (1992). Characteristics associated with dierential item functioning on the Scholastic
Aptitude Test: Gender and majority/minority group comparisons (No. RR-92-64). Princeton, NJ: Educational
Testing Service. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/j.2333-8504.1992.tb01495.x/pdf
CAS Sta. (2014, November 25). CAS welcomes two new member organizations: National College Testing
Association and National College Learning Center Association. Retrieved from http://www.cas.edu/blog_
home.asp?display=35
10 ACPA—College Student Educators International
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify dierentially functioning test
items. Educational Measurement: Issues & Practice, 17(1), 31–44.
Çokluk, Ö., Gül, E., & Doğan-Gül, Ç. (2016). Examining dierential item functions of dierent item ordered test
forms according to item diculty levels. Educational Sciences: Theory & Practice, 16(1), 319–330.
Council for the Advancement of Standards in Higher Education. (2015). CAS professional standards for higher
education (9th Ed.). Washington, DC: Council for the Advancement of Standards in Higher Education.
Council for the Advancement of Standards in Higher Education. (2016). About CAS. Retrieved from http://www.
cas.edu/
Dorans, N. J. (2004). Freedle’s table 2: Fact or ction. Harvard Educational Review, 74(1), 62–72.
Dorans, N. J., & Zeller, K. (2004). Examining Freedle’s claims about bias and his proposed solution: Dated data,
inappropriate measurement, and incorrect and unfair scoring (No. RR-04-26). Princeton, NJ: Educational
Testing Service. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/j.2333-8504.2004.tb01953.x/pdf
Fisher v. University of Texas, No. 14-981 (U.S. June 23, 2016).
Freedle, R. O. (2003). Correcting the SAT’s ethnic and social-class bias: A method for reestimating SAT scores.
Harvard Educational Review, 73(1), 1–43.
Gersh, D. (1987). The corporate elite and the introduction of IQ testing in American public schools. In M. Schwartz
(Ed.) The Structure of power in America: The corporate elite as a ruling class (pp. 163-184). New York, NY:
Holmes & Meier.
Grutter v. Bollinger, 539 U.S. 306, 123 S. Ct. 2325, 156 L. Ed. 2d 304 (2003).
Heath, S. B. (1989). Oral and literate traditions among Black Americans living in poverty. American Psychologist,
44, 367-373.
Helms, J. E. (1989). Eurocentrism strikes in strange places and in unusual ways. The Counseling Psychologist,
17, 643-647.
Helms, J. E. (2006). Fairness is not validity or cultural bias in racial-group assessment: A quantitative perspective.
The American Psychologist, 61(8), 845–59. doi:10.1037/0003-066X.61.8.845
Helms, J. E. (1992). Why is there no study of cultural equivalence in standardized cognitive ability testing?
American Psychologist, 47(9), 1083–1101. doi:10.1037//0003-066X.47.9.1083
Helms, J. E., Jernigan, M., & Mascher, J. (2005). The meaning of race in psychology and how to change it: A
methodological perspective. American Psychologist, 60, 27–36.
Huang, J., & Han, T. (2012). Revisiting dierential item functioning: Implications for fairness investigation.
International Journal of Education, 4(2), 74–86. doi:10.5296/ije.v4i2.1654
Kane, M. (2010). Validity and fairness. Language Testing, 27, 177–182.
Kerr, L. K., & Kerr Jr., L. D. (2001). Screening tools for depression in primary care: The eects of culture, gender,
and somatic symptoms on the detection of depression. Western Journal of Medicine, 175(5), 349–352.
Peneld, R. D., & Camilli, G. (2007). Dierential item functioning and item bias. In C. R. Rao & S. Sinharay
(Eds.), Handbook of Statistics (Vol. 26, pp. 125–167). Elsevier Science & Technology. doi:10.1016/S0169-
7161(06)26005-X
Perrone, M. (2006). Dierential item functioning and item bias: Critical considerations in test fairness. TESOL &
Applied Linguistics (Vol. 6). Retrieved from http://www.tc.columbia.edu/tesolalwebjournal
McMahon, M. (2015). Psychometrics. Research Starters: Education (Online Edition).
Onwuegbuzie, a. J., & Daley, C. E. (2001). Racial dierences in IQ revisited: A synthesis of nearly a century of
research. Journal of Black Psychology, 27(2), 209–220. doi:10.1177/0095798401027002004
Prilleltensky, I. (1989). Psychology and the status quo. American Psychologist, 44, 795-802.
Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race dierences in cognitive ability. Psychology,
Public Policy, and Law, 11(2), 235–294.
Commission for Academic Support Monograph 11
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment,
credentialing, and higher education. Prospects in a post-armative-action world. The American Psychologist,
56(4), 302–318.
Sanchez, C. (2015, July 28). Is this the beginning of the end for the SAT And ACT? Retrieved from http://www.
npr.org/sections/ed/2015/07/28/427110042/is-this-the-beginning-of-the-end-for-the-sat-and-act
Santelices, M. V., & Wilson, M. (2010). Unfair treatment? The case of Freedle, the SAT, and the standardization
approach to dierential item functioning. Harvard Educational Review, 80(1), 106–133. Retrieved from
https://bearcenter.berkeley.edu/sites/default/les/Wilson #22.pdf
Scarr, S. (1988). Race and gender as psychological variables. American Psychologist, 43, 56-59.
Scherbaum, C. A., & Goldstein, H. W. (2008). Examining the relationship between race-based dierential
item functioning and item diculty. Educational and Psychological Measurement, 68(4), 537–553.
doi:10.1177/0013164407310129
Schmitt, A. P., & Dorans, N. J. (1990). Dierential item functioning for minority examinees on the SAT. Journal
of Educational Measurement, 27(1), 67–81. Retrieved from http://www.jstor.org/stable/1434768
Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans.
Journal of Personality and Social Psychology, 69(5), 797.
Stewart, S., & Haynes, C. (2016). An alternative approach to standardized testing: A model that promotes racial
equity and college access. Journal of Critical Scholarship on Higher Education and Student Aairs, 2(1),
122-136.
Stulberg, L. M., & Chen, A. S. (2014). The origins of race-conscious armative action in undergraduate admissions:
A comparative analysis of institutional change in higher education. Sociology of Education, 87(1), 36-52.
Terman, L. M. (1922). Were we born that way? World’s Work, 44, 655-660.
Veale, J. R., & Foreman, D. I. (1983). Assessing cultural bias using foil response data: Cultural variation. Journal
of Educational Measurement, 20, 249–258.
Williams, D. A., Berger, J. B., & McClendon, S. A. (2005). Toward a model of inclusive excellence and change in
postsecondary institutions. Washington, DC: Association of American Colleges and Universities.
AUTHOR BIOGRAPHIES
Cori M. Bazemore-James (M.S., University of Georgia, 2015) is a doctoral student in the College Student
Aairs Administration program at the University of Georgia and a Southern Regional Education Board Fellow.
Cori hails from the Seneca Nation of Indians and her experience and research interests focus on retention of
Native American college students and Native student identity. Her current research is a qualitative study of
Native American student aairs professionals in Native student support services.
Thitapa Shinaprayoon (M.S., University of Georgia, 2015) is a doctoral candidate at the University of
Georgia. She is interested in the inuence of perception and cognitive biases on choice preferences and
behaviors in basic decision making. Her research interests are applicable to decision making under risky
circumstances and consumer behavior.
Jillian A. Martin (M.Ed., University of Georgia, 2009) is a doctoral candidate in the College Student Aairs
Administration program at the University of Georgia. Her research interests include student aairs practitioner
socialization, professional learning, use of scholarship in practice, and student athlete transition issues. Her dissertation
research is a qualitative case study of student aairs practice at an institution in higher education in Ghana.
... While colleges and universities in most states may consider the race of college applicants explicitly for the pursuit of diversity, they also consider factors associated with prestige and status, such as legacy status, standardized test scores, and high school records that may disadvantage racially minoritized groups in the college applicant pool (Alon & Tienda, 2007;Massey & Mooney, 2007). For example, standardized exams, a common admission factor, have long been criticized for designating familiarity with White cultural orientations as the standard for knowledge (Bazemore-James et al., 2016;Helms, 1992). Challenging tendencies to attribute Black-White performance differences in cognitive ability tests to racial and ethnic group intellectual abilities, Helms (1992) details how standard test components like extolling individual achievement, adhering to rigid time schedules, and favoring quantification and linear problem solving reflect a Eurocentric cultural bias. ...
Article
Full-text available
As US universities attempt to accommodate a growing multicultural society, the task of racially diversifying entering cohorts and retaining a racially diverse student demographic has taken on a leading role in recruitment, college admissions, and campus programming. But, we ask, what definitions of racial diversity are fueling these changes and how have existing racial diversity regimes impacted racial/ethnic hierarchies? We analyze two widespread applications of racial diversity—racial diversity as benefit for all and racial diversity as status marker—to illustrate how racial diversity regimes can be manipulated in ways that undermine the contributions of racially/ethnically minoritized groups and reinforce a racial order that privileges Whiteness—particularly when framed around interest convergence. We conclude by discussing the importance of articulating more concrete racial diversity objectives, addressing structural contributors to racial inequity, and measuring diversity outcomes.
Article
Purpose The primary purpose of this article is to explore the speech-language pathology (SLP) Praxis test, a barrier to culturally and linguistically diverse (CLD) individuals entering the profession, by investigating first-time pass rates and mean scores by test taker race/ethnicity. Other potential barriers to licensure and certification, as well as solutions for mitigating these barriers, will also be addressed. Method SLP Praxis test data from two windows of time, 2008–2011 and 2014–2020, were compared for the following: (a) proportions of test taker race/ethnicity relative to U.S. demographic estimates of racial/ethnic group proportions overall, (b) proportions of racial/ethnic groups, and (c) trends in test-taker mean scores by race/ethnicity. First-attempt pass rates by racial/ethnic groups were also calculated for the 2014–2020 testing window. Results The percentage of some CLD SLP Praxis test-taker groups increased since the 2008–2011 testing window but is still not representative of U.S. racial/ethnic demographics. The first-attempt pass rates and overall mean scores of CLD test-taker groups remained substantially lower than White non–Hispanic/Latinx test takers. Conclusions Despite the encouraging trends in SLP Praxis test-taker racial/ethnic diversity, disparities persist between the racial/ethnic makeup of SLP Praxis test takers and the demographic makeup of the United States. Consequently, these disparities have implications for the continued lack of cultural representativeness seen in our workforce.
Article
Full-text available
Fairness has been the priority in educational assessments during the past few decades. Differential item functioning (DIF) becomes an important statistical procedure in the investigation of assessment fairness. For any given large-scale assessment, DIF evaluation is suggested as a standard procedure by American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. This procedure often affords opportunities to check for group differences in test performance and investigate whether or not these differences indicate bias. However, current DIF research has received several criticisms. Revisiting DIF, this paper critically reviews current DIF research and proposes new directions for DIF research in the investigation of assessment fairness.
Article
Full-text available
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative approaches to test fairness, counterfactual reasoning is useful to clarify a potential charge of unfairness: Is it plausible to believe that with an alternative assessment (test or item) or under different test conditions an individual or groups of individuals may have fared better? Beyond comparative questions, fairness can also be framed by moral and ethical choices. A number of ongoing issues are evaluated with respect to these topics including accommodations, differential item functioning (DIF), differential prediction and selection, employment testing, test validation, and classroom assessment.
Article
Cognitively loaded tests of knowledge, skill, and ability often contribute to decisions regarding educpation, jobs, licensure, or certification. Users of such tests often face difficult choices when trying to optimize both the performance and ethnic diversity of chosen individuals. The authors describe the nature of this quandary, review research on different strategies to address it, and recommend using selection materials that assess the full range of relevant attributes using a format that minimizes verbal content as much as is consistent with the outcome one is trying to achieve. They also recommend the use of test preparation, face-valid assessments, and the consideration of relevant job or life experiences. Regardless of the strategy adopted, it is unreasonable to expect that one can maximize both the performance and ethnic diversity of selected individuals.
Article
The standardization approach to assessing differential item functioning (DIF), including standardized distractor analysis, is described. The results of studies conducted on Asian-Americans, Hispanics (Mexican-Americans and Puerto Ricans), and Blacks on the Scholastic Aptitude Test are described and then synthesized across studies. Where the groups were limited to include only examinees who spoke English as their best language, very few items across forms and ethnic groups exhibited large DIF. Major findings include evidence of differential speededness for Blacks and Hispanics, and when the item content is of special interest, advantages for the relevant ethnic group. In addition homographs tend to disadvantage all three ethnic groups, but the effect of vertical relationships are not as consistent. Although these findings are important in understanding DIF they do not seem to account for all differences. Other variables related to DIF still need to be identified. Furthermore, these findings are seen as tentative until corraborated by studies using controlled data collection designs.
Article
In 2003, the Harvard Educational Review published a controversial article by Roy Freedle that claimed bias against African American students in the SAT college admissions test. Freedle's work stimulated national media attention and faced an onslaught of criticism from experts at the Educational Testing Service (ETS), the agency responsible for the development of the SAT In this article, Maria Veronica Santelices and Mark Wilson take the debate one step further with new research exploring differential item functioning in the SAT By replicating Freedle's methodology with a more recent SAT dataset and by addressing some of the technical criticisms from ETS, Santelices and Wilson confirm that SAT items do junction differently for the African American and Mite subgroups in the verbal test and argue that the testing industry has an obligation to study this phenomenon.
Article
What explains the rise of race-conscious affirmative action policies in undergraduate admissions? The dominant theory posits that adoption of such policies was precipitated by urban and campus unrest in the North during the late 1960s. Based on primary research in a sample of 17 selective schools, we find limited support for the dominant theory. Affirmative action arose in two distinct waves during the 1960s. A first wave was launched in the early 1960s by northern college administrators inspired by nonviolent civil rights protests in the South. A second wave of affirmative action emerged in the late 1960s, primarily as a response to campus-based student protests. Most late-adopting schools were those most favored by the Protestant upper class. Our findings are most consistent with a theoretical perspective on institutional change in which social movements' effects are mediated by the moral and ideological beliefs of key administrators.
Article
The purpose of this article is to present a working definition of the term culture, as well as to describe and demonstrate a comprehensive framework for evaluating hypotheses about cultural bias in educational testing. The framework is demonstrated using 5th-grade reading and language arts data from the Terra Nova test (CTB/ McGraw-Hill, 1999). The example begins at the broadest level of analyzing bundles of items that are thought to illustrate aspects of the Hispanic or Black cultures in the correct or incorrect option and reduces to the analysis of individual distractors. The results show that although Hispanic and Black examinees did not differ from matched White examinees in their bundle scores, Black examinees were differentially drawn to incorrect options that illustrated aspects of their culture in comparison to matched White examinees. Implications of the results are discussed, as well as suggestions for future research.