Content uploaded by Joo Young Lee
Author content
All content in this area was uploaded by Joo Young Lee on Oct 28, 2019
Content may be subject to copyright.
https://doi.org/10.1177/1534508418820116
Assessment for Effective Intervention
1 –9
© Hammill Institute on Disabilities 2018
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1534508418820116
aei.sagepub.com
Brief/Psychometric Reports
Mathematics competency requires students to know the
numbers and symbols of mathematics for solving many
types of problems. In addition, it is important for students to
know the words that describe mathematics (Bruner, 1966)
to understand what teachers and peers say and what text-
books and resources mean. Mathematical proficiency,
therefore, requires a knowledge and understanding of math-
ematical language (Schleppegrell, 2007) which is con-
nected to conceptual understanding of content knowledge
and skills (Capraro & Joffrion, 2006). One critical compo-
nent of mathematical language is vocabulary, in which stu-
dents have to interpret terms related to mathematical
concepts or procedures. For example, “a quadrilateral has
four sides” or “the point where the x-axis and y-axis inter-
sect is the origin.”
In a recent study with fifth graders, Forsyth and Powell
(2017) determined that students with the lowest mathemat-
ics computation scores demonstrated the lowest perfor-
mance on a measure of late elementary mathematics
vocabulary. These fifth-grade students also struggled with
defining mathematics terms introduced in grade levels
before fifth grade (e.g., first- and second-grade terms).
Without an understanding of these mathematics-specific
vocabulary terms (e.g., quadrilateral, four, sides, point,
x-axis, y-axis, intersect, origin), it might be difficult for stu-
dents to engage in rich mathematical discussions, compre-
hend written text, understand spoken language, or perform
adequately on assessments. Mathematics vocabulary may
be especially difficult because students are expected to
learn hundreds of different terms in the elementary grades
(Powell, Driver, Roberts, & Fall, 2017). Mathematics
vocabulary may also be challenging because many terms
have multiple definitions in general language with a spe-
cific mathematical meaning (e.g., a contributing factor and
factor as a part of multiplication) or multiple definitions
with two related to mathematics (e.g., square as a shape and
square a number; Rubenstein & Thompson, 2002).
Measures Assessing Mathematics
Vocabulary
General mathematics measures, such as the Test of
Mathematical Abilities (TOMA-3; Brown, Cronin, &
Bryant, 2012), the Test of Early Mathematics Ability
(TEMA-3; Ginsburg & Baroody, 2003), or subtests of the
Woodcock-Johnson IV Tests of Achievement (Schrank,
Mather, & McGrew, 2014), survey student performance
820116AEIXXX10.1177/1534508418820116Assessment for Effective InterventionHughes et al.
brief-report2018
1The Pennsylvania State University, University Park, USA
2University of Texas at Austin, USA
Corresponding Author:
Elizabeth M. Hughes, The Pennsylvania State University, 212a Cedar
Building, University Park, PA 16801, USA.
Email: Emh71@psu.edu
Development and Psychometric
Report of a Middle-School Mathematics
Vocabulary Measure
Elizabeth M. Hughes, PhD1, Sarah R. Powell, PhD2, and Joo-Young Lee, MA1
Abstract
Proficiency with mathematics requires an understanding of mathematical language. Students are required to make sense of
both spoken and written mathematical terms. An essential component of mathematical language involves the understanding
of the vocabulary of mathematics in which students connect vocabulary terms to mathematical concepts or procedures.
In this brief psychometric report, we developed and tested a measure of mathematics vocabulary for students in the late
middle-school grades (i.e., Grades 7 and 8) to determine the reliability of such a measure and to learn how students answer
questions about mathematics vocabulary terms. The vocabulary terms on the measure were those terms determined as
essential by middle-school teachers for success with middle-school mathematical language. Analysis indicates the measure
demonstrated high reliability and validity. Student scores were widely distributed and students, on average, only answered
two-thirds of vocabulary terms correctly.
Keywords
mathematics, language, vocabulary, middle school
2 Assessment for Effective Intervention 00(0)
across mathematical domains and skills. These assessments
provide critical information regarding students’ strengths
and weaknesses, but such assessments do not offer concen-
trated performance information on targeted language skills
(e.g., mathematics vocabulary) directly connected to the cur-
riculum. To gauge student command of mathematical lan-
guage, educators could rely on observations within the
classroom, and it may also be helpful to assess mathematical
language through targeted assessments.
To understand whether students have difficulty with
mathematics vocabulary, which may influence mathemati-
cal language, researchers have developed several mathe-
matics vocabulary measures across preschool and the
elementary grades. For use with preschool students, Purpura
and Logan (2015) developed a 16-item measure of mathe-
matics language. This measure focused on vocabulary
related to comparison (e.g., more, take away) and spatial
reasoning (e.g., near, far). The authors presented correla-
tions greater than .60 between the mathematics language
measure and measures of general vocabulary and mathe-
matics content. Internal consistency of the measure was
strong (α = .85). For use in first grade, Powell and Nelson
(2017) created a 64-item measure of mathematics vocabu-
lary featuring common mathematics vocabulary terms from
the early elementary grades (e.g., add, cone, equal shares,
outside, rhombus, take away). Cronbach’s alpha was .85
and mathematics vocabulary scores shared strong correla-
tions with general vocabulary (r = .70) and mathematics
fluency (r = .59) scores. Moving beyond first grade, Powell
et al. (2017) designed a measure of mathematics vocabulary
for the late elementary grades. This measure, featuring 133
mathematics vocabulary terms, showed strong reliability at
third grade (α = .92) and fifth grade (α = .96). Similar to
first grade, scores of the measure of mathematics vocabu-
lary were correlated with general vocabulary and mathe-
matics computation scores with all correlations greater than
.60. In fifth grade, students with learning difficulties per-
formed significantly lower on the measure of mathematics
vocabulary, which indicated how lower performance on
measures of mathematics vocabulary may be related to
lower overall mathematics performance (Forsyth & Powell,
2017). A well-developed vocabulary measure may help
educators make sense of students’ mathematics vocabulary
knowledge which could lead to targeting the mathematical
language needs of students in mathematics classes.
Purpose and Research Questions
Previous efforts support the use of specific mathematics
vocabulary assessments in preschool and the elementary
grades to inform educators and researchers about students’
mathematical language knowledge. This study extends the
existing body of mathematics vocabulary assessment
research to middle grades. In this study, we surveyed the
extensive list of mathematics vocabulary in middle-school
textbooks to develop a measure of mathematics vocabulary
for use in the late middle-school grades. Our research ques-
tion was the following: What is the reliability and validity
of a middle-school mathematics vocabulary measure?
Method
Identification of Potential Vocabulary Terms
To create a measure of middle-school mathematics vocabu-
lary, we followed assessment development steps similar to
Powell and Nelson (2017) and Powell et al. (2017). We
designed the measure with the intent to be useful across the
middle-school grades and curricula. To understand which
mathematics vocabulary terms could be included on an
assessment for middle schoolers, we first identified over
450 distinct mathematics vocabulary terms in seventh-
grade textbook glossaries; the number was slightly over 500
in eighth-grade glossaries. We used glossaries from two
major textbooks (i.e., GoMath! by Houghton Mifflin and
Texas Mathematics by McGraw Hill). We compiled the sev-
enth- and eighth-grade terms into one database, with a total
of 599 terms once duplicate terms were removed.
Our next step was to reasonably reduce the list of vocab-
ulary terms by evaluating the likelihood of students encoun-
tering each term in a mathematics class. To do this, we cross
referenced the Common Core State Standards (National
Governors Association Center for Best Practices & Council
of Chief State School Officers, 2010) for the two grade lev-
els. We compiled the data, considering how often the term
appeared across both textbooks, both grade levels, and the
Common Core standards, to create a list of 213 middle-
school mathematics vocabulary terms emphasized in sev-
enth and eighth grade.
We elicited classroom teachers to have a voice in the
design of the measure early in the process. We electroni-
cally shared the list of 213 mathematics terms with teachers
of middle-school and high-school students. We imple-
mented a convenience sampling, sharing the survey link
with teachers with whom we had previously worked and
asking them to share it with colleagues. The survey direc-
tions asked responders to identify the 50 words they thought
were most important for students to learn to be successful in
mathematics. Responders selected the most important
words without pressure to rank or order the importance of
the vocabulary terms. Overall, 68 teachers from at least
seven states, including the three states where the pilot study
took place, responded to the survey. Survey responses were
coded by the frequency of unique selections as a most
important term. For example, the term expression was iden-
tified by 53 survey responders and given a score of 53.
Words were then ordered from greatest score to least score.
The first two authors evaluated the list and created a revised
list of 69 terms. See Table 1 for specific information regard-
ing presence of terms in grade-level glossaries.
Hughes et al. 3
Table 1. Description of Mathematics Vocabulary Items.
Term #
Answer
Choice
Textbook
Glossary
MNSQ
% Correct 7th 8th Combined
6th 7th 8th 7th 8th combined Difficulty Difficulty Difficulty Infit Outfit
Absolute value 9 Definition 72.78 52.29 66.40 −0.36 1.08 0.11 1.18 1.19
Additive inverse 35 Term 47.04 48.37 47.45 1.00 1.27 1.07 0.98 1.00
Angle 12 Definition 61.83 73.20 65.38 0.25 −0.03 0.16 1.07 1.02
Area 52 Term 56.51 69.93 60.69 0.53 0.17 0.41 0.92 0.88
Circle 53 Definition 73.96 83.66 76.99 −0.43 −0.77 −0.53 1.00 0.96
Circumference 21 Term 70.71 87.58 75.97 −0.24 −1.14 −0.46 0.95 0.88
Coefficient 30 Definition 12.13 6.54 10.39 3.24 4.17 3.49 1.30 2.70
Common factor 54 Definition 41.72 45.75 42.97 1.27 1.40 1.30 1.10 1.19
Common multiple 56 Definition 59.17 59.48 59.27 0.39 0.73 0.48 1.01 0.98
Complex fraction 55 Term 55.92 54.90 55.60 0.56 0.95 0.67 0.92 0.90
Congruent 36 Definition 47.93 71.24 55.19 0.96 0.09 0.69 1.07 1.09
Constant 5 Definition 73.96 84.31 77.19 −0.43 −0.83 −0.54 1.06 1.06
Coordinate plane 1 Definition 70.12 73.20 71.08 −0.20 −0.03 −0.16 1.05 1.07
Dependent variable 57 Term 58.28 63.40 59.88 0.44 0.52 0.45 0.96 0.93
Difference 2 Term 86.98 82.35 85.54 −1.43 −0.67 −1.19 0.97 0.91
Dilation 18 Term 36.39 91.50 53.56 1.55 −1.62 0.77 1.16 1.22
Distributive property 29 Term 34.32 33.33 34.01 1.66 2.02 1.76 1.15 1.35
Equation 10 Term 53.55 52.94 53.36 0.68 1.05 0.78 1.10 1.12
Equivalent 67 Term 47.04 62.09 51.73 1.00 0.59 0.86 1.16 1.19
Estimate 40 Term 86.09 87.58 86.56 −1.34 −1.14 −1.29 0.84 0.66
Exponent 39 Definition 81.36 83.66 82.08 −0.94 −0.77 −0.90 0.89 0.69
Expression 6 Definition 73.96 63.40 70.67 −0.43 0.52 −0.13 1.18 1.25
Factor 58 Term 33.14 29.41 31.98 1.72 2.23 1.87 1.10 1.20
Formula 59 Definition 60.95 64.71 62.12 0.30 0.45 0.34 1.03 1.02
Fraction 22 Definition 65.98 61.44 64.56 0.03 0.63 0.21 1.06 1.09
Graph 60 Term 77.81 78.43 78.00 −0.68 −0.37 −0.60 0.91 0.78
Independent variable 43 Definition 47.63 64.05 52.75 0.97 0.49 0.81 1.08 1.10
Inequality 51 Definition 63.31 65.36 63.95 0.18 0.42 0.24 0.89 0.81
Integer 44 Definition 64.20 70.59 66.19 0.13 0.13 0.12 0.89 0.83
Interquartile range 42 Term 73.67 79.74 75.56 −0.41 −0.47 −0.44 0.89 0.80
Irrational numbers 41 Definition 72.19 84.97 76.17 −0.32 −0.89 −0.48 0.89 0.78
Like terms 19 Definition 67.75 71.90 69.04 −0.07 0.05 −0.04 1.07 1.11
Line plot 62 Definition 70.71 77.78 72.91 −0.24 −0.33 −0.27 0.88 0.76
Mean 28 Term 68.05 75.16 70.26 −0.08 −0.15 −0.11 0.94 0.89
Mean absolute deviation 63 Term 58.88 71.90 62.93 0.41 0.05 0.29 0.94 0.91
Median 11 Definition 86.98 90.85 88.19 −1.43 −1.53 −1.46 1.01 0.99
Multiple 17 Definition 79.59 80.39 79.84 −0.81 −0.51 −0.73 1.03 1.12
Order of operations 23 Term 85.21 81.70 84.11 −1.26 −0.61 −1.07 0.90 0.78
Ordered pair 31 Definition 66.57 88.89 73.52 0.00 −1.29 −0.31 0.91 0.79
Origin 45 Term 75.44 75.16 75.36 −0.53 −0.15 −0.42 0.97 1.00
Percent 64 Definition 80.47 87.58 82.69 −0.87 −1.14 −0.95 0.82 0.60
Perimeter 3 Definition 89.64 92.81 90.63 −1.71 −1.82 −1.74 0.89 0.52
Plane 50 Definition 60.36 76.47 65.38 0.33 −0.24 0.16 0.98 0.96
Point 7 Term 74.56 86.93 78.41 −0.47 −1.08 −0.63 1.01 0.89
Polygon 38 Term 47.34 52.94 49.08 0.99 1.05 0.99 0.95 0.94
Probability 65 Term 79.88 83.66 81.06 −0.83 −0.77 −0.82 0.77 0.57
Product 68 Term 38.46 40.52 39.10 1.44 1.65 1.49 1.17 1.25
Property 16 Term 48.52 52.94 49.90 0.93 1.05 0.95 1.25 1.33
(continued)
4 Assessment for Effective Intervention 00(0)
Table 1. (continued)
Term #
Answer
Choice
Textbook
Glossary
MNSQ
% Correct 7th 8th Combined
6th 7th 8th 7th 8th combined Difficulty Difficulty Difficulty Infit Outfit
Proportion 61 Term 63.61 64.05 63.75 0.16 0.49 0.25 0.94 0.93
Quadrant 66 Definition 78.70 81.70 79.63 −0.75 −0.61 −0.71 0.77 0.57
Quotient 46 Term 79.88 82.35 80.65 −0.83 −0.67 −0.79 0.91 0.81
Range 27 Term 74.56 81.70 76.78 −0.47 −0.61 −0.52 0.95 0.87
Rate 32 Term 30.18 24.84 28.51 1.89 2.49 2.07 1.36 1.77
Rate of change 24 Definition 64.20 60.78 63.14 0.13 0.66 0.28 1.09 1.19
Ratio 26 Definition 66.57 73.20 68.64 0.00 −0.03 −0.02 0.95 0.88
Rational number 13 Term 39.47 50.33 42.86 1.39 1.18 1.30 1.11 1.15
Regular polygon 49 Term 56.80 62.09 58.45 0.51 0.59 0.53 0.96 0.94
Set 37 Term 66.57 74.51 69.04 0.00 −0.11 −0.04 0.95 0.90
Side 20 Definition 76.85 81.05 78.16 −0.62 −0.56 −0.61 0.87 0.70
Statistics 15 Term 74.26 81.05 76.37 −0.45 −0.56 −0.49 1.04 0.98
Sum 33 Definition 80.18 81.05 80.45 −0.85 −0.56 −0.77 1.00 0.93
Three-dimensional figure 4 Term 78.11 82.35 79.43 −0.70 −0.67 −0.70 1.10 1.14
Triangle 34 Definition 84.02 87.58 85.13 −1.15 −1.14 −1.16 0.91 0.82
Unit rate 69 Definition 56.80 58.17 57.23 0.51 0.79 0.59 0.93 0.89
Variable 8 Term 93.49 90.85 92.67 −2.26 −1.53 −2.04 1.02 1.04
Volume 47 Definition 74.85 84.97 78.00 −0.49 −0.89 −0.60 0.82 0.63
x-axis 25 Term 71.60 82.35 74.95 −0.29 −0.67 −0.40 0.90 0.81
x-intercept 48 Definition 63.02 81.05 68.64 0.19 −0.56 −0.02 1.05 1.07
y-axis 14 Definition 86.39 90.20 87.58 −1.37 −1.44 −1.39 0.87 0.59
Note. The items which appear bold and italic were excluded from the measure. MNSQ = mean-square values.
Creation of Measure
For each of the 69 terms, the first two authors created two
multiple-choice questions with the vocabulary term in the
prompt for the first question and as one of the answer choices
for the second question. As such, the term difference had two
questions in the original question bank: “The difference is
the result in (a) addition, (b) subtraction, (c) multiplication,
or (d) division,” and “The result in subtraction is: (a) sum,
(b) minuend, (c) subtrahend, or (d) difference.” For the defi-
nitions of each term, we used information from the textbook
glossaries and strived to use student-friendly language. For
the final measure, we randomly selected half of the vocabu-
lary terms to involve questions with the term in the prompt,
which resulted in the other half of vocabulary terms to
involve questions with the target vocabulary term as one of
the four answer choices. Finally, we placed the questions in
random order. The final terms and questions were vetted by
an expert in the field of secondary mathematics education.
Table 1 provides a listing of the terms, how the vocabulary
term was measured (i.e., whether the definition was the
answer choice or whether the term was the answer choice),
and whether the term was included in sixth-, seventh-, and
eighth-grade textbook glossaries.
Pilot Data Collection
Participants and settings. In this study, we worked with 491
students. Among them, 338 were seventh-grade students,
including 172 female students and 166 male students. Of
the 153 eighth-grade students, 88 were female students and
65 were male students. We collected surveys in four schools
across three states. School A was a rural school district
located in a southern state. School demographics indicated
the student population as 65% Caucasian, 21% African
American, 7% Hispanic, 1% Asian, and 6% two or more
races. School B was in a suburban school district in a north-
eastern state. District demographics (school demographics
were unavailable) indicated the student population as 88%
Caucasian, 6% Asian, 3% African American, and 3% His-
panic. School C was a rural school in a northeastern state.
School demographics indicated the student population as
76% Caucasian, 15% Hispanic, 5% African American, 2%
Asian, and approximately 2% with another identification.
School D is a school in a small city in a south-central state.
School demographics specified that 67% of the population
was Caucasian, 25% Hispanic, almost 4% American Indian,
and 2% identified with two or more races and 2% with
another identification.
Hughes et al. 5
Procedure. Each teacher participant taught seventh or eighth
grade and received a set of assessment packets for all the
students in a classroom. Teachers were provided with a
detailed letter which included (a) a description page of what
was in each packet (e.g., consent forms, measures, direc-
tions), (b) directions to distribute and collect consent forms
prior to administering the measure, (c) scripted directions to
read to the students, and (d) action items for the teachers to
do after collecting the measure to return the measure to the
researchers. The teacher read directions to the students that
stated the aim of the measure was to survey middle-school
students’ knowledge of mathematics vocabulary, that stu-
dents may know some words but not others, and that stu-
dents should try their best. Students were assured that the
results of this measure would not impact their grades. Each
student received a packet consisting of a student demo-
graphic cover page, the multiple-choice measure, and a stu-
dent response sheet. Each student packet was coded with a
unique identification number, which appeared in the top
right corner of both the demographic cover page and
response sheet. Students were given unlimited time to com-
plete the measure.
After packets were returned to the first author, items
were entered and recoded as correct/incorrect for each stu-
dent. A separate trained assistant reentered and recoded
33% of the data, which were then compared to the original
data. Data agreement was calculated at 99.97%.
Data Analysis
WINSTEPS 4.0.1 (Linacre, 2017) was used to conduct
Rasch Analysis to measure item difficulty and validity, and
SPSS 25.0 was used to calculate Cronbach’s alpha for inter-
nal consistency reliability. The Rasch model, based on Item
Response Theory (IRT), is used for dichotomously scored
items (i.e., responses that are marked as either correct or
incorrect) and allows us to determine if item difficulties are
appropriate to person ability levels on the latent trait (Van
Zile-Tamsen, 2017). To identify the difference between two
grade levels (i.e., Grade 7 and Grade 8), item difficulty for
each grade was analyzed. The combined data set was ana-
lyzed to evaluate difficulty for each item regarding overall
grade levels. If items are off the range (e.g., –3.0 to +3.0)
excessively, they may be considered to be excluded in the
measure.
In addition to item difficulty, two types of fit statistics,
infit and outfit, were calculated. Infit statistics reflect unex-
pected response patterns where the test is targeting ability,
essentially what the test is supposed to be measuring, while
outfit statistics are more sensitive to guessing or mistakes,
perhaps when a test taker gets an answer wrong that the indi-
vidual should have gotten right or vice versa (Linacre, 2014;
Runnels, 2012). To determine the items for removal, the
combined data were used to examine the infit and outfit
statistics. Mean-square values (MNSQ) for both infit and
outfit between 0.70 and 1.30 are acceptable (Runnels, 2012).
Results
Table 1 displays the percentage of students answering each
item correctly. Before revising the 69 items according to the
fit statistics, the overall mean score was 46.15 (SD = 12.30),
with eighth graders (M = 48.73, SD = 11.19) outperforming
seventh graders (M = 44.99, SD = 12.61). The mean score
of the revised version of the measure with 57 items was
38.24 (SD = 10.60), with eighth graders (M = 40.63, SD =
9.66) outperforming seventh graders (M = 37.16, SD =
10.83). All measures were completed in the same academic
year. Seventh graders completed the survey at the end of the
fall semester (n = 177, M = 41.96, SD = 11.84) or at the
end of the spring semester (n = 161, M = 48.31, SD =
12.62) of the same academic year. All eighth graders com-
pleted the survey at the end of the spring semester.
SPSS 25.0 was used to determine differences between
performance of seventh and eighth graders. An independent
t-test was conducted before and after revising the measure.
Results of the Welch-Aspin t-test with the original 69-item
measure indicated that there was a statistically significant
difference between grades (t = 3.298, p = .001). Analysis
of the revised 57-item measure yielding similar results (t =
3.548, p < .001). Similarly, there was a statistically signifi-
cant difference between scores from seventh grade students
who completed the measure at the end of the fall semester
as opposed to the end of the spring semester on the full ver-
sion 69-item measure (t = 4.768, p < .001) and the revised
57-item measure (t = 4.538, p < .001).
Validity and Reliability
In order to determine the degree to generalizability of mea-
surement to the new samples across the grade levels, item
separation and person separation indices were measured via
WINSTEPS 4.0.1 (Van Zile-Tamsen, 2017). The combined
data set was used to evaluate item separation index, person
separation index, and appropriateness for individual items.
Item separation index indicates the degree to which the item
estimates are expected to remain stable in a new sample and
person separation index means the degree to which people
in new samples can be classified along the latent trait being
measured (Van Zile-Tamsen, 2017). For our mathematics
vocabulary measure, item separation was 7.99 (item reli-
ability = .98), and person separation was 3.07 (person reli-
ability = .90). The results showed that item separation
index achieved acceptable levels for stability of item diffi-
culty across the samples because it is greater than 3.0, and
person separation index also achieved appropriate level of
generalizability of the measurement to new samples because
it is greater than 2.0 (Linacre, 2014).
6 Assessment for Effective Intervention 00(0)
The overall fit statistics (see Table 1) with 69 items show
acceptable mean values for the infit and outfit MNSQ and
z-standardized (ZSTD) scores (infit MNSQ = 1.00, ZSTD
= 0.10; outfit MNSQ = 0.98, ZSTD = −0.1). Table 1
exhibits the fit statistics for each item and item difficulty.
Among 69 items, 12 items (i.e., items 3, 14, 16, 29, 30, 32,
39, 40, 47, 64, 65, 66) were eliminated because their esti-
mates were not within the acceptable range of MNSQ. The
most difficult item was coefficient (item 30), with only 10%
of correct responses. Variable (item 8) was the easiest item,
with 93% correct responses.
Figure 1 illustrates how the difficulty of items is matched
to the overall level of the latent trait in each respondent
(Bond & Fox, 2012; Van Zile-Tamsen, 2017). The most dif-
ficult item is shown at the top of the figure on the right of
the y-axis, and the student with the highest level of profi-
ciency is the highest on the left side (Runnels, 2012). If a
respondent is plotted at the same level as an item, this means
that the student has a 50% chance of responding correctly to
that item (Runnels, 2012). If the item is above the person,
there is less than a 50% chance of getting that item correct,
as those items are estimated to be beyond ability level
(Runnels, 2012). The majority of students ranged from −2.0
to +3.0 logits and the item difficulties ranged from −2.04 to
3.49 logits. It shows a general pattern of time difficulties
spanning within and beyond the abilities of the test-taking
population (Runnels, 2012), indicating that this measure is
well-targeted for middle-school students and is able to mea-
sure their mathematics vocabulary proficiency with high
validity.
SPSS 25.0 was used to calculate Cronbach’s alpha for
reliability. Both versions of the measure yielded high reli-
ability. The Cronbach’s alpha was .924 for the 69 items
before revision and .912 for 57 items after revision.
Discussion
As only one-third of eighth-grade students meet proficient
levels on national mathematics assessments (U.S.
Department of Education, Institute of Education Sciences,
National Center for Education Statistics, 2015), it is likely
that many students have weak mathematics skills. It is pos-
sible that the demand of vocabulary involved in mathemat-
ics assessment may cause additional difficulty for many
students (Powell et al., 2017), and many students may not
enter the classroom with a comprehensive understanding of
mathematical vocabulary (Simpson & Cole, 2015). Our aim
was to create a high-quality and useful mathematics vocabu-
lary measure that can provide teachers with information
regarding students’ mathematics vocabulary knowledge. In
this brief psychometric report, we reported selected validity
and reliability of our researcher-created measure. The result
is a mathematics vocabulary measure that may be helpful to
evaluate students’ performance on mathematics vocabulary
across grades and curricular content and identify obstacles
that mathematics terms present for student learning.
Throughout the process of creating the measure, teach-
ers’ expertise and experiences were considered in a con-
certed effort to develop a measure that will have social
validity and practical applications. The 69 items assessed in
this pilot study were evaluated for inclusion in the revised
measure, which included 57 items. The difficulty of the
measure appeared to be appropriate for middle-school stu-
dents. The final measure is reliable for use with students in
both seventh and eighth grade, which allows teachers to use
the measure mathematics vocabulary knowledge across
grade levels. Seventh-grade students who completed the
measure at the end of the academic year performed signifi-
cantly better than seventh graders who completed the mea-
sure several months earlier. Similarly, eighth graders
performed significantly higher than seventh graders. The
differential performance provides initial evidence that the
measure can detect differences in growth of mathematics
vocabulary knowledge. There is potential for the measure to
evaluate performance over time, providing teachers and
researchers with longitudinal data regarding middle-school
mathematics vocabulary performance.
Our analysis also indicated the measure demonstrated
high reliability, both in the original form and in the revised
form. One of the advantages of the revised form is that it
requires less time for students to complete the measure,
which is important as we move forward with practical
application of the measure, but at the same time, it is more
reliable in measuring students’ mathematical vocabulary.
We aim for this measure to be a tool with high validity, reli-
ability, and feasibility for the identification of students with
low mathematics vocabulary knowledge, which could
inform teacher instruction about development and support
of mathematics vocabulary learning.
Limitations and Future Directions
Because this was a pilot study, we collected limited data
regarding student demographics. Future research should
evaluate student-level characteristics, such as identification
of a disability, on performance on outcome measures. We
also did not collect information on fidelity of administration
protocol or additional performance data besides that from
the mathematics vocabulary measure. This limits our
knowledge of the students and possible influences on math-
ematics vocabulary. Future research should administer
other measures to students (e.g., general vocabulary knowl-
edge, expressive and receptive language, mathematics
problem solving), to understand which academic factors
may influence performance on a mathematics vocabulary
measure and collect data regarding fidelity of administra-
tion. Another limitation is that our student sample did not
reflect the composition of students in schools in the United
Hughes et al. 7
Figure 1. Item difficulty versus person ability.
Note. Each # in the left-hand column means 4 respondents and each “.” indicates 1 to 3 respondents who have 50% of chance to respond to each item
in the right-hand column.
8 Assessment for Effective Intervention 00(0)
States, which should be addressed in future iterations of this
work to recruit a diverse group of students from a range of
schools (e.g., urban, suburban, and rural).
This measure was designed with the intention to help
inform instruction by identifying student vocabulary
strengths and weaknesses, but future research is necessary
to determine the usefulness of such a measure for informing
classroom practices and improving student mathematics
achievement. As many of the terms were identified in glos-
saries across sixth through eighth grades, extending research
to evaluate performance of students in sixth grade or pre-
algebra and algebra classes may increase the utility of the
measure. In addition, in this study, we measured internal
consistency reliability and item difficulty validity, but more
research is needed to develop a deeper understanding about
reliability and ongoing validity of this measure (e.g., test–
retest reliability, concurrent and predictive validity, sensi-
tivity of the measure). Future validation warrants additional
analyses, such as confirmatory factor analysis, which will
allow for consideration of additional factors and nestedness
of participants.
Research is needed to identify ways to support teachers’
understanding of students’ mathematics language perfor-
mance and acquisition. The next steps of this research are
to evaluate predictive validity of the measure and deter-
mine how the information obtained from this measure
relates to student performance and can be used to inform
teacher instruction. Future research will determine how
this mathematics vocabulary measure can inform instruc-
tional decisions based on students’ current level of mathe-
matics vocabulary. Similarly, there is currently a dearth of
research on the acquisition of mathematics language (i.e.,
vocabulary) for students who struggle with mathematics.
More research is needed to evaluate not only the broad
assessment of mathematics vocabulary but also the depth
at which students must understand most important terms in
mathematics.
Conclusion
This brief psychometric report contributes to a growing
number of studies on mathematics vocabulary at the ele-
mentary level (e.g., Powell & Driver, 2015; Powell et al.,
2017) and extends the research base to middle school. This
study aimed to establish a reliable vocabulary measure with
classroom utility across middle-school grades and curri-
cula. In this article, we described the process to develop the
measure, including steps to address social validity, and out-
comes when evaluating the measure with students in Grades
7 and 8. Analysis of fit statistics indicated that both the
original (69-item) and revised (57-item) measure yielded
high reliability. The differences found between performance
of seventh graders in fall and spring and between perfor-
mance of seventh and eighth graders suggests that the
measure is sensitive enough to evaluate performance over
time, which future research should systematically evaluate.
It is promising that the mathematics vocabulary measure
may be helpful to identify students’ performance on math-
ematics vocabulary and the obstacles that mathematics
terms present for student learning.
Acknowledgments
Thank you to Andrew Marklez, Lauren Cozad, and Paul
Dunklebarger for assisting with data coding. Thank you to Dr. Paul
Riccomini for serving as an expert reviewer.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect
to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support
for the research, authorship, and/or publication of this article: This
research was supported by funding from a Pennsylvania State
University College of Education Research Initiative Grant.
ORCID iDs
Elizabeth M. Hughes https://orcid.org/0000-0002-0895-2562
Sarah R. Powell https://orcid.org/0000-0002-6424-6160
References
Bond, T. G., & Fox, C. M. (2012). Applying the Rasch model:
Fundamental measurement in the human sciences (2nd ed.).
New York, NY: Routledge.
Brown, V. L., Cronin, M. E., & Bryant, D. P. (2012). Test of
Mathematical Abilities (3rd ed.). Austin, TX: Pro-Ed.
Bruner, J. S. (1966). Toward a theory of instruction. Cambridge,
MA: Harvard University Press.
Capraro, M. M., & Joffrion, H. (2006). Algebraic equations: Can
middle-school students meaningfully translate from words to
mathematical symbols? Reading Psychology, 27, 147–164.
Forsyth, S. R., & Powell, S. R. (2017). Differences in the mathe-
matics-vocabulary knowledge of fifth-grade students with and
without learning difficulties. Learning Disabilities Research
& Practice, 32, 231–245. doi:10.1111/ldrp.12144
Ginsburg, H. P., & Baroody, A. J. (2003). Test of Early
Mathematics Ability (3rd ed.). Austin, TX: Pro-Ed.
Linacre, J. M. (2014). A user’s guide to Winsteps Ministeps
Rasch-Model computer programs. Chicago, IL: Author.
Linacre, J. M. (2017). Winsteps Rasch software program (Version
4.0.1). Retrieved from http://www.winsteps.com/index.htm
National Governors Association Center for Best Practices, &
Council of Chief State School Officers. (2010). Common core
state standards mathematics. Washington, DC: Author.
Powell, S. R., & Driver, M. K. (2015). The influence of math-
ematics vocabulary instruction embedded within addition
tutoring for first-grade students with mathematics difficulty.
Learning Disabilities Quarterly, 38, 221–233. doi:10.1177/
0731948714564574
Hughes et al. 9
Powell, S. R., Driver, M. K., Roberts, G., & Fall, A.-M. (2017).
An analysis of the mathematics vocabulary knowledge of
third- and fifth-grade students: Connections to general vocab-
ulary and mathematics computation. Learning and Individual
Differences, 57, 22–32. doi:10.1016/j.lindif.2017.05.011
Powell, S. R., & Nelson, G. (2017). An investigation of the math-
ematics-vocabulary knowledge of first-grade students. The
Elementary School Journal, 117, 664–686. doi:10.1086/691604
Purpura, D. J., & Logan, J. A. R. (2015). The nonlinear relations
of the approximate number system and mathematical lan-
guage to early mathematics development. Developmental
Psychology, 51, 1717–1724. doi:10.1037/dev0000055
Rubenstein, R. N., & Thompson, D. R. (2002). Understanding and
supporting children’s mathematical vocabulary development.
Teaching Children Mathematics, 9, 107–112.
Runnels, J. (2012). Using the Rasch model to validate a multiple
choice English achievement test. International Journal of
Language Studies, 6, 141–153.
Schleppegrell, M. J. (2007). The linguistic challenges of mathe-
matics teaching and learning: A research review. Reading &
Writing Quarterly, 23, 139–159. doi:10.1080/1057356060
1158461
Schrank, F. A., Mather, N., & McGrew, K. S. (2014). Woodcock-
Johnson IV Test of Achievement. Rolling Meadows, IL:
Riverside.
Simpson, A., & Cole, M. W. (2015). More than words: A litera-
ture review of language of mathematics research. Educational
Review, 67, 369–384. doi:10.1080/00131911.2014.971714
U.S. Department of Education, Institute of Education Sciences,
National Center for Education Statistics. (2015). National
Assessment of Educational Progress (NAEP) mathematics
assessment. Retrieved from https://www.nationsreportcard.
gov/reading_math_2015
Van Zile-Tamsen, C. (2017). Using Rasch analysis to inform rat-
ing scale development. Research in Higher Education, 58,
922–933. doi:10.1007/s11162-017-9448-0