ArticlePDF Available

A Turn toward Specifying Validity Criteria in the Measurement of Technological Pedagogical Content Knowledge (TPACK)

Taylor & Francis on behalf of the International Society for Technology in Education
Journal of Research on Technology in Education
Authors:

Abstract and Figures

The impetus for this paper stems from a concern about directions and prog-ress in the measurement of the Technological Pedagogical Content Knowledge (TPACK) framework for effective technology integration. In this paper, we develop the rationale for using a seven-criterion lens, based upon contem-porary validity theory, for critiquing empirical investigations and measure-ments using the TPACK framework. This proposed seven-criterion lens may help researchers map out measurement principles and techniques that ensure reliable and valid measurement in TPACK research. Our critique of existing TPACK research using these criteria as a frame suggests several areas of theo-rizing and practice that are likely impeding the press for measurement. First are contradictions and confusion about the epistemology of TPACK. Second is the lack of clarity about the purpose of TPACK measurement. Third is the choice and use of measurement models and techniques. This article illustrates these limitations with examples from current TPACK and measurement-based research and discusses directions and guidelines for further research. (
Content may be subject to copyright.
Volume 46 Number 2
|
Journal of Research on Technology in Education | 129
A Turn toward Specifying Validity Criteria in the Measurement of
Technological Pedagogical Content Knowledge (TPACK)
JRTE | Vol. 46, No. 2, pp. 129–148 | ©2013 ISTE | iste.org/jrte
A Turn toward Specifying Validity Criteria in the Measurement
of Technological Pedagogical Content Knowledge (TPACK)
Robert F. Cavanagh
Curtin University
Matthew J. Koehler
Michigan State University
Abstract
e impetus for this paper stems from a concern about directions and prog-
ress in the measurement of the Technological Pedagogical Content Knowledge
(TPACK) framework for eective technology integration. In this paper, we
develop the rationale for using a seven-criterion lens, based upon contem-
porary validity theory, for critiquing empirical investigations and measure-
ments using the TPACK framework. is proposed seven-criterion lens may
help researchers map out measurement principles and techniques that ensure
reliable and valid measurement in TPACK research. Our critique of existing
TPACK research using these criteria as a frame suggests several areas of theo-
rizing and practice that are likely impeding the press for measurement. First
are contradictions and confusion about the epistemology of TPACK. Second
is the lack of clarity about the purpose of TPACK measurement. ird is the
choice and use of measurement models and techniques. is article illustrates
these limitations with examples from current TPACK and measurement-
based research and discusses directions and guidelines for further research.
(Keywords: Technological Pedagogical Content Knowledge framework,
TPACK, reliability, validity, measurement, assessment)
Since initial publication in 2006 by Mishra and Koehler, the Tech-
nological Pedagogical Content Knowledge (TPACK) framework
for eective technology integration (see Figure 1, p. 130), has had a
signicant impact on research and practice around educational technology
(Koehler, Shin, & Mishra, 2011). Application of the framework by research-
ers and practitioners to inform design of interventions such as professional
development has led to the development of measures to quantify eects and
potential gains (Graham, Cox, & Velasquez, 2009; Guzey & Roehrig, 2009).
Although this empirical imperative is a powerful rationale for developing mea-
sures, measurement is also oen viewed as the optimal means of establishing the
validity of theoretical frameworks and models. e validation of the frame-
Cavanagh.indd 129 11/2/2013 5:02:17 PM
130 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
work as a model of technology integration is a second driver of the prolifera-
tion of TPACK measures.
e growth in both the number and variety of the TPACK measures
being explored warrants a critical look at the quality and validity of the mea-
sures being used (Koehler, Shin, & Mishra, 2011). In the sections that follow,
we examine these issues through the lens of contemporary validity theory
and then propose a multistep approach for examining validity in empirical
investigations of TPACK.
is work is grounded in the construct of validity advanced by Messick
(1995). According to Messick (1995, p. 741), validity “is an overall judg-
ment of the degree to which evidence and theoretical rationales support the
adequacy and appropriateness of interpretations and actions on the basis of
test scores and other modes of assessment.” Messick (1998) was emphatic
about this approach being unied in contrast to the multiple-type concep-
tion that previously prevailed. He also reframed these types of validity as
forms of evidence and stated:
What is singular in the unied theory is the kind of validity: All validity
is of one kind, namely, construct validity. Other so-called separate types
of validity—whether labeled content validity, criterion-related validity,
consequential validity, or whatever—cannot stand alone in validity ar-
guments. Rather, these so-called validity types refer to complementary
forms of evidence to be integrated into an overall judgment of construct
validity. (p. 37)
Figure 1. The TPACK framework (reproduced with permission from http://tpack.org)
Cavanagh.indd 130 11/2/2013 5:02:17 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 131
Specifying Validity Criteria in the Measurement of TPACK
e current version of the Standards for Educational and Psychologi-
cal Testing published by the American Educational Research Association
(AERA), the American Psychological Association (APA), and the National
Council on Measurement in Education (NCME) embody this unied
conception: “Validity is a unitary concept. It is the degree to which all the
accumulated evidence supports the intended interpretation of test scores for
the proposed purpose” (AERA, APA, & NCME, 1999, p. 11). e evidence
requires documentation of all aspects of instrument development and
administration, from initial theorizing to assessing the consequences of
interpreting the results.
Messick (1995) provided a six-criterion framework for the organization
of evidence. e criteria were the content, substantive, structural, generaliz-
ability, external, and consequential aspects. Wolfe and Smith (2007a) added
an additional aspect from the Medical Outcomes Trust Scientic Advisory
Committee (1995), evidence of the interpretability aspect. Application of the
seven-criterion framework has not been restricted to psychometric test de-
velopment. Signicantly, it has been used in the assaying of phenomenologi-
cal research that used rating scales, surveys, and observational instruments
(Cavanagh, 2011a; Young & Cavanagh, 2011).
The seven criteria are introduced in Table 1, along with examples of
how each can be applied. In this paper, we employ these criteria to audit
TPACK-based empirical research and the measures employed in this
research. The following sections explain the seven aspects of validity
Table 1. Validity Evidence Criteria
Types of Evidence Description Examples of Application
1. Content evidence The relationship between the
instrument’s content and what
the instrument seeks to measure
Specification of research ques-
tions, development of a construct
model, writing of items, selection
of a scaling model
2. Substantive evidence Explanation of observed consis-
tencies in the data by reference
to a priori theory or hypotheses
Comparing TPACK scores of
teachers who have completed
TPACK training with those who
have not
3. Structural evidence Confirmation of subconstructs
or components in the construct
model
Conducting Confirmatory Factor
Analysis
4. Generalizability evidence Individual items are not biased
toward particular groups or
situations
Testing that each item in a test of
TPACK elicits similar responses
from males and females with the
same overall TPACK level
5. External evidence Similar results are obtained when
different tests are applied to
measure the same construct
Comparing findings from
observational schedules and
document analysis
6. Consequential evidence Consideration of how results
could impact on persons and
organizations
Discussing findings with
stakeholders
7. Interpretability evidence Communication of the qualitative
meaning of scores
Providing a construct map that
explains key points on the scale
Cavanagh.indd 131 11/2/2013 5:02:17 PM
132 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
evidence in more detail and how these could be manifest in TPACK
measurement.
e corpora of reports on TPACK measurement used in the study were
identied by a literature search in conjunction with the second author’s
extensive familiarity with TPACK literature. e theoretical model ap-
plied in the study was the Wolfe and Smith (2007a; 2007b) seven-criterion
framework. is framework was adopted a priori as a vehicle for identifying
validity evidence rather than simply classifying typical features of TPACK
research or counting the occurrence of these features in the TPACK litera-
ture. We selected studies because they exemplied one or more aspects of
validity evidence of relevance to TPACK measurement. However, locating
examples of all of the types of evidence was dicult and, for some types of
evidence, not successful. For example, specication of scaling models and
testing the technical quality of items are very rare and only found in one
study (i.e., Jamieson-Proctor, Finger, Albion, Cavanagh, Fitzgerald, Bond, &
Grimbeek, 2012), which applied many, but not all, of the AERA, APA and
NCME standards for instrument construction. is potential over-reliance
on one study is a limitation of this paper and will hopefully be overcome as
advances are made in attention to validity in future TPACK research.
We commence by examining the content aspect of validity that begins
with the reason for measurement. en we examine a sequence of activities
that lead from clarication of the construct of interest to the design of the
instrument.
Evidence of the Content Aspect
Purpose
e evidence of content aspect of validity includes clear statements of the
purpose of a study or instrument development process that are made before
other activities are attempted. Asking research questions is one widely used
method of expressing the intent of an investigation. For example, in a study
of TPACK measures, Koehler, Shin, and Mishra (2011, p. 18) made their
purpose clear by posing two research questions: “What kinds of measures
are used in the TPACK literature?” and “Are those measures reliable and
valid?”
Also related to articulating a clear purpose for a study or measure is
specifying the domain of inference, the types of inferences, and potential
constraints and limitations.
Domain of inference. Specifying the domain(s) of inference situates the an-
ticipated outcomes of an investigation within an established body of theory
or knowledge and provides additional evidence of the content. e domains
could be curricular (relating to instruction), cognitive (relating to cogni-
tive theory), or criterion-based (knowledge, skills, and behaviors required
for success in a particular setting). For example, the domain of inference of
Cavanagh.indd 132 11/2/2013 5:02:18 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 133
Specifying Validity Criteria in the Measurement of TPACK
TPACK is curricular due to the pedagogical component (Mishra & Koehler,
2006; Koehler & Mishra 2008) and also criterion-based due to its contex-
tual specicity and situational dependence (Doering, Scharber, Miller, &
Veletsianos, 2009).
Type of Inferences. e types of inferences delimit the intended conclusions
or judgments to be made from a study or instrument. Presumably, TPACK
studies or measures could be designed to make inferences about mastery,
individual teachers, systems, or groups of teachers. To date, TPACK mea-
surements have primarily sought to measure individual teachers’ TPACK
(Roblyer & Doering, 2010; Schmidt, Baran, ompson, Mishra, Koehler, &
Shin, 2009), although there have been notable attempts to study groups of
teachers as well (e.g., Finger, et al., 2012).
ere is also an element of mastery underpinning TPACK through the
implication that high technology integration results from high levels of, and
interaction between, technological, pedagogical, and content knowledge.
Schmidt et al. (2009, p. 125) explained, “At the intersection of these three
knowledge types is an intuitive understanding of teaching content with ap-
propriate pedagogical methods and technologies.
Potential constraints and limitations. Potential constraints and limita-
tions can also be identified that comment on the logistics, resource
issues, or methodological exigencies. For example, Harris, Grandgenett,
and Hofer (2010) identified a methodological limitation when they criti-
cized self-report methods in TPACK research. The authors explained
that “the challenges inherent in accurately estimating teachers’ knowl-
edge via self-reports—in particular, that of inexperienced teachers
are well-documented” (Harris, et al., 2010, p. 1).
Instrument Specification
Following the denition of the purpose, a set of instrument specications are
developed. is task involves describing constructs, a construct model, and
then a construct map.
Constructs. Wilson (2010) described a construct as “the theoretical ob-
ject of our interest” (p. 6) and saw it resulting from knowledge about the
purpose of designing an instrument and the context in which it is to be
used. He also considered a construct to be part of a theoretical model that
explains phenomena. Importantly, the construct should sit within a well-
established body of knowledge, and one of the purposes of a study is to
contribute to extant theory in this domain of inference. e construct model
and this theory are a priori considerations that require specication prior to
other measure construction activities.
e TPACK framework could be viewed as a representation of one
construct, a trait or ability of teachers that is not directly observable but is
latent and indicated by observable behaviors. For example, Koehler et al.
(2011, p. 6) explained that the “TPACK framework connects technology to
Cavanagh.indd 133 11/2/2013 5:02:18 PM
134 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
curriculum content and specic pedagogical approaches and describes how
teachers’ understandings of these three knowledge bases can interact with
one another to produce eective discipline-based teaching with educational
technologies” (p. 6).
Alternatively, TPACK could be viewed as a composite of the seven con-
structs of Figure 1 (p. 130), each of which is suciently dierent from the
others to warrant separate specication (Schmidt et al., 2009). e seven
constructs comprise three types of knowledge—technological knowledge
(TK), pedagogical knowledge (PK), and content knowledge (CK); and three
types of knowledge about the interactions between technology, pedagogy,
and content—pedagogical content knowledge (PCK), technological peda-
gogical knowledge (TPK), technological content knowledge (TCK); and
then the interaction between PCK, TPK, and TCK—technological pedagogi-
cal content knowledge (TPACK). Additional complexities are contextual
dependency on situational variables (e.g., subject discipline), which needs to
be accommodated in both the unied and the multi-component representa-
tions, and the possibility of perhaps as few as three components (Archam-
bult & Barnett, 2010) or more than seven components.
Empirical studies that use TPACK to guide research have tended to focus
on one specic aspect of TPACK. Angeli and Valanides (2009) researched
a strand within an alternative TPACK framework they termed ICT-TPCK;
Harris et al. (2010) studied the quality of technology integration; and
Jamieson-Proctor et al. (2012) evaluated TPACK condence and usefulness.
In these cases, models supplemented the more general TPACK model utiliz-
ing Venn diagrams that altered the focus on the phenomenon of interest.
Construct models. ere are many sources of information that can assist
in depicting a construct model. Wolfe and Smith (2007a) listed real-world
observations, literature reviews of theory, literature reviews of empirical
research, reviews of existing instruments, expert and lay viewpoints, and
content and task analyses. Constructs can have internal and external mod-
els. An internal model typically comprises components, facets, elements
or factors, and the hypothesized relations between these components. e
TPACK models above are examples of internal models. Another example of
an internal model is represented in Table 2 (Jamieson-Proctor et al., 2012,
p. 5). e construct model for the Teaching Teachers for the Future (TTFF)
TPACK Survey has seven components: TPACK, TPK, TCK, condence to
support student learning, condence to support teaching, usefulness to
support student learning, and usefulness to support teaching.
External models represent relations between the target construct and
other constructs. Constructs associated with context (e.g., racial identity,
learning environment, professional development) and how these relate
to TPACK could constitute external models. An early version of the TTF
instrument (Jamieson-Proctor, Finger, Albion, Cavanagh, Fitzgerald, Bond,
& Grimbeek, 2012) contained a set of items on teacher ecacy. ese items
Cavanagh.indd 134 11/2/2013 5:02:18 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 135
Specifying Validity Criteria in the Measurement of TPACK
were intended to measure what was at the time considered a construct
related to TPACK.
Construct maps. e construct map requires qualication of the construct
model by providing a coherent and substantive denition of the content of
the construct and a proposal of some form of ordering of persons or of the
tasks administered to persons (Wilson, 2010). From a content perspective,
the extension of Shulman’s (1986; 1987) conception of pedagogical content
knowledge (PCK) by the addition of technological knowledge (TK) has
produced the integrative TPACK model (Graham, 2011). However, the PCK
model and associated denitions have been criticized for imprecision and
thus being “a barrier to the measurement of PCK” (Graham, 2011, p. 1955).
is in turn has led to problems when dening the TPACK construct and
the need for ongoing work in this area to resolve these issues (Koehler, Shin,
& Mishra, 2011).
e issue of denitional precision is not peculiar to TPACK measure-
ment. Wilson (2010, p. 28) referred to it as the “more complex reality of
usage” and suggested some constructs should be conceptualized as multi-
dimensional and represented by several discreet construct maps. He also
recommended initial focus on one dimension at a time and development
of a simple model on the assumption that complications can be dealt with
later. is approach is compatible with the transformative view of TPACK
that focuses on change and growth of teachers’ knowledge over time
rather than on discriminating between dierent types of TPACK knowl-
edge (Graham, 2011). It is also consistent with the general objectives of
measurement—interpersonal comparison of capabilities or dispositions,
comparison of an individual’s capabilities or dispositions at dierent times,
or comparison of the diculty the tasks comprising a measure present to
persons.
e notion of ordering of persons or of instrument tasks has been suc-
cessfully applied in construct mapping of TPACK. Harris, Grandgenett, and
Hofer (2012) developed a rubric to rate experienced teachers on four forms
of technology use when planning instruction. Twelve scorers assessed cur-
riculum goals and technologies, instructional strategies and technologies,
technology selection, and t using a scoring rubric that described four levels
of each form of technology use. ey rated curriculum goals and technolo-
gies “strongly aligned” (scored 4), “aligned” (scored 3), “partially aligned,
(scored 2), and “not aligned” (scored 1). e goal of this exercise was evalu-
ating teachers’ TPACK by ordering of persons.
Table 2. The Conceptual Structure of the TTF TPACK Survey
TPACK Framework Dimension Scale: Confidence to Use ICT to: Scale: Usefulness of ICT to:
TPACK Support student learning Support student learning
TPK, TCK Support teaching Support teaching
Cavanagh.indd 135 11/2/2013 5:02:18 PM
136 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
e ordering of tasks assumes that dierent tasks present varying
degrees of diculty to the persons attempting the tasks. An example of a
task-ordered rubric is the six facets of learning for understanding devel-
oped by Wiggins and McTighe (1998; 2005). e facet of explanation was
postulated to vary in degree from naïve to sophisticated. Five levels were
dened—naïve, intuitive, developed in-depth, and sophisticated. A naïve
understanding was described as “a supercial account; more descriptive
than analytic or creative; a fragmentary or sketchy account of facts/ideas or
glib generalizations” (Wiggins & McTighe, 1998, p. 76). In contrast, sophis-
ticated understanding could be demonstrated by “an unusually thorough,
elegant, and inventive account (model, theory, or explanation)” (Wiggins
& McTighe, 1998, p. 76). e facets of a learning rubric describe student
behaviors at each level to dierentiate between levels as well as to order the
levels. Such a system of ordering is important when the construct of interest
is hypothesized to be cognitively developmental with the attainment of low-
er-level tasks prerequisite to mastering those at higher levels. In the Wiggins
and McTighe (1998; 2005) construct map, naïve explanations are easier to
provide than intuitive explanations, which in turn are easier to provide than
developed explanations (Cavanagh, 2011). is ordering informs theorizing
about students learning for understanding. A developmental view of TPACK
learning in which teacher cognition progresses through developmental
stages would also require the identication of similar sequences of levels for
the construct map and then the development of instrument items.
Item development. Item development concerns making choices about
item formats such as multiple choice, rating scales, and performance as-
sessments. is can be informed by following the recommendations of
item writing guidelines about content/semantics, formatting, style, stem
statements, response scales, and response choices. Regular reviews such as
expert reviews, content reviews, and sensitivity (targeting) reviews can be
conducted throughout all stages of instrument development. For example,
seven TPACK experts reviewed the validity and face value of the rubric
developed by Harris et al. (2012) to assess observed evidence of TPACK
during classroom instruction.
Scoring model. A detailed construct map with an internal structure that
orders persons and tasks informs selection of a scoring model. Signi-
cantly, it is the ordering that provides a foundation for the instrument
being a measure. A scoring model shows how observations or responses
to items are numerically coded. Right or wrong answers provide dichoto-
mous data that could be scored 0, 1. Rating scales produce polytomous
data that can be scored using the successive integers 0, 1, 2, and 3. Rating
scales can show the degree of agreement of respondents to a stem state-
ment, and while this is related to the overall strength of the trait of inter-
est in persons, it is the ordering within the construct map that constitutes
the measure.
Cavanagh.indd 136 11/2/2013 5:02:18 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 137
Specifying Validity Criteria in the Measurement of TPACK
e number and labeling of response categories is crucial to the per-
formance of a rating scale instrument (Hawthorne, Mouthaan, Forbes, &
Novaco, 2006; Preston & Colman, 2000). Another related issue is use of
a “neither disagree or agree” category and the reasons for the selection of
this category (Kulas & Stachowski, 2001). e scoring model for the TTF
TPACK Survey instrument (Jamieson-Proctor et al., 2012) comprised seven
response categories scored 0 (not condent/useful); 1, 2, 3 (moderately
condent/useful); 4, 5, 6 (extremely condent/useful); plus an additional
“unable to judge” category scored 8 and coded as missing data. We collected
data using Qualtrics online survey soware.
Scaling model. e data obtained directly from instrument administra-
tion are termed raw data because they require processing by scaling into a
meaningful form. Without scaling, the use of raw scores is limited to the
presentation of frequencies, and even mathematical operations as basic
as estimating a mean score should be undertaken with caution (Doig &
Groves, 2006). A scaling model such as the Rasch Model (Rasch 1980) can
be applied to raw scores to calibrate these on a linear scale. e intervals on
a linear scale are equal in the same way as the markings on a yardstick. is
enables comparison of person scores according to their magnitude and not
just their order.
We analyzed the TTF TPACK Survey student scores using the Rasch Rat-
ing Scale Model (Andrich, 1978a; Andrich, 1978b; Andrich, 1978c; Bond &
Fox, 2007; Jamieson-Proctor, Finger, Albion, Cavanagh, Fitzgerald, Bond,
& Grimbeek, 2012). Data from four groups of like-named items (i.e., TPK/
TCK Condence, TPK/TCK Usefulness, TPACK Condence, TPACK Use-
fulness) were subject to separate scaling, and then we equated scaled scores
on an interval scale (Jamieson-Proctor, Finger, Albion, Cavanagh, Fitzgerald,
Bond, & Grimbeek, 2012). e generation of interval data enabled accurate
comparison of student responses on four scales between the two occurrenc-
es of instrument administration at the national level and also within the 39
universities/higher education providers that participated in the project.
Item technical quality. Evidence of item technical quality can be garnered
by testing how well data from individual items meet the requirements of an
item-response measurement model. For example, in its simplest form, the
Rasch Model requires the probability of a person completing a task to be a
function of that person’s ability and the diculty of the task. Persons with
high ability are more likely to complete dicult tasks than those with lower
ability. Conjointly, easy tasks are likely to be completed by both low- and
high-ability persons. Rasch Model computer programs such as RUMM2030
(RUMMLab, 2007) or Winsteps (Linacre, 2009) test how well the responses
to an item display this property by estimating t statistics. Common reasons
for items having poor t to the model include the item not discriminating
between persons of dierent ability and the responses being confounded by
an attribute of the persons dierent to the trait being measured.
Cavanagh.indd 137 11/2/2013 5:02:18 PM
138 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
Rasch Model analysis of the TTF TPACK Survey data using the
WINSTEPS computer program (Linacre, 2009) identified six items with
misfitting data. These were stepwise removed from subsequent analyses
until all the remaining items showed adequate fit to the model’s require-
ments for measurement. The items removed and their respective scales
were:
Scale TPK/TCK Confidence Combined: Teach strategies to support students
from Aboriginal and Torres Strait Islander backgrounds; access, record,
manage, and analyze student assessment data
Scale TPK/TCK Usefulness Combined: Teach strategies to support stu-
dents from Aboriginal and Torres Strait Islander backgrounds; man-
age challenging student behavior by encouraging the responsible use
of ICT
Scale TPACK Confidence Combined: Communicate with others locally and
globally
Scale TPACK Usefulness Combined: Communicate with others locally and
globally (Jamieson-Proctor, Finger, Albion, Cavanagh, Fitzgerald, Bond,
& Grimbeek, 2012. p. 8)
Another consideration in rating scale instruments is the functioning
of the rating scale categories. ere is a diversity of views on the optimum
number of response categories (Hawthorne, Mouthaan, Forbes, & Novaco,
2006; Preston & Colman, 2000). ere are also many reasons, which are
oen unclear, for selecting a “neither disagree or agree,” “undecided,” or “not
sure” category as a middle category (Kulas & Stachowski, 2001). Optimiz-
ing the response scale is possible by analysis of pilot and trial data using
the Rasch Rating Scale Model (Andrich, 1978a; Andrich, 1978b; Andrich,
1978c). For an item, a Category Probability Curve is produced from plotting
the responses to each category in the response scale against the ability of the
persons. An ideal pattern of responses would show the more capable respon-
dents choosing the most dicult to arm categories and the less capable
respondents choosing the easier to arm categories. For the seven-category
response scales used in the TTFF study, some of the provided response op-
tions were not used as intended. Consequently, “adjacent response categories
were combined as required to achieve satisfactory Category performance
(Jamieson-Proctor, et al., 2012, p. 8).
e preceding section on the content aspect of validity described the key
activities in the construction of a measure, and methods for ensuring these
are implemented as intended. e content activities are sequential and itera-
tive but require implementation in conjunction with the other six aspects
of validity evidence. With this in mind, the following six sections examine
substantive, structural, generalizability, consequential, and interpretability
evidence of validity.
Cavanagh.indd 138 11/2/2013 5:02:18 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 139
Specifying Validity Criteria in the Measurement of TPACK
Evidence of the Substantive Aspect
e substantive aspect of validity can be evidenced by the extent to which
the theoretical framework, an a priori theory, or the hypothesis inform-
ing an investigation can explain any observed consistencies among item
responses. is section examines each approach.
For example, the literature on student engagement suggests that it is
characterized by enjoyable experiences in the classroom and a favorable
disposition toward the material being learned and toward the classroom
environment (Sherno, 2010). Students describing their favorite class
would be expected to have higher engagement scores than those describing
a nonfavorite class. We used RUMM2030 to calculate engagement scores
for data from the Survey of Student Engagement in Classroom Learning
(Cavanagh, 2012). Figure 2 presents the frequency of scores (person loca-
tions measured in logits) for students reporting their favorite subjects and
those reporting a nonfavorite subject. e scores for favorite subject were
statistically signicantly higher than those for the nonfavorite subjects
(i.e., mean score favorite .93 logits and mean score nonfavorite .01 logits,
F=14 7.7, p< .000).
A similar approach for gathering substantive evidence could be used
with TPACK construct models and data. ere are likely particular groups
of teachers with attributes anticipated to be associated with high TPACK
scores. ese could be teachers who have completed postgraduate courses
in technology integration, teachers who have received substantial profes-
sional development in technology integration, teachers who have been
recognized for outstanding technology use in their classroom, teachers
who have received awards for innovative technology use in the classroom,
Figure 2. Frequency dis tributi ons of stud ent engagement scores f or favourite and nonfavorite subje cts (N=174 3 ).
Cavanagh.indd 139 11/2/2013 5:02:18 PM
140 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
and/or teachers selected to mentor or train colleagues in technology
integration.
Evidence of the Structural Aspect
e structural aspect of validity concerns the construct model and map, for
example, by ascertaining if the requirements of a unidimensional measure-
ment model are met when a unidimensional trait is measured. ere are
both traditional and contemporary methods for collecting evidence about
construct structure. e traditional approach is to conduct a Principal
Components Factor Analysis of raw scores (dichotomous or polytomous)
to examine correlations and covariance between items by identifying
factorial structure in the data. Provided there is sucient data in relation
to the numbers of items in the scale under scrutiny, this method is well
accepted in TPACK research. Notwithstanding, smaller data sets and large
instruments (many items) have required a multiscale approach. Schmidt
et al. (2009) developed a 75-item instrument measuring preservice teach-
ers’ self-assessments of the seven TPACK dimensions: 8 TK items, 17
CK items, 10 PK items, 8 PCK items, 8 TCK items, 15 TPK items, and 9
TPACK items. However, the sample included only 124 preservice teachers,
which precluded a full exploratory factor analysis of data from all 75 items
but did allow separate analyses of the seven dimensions. In this study
(Schmidt et al. 2009), factor loadings were estimated, “problematic” items
were “eliminated,” and Cronbachs alpha reliability coecient was com-
puted for data from the retained items in each scale. is process provided
evidence of the internal structure of the seven dimensions but did not
conrm a seven-dimension construct model of TPACK. Similarly, the TTF
TPACK Survey data were subject to two exploratory factor analyses: one for
the 24 TPK and TCK items and one for the 24 TPACK items. We found two-
factor solutions in both cases, with the condence data loaded on one factor
and the usefulness data loaded on the second factor (Jamieson-Proctor,
Finger, Albion, Cavanagh, Fitzgerald, Bond, & Grimbeek, 2012). e results
provide conrmatory evidence of the construct model in Table 2 (p. 135).
Another approach to garnering evidence of dimensionality uses the
Rasch Model. e linear Rasch measure is extracted from the data set aer
the initial Rasch scaling, and then a Principal Components Factor Analy-
sis of the residuals is conducted. e assumption underlying this process
is that variance within the data should be mainly attributable to the Rasch
measure and that there will be minimal structure and noise in the residual
data. Application of this approach to phenomena that are clearly multivari-
ate requires separate Rasch Model analyses for each variable. is was the
case with the TTF TPACK Survey data. We used four Rasch Model analyses
and took the sound data to model t in the four scales as evidence of the
structure within the four-component construct model presented in Table 2
(Jamieson-Proctor, et al., 2012).
Cavanagh.indd 140 11/2/2013 5:02:18 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 141
Specifying Validity Criteria in the Measurement of TPACK
Evidence of the Generalizability Aspect
Wolfe and Smith (2007b) explained “the generalizability aspect of valid-
ity addresses the degree to which measures maintain their meaning across
measurement contexts” (p. 215). For example, consider an item for which
the success rate does not dier between males and females. A lack of this
property of an item is referred to as dierential item functioning (DIF). Test-
ing for DIF typically proceeds by generating an Item Characteristic Curve
and plotting observed scores for class intervals of groups of persons of inter-
est. Figure 3 displays this information for Item 35 (“My test scores are high”)
from the Survey of Student Engagement in Classroom Learning (Cavanagh,
2012). When the observed responses of boys and girls with the same engage-
ment level are compared, the more highly engaged boys responded more af-
rmatively than the more highly engaged girls (F=15.05, p< .001). e item
has functioned dierently for males and females.
A similar approach for gathering generalizability evidence could be used
with TPACK models and data. Ideally, there should be no dierence in
scores for a TPACK item between groups of teachers with the same overall
score, such as between groups of male and female teachers, city and rural
teachers, or experienced and inexperienced teachers. is does not negate
the overall instrument discriminating between dierent groups; it merely
avoids bias at the item level.
Evidence of the External Aspect
e relation between a measure and an external measure of a similar con-
struct can show the external aspect of validity. For example, the developers
of the TTF TPACK Survey acknowledged the importance of using exter-
nal measures: “As with all self-report instruments, data collected with this
instrument should be complemented with other data collection methodolo-
gies to overcome the limitations associated with self-report instruments
Figure 3. Item charac teristi c curve for I tem 35 (N=174 5 ).
Cavanagh.indd 141 11/2/2013 5:02:18 PM
142 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
(Jamieson-Proctor, Finger, Albion, Cavanagh, Fitzgerald, Bond, & Grim-
beek, 2012, p. 9). For similar reasons, Harris et al. (2010; 2012) assessed the
quality TPACK through examination of detailed written lesson plans and
also semi-structured interviews of teachers. However, the extent to which a
second measure can be independent of the rst is dicult to establish, par-
ticularly when both measures share a common construct model or measure
a similar construct.
Evidence of the Consequential Aspect
e consequential aspect of validity centers on judgments about how the
score interpretations might be of consequence. When measures are used
in high-stakes testing, the consequences for students, teachers, and schools
can be signicant and sometimes the source of serious concern. Measuring
TPACK is unlikely to have such consequences, but applications that compare
teachers against one another or against benchmarks for performance man-
agement purposes could be seen as less benign. TPACK researchers should
consider potential consequences, and such consideration is further evidence
for establishing consequential validity.
Evidence of the Interpretability Aspect
e interpretability aspect of validity concerns the qualitative interpretation
of a measure in terms of how well its meaning was communicated. Figures
and graphical displays can assist the reader in understanding the meaning
of an instrument and the properties of its data. e TTF TPACK Survey was
developed to test for change in TPACK in Australian preservice teachers
who were provided with six months of specialized instruction in technol-
ogy integration. e results of this testing were presented as graphics such as
Figure 4 (Finger et al. 2012, p. 12). is is an item-by-item display of scores
from the rst survey administration and of scores from the second survey
administration for the condence items. Rasch Model equating procedures
have enabled all the scores to be plotted on the same scale. e improvement
in scores for all the items is obvious.
Another useful display is an item map that plots the diculty of items
and the ability of persons on the same scale. Figure 5 is the item map for a
scale measuring student engagement and classroom learning environment
(Cavanagh, 2012, p. 9). e scale is marked in logits from 3.0 to -3.0. e
student scores are located on the scale, and × indicates 10 students. e stu-
dents with the most armative views are located toward the top of the dis-
tribution. e location of an item shows the diculty students experienced
in arming the item. e items located towards the top of the distribution
were more dicult to arm than those below. e items are numbered ac-
cording to their labeling in the instrument. Item 41 (“I start work as soon as
I enter the room”) and Item 48 (“Students do not stop others from work-
ing”) were the most dicult to arm, whereas Item 7 (“I make an eort”)
Cavanagh.indd 142 11/2/2013 5:02:18 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 143
Specifying Validity Criteria in the Measurement of TPACK
was easy to arm. e relation between student scores and item diculty
enables predictions to be made about student responses. Students with
locations below 1.0 logits are unlikely to arm Items 41 and 48. Conversely,
those with locations above 1.0 logits are likely to arm these items.
For TPACK measurement, the calibration of items as illustrated in the
item map would enable proling of TPACK for many teachers at dierent
times and in dierent situations. It would also accurately show changes in
TPACK over time for individual teachers. e scaling of person scores and
item diculty scores is essential for constructing an item map; raw scores
are not suitable for this purpose.
Figure 5. Item map for engagement and le arning envir onment items.
Figure 4. Confidence to facilitate student use.
Cavanagh.indd 143 11/2/2013 5:02:19 PM
144 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
A Checklist for Researchers
e preceding sections have examined seven aspects of validity evidence,
and, where possible, examples of TPACK measurement were used to illus-
trate these aspects and situate them within the epistemology and methodol-
ogy of TPACK research. Table 3 lists the denitions of the seven aspects to
provide a tabular recount of the key considerations in mounting an argu-
ment for validity. e table could be used as a checklist for TPACK research-
ers to assess the validity of their research, either a priori when designing
TPACK measures or post hoc to evaluate existing TPACK measures.
e use of the checklist requires comment. First, it is more than a simple
seven-item list; the content exemplies contemporary understandings of
validity and validity theory. e underlying approach and its major features
have been explained in this paper, but this explication has been limited.
Users of the table would likely benet by consulting some of the original
sources referenced in the text. Second, statistics such as correlation coef-
cients or the results of an exploratory factor analysis are oen put forward
as proof of validity. Statistical evidence is just one aspect of an argument
for validity, and an argument relying on only this form of evidence would
be weak. ird, the application of the checklist should be on the availability
of evidence rather than simply whether attention has been given to each
particular aspect, although this would be a useful starting point. e notion
of validity being an argument requires the provision of evidence to convince
others, and the checklist is simply a vehicle for stimulating and organiz-
ing evidence collection. Fourth, the availability of extensive evidence of all
seven aspects is an optimal situation and, in reality, not attainable in many
educational studies. is limitation is methodological and mainly centered
on the instrument specication process within the content aspect. e use
of measurement models that examine the properties of data at the level of
individual items and persons can ensure instrument specication complies
with the content evidence requirements. Detailed and persuasive evidence is
available when Item Response eory and Rasch Model methods are used.
While the iterative nature of instrument construction might suggest
that the sequencing of the seven aspects could be varied, there are some
strong reasons for commencing with the content aspect. e rationale for
this view derives from a scientic approach to educational research, includ-
ing TPACK research, that is very consistent with Messick’s (1995) view of
validity. In both, primacy is given to substantive theory informing decisions
about instrumentation. e research is driven by theory rather than theory
being generated from existing data; in terms of validity, specication of the
construct model, particularly the construct map, precedes selection of data
collection methods and analyses. When the checklist is used post hoc, this
matter is more important for principled rather than pragmatic reasons.
However, when using the checklist a priori at the commencement of a study,
substantive theory and the ndings of previous research require clarication
Cavanagh.indd 144 11/2/2013 5:02:19 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 145
Specifying Validity Criteria in the Measurement of TPACK
before progressing to methodological decisions. In this situation, the order
of the seven aspects is important.
e nal consideration in the use of the checklist is that it is neither
exhaustive nor the only way to conceptualize an argument for validity. For
example, in the hard sciences, where causal relations exist between variables,
the dominant form of validity is predictive validity. Notwithstanding, we
believe that an argument is required, and this needs to reect all aspects of
an instrument development process or of an empirical investigation.
Conclusion
One purpose of this paper was to stimulate discussion about the validity of
TPACK measures and measurement. A second purpose was to use contem-
porary validity theory as a framework to examine the principles and prac-
tices applied when dealing with validity issues in TPACK measurement. e
analysis suggests several types of validity evidence that are not characteristic
of current TPACK measurement activities, and that identication of these
factors could provide the impetus for improvement of TPACK measure-
Table 3. A Checklist of Validity Evidence
Aspect of evidence Definition
1. Content The relevance and representativeness of the content upon which the
items are based and the technical quality of those items
Purpose Domain of inference
Types of inferences
Potential constraints and limitations
Instrument specification Construct selection
Construct model
Construct map
Item development
Scoring model
Scaling model
Item technical quality
2. Substantive The degree to which theoretical rationales relating to both item
content and processing models adequately explain the observed
consistencies among item responses
3. Structural The fidelity of the scoring structure to the structure of the construct
domain
4. Generalizability The degree to which score properties and interpretations generalize
to and across population groups, settings, and tasks, as well as the
generalization of criterion relationships
5. External What has traditionally been termed convergent and discriminant
validity and also concerns criterion relevance and the applied utility of
the measures
6. Consequential The value implications of score interpretation as a basis for action
7. Interpretability The degree to which qualitative meaning can be assigned to quanti-
tative measures
(Wolfe & Smith, 2007a, p. 99)
Cavanagh.indd 145 11/2/2013 5:02:19 PM
146 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
ment. In particular, the content and substantive aspects of validity evidence
are especially challenging.
TPACK theory is still in its infancy, as is the measurement of TPACK. It is
timely to consider concerns such as validity from the perspective of main-
stream epistemologies and methodologies. Maturation of TPACK research
and measurement requires nurture and sustenance from well-established
elds of research and methodologies.
Acknowledgment
e authors gratefully acknowledge the assistance of Joshua Rosenberg with the preparation of
this manuscript.
Author Note
Robert F. Cavanagh is a professor in the School of Education at Curtin University, Perth, Austral-
ia. His research interests focus on the measurement of student, teacher, and classroom attributes
conducive to improved learning and instruction. Please address correspondence regarding this ar-
ticle to Rob Cavanagh, School of Education, Curtin University, Kent St., Bentley 6102, Australia.
Email: r.cavanagh@curtin.edu.au.
Matthew J. Koehler is a professor in the College of Education at Michigan State University, East
Lansing. His research interests focus on the design and assessment of innovative learning environ-
ments and the knowledge that teachers need to teach with technology.
References
American Educational Research Association, American Psychological Association, National
Council on Measurement in Education. (1999). Standards for educational and psychological
testing. Washington, DC: American Educational Research Association.
Andrich, D. (1978a). Application of a psychometric rating model to ordered categories which
are scored with successive integers. Applied Psychological Measurement, 2(4), 581–594.
doi:10.1177/014662167800200413
Andrich, D. (1978b). Rating formulation for ordered response categories. Psychometrika,
43(4), 561–573. doi:10.1007/BF02293814
Andrich, D. (1978c). Scaling attitude items constructed and scores in the Likert
tradition. Educational and Psychological Measurement, 38(3), 665–680.
doi:10.1177/001316447803800308
Angeli, C., & Valanides, N. (2009). Epistemological and methodological issues for the
conceptualization, development, and assessment of ICT-TPCK: Advances in technological
pedagogical content knowledge (TPCK). Computers and Education, 52(1), 154–168.
doi:10.1016/j.compedu.2008.07.006
Archambault, L. M., & Barnett, J. H. (2010). Revisiting technological pedagogical content
knowledge: Exploring the TPACK framework. Computers and Education, 55(4), 1656–1662.
doi:10.1016/j.compedu.2010.07.009
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the
human sciences (2nd ed.). Mahwah, NJ: Erlbaum.
Cavanagh, R. F. (2011a). Establishing the validity of rating scale instrumentation in learning
environment investigations. In R. F. Cavanagh, & R. F. Waugh (Eds.), Applications of Rasch
measurement in learning environments research (pp. 77–100). Rotterdam: Sense Publishers.
Cavanagh, R. F. (2011b). Confirming the conceptual content and structure of a
curriculum framework: A Rasch Rating Scale Model approach. Curriculum Perspectives,
31(1), 42–51.
Cavanagh.indd 146 11/2/2013 5:02:19 PM
Volume 46 Number 2
|
Journal of Research on Technology in Education | 147
Specifying Validity Criteria in the Measurement of TPACK
Cavanagh, R. F. (2012). Associations between the classroom learning environment and student
engagement in learning: A Rasch model approach. Paper presented at the meeting of the
Australian Association for Research in Education: Sydney, Australia.
Doering, A., Scharber, C., Miller, C., & Veletsianos, G. (2009). Geoentic: Designing and
assessing with technology, pedagogy, and content knowledge. Contemporary Issues in
Technology and Teacher Education, 9(3), 316–336. Retrieved from http://www.citejournal.
org/vol9/iss3/socialstudies/article1.cfm
Doig, B., & Groves, S. (2006). Easier analysis and better reporting: Modeling ordinal data in
mathematics education research. Mathematics Education Review Journal, 18(2), 56–76.
doi:10.1007/BF03217436
Finger, G., Jamieson-Proctor, R., Cavanagh, R., Albion, P., Grimbeek, P., Bond, T., Fitzgerald,
R., Romeo, G., & Lloyd, M. (2012). Teaching teachers for the future (TTF) project TPACK
survey: Summary of the key ndings. Paper presented at ACEC2012: ITs Time Conference,
Perth, Australia. Available at: http://bit.ly/ACEC2012_Proceedings
Graham, C. R. (2011). eoretical considerations for understanding technological pedagogical
content knowledge (TPACK). Computers & Education, 57(3), 1953–1960. Retrieved from
http://www.sciencedirect.com/science/article/pii/S0360131511000911
Graham, C., Cox, S., & Velasquez, A. (2009). Teaching and measuring TPACK development
in two preservice teacher preparation programs. In I. Gibson et al. (Eds.), Proceedings of
Society for Information Technology & Teacher Education International Conference 2009 (pp.
4081–4086). Chesapeake, VA: AACE. Retrieved August 19, 2013, from http://www.editlib.
org/p/31297
Guzey, S. S., & Roehrig, G. H. (2009). Teaching science with technology: Case studies of
science teachers’ development of Technological Pedagogical Content Knowledge (TPCK).
Contemporary Issues in Technology and Teacher Education, 9(1), 25–45. AACE. Retrieved
August 18, 2013 from http://www.editlib.org/p/29293
Harris, J., Grandgenett, N., & Hofer, M. (2010). Testing a TPACK-Based Technology Integration
Assessment Rubric. In D. Gibson & B. Dodge (Eds.), Proceedings of Society for Information
Technology & Teacher Education International Conference 2010 (pp. 3833–3840). Chesapeake,
VA: AACE. Retrieved August 18, 2013, from http://www.editlib.org/p/33978
Harris, J., Grandgenett, N., & Hofer, M. (2012). Using structured interviews to assess
experienced teachers’ TPACK. In P. Resta (Ed.), Proceedings of Society for Information
Technology & Teacher Education International Conference 2012 (pp. 4696–4703).
Chesapeake, VA: AACE. Retrieved from http://www.editlib.org/p/40351
Hawthorne, G., Mouthaan, J., Forbes, D., & Novaco, R. W. (2006). Response categories and anger
measurement: Do fewer categories result in poorer measurement? Development of the DAR5.
Social Psychiatry Psychiatric Epidemiology, 41(2), 164–172. doi:10.1007/s00127-005-0986-y
Jamieson-Proctor, R., Finger, G., Albion, P., Cavanagh, R., Fitzgerald, R., Bond, T., &
Grimbeek, P. (2012). Teaching Teachers for the Future (TTF) project: Development of the TTF
TPACK survey instrument. Paper presented at ACEC2012: ITs Time Conference, Perth,
Australia. Available at: http://bit.ly/ACEC2012_Proceedings
Koehler, M. J., & Mishra, P. (2008). Introducing TPCK. In AACTE Committee on Technology
and Innovation (Ed.), Handbook of technological pedagogical content knowledge (TPCK) for
educators (pp. 3–29). London: Routledge.
Koehler, M. J., Shin, T. S., & Mishra, P. (2011). How do we measure TPACK? Let me count
the ways. In R. N. Ronau, C. R. Rakes, & M. L. Niess (Eds.), Educational technology, teacher
knowledge, and classroom impact: A research handbook on frameworks and approaches (pp.
16–31). Hershey, PA: Information Science Reference.
Kulas, J. T., & Stachowski, A. A. (2001). Respondent rationale for neither agreeing nor
disagreeing: Person and item contributors to middle category endorsement intent on
Likert personality indicators. Journal of Research in Personality, 47, 254–262. doi: 10.1016/j.
jrp.2013.01.014
Linacre, J. M. (2009). Winsteps (Version 3.68) [Computer Soware]. Beaverton, OR: Winsteps.
com.
Cavanagh.indd 147 11/2/2013 5:02:19 PM
148 |
Journal of Research on Technology in Education
|
Volume 46 Number 2
Cavanagh & Koehler
Medical Outcomes Trust Scientic Advisory Committee. (1995). Instrument review criteria.
Medical Outcomes Trust Bulletin, 3, 1–4.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’
responses and performances as scientic inquiry into score meaning. American Psychologist,
50(9), 741–749. doi:10.1037/0003-066X.50.9.741
Messick, S. (1998). Test validity: A matter of consequences. Social Indicators Research, 45(4),
35–44. doi:10.1023/A:1006964925094
Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A
framework for teacher knowledge. Teachers College Record, 108(6), 1017–1054. doi:10.1111/
j.1467-9620.2006.00684.x
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating
scales: Reliability, validity, discriminating power, and respondent preferences. Acta
Psychologica, 104(1), 1–15. doi:10.1016/S0001-6918(99)00050-5
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago:
MESA Press.
Roblyer, M. D., & Doering, A. H. (2010). Integrating educational technology into teaching (5th
ed.). Boston, MA: Allyn & Bacon.
RUMMLab. (2007). RUMM2020 Rasch Unidimensional Measurement Models. RUMM
Laboratory Pty Ltd.
Shulman, L. S. (1986). ose who understand: Knowledge growth in teaching. Educational
Researcher, 15(2), 4–14. doi:10.3102/0013189X015002004
Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard
Educational Review, 57(1), 1–22. Retrieved from http://hepg.org/her/abstract/461
Schmidt, D. A., Baran, E., ompson, A. D., Mishra, P., Koehler, M. J., & Shin, T. S. (2009).
Technological pedagogical content knowledge (TPACK): e development and validation
of an assessment instrument for preservice teachers. Journal of Research on Technology in
Education, 42(2), 123–149.
Sherno, D. J. (2010). e experience of student engagement in high school classrooms.
Saarbrucken, Germany: Lambert Academic Publishing.
Wiggins, G., & McTighe, J. (1998). Understanding by design. Alexandra, VA: Association for
Supervision and Curriculum Development.
Wiggins, G., & McTighe, J. (2005). Understanding by design (2nd ed.). Alexandra, Virginia:
Association for Supervision and Curriculum Development.
Wilson, M. (2010). Constructing measures: An item response approach. New York: Routledge.
Wolfe, E.W., & Smith, E.V. (2007a). Instrument development tools and activities for
measure validation using rasch models: Part I–instrument development tools. Journal
of Applied Measurement, 8(1), 97–123. Retrieved from http://www.ncbi.nlm.nih.gov/
pubmed/17215568
Wolfe, E. W., & Smith, E. V. (2007b). Instrument development tools and activities for
measure validation using Rasch models: Part II–validation activities. Journal of
Applied Measurement, 8(2), 294–234. Retrieved from http://www.ncbi.nlm.nih.gov/
pubmed/17440262
Young, A., & Cavanagh, R. F. (2011). An investigation of dierential need for psychological
services across learning environments. In R. F. Cavanagh & R. F. Waugh. (Eds.),
Applications of Rasch measurement in learning environments research (pp. 227–244).
Rotterdam: Sense Publishers. ISBN 978-94-6091-491-1.
Manuscript received July 12, 2013 | Initial decision July 30, 2013 | Revised manuscript accepted August 29, 2013
Cavanagh.indd 148 11/2/2013 5:02:19 PM
... Developed in the context of ongoing debates about the need to redesign teacher education programs to better facilitate teachers' knowledge development, the necessity to measure teachers' TPACK has resulted in the development of many hundreds of published instruments (Koehler et al., 2012). Noting the growing numbers of tools being developed to collect evidence of teachers' TPACK, Cavanagh and Koehler (2013) highlighted the importance of systematic approaches to validation. ...
... In a paper focused on validating TPACK measurements, Cavanagh and Koehler (2013) noted the opportunity afforded by person-item maps constructed from Rasch modeling to profile the TPACK development of teachers. Wilson (2003) describes how the probabilistic relationship between a person's ability on a Rasch measurement scale and an item's difficulty can be used to provide qualitative descriptions of a person's ability at a given point on the scale. ...
... TPACK Confidence Construct Map. Reproduced from Saubern et al. (2020) To find the ability of an individual teacher and therefore their location on the scale and construct map, it is necessary to derive a scale score corresponding to each raw score from the survey, as raw scores alone are not suitable for use on Rasch measurement scales (Cavanagh & Koehler, 2013). In the case of the Rasch scale reported in Saubern et al (2020), the location of each teacher can be calculated using a two-step process. ...
Book
Full-text available
This book is a collection of 11 papers that were presented at SITE’s 2024 annual conference in Las Vegas, NV. It also includes a Foreword from SITE President Jake Cohen and an Introduction from the Editors, Drs. Todd Cherner and Rebecca Blankenship.
... Researchers have criticised TPACK for effective technology integration to reimagine new modes of learning (Cavanagh & Koehler, 2013;Pamuk, 2012;Swallow & Olofson, 2017). TPACK has a pedagogical component but there are confusion and contradictions regarding the use of TPACK highlighting its complexity for effective adoption of technology (Cavanagh & Koehler, 2013). ...
... Researchers have criticised TPACK for effective technology integration to reimagine new modes of learning (Cavanagh & Koehler, 2013;Pamuk, 2012;Swallow & Olofson, 2017). TPACK has a pedagogical component but there are confusion and contradictions regarding the use of TPACK highlighting its complexity for effective adoption of technology (Cavanagh & Koehler, 2013). According to Pamuk (2012), educators experience difficulty in explaining interactions among knowledge bases of the TPACK framework. ...
... Additionally, Cavanagh and Koehler (2013) state that there are concerns about potential constraints to using the TPACK framework by researchers in terms of logistics, resource issues, or methodological exigencies and call for improvement of TPACK measurements. Due to the complexities of TPACK and the lack of focus on understanding how it could facilitate technology adoption for effective teaching and learning, Saubern et al. (2020) in their research opened up discussions on the need to reboot applications of TPACK. ...
Article
Higher education systems are under increasing pressure to embrace technology-enhanced learning as a meaningful step towards the digital transformation of education. Digital technologies in education promise optimal teaching and learning, but at the same time, they put a strain on education systems to adapt pedagogical strategies. Classical pedagogical frameworks such as Dewey, Piaget, and Vygotsky’s theories focused on student agency and are not specific to contemporary education with ubiquitous digital technologies. Hence, there is a need for a novel and innovative pedagogical framework that aligns with these emerging and advanced digital technologies. However, recent guidelines to incorporate emerging digital technologies in education have largely focused on ethical dimensions and assessment practices. The lack of an overarching pedagogical framework for teaching and learning practices in the digital era is a threat to quality education. The current study proposes a digital pedagogy for sustainable educational transformation (DP4SET) framework applicable to the new modes of teaching and learning powered by digital technologies. The DP4SET framework comprises four components that advocate for digital competence for accessing deep learning, evidence-based practice with quality digital resources, learning environments with applicable digital technology, and synergy between human teachers and trustworthy artificial intelligence (AI). A real-world application of the DP4SET framework in Chinese contexts proves that it promotes the effective use of technology and significantly reshapes teaching and learning in and beyond the classroom. The proposed digital pedagogy framework provides a foundation for modern education systems to accommodate advanced digital technologies for sustainable digital transformation of education.
... However, the TPACK framework is frequently employed in a broad manner, resulting in a lack of domain-specific differentiations or elaborations, especially when it comes to using self-report measures, as also emphasized by other researchers [8,18,22]. Furthermore, self-report measures hold the risk of a so-called jingle-jangle fallacy due to the lack of extrinsic validity of the TPACK-construct [1,22], highlighting the potential discrepancy between self-reported TPCK and actual knowledge of technology-enhanced teaching among teachers [6,17]. Because of these problems with the TPACK framework, within physics education, researchers recently focused on a more nuanced and detailed examination of the specific knowledge associated with the use of ET which they called digital-media PCK [11,33]. ...
Article
Full-text available
Educational technology (ET) is playing an increasingly important role in classrooms and has the potential to support student learning. However, teachers need to implement ET in a purposeful way. A necessary –but insufficient –condition for meaningful implementation of ET is the intention to use it in the first place. This study aims to investigate the predictors of pre-service teachers’ intention to use ET, specifically focusing on how technological pedagogical content knowledge (TPCK) interacts with the constructs of the Technology Acceptance Model (TAM). While previous research has frequently employed self-reported TPCK to explore these relationships, our study uses a test-based measure to provide a more objective assessment. We also aim to understand how these relationships evolve over time, particularly during a technology integration seminar in teacher education. Using path analysis including N = 146 preservice teachers, we examined the relationships between test-based TPCK, self-efficacy, and the TAM variables (perceived usefulness, perceived ease of use, and intention to use ET). Our findings indicate that self-efficacy is a strong predictor of perceived usefulness, perceived ease of integration, and intention to use ET, whereas TPCK primarily influences perceived usefulness and indirectly affects intention. Furthermore, we observed that the roles of TPCK and self-efficacy shift over time. This study contributes to a deeper understanding of how objective measures of professional knowledge can reshape interpretations of TAM studies and guide the design of teacher preparation programs.
... Technology can be instrumental in making teachers more efficient. These skills are needed by teachers to be successful in their work (Cavanagh & Koehler, 2013). ...
Article
This study aimed to determine which domain of pedagogical content knowledge best influences technology proficiency of teacher. This study utilized the non-experimental quantitative research design using descriptive technique involving teachers in Sarangani District, Davao Occidental Division, Philippines. The study was conducted on the second semester of school year 2020-2021. Research instruments on pedagogical content knowledge and technology proficiency of teacher were used as source of data. Using mean, pearson-r, and regression as statistical tools to treat the data, the study showed the following results: the level of pedagogical content knowledge is high, the level of technology proficiency of teacher is high, there is a significant relationship between pedagogical content knowledge and technology proficiency of teacher, the domains of pedagogical content knowledge that best influences technology proficiency of teacher is pedagogical content knowledge.
... The integration of technology in learning environments has the potential to cultivate a more active, creative, and innovative generation (Kara, 2023;Stefan et al., 2020). Technological Pedagogical Content Knowledge (TPACK), formerly known as TPCK, is knowledge about the appropriate use of technology and pedagogy in various subjects to facilitate learners' understanding and assist teachers in thinking creatively (Cavanagh & Koehler, 2013;Schmid et al., 2021). TPACK is the process of advancing PCK with the addition of technological elements. ...
Article
Full-text available
Technological Pedagogical and Content Knowledge (TPACK) is very important in the process of developing professional teacher competencies in the era of the Industrial Revolution 4.0. The purpose of this study was to determine the profile of biology teachers TPACK on the classification of living things. This research is a descriptive study using a qualitative approach. Data collection techniques were carried out using questionnaires, CoRe&TPaP-eRs, and interviews. The sample was compiled using a saturated sampling technique with a total of ten respondents. The results indicated that biology teachers' TPACK skills must be improved. The results of the questionnaire consisting of 7 TPACK components, reaching 70.11% were categorized as good. The TK, PK, CK, and PCK components are in the excellent category, while the TCK, TPK, and TPACK components are still in the good category. The results of the CoRe & TPaP-eRs instrument covering 5 aspects show that the average of teacher’s TPACK reached 41%, in the Growing-TPACK category. The TPACK of biology teachers based on teaching experience is not so different. The results show that the length of teaching experience is not directly proportional to the growth of TPACK skills.
... Thus, there is no standardized way of measuring the constructs in question. Different results are therefore not too unexpected and show the need to find a reliable and valid way to measure them (see also Cavanagh and Koehler, 2013). ...
Article
Full-text available
In an increasingly digitalized world, pre-service and in-service teachers need subject-specific didactic competencies to be able to plan their lessons appropriately and use their knowledge to promote digital competencies among students. Building on competency models such as the Technological pedagogical content knowledge (TPACK) framework, this article explores the extent to which specific digital competencies relevant to pre-service teachers can be developed through project work in a pedagogical makerspace and examines the extent to which contextual factors such as technological self-efficacy, motivation and technology acceptance influence the development of pre-service teachers’ TPACK and their intention to use digital media. To this end, 495 pre-service science teachers from both intervention and control groups completed a pre-post digital questionnaire before and after the intervention. The data were used for structural equation modeling. The results show that the level of TPACK before the intervention is an important predictor of TPACK after project work. Furthermore, TPACK before the intervention positively influences pre-service teachers’ intention to use digital media in the future. Also, the perceived usefulness for professional use and the intention to use information and communication technologies (ICT) are strongly influenced by TPACK. Consequently, it appears significant to enable a low-threshold entry point at the beginning of the study to provide a solid foundation upon which more advanced TPACK can be built. Motivation and technology acceptance are strongly correlated. Therefore, teacher training should focus on motivation and acceptance of technology.
... Furthermore, self-report measures hold the risk of a so-called jingle-jangle fallacy due to the lack of extrinsic validity of the TPACK-construct (Backfisch, Schneider, Lachner, Scheiter, & Scherer, 2020). The jingle-fallacy highlights the potential discrepancy between self-reported TPCK and actual knowledge of technology-enhanced teaching among teachers (Cavanagh & Koehler, 2013;Kopcha, Ottenbreit-Leftwich, Jung, & Baser, 2014). In this article, we want to focus on a nuanced and detailed examination of the specific knowledge associated with the use of digital media, particular within the realm of physics education. ...
Article
Full-text available
In today's digital age, incorporating technology in teaching has become increasingly important. To prepare future teachers, understanding the factors that influence the development of teachers' corresponding knowledge base is crucial. However, not only is there a small number of studies investigating the development of pre-service teachers' professional knowledge, but these studies also predominantly rely on cross-sectional and/or self-reported data. This pre-post study investigated the factors associated with the development of digital media Pedagogical Content Knowledge (PCK) in physics teacher education through a seminar-based approach with N = 66 pre-service teachers (PSTs). Investigated factors include PSTs' PCK about students' conceptions (PCK-SC), general motivation to use digital media, interest in digital media, and previous experience with digital media. The findings suggest that a strong foundation in various aspects of PCK is essential for the development of digital media PCK, as PSTs' PCK-SC has proven to be a positive predictor for the development of PTSs' digital media PCK. Surprisingly, the study found that higher motivation to use digital media negatively affects the development of digital media PCK, suggesting that excessive motivation may hinder the acquisition of digital media PCK. Additionally, the study found that PSTs' interest in digital media positively influenced the development of digital media PCK, while previous experience with digital media had a negative impact on its development. The study suggests that future research should focus on investigating the general professional knowledge of PSTs and identifying the proficiency levels of PCK needed for the development of digital media PCK. It also emphasizes the importance of considering both, cognitive and affective aspects of teacher preparation, as seminars that solely focus on motivation may not be adequate in fostering effective technology integration. The study recommends that seminars on digital media integration should be situated at later stages of teacher education or after introductory courses that cover other facets of PCK. Furthermore, integrated seminars addressing multiple facets of PCK alongside digital media PCK could be explored in future research. Overall, this study provides insights into the factors influencing PSTs' development of digital media PCK and suggests avenues for further research to improve the integration of digital media in teaching practices for enhanced student learning outcomes.
... Methods that can be used to measure TPACK are teacher knowledge surveys and assessment of learning planning and implementation documents (Abbitt, 2011;Koehler et al., 2011). Both methods can increase the validity associated with TPACK studies (Abbitt, 2011;Cavanagh & Koehler, 2013;Graham, 2011). ...
Article
Full-text available
The low knowledge of TPACK among senior semester students is the main reason for conducting this research. Whereas this knowledge supports them to become a teacher. When the profile is known, the researcher as well as the study program lecturer can prepare and improve the student’s TPACK. This research is descriptive research with a quantitative approach. The purpose of this study was to analyze the TPACK profile of prospective chemistry teacher in the microteaching class. The research subjects were students of the chemistry education study program, University of Bengkulu who took the course. The instrument used is the TPACK questionnaire. This TPACK questionnaire consists of aspects of TK, PK, CK, TPK, TCK, PCK, and TPACK. This questionnaire uses a Likert scale. The results of the study were then categorized into 5 categories in each aspect. This category is useful for grouping students. The results of this study stated that students of the chemistry education study program who took microteaching courses knew this TPACK in the good category. Almost all aspects have an average score in the good category, only 1 aspect has a sufficient average score.
... To answer the research question: The theoretical constructs of teacher knowledge and their interrelations as described by the TPACK model are valid for the adaptation of the model for constructions of space and social media. Validity here refers to "structural validity" as summarized by Cavanagh and Koehler [60] for the TPACK model and excludes other criteria of validity. ...
... On the other hand, experiences with the use of this model in different investigations related to mathematics education have allowed the definition of various units of analysis for each of the domains and subdomains of the model (e.g., Arévalo, García and Hernández, 2019;Cavanagh and Koehler, 2013;Kirikçilar and Yildiz, 2018;Önal, 2016;Schmidt, Baran, Thompson, Mishra, Koehler and Shin, 2009). According to Lee, Chung, and Wei (2022), "TPACK's core themes from the highly cited articles have been surrounding PCK, teacher education, skill, and pedagogy" (p. ...
Article
Full-text available
Background: One of the current concerns in the face of the changes and challenges imposed by the COVID-19 pandemic is the quality of the education received in non-traditional environments such as virtual or hybrid teaching. Elements associated with this problem include the knowledge and skills that mathematics teachers use to work in these environments and to integrate them into mathematics education. Objectives: This investigation aimed to characterise the levels of technological competence self-perceived by mathematics teaching staff when planning and executing a virtual class. Design: A qualitative framework was used in an exploitative-descriptive approach. Setting and participants: This study is part of a doctoral research in which we sought to identify the knowledge demonstrated by three mathematics teachers when incorporating technology into a virtual class with 24 students. The TPACK model (domains and subdomains linked to technology) was used to achieve this. Data collection and analysis: The data was collected through an open-ended interview linked to the video recording of the class, and the analysis used was content analysis. Results: The main conclusion was that the teachers perceived the levels of their technological competence to be very high when implementing an experimental virtual class. Conclusions: It is suggested that their continuous professional development and, especially, having worked together in a team for several years is a possible factor that makes them feel more able to integrate technology in mathematics education.
Article
Full-text available
Although there is ever-increasing emphasis on integrating technology in teaching, there are few well-tested and refined assessments to measure the quality of this integration. The few measures that are available tend to favor constructivist approaches to teaching, and thus do not accurately assess the quality of technology integration across a range of different teaching approaches. We have developed a more "pedagogically inclusive" instrument that reflects key TPACK concepts and that has proven to be both reliable and valid in two successive rounds of testing. The instrument's interrater reliability coefficient (.857) was computed using both Intraclass Correlation and a score agreement (84.1%) procedure. Internal consistency (using Cronbach's Alpha) was .911. Test-retest reliability (score agreement) was 87.0%. Five TPACK experts also confirmed the instrument's construct and face validities. We offer this new rubric to help teacher educators to more accurately assess the quality of technology integration in lesson plans, and suggest exploring its use in project and unit plans.
Chapter
Rating scale instruments have been widely used in learning environment research for many decades. Arguments for their sustained use require provision of evidence commensurate with contemporary validity theory. The multiple-type conception of validity (e.g. content, criterion and construct), that persisted until the 1980s was subsumed into a unified view by Messick. He re-conceptualised types of validity as aspects of evidence for an overall judgment about construct validity. A validity argument relies on multiple forms of evidence. For example, the content, substantive, structural, generalisability aspect, external, and consequential aspects of validity evidence. The theoretical framework for the current study comprised these aspects of validity evidence with the addition of interpretability. The utility of this framework as a tool for examining validity issues in rating scale development and application was tested. An investigation into student engagement in classroom learning was examined to identify and assess aspects of validity evidence. The engagement investigation utilised a researcher-completed rating scale instrument comprising eleven items and a six-point scoring model. The Rasch Rating Scale model was used for scaling of data from 195 Western Australian secondary school students. Examples of most aspects of validity evidence were found, particularly in the statistical estimations and graphical displays generated by the Rasch model analysis. These are explained in relation to the unified theory of validity. The study is significant. It exemplifies contemporary validity theory in conjunction with modern measurement theory. It will be of interest to learning environment researchers using or considering using rating scale instruments.
Chapter
The role of school psychologists in Western Australia has been reviewed a number of times since the establishment of services to schools. Current practice whereby school psychologist allocation to a range of schools is achieved, continues to rely on school student population figures, its socioeconomic index and an appraisal of the school’s ‘difficulty’ level. Psychological services are then allocated accordingly, the decision-making mechanism based on an ad hoc conception of school need. The research reported in this paper concentrates on the issue of trying to establish what aspects or characteristics of learning environments constitute a greater or esser level of need for services and then attempts to measure this need in an objective evidence-based manner. The various elements of school need for psychological services are posited to cluster around constructs extrapolated from the domains of service reported in the international professional literature. These are characteristics of students, characteristics of schools and teacher expertise. The three constructs constitute the preliminary conceptual framework for the study upon which the empirical investigation was based. The study was conducted in three phases: First, item development, theoretical framework refinement utilising data collected from a questionnaire; second, development of a pool of appropriate items, piloting and trialling; and third, utilising the refined linear scale to measure a sample of schools need for psychological services. Data were obtained from samples of principals, teachers and school psychologists working in two Department of Education and Training (DET) school districts. Data analysis employed the Rasch Rating Scale Model and Analyses of Variance. Data fitting the model confirmed that a uni-dimensional trait was measured. Data-to-model fit was estimated by item difficulty thresholds, individual item fit statistics, the person Separation Index and Principal Components Factors loadings of residuals. The results demonstrated that the linear scale instrument developed in the research provided an authentic measure of school need and that the measures of the phase three schools differed significantly from each other. The empirical findings of the study are discussed in the context of their application in informing decisions about the level of psychological services that should be provided to schools congruent with the psychological needs of their students.
Article
In this chapter we reviewed a wide range of approaches to measure Technological Pedagogical Content Knowledge (TPACK). We identified recent empirical studies that utilized TPACK assessments and determined whether they should be included in our analysis using a set of criteria. We then conducted a study-level analysis focusing on empirical studies that met our initial search criteria. In addition, we conducted a measurement-level analysis focusing on individual measures. Based on our measurementlevel analysis, we categorized a total of 141 instruments into five types (i.e., self-report measures, open-end questionnaires, performance assessments, interviews, and observations) and investigated how each measure addressed the issues of validity and reliability. We concluded our review by discussing limitations and implications of our study.
Book
This study investigates the experience of student engagement in high school classrooms ? both the influences on engagement as well as the short-term and long-term educational outcomes resulting from engagement. The experiences of 526 tenth and twelfth grade students enrolled in 13 U.S. high schools during the 1990s were examined. Data were gathered using the Experience Sampling Method (ESM) and student interviews. Students reported higher engagement during individual and group work than while listening to the teacher lecture, watching a video, or taking a test ? and also during their non-academic classes such as art, computer science and vocational education compared to their traditional academic classes. Engagement also predicted long-term continuing motivation and college performance. Over all, results suggested that activities and classrooms that combined academic intensity with features that provoke a positive emotional response are more likely to engage students the short term and the long term.
Book
Constructing Measures introduces a way to understand the advantages and disadvantages of measurement instruments, how to use such instruments, and how to apply these methods to develop new instruments or adapt old ones. The book is organized around the steps taken while constructing an instrument. It opens with a summary of the constructive steps involved. Each step is then expanded on in the next four chapters. These chapters develop the "building blocks" that make up an instrument--the construct map, the design plan for the items, the outcome space, and the statistical measurement model. The next three chapters focus on quality control. They rely heavily on the calibrated construct map and review how to check if scores are operating consistently and how to evaluate the reliability and validity evidence. The book introduces a variety of item formats, including multiple-choice, open-ended, and performance items; projects; portfolios; Likert and Guttman items; behavioral observations; and interview protocols. Each chapter includes an overview of the key concepts, related resources for further investigation and exercises and activities. Some chapters feature appendices that describe parts of the instrument development process in more detail, numerical manipulations used in the text, and/or data results. A variety of examples from the behavioral and social sciences and education including achievement and performance testing; attitude measures; health measures, and general sociological scales, demonstrate the application of the material. An accompanying CD features control files, output, and a data set to allow readers to compute the text's exercises and create new analyses and case archives based on the book's examples so the reader can work through the entire development of an instrument. Constructing Measures is an ideal text or supplement in courses on item, test, or instrument development, measurement, item response theory, or rasch analysis taught in a variety of departments including education and psychology. The book also appeals to those who develop instruments, including industrial/organizational, educational, and school psychologists, health outcomes researchers, program evaluators, and sociological measurers. Knowledge of basic descriptive statistics and elementary regression is recommended. © 2005 by Lawrence Erlbaum Associates, Inc. All rights reserved.