Mapping out guidelines for the development and use of sign language assessment: Some critical issues, comments and suggestions



The purpose of this chapter is two-fold: to illuminate the importance of assessment in the context of language acquisition of deaf individuals, specifically children, and to provide a contextual foundation for some of the studies discussed in this book. Furthermore, with the prevailing lack of sign language assessment tests in most countries, we offer a set of guidelines for the development/adaptation of such tests for less documented sign languages.
1. Introduction
Throughout our lifespan, we are subjected to a wide variety of assessments,
ranging from developmental screening procedures or diagnostic tests during infancy to
academic achievement and/or placement tests at school and in college, as well as a
number of non-academic assessments, including driving exams, job interviews, and/or
self-assessments. Many of these procedures represent integral parts of our social life and
culture (e.g., Bartram, 1990; Fulcher & Davidson, 2007; McNamara, 2000).
One of the main areas of interest to researchers, psychologists, clinicians and
others in the fields of language assessment and cognitive development focuses on
measuring language skills. Similar to the wide range of aforementioned assessments,
which are used on a day-to-day base, language tests can have an equally wide range of
purposes. This may include tests that are part of job application procedures in English-
speaking countries, such as the Test of English as a Foreign Language (TOEFL; e.g.
Davies et al., 1999) or tests to monitor children’s language development (Johnston,
2007). Most of these and other comparable assessments are readily available for use
in/with a variety of spoken languages. In comparison, a much smaller number of tests
exists for sign languages and even fewer are a) suitable for use in educational settings
(Mann, 2007; 2008) and/or b) commercially available (Haug, 2008). Aside from a few
well-documented sign languages, including American Sign Language (ASL) and British
Sign Language (BSL), this area of research is still considered very young, for instance,
when compared to the large body of literature on the acquisition of spoken languages.
The lack of standardized sign language assessments proves to be particularly challenging
for practitioners in need for instruments to evaluate deaf children’s sign language
acquisition against normative developmental milestones. Consequently, decisions about
appropriate educational placements or recommended interventions for deaf children are
generally based on tests of spoken and written language skills, with only impressionistic
assessments being made of sign language skills (Herman, 1998). These shortcomings are
not limited to one country but exist on an international level (Germany: Haug &
Hintermair, 2003; Switzerland: Audeoud & Haug, 2008; United Kingdom: Herman,
1998; United States: Mann & Prinz, 2006).
One of the reasons for the international shortage of sign language assessment
tools lies within the daunting process of developing such tools. For instance, only a small
number of deaf children have deaf parents (less than 10%, Mitchell & Karchmer, 2004)
and are considered native users of the sign language. These children are critical for both
test developers and researchers as they represent the normative group: children, who are
exposed to sign language from early on at home and who develop sign language at a pace
that is comparable to hearing peers. Native signers constitute the “ideal” population to
establish developmental norms for a sign language test. In comparison, the majority of
deaf children experience language very differently, partly as a result of the inconsistent
input they receive by their hearing parents and/or professionals, ranging from sign
language to spoken language only. For these reasons, language outcomes for deaf
children with hearing parents have been widely acknowledged as notably lower than
those achieved by native signers (Hermans, Knoors, & Verhoeven, 2009; Mayer & Leigh,
2010; Spencer, 2004). Because of the limited early exposure to sign language for most of
these individuals, their struggle to successfully acquiring a language is understandable
(Marschark, 2002).
Another, related reason for the shortage of sign language instruments is that much
of the established knowledge what constitutes ‘typical’ language acquisition in sign
comes from studies with fairly small numbers of participants (e.g., Morgan & Woll,
2002; Schick, Marschark, & Spencer, 2006). These studies tend to focus on deaf native
signers and/or children of deaf adults (CODA), who are considered hearing native
signers. Because of their limited numbers, the challenge of recruiting large samples of
deaf native signers, specifically in smaller countries that may not even have a
(residential) Deaf school or program becomes apparent (see Mann, Roy, & Marshall,
2013, for a discussion of what constitutes the ideal norming sample).
The limited number of available sign language tests can be linked to a number of
other, related, explanations: the generalizability of findings from sign language
acquisition research on deaf native signers to the (larger) deaf population, including deaf
children of hearing parents, and the lack of empirical documentation on the process of
language acquisition for many sign languages. Traditionally, most of the available
research has focused on ASL and/or BSL (for ASL: Anderson & Reilly, 2002; Pettito,
1987; Pettito & Marentette, 1991; Reilly, 2006; for BSL: Morgan, Herman, & Woll,
2002; Morgan, Barrière, & Woll, 2003, 2006; Morgan, Herman, Barrière, & Woll, 2008;
Woolfe, Herman, Roy, & Woll, 2010) although more research is becoming available on
other sign languages (Australian Sign Language: De Beuzeville, 2006; Brazilian Sign
Language: Bernardino, 2005; German Sign Language: Hänel, 2003; Italian Sign
Language: Pizzuto, 2002, Sign Language of the Netherlands: Slobin et al., 2003). The
extensive research available on some sign languages (e.g., ASL) makes it tempting to use
these results to draw conclusions about other sign languages. This approach may provide
test developers with a more general understanding of the developmental aspect of a sign
language. However, such “transfer” should be done with caution and not without a
critical analysis (see Haug & Mann, 2008, for a discussion).
Despite researchers’ preference to use families with at least one deaf parent for
studies on language development, it would be immature to consider parental deafness as
the only predictor of a child’s sign language proficiency. This has been demonstrated in a
study by Singleton and Newport (2004), which investigated ASL development in a deaf
child with deaf, non-native, signing deaf parents who acquired ASL after age 15. The
language proficiency level of the reported deaf child was based on the input from his
parents. However, at the age of 7, the child outperformed both of his parents on an ASL
morphology task, indicating that he was – despite inconsistent ASL input – able to
acquire most ASL morphemes comparable to native signing deaf children (for a detailed
discussion, see Singleton & Newport, 2004). This is also an important issue in sign
language acquisition studies, i.e. the variability that depends, in part, on the language
background of the deaf parents. Traditionally, it has been assumed that a deaf child,
whose parents are deaf, will receive meaningful and consistent linguistic input from early
on. At the same time, the importance of the input provided by hearing parents may have
been underestimated. More than sign fluency, the frequency of adult-child interaction
may provide enough meaningful input for young learners to quickly surpass their
language models and to become proficient signers, similar to the reports by Singleton &
Newport (2004). This raises the question whether parental hearing status alone is
sufficient as matrix for language input or may need to be broadened to “signers at home”.
For instance, Mann et al. (2013) used “number of deaf family members” as one of their
predictor variables and found that it helped to explain some of the variance in deaf
children’s sign production.
Because of deaf children’s different language experiences, it is crucial to examine
more closely the extent of variability for development of sign languages and to confirm
existing findings on larger numbers of children. In this context, it may be premature to
assume that deaf and hearing children in signing families are equivalent in terms of
language acquisition. For instance, Herman and Roy (2000) noted that hearing children in
deaf families are likely to be bilingual from an early age, whereas deaf children start off
as monolingual in sign, at least until they enter school. This generates a need, at least
initially, to establish monolingual norms in sign language as a basis for measuring deaf
children’s progress in language development.
Another issue related to sign language test development is the cultural
acceptability of standardized tests. For instance, in the United States, standardized tests
(both educational placement, language assessment etc.) play an important role in
measuring students’ academic achievement. This perception is not necessarily shared
around the globe and tends to be highly cultural-determined, particularly in countries
where standardized testing is a less integrated part of academic assessment (e.g., in
Sweden; Schönström, Simper-Allen, & Svartholm, 2003; also see Haug & Mann, 2008).
Finally, when considering the development of a test for sign language (as for any
other language), the purpose of the test should be clearly defined before the development
begins. An assessment that is developed primarily as a research tool may look different
and be devised in a different way from a test used to carry out large-scale norming.
Similarly, a test to provide clinicians with a diagnostic tool may look very different from
an assessment used by schools to monitor students’ developmental progress although
some overlap of interests across targets may be possible. Examples of such multi-purpose
assessments include the ASL Sentence Repetition Test (ASL-SRT; Hauser,
Paludnevičienė, Supalla, & Bavelier, 2008) and the ASL/English Vocabulary Tasks
(Mann, in preparation).
2. Types of Sign Language Assessments
Tests on sign language generally fall into one of four categories (Haug, 2008):
a) Tests for sign language acquisition:
Primary use: to measure deaf children’s sign language skills, monitor
development, and to inform intervention (where appropriate).
Assessment instruments developed for (selected examples): BSL (British Sign
Language Receptive Skills Test, Herman et al., 1999), MacArthur Communicative
Development Inventory for BSL (Woolfe, Herman, Roy, & Woll, 2010), (Web-
based Vocabulary Tasks, Mann & Marshall, 2012); DGS (German Sign Language
Receptive Skills Test, Haug, 2011), SLN (the Assessment Battery for Sign
Language of the Netherlands, Hermans, et al., 2010), and ASL (American Sign
Language Proficiency Assessment, Maller, Supalla, Singleton, & Wix, 1999),
MacArthur Communicative Development Inventory for ASL, (Anderson & Reilly,
b) Tests for educational purposes:
Primary use: investigate the relationship between deaf children’s sign language
proficiency and their literacy skills.
Assessments instruments developed for: ASL (American Sign Language
Assessment Instrument, Hoffmeister, 1999; Test of American Sign Language
(Prinz, Strong, & Kuntze, 1994), DGS (Computer Test for German Sign
Language, Mann, 2008), and LSF (Test of French Sign Language, Niederberger,
c) Tests for linguistic research:
Primary use: to study specific grammatical features of sign language(s) and
inform our understanding of how sign languages work.
Assessments developed for: ASL (Grammatical Judgment Test for American Sign
Language, Boudreault, 1999; Boudreault & Mayberry, 2000; American Sign
Language Sentence Repetition Test, Hauser, Paludnevičienė, Supalla, & Bavelier,
2008; American Sign Language Morphology and Syntax, Supalla, Newport,
Singleton, S. Supalla, Coulter, & Metlay, 1995, unpublished), BSL (BSL
Nonsense Sign Repetition Task, Mann, Marshall, & Morgan, 2008; Mann,
Marshall, Manson, & Morgan, 2010).
d) Tests for adult second language learners:
Primary use: assessment for adults learning sign language as an
additional/foreign language, professionals, who work with deaf people or have
deaf colleagues
Assessments developed for: DGS (Aachen Test for Basic German Sign Language
Competence – Adult Version, Fehrmann, Huber, Jäger, Sieprath, & Werth, 1995a,
1995b), ASL (Sign Language Proficiency Interview (Caccamise & Newell, 1995).
3. Approaches to the development and/or adaptation of sign language tests
Aside from the number of different assessment tests described above, few or no
sign language assessments are available in most countries, particularly those outside of
North America and Western Europe. Because of this lack, many practitioners and/or
researchers looking for a appropriate tool to investigate the sign language of their country
frequently turn to existing tests that have been developed for spoken language (category
1) or to those available in other sign languages to use as a template (category 2).
Examples of the first category include the MacArthur Communicative Inventory (CDI;
Fenson et al., 1993), the Peabody Picture Vocabulary Test (PPVT, Dunn & Dunn, 1997),
and/or the Expressive/Receptive One Word Picture Vocabulary Test (EOWPVT;
Brownell, 2000; ROWPVT, Gardner, 1985). For instance, the MacArthur CDI, a parental
checklist to monitor language development in young children from 8-36 months, has
been adapted for American Sign Language (Anderson & Reilly, 2002) and, more
recently, for British Sign Language (Woolfe et al., 2010). Similar formats to the PPVT
and ROWPVT/EOWPVT were used in tests to assess Sign Language of the Netherlands
(Hermans et al., 2010) and BSL (Mann & Marshall, 2012) skills. Furthermore, the PPVT
has been adapted for American Sign Language (Schick, 2002).
An example of the second category is the BSL Receptive Skills Test (Herman et
al., 1999), which has been adapted for a number of different sign languages, including
American Sign Language (Enns & Herman, 2011), Australian Sign Language (Johnston,
2004), and German Sign Language (Haug, 2011). While this approach seems plausible in
light of the lack of available tests, it may give the misleading impression that sign
languages across countries are the same/compatible and that linguistic structures (or
features) in one sign language can easily be “transferred” to another. However,
comparative sign linguistic research (e.g., Zeshan, 2004, 2006) has shown that this is not
the case. Consequently, past attempts to measure similar, or identical, features or targets
across different sign languages, using an adapted test have not been without
complications (Haug, 2011 for a review on adapting spoken and sign language tests). The
most common factors influencing test adaptation across sign languages include
differences in linguistic structures and cultural influences. Other issues that require
caution comprise the adaptation of established psychometric properties, such as reliability
and validity from the source test to the adapted version. This step is crucial because it
means that the psychometric properties need to be established anew in an adapted test,
even though the source test may have shown strong evidence of reliability and validity
(Hambleton, 1994, 2001, 2005).
Reliability refers to whether the test actually measures what it is intended to
measure (Rust & Golombok, 2000). One measure of reliability is inter-rater reliability,
which refers to the level of agreement between two or more raters on a participant’s
performance (Davies et al., 1999). This can be done, for instance, by video-recording a
child’s language production and then have different raters individually score the same
specific grammatical structures. The core claim for the validity of a test is whether it
really measures what it claims to measure (Kline, 2000). With regard to sign language,
this could mean whether a test of sign language vocabulary really measures vocabulary
knowledge and not, for example, the ability to guess the meaning of iconic signs (e.g.,
EAT, SLEEP) or gestures, which may be comparably high in non-signing hearing
children (e.g., White & Tischler, 1999). One of the several types of validity is content
validity. Content validity examines whether, for example, the test items (and the test as a
whole) represent the linguistic structures to be tested (Davies et al., 1999). One of the
prerequisites for assuring content validity in a test of sign language skills is the close
collaboration with deaf native signers during the developmental stage (Singleton &
Supalla, 2003).
In sum, a growing number of sign language assessments have been developed
over the course of the last 10-15 years, specifically for use with deaf children. Yet, only
few of these assessments meet the psychometric criteria (e.g., validity and reliability) that
are required for a “good” test and are also suitable for use in educational settings.
Additional tests, including linguistic/cognitive assessments, are still needed. Moreover,
most of the “available” tests focus on assessing typical sign language development,
which results in a shortage of diagnostic tools. The strong need for instruments which can
provide professionals with both diagnostic and acquisition information becomes evident
in light of the considerable numbers of deaf individuals with additional needs (30-40%,
Cone-Wesson, 2003).
4. Sign language assessments for special needs groups
In addition to the limited research on sign languages and the shortage of available
assessments, the characteristics of the ‘typical’ deaf language user (if such a person
exists) keep changing: Traditionally, the distinction of deaf language users was made
based on parental hearing status and type of amplification. These distinction criteria have
become much more diversified due to the additional needs of certain sub-groups within
the deaf population. These sub-groups include (1) deaf individuals from different cultural
and linguistic backgrounds, (2) children with uni/bilateral CI, (3) individuals with
dementia or motor disorders, or (4) individuals with additional (language) needs, which
are one of the foci of this book. While the first two sub-groups can be more easily
identified, this is not always the case with children in the last sub-group, particularly
those children with language and/or learning problems (e.g., Herman et al., this volume).
Given the delay with which many deaf children come to language, the distinction
between language impairment versus language delay remains challenging. Due to the
lack of any screening instruments, identification of language-impaired children usually
remains with the parents or teachers, who express concern for their children’s sign
language development (Mason et al., 2010). Even then, clinicians left without the proper
diagnostic tools to systematically identify particular problems, struggle with establishing
whether or not these children progress normally and with all the complexities of signed
language development (Quinto-Pozos et al., 2011)
Little is known to what extent any of the currently available sign language tests
are efficient/suitable for use with any of the described sub-groups. The dearth of
knowledge about most of these individuals is mainly due to their general exclusion from
empirical studies that aim to establish norms/collect average scores and to inform future
assessment as part of the standardization of a test. As a result of this data shortage, we
only have limited knowledge about their language skills and these sub-groups continue to
remain mostly underrepresented in the literature, with few exceptions (e.g., Denmark,
2011; Shields, 2011, on deaf children on the autistic spectrum; Große, 2004; Mann, 2008;
Mahon et al., 2011, on deaf children from migrant backgrounds; Mason et al., 2010;
Morgan, 2005; Morgan et al., 2007; Marshall et al., 2006; Quinto-Pozos et al., 2011, on
deaf children with language and/or communication disorders, Mann et al., 2013, for a
discussion of the need to include deaf children with additional needs in research).
5. Links to other chapters in the book.
The need for assessment instruments for signed languages that go beyond the
testing of “typical” sign language development is highlighted by some of the other
chapters in this book (e.g. Herman et al.; Quinto-Pozos et al.; Shield & Meier). For
instance, Chen Pichler and colleagues (chapter 10) argue that more sign language
assessment instruments are needed in order to effectively document the bimodal
development of children acquiring English and ASL. Similarly, Herman and colleagues
(chapter 2) suggest that sign language assessment tests can help identify SLI in deaf
children or to test deaf children on the autism spectrum (e.g., Shield & Meier, chapter 4).
One of the overarching themes across these chapters is that standardized sign language
tests should not be limited in their use to typically developing deaf children, but also be
suitable for children where a different form of testing is required, e.g. to detect SLI.
6. Looking ahead: sign language assessment and Information and
Communication Technologies (ICT)
As mentioned before, there have been growing international efforts to address the
need for appropriate assessments to use with deaf children. Many of these efforts
included new technologies related to instruction, e.g., the use of web-based video lectures
in Slovenian Sign Language for deaf students (Debevc & Peljhan, 2004), clinical
assessment e.g., the computer-based psychiatric diagnostic interview in ASL (Montoya et
al., 2004), or language research (e.g., web-based vocabulary tests for BSL, Mann &
Marshall, 2012, and ASL, Mann, in preparation). Other technologies include a computer-
based platform for delivering content in ASL from K-12, (Hooper, Rose, & Miller, 2005;
Miller, Hooper, & Rose, 2005'), and an interactive, web-based, multi sign language test
interface (Haug, in preparation).
The advantages of some of the features of ICT, such as automated score saving
and/or score reporting functions, are unquestioned as they are more tailored towards the
modality-specific needs of sign languages. For instance, in a standardized testing format
with a computer- or web-based, interactive interface, videos can be easily integrated;
responses (e.g., in a multiple-choice test) can be easily stored and later exported into a
statistical program for analysis. In addition, set up, access, and administration of tests is
facilitated even further through the use of remote testing via the internet. However, this
way of using ICT for language assessment is not without potential shortcomings and
raises some questions related to data security (e.g., who has access to the test data? Are
they stored on a secure server?). Also, the success of online testing largely depends on
the availability of compatible resources/equipment at the test site and the extent to which
this equipment meets the new standards, e.g., high-speed internet connection, capacity to
play/store videos, on site IT support etc. Nevertheless, there is a clear advantage of new
technologies for sign language assessment due to the highly interactive format, e.g.
testing of content/grammar within a narrative context as used in the Computer Test for
German Sign Language (Mann, 2005).
7. Taking the first step – how to test sign language development without
standardized assessments?
So far, we have highlighted the important role of assessment, specifically
language assessment, in our lives and explained the need for (more) tests that assess sign
language skills. In addition, we presented a number of available sign language tests and
discussed some of the major challenges of test development along with the possible
pitfalls of adapting a test from one sign language to another. Our aim for the remaining
part of this chapter is to offer some guidelines for those readers interested in assessment,
who may not have access to any of the tests we mentioned. We present these guidelines
based on an adapted version of McNamara’s (2000) “testing cycle” by Haug (2011) for
use with sign language1.
Step 1: Identifying a rational: Test development should be approached/viewed as
a cyclic rather than a linear process. There can be different reasons why language tests
are developed, e.g., due to changes in the school curriculum (e.g., McNamara, 2000), or
practitioners’ need for a sign language test instrument (e.g., Audeoud & Haug, 2008;
Haug & Hintermair, 2003; Herman, 1998b; Mann & Prinz, 2006).
Step 2: Acknowledging possible constraints: Before test developers can start
thinking about test content, they need to consider existing or potential limitations,
including available budget and timeline, e.g., is the test part of a research project with
limited funding options?
Step 3: Defining the purpose of a test: this includes a clearly defined purpose of
the test and an appropriate testing method. The purpose always has an influence on the
testing method. The method can be viewed as two-dimensional: (1) the nature of the
reference, i.e. norm- vs. criterion-referenced and (2) nature of the sample, i.e.
spontaneous vs. elicited language samples. When thinking about the test method, it is
most important to consider the interaction of the test participant with the test materials,
most obvious with the response format, i.e., how the test participant needs to response to
the test materials (e.g. videos).
Step 4: Content of the test: Once the purpose of the test has been identified, the
next step is to define the content of the test namely which aspects of a language should be
tested, e.g. the acquisition of lexical development. Other variables to take into
consideration include the age of the target group and the type of language skills, e.g.,
comprehension and/or production to be tested.
Step 5: Test specification: these specifications represent the ‘blueprint’ of what
needs to be done in which order, e.g., the development of the items; they are a result of
the design, of the purpose, the content, and the (test) method and for the basis for
constructing the test. Developing a new test also involves designing appropriate testing
procedures and test items. Other issues that are important to consider include
environmental factors of the testing situation, e.g. is the tester familiar to the test
participant, familiarity/unfamiliarity of the test location (room), management and secure
storage of large amounts of data (e.g., language production), and the need to report
psychometric properties, i.e. validity and reliability for a new test.
Step 6: Rating/Scoring: Depending on the content and the testing method it is
important to have appropriate rating procedures (e.g., rating sheets) in place. These
procedures may be very different for production skills, e.g. checking for correct use of
classifier constructions vs. comprehension skills, e.g., use of a multiple-choice format.
Step 7: Score report and interpretation of results: It is important to think about
how scores will be reported and how they should be interpreted. With score reporting we
mean, how does a test participant learn about the test results? For example, will a test
subject receive a printout of the score report along with some comments? It is also
important to consider how the results are interpreted and perhaps used to inform
decisions related to placement (e.g., foreign language class), intervention (e.g., individual
support sessions), and/or admittance (e.g., university).
Step 8: Pilot and main study: Before a test can be used, for example in schools for
the deaf, it is important to conduct a pilot study and a main study. The pilot study can
result in some revisions of the test items, the testing procedure, and/or the rating/scoring
procedures. This approach also allows establishing psychometric properties of the new
test, which then need to be reported.
Step 9: Training of the tester: One of the key aspects for successful test
development and -administration is the training of the testers, i.e. making the tester aware
of issues related to use with children, psychometrics (e.g., to use the test with every child
in the same way), and with regard to sign language acquisition.
It might appear to some readers that our guidelines make the task of
developing/adapting a sign language test even more daunting. Therefore, we close this
chapter with the following, general, recommendations, as an encouragement to those
interested in exploring the field of sign language assessment.
1. In order to provide quality assessments for sign language(s), be it a
standardized receptive skills test or a set of standardized checklists, one needs to start
somewhere, no matter how simplistic a first test version might be.
2. At the same time, it is important for the test developer not to forget about
the original purpose of the test. This is not always an easy task given the heterogeneous
nature of the deaf target population, including various subgroups (deaf children with
hearing parents, deaf children from minority backgrounds, deaf children with/without CI,
etc.). One way of accounting for the heterogeneity may be to collect data over a longer
period of time to set up language profiles of different subgroups.
3. Finally and above all, for any test adaptation/development to be
successful, a close collaboration from early on with deaf native signers, researchers,
and/or practitioners/clinicians working in an educational setting is (one of) the most
crucial precondition(s).
