Self- and Surrogate-Reported Communication
Functioning in Aphasia
Patrick J. Doyle
Geriatric Research Education and Clinical Center, VA Pittsburgh Healthcare System and
Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA
William D. Hula
Geriatric Research Education and Clinical Center, VA Pittsburgh Healthcare System and
Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA
Shannon N. Austermann Hula
Research Service, VA Pittsburgh Healthcare System, Pittsburgh, PA
Clement A. Stone
Department of Psychology in Education, University of Pittsburgh, Pittsburgh, PA
Julie L. Wambaugh
Research Service, VA Salt Lake City Healthcare System and Department of Communication
Sciences and Disorders, University of Utah, Salt Lake City, UT
Katherine B. Ross
Audiology and Speech Pathology, Phoenix VA Healthcare System, Phoenix, AZ
James G. Schumacher
Audiology and Speech Pathology, VA Pittsburgh Healthcare System, Pittsburgh, PA
This work was supported by VA Rehabilitation Research & Development Merit Review Award C6098R,
Career Development Award 6210M, and the VA Pittsburgh Healthcare System Geriatric Research
Education and Clinical Center. An earlier version of this work was presented at the Clinical Aphasiology
Conference, May 27, 2010, Isle of Palms, SC, USA. The contents of this paper do not represent the views
of the Department of Veterans Affairs or the United States Government.
The original publication is available at www.springerlink.com:
Aphasia is an acquired neurogenic impairment of language performance, usually resulting from
focal brain damage involving the dominant (usually left) hemisphere . In most cases,
communication deficits are present in all input and output modalities (i.e., speaking,
understanding, reading, and writing), and they are disproportionate to any other cognitive
impairments that may be present . The term aphasia specifically excludes motor speech
disorders resulting from muscle weakness or incoordination (e.g., dysarthria), as well as
communication impairments resulting from dementia, delirium, coma, or sensory loss . Stroke
is the most common cause of aphasia , and approximately 20% of stroke survivors have
persisting aphasia . The worldwide incidence and prevalence of aphasia are not known, but
there are currently estimated to be more than 1 million people living with the condition in the
United States . The negative consequences of aphasia include psychosocial difficulties,
reduced functional independence, and diminished vocational opportunities.
The measurement of communication outcomes is critical to the care of patients with
aphasia and to the evaluation of stroke rehabilitation programs. In addition to traditional
performance-based and clinical indicators of communication functioning, increasing emphasis
has been placed on patient-centered assessments. Several patient-reported stroke outcome
assessments include sub-scales of communication functioning [5-7], and additional scales have
been developed specifically for patients with aphasia [8-12].
One issue that has concerned developers and users of these and related scales is the extent
to which stroke survivors in general, and stroke survivors with aphasia specifically, can provide
valid self reports of their own functioning [13-21]. This concern has led to the collection of
proxy1 reports and their direct comparison with patients’ self reports [13-20]. It has also been
noted that proxy reports may constitute a valid perspective in their own right, regardless of their
correspondence with patients’ ratings [22, 23].
Stroke-specific studies that have included participants with aphasia are in agreement with
the more general literature that patient and proxy respondents demonstrate higher agreement on
ratings of more directly observable domains (e.g., physical function vs. energy) and that proxies
tend to rate patients as more limited than patients rate themselves [16-18]. In these studies, the
strength of association between patient and proxy reports, expressed as intraclass correlation
coefficients, has ranged from 0.50 to 0.70 for language and communication scales. Studies
specific to patients with aphasia have produced similar findings [13-15]. Some researchers in this
area have concluded that, in cases where patients with aphasia are unable to give valid self
reports, substitution with proxy reports is appropriate [13, 16]. Others have been more cautious
One limitation of these patient-proxy comparison studies is that they have not evaluated
whether the scales in question have invariant measurement properties in the two groups.
Investigation of measurement invariance asks whether a scale measures the same construct in the
same way in two different populations. Questions of measurement invariance may be addressed
using latent variable modeling approaches to psychological measurement. Within this framework,
observed responses to test items are taken as indicators of unobserved (latent) constructs that are
1 The term “proxy” has been used with two distinct meanings in the literature. Some authors have used the term to
refer to a person close the patient who responds as he or she believes that the patient would respond [9, 16]. Others
have used the term to refer to a person close to the patient who provides his or her own assessment, without
considering how the patient might respond [12, 14]. In still other cases, the meaning is not clearly specified .
the actual objects of study . Thus, a model relating the observed scores to the underlying
latent construct is necessary, and when group comparisons are made, it must be shown that this
model is structured similarly for the groups involved [25, 26]. Without demonstration of
invariance, between-group comparisons of means, variances, and covariances may be
confounded [25, 27, 28]. While investigations of measurement invariance in patient-reported
health-status assessment have frequently focused on cultural, ethnic, gender, and age differences
[29-35], the issue is equally applicable to potential differences in how patients and their proxies
use self-report scales.
A related issue concerns the underlying conceptual structure of communication
functioning. In order to evaluate measurement invariance, the structure of the latent variable in
question must first be established within a reference population. Among the many instruments
that have been developed to assess various aspects of functional communication in aphasia [[7-
12, 22, 36-43], there is a general lack of a unifying conceptual structure  and much
variability in how the construct has been operationalized . Some instruments propose
multiple subdomains of communication functioning that may be assessed individually or in
combination [36, 41] others provide only an overall score [22, 39], and still others have chosen
to measure communication as an undifferentiated aspect of general cognition [33, 34].
In this context, we have begun to develop a new self- and surrogate-reported2 instrument
for measuring communication functioning in persons with aphasia: The Aphasia Communication
Outcome Measure (ACOM). Initial steps in developing the ACOM item pool were reported in a
prior paper . In the present study, we asked the following questions: (1) Do items describing
2 We use the term “surrogate” here to specify the second meaning of the word “proxy” discussed above in Footnote
1, i.e., a person close to the patient who provides his or her own assessment, without trying to respond as he or she
thinks that the patient would respond.
self- and surrogate-reported communication functioning in aphasia reflect a single
unidimensional scale? We plan to develop one or more communication functioning item banks
calibrated to an item response theory model . Because the most easily applied item response
theory models assume unidimensionality, the present paper is focused on defining valid single-
factor scales. (2) Do self and surrogate ratings of communication functioning demonstrate
measurement invariance? That is, can they be interpreted and directly compared using a common
scale? (3) To what extent do self and surrogate ratings of communication functioning agree? (4)
Are persons with severe aphasia able to provide meaningful self-reports about their own
Participants were 133 persons with aphasia (PWAs) and 133 surrogate respondents.
PWAs met the following inclusion criteria: diagnosis of aphasia ≥1 months post-onset;
community dwelling; self-reported normal pre-morbid speech-language function; pre-morbid
literacy with English as a first language; negative self-reported history of progressive
neurological disease, psychopathology, and substance abuse; ≥0.6 delayed/immediate ratio on
Arizona Battery for Communication Disorders of Dementia Story Retell ; ≤5 self-reported
depressive symptoms on the 15-item Geriatric Depression Rating Scale ; and Boston
Diagnostic Aphasia Exam severity rating ≥1. Surrogate (SUR) respondents met similar criteria,
except for diagnosis of aphasia and reported weekly or more-frequent contact with their
respective PWA both prior to and after aphasia onset. A subset of the PWAs (n = 116) was also
administered the Porch Index of Communicative Ability , a performance-based test of
communication impairment. Demographic and clinical characteristics of the sample are
summarized in Tables 1 and 2.
The initial ACOM item pool was comprised of 177 items describing
various communication activities. The content of the items is presented in Appendices A and B.
Participants were asked to rate on a 4-point scale (not at all, somewhat, mostly, completely) how
effectively the PWA performs each activity. “Effectively” was defined as “accomplishing what
you want to, without help, and without too much time or effort.” Respondents were also
permitted to indicate that they had no basis for rating a particular item, or that the PWA did not
do the activity in question for some reason other than his/her aphasia, in which cases the
responses were coded as missing data. For example, many surrogates indicated that they had no
basis for rating the item “get help in an emergency” because they had never observed their
partner do this, and many PWAs responded similarly because they had not experienced any
emergencies since the onset of their aphasia.
Responses from PWAs and surrogates were collected separately by trained research staff
using an interviewer-assisted administration format. Each item was displayed on a computer
screen in large font along with the stem “How effectively do you…” (for PWAs) or “How
effectively does your partner…” (for surrogates). The examiner read each item aloud and also
permitted the respondent to read it. The computer screen also displayed a vertical bar
representing the response categories with text labels. Participants were permitted to give their
responses verbally, by pointing to the screen, or a combination. In cases where there was any
uncertainty about the validity of the response, the examiner verified the response by verbally
repeating the item and the response back to the participant and also indicating the chosen
category on the screen.
Analyses and Results
To address our research questions, we took a factor analytic approach, using Mplus version 5.2
 with the weighted least squares mean-and-variance-adjusted estimator. We began the
analysis by collapsing item response categories with < 10 observed responses in either the PWA
or SUR data with adjacent categories. For example, if the response category “completely” was
used for a particular item by fewer than ten PWA, we collapsed “completely” with “mostly” and
treated these two responses as the same for this particular item. Also, we excluded items with ≥
5% missing responses for either the PWA or SUR. Missing data were handled with pairwise
deletion. Items retained in the analyses (n =101) described below are presented in Online
Resource 1. Items excluded by the missing data criterion (n = 73) are presented in Online
An initial attempt to fit the 101 retained items to single factor model yielded poor fit for
both the PWA and SUR data (Comparative Fit Index (CFI) < 0.9, Tucker-Lewis Index (TLI) <
0.95, and root mean square error of approximation (RMSEA) > 0.10). Next, we performed
separate exploratory factor analyses on the PWA and SUR data. A three-factor model provided
marginally adequate fit for both PWA (CFI = 0.949, TLI = 0.971, RMSEA = 0.074) and SUR
(CFI = 0.949, TLI = 0.979, RMSEA = 0.081).
The factors identified in these exploratory models defined coherent groupings of item
content and were predominantly consistent across the two sources of report. The item content
and salient loadings (>0.4) are presented in Online Resource 1. For both groups, the items that
loaded onto the first factor were primarily related to verbal expression (talking), with the second
and third factors related to writing (including typing) and comprehension (both auditory and
written), respectively. The factor correlation matrix, presented in Table 3, was similar across the
PWA and SUR samples.
Based on the above analysis, we selected three item subsets, henceforth referred to as
domains, based on the content groupings identified by the three factors that the PWA and SUR
participants had in common: Talking, Comprehension, and Writing. The subsequent analysis
steps were carried out separately for each domain, and included: item reduction, testing of
measurement invariance, and analysis of patient-surrogate agreement.
First, we fit a series of unidimensional confirmatory factor models separately for the
PWA and SUR items within each domain. When a one-factor model demonstrated poor fit, an
exploratory model was estimated and items with non-salient loadings on the primary factor were
excluded until adequate fit to a unidimensional model was achieved. We also inspected the
model modification indices provided by Mplus and excluded items that contributed substantially
to model misfit. We considered a model to have adequate fit when the following criteria were
met: CFI > 0.95, TLI > 0.95, RMSEA < 0.08, and weighted root mean square residual (WRMR)
< 1.0 .3 In excluding items based on the factor analysis results, we also attempted to retain
the largest possible groups of items with the most directly related content.
3 The CFI and TLI are measures of incremental or relative fit that compare the tested model to a null model, which
assumes that there are no relationships between any of the observed variables. They both adjust for model
complexity, the CFI with an expression that subtracts the model degrees of freedom from the model chi-square
value, while the TLI is based on the ratio of the chi-square to its degrees of freedom. CFI and TLI values of zero
indicate worst possible fit, while values close to 1 indicate relatively good fit. The RMSEA is a badness-of-fit
measure where a value of zero indicates best possible fit. It is based on the model chi-square, its degrees of freedom,
and the sample size. The WRMR is a newer statistic that measures the weighted average difference between the
observed and model-estimated population variances and covariances.
Starting with an initial set of 50 Talking items, we retained 24 items that fit a
unidimensional model for both sources of report. The content of the retained items was primarily
related to verbal conversation and social interaction, e.g., “tell people about yourself” and “start a
conversation with other people.” By contrast, much of the excluded item content related to de-
contextualized verbal performance, e.g., “say the names of clothing items,” and basic
communication, e.g., “say your name.” Item reduction for the Comprehension domain began
with 29 items. Ten items were retained in the final model, all of which described auditory
comprehension activities, e.g., “follow group conversation,” and “follow tv shows.” For the
Writing domain, item reduction began with 18 items. Fourteen items were retained in the final
factor model, including “write down a phone message,” and “write your name.”
To evaluate measurement invariance for each scale, we tested a series of nested
confirmatory factor models [24, 25, 28], using the theta parameterization option in Mplus and the
DIFFTEST option for chi-square difference testing of nested models. Because of the potential
dependency between the PWA and SUR item pairs with identical content, we did not conduct a
traditional multiple group analysis, but instead treated the paired PWA and SUR responses as a
single case . We specified a series of 2-factor models in which the PWA responses loaded on
the first factor and the SUR responses loaded on the second. In order to model the PWA-SUR
dependency, the errors for each item pair were permitted to covary. The first model tested in
each domain evaluated configural invariance, which requires that items respond to the same
factor(s) in both groups . This model permitted item thresholds, factor loadings, and factor
variances to vary across the two groups . Next, we evaluated weak and strong factorial
invariance in a single step. Weak invariance requires that factor loadings be equal across groups
and permits valid comparisons of estimated factor variances and covariances. Strong invariance
adds the constraint that item thresholds are equal for both groups and supports valid comparison
of estimated group means . In this second step, we tested a model in which the factor
loadings and thresholds for each PWA-SUR item pair were constrained to be equal. Finally, we
evaluated strict factorial invariance, which adds the additional constraint that the residual
variance for each item must be equivalent in the two groups. When strict factorial invariance is
met, observed score variances and covariances may be validly compared, and additional support
for the validity of group mean comparisons is provided as well . In each case, we used chi-
square difference testing to evaluate whether the added model constraints significantly (p < 0.05)
As shown in Table 4, the strong invariance model for the Talking scale was rejected.
Modification indices showed that the constraints on the factor loadings for two items, “speak to
family members and friends on the phone” and “ask questions to get information” were the
largest contributors to the significant chi-square difference test. We estimated a model in which
these constraints were relaxed, permitting the loadings for these items to be freely estimated
across patients and surrogates. This partial invariance model [24, 28] was tenable. Table 5
presents the results of measurement invariance testing for the Comprehension scale. The strong
and strict invariance models were both tenable. For the Writing scale, shown in Table 6, the
strong invariance model was rejected. Modification indices showed that the constraints on the
thresholds for the item “dial a telephone number” were the strongest contributors to misfit. A
model that estimated separate PWA and SUR thresholds for this item provided support for partial
strong invariance. A partial strict invariance model that maintained free estimation of the
thresholds for this item also showed adequate fit and a non-significant chi-square difference test.
Having established measurement invariance for the three scales, we evaluated
agreement between self and surrogate reports in three ways. First, we inspected the correlations
between the PWA and SUR factor scores for each scale. The correlations were 0.71, 0.50, and
0.89 for Talking, Comprehension, and Writing, respectively, suggesting moderate-to-strong
relationships between self and surrogate reports in each domain.
Second, we further constrained the restricted invariance factor models described above to
test the equality of the means and variances between self and surrogate reports. For the Talking
and Writing scales, the models specifying equal PWA and SUR means were tenable, but the
models specifying equal variances were not (see Tables 4 and 6). In both cases, the SUR
distribution had higher variance. For the Comprehension scale, there were no significant
differences between PWA and SUR means or variances.
To evaluate the magnitude of individual PWA-SUR differences and their relationship to
overall level of reported functioning, we constructed Bland-Altman plots for each domain .
These plots, displayed in Figure 1Fig. , show the PWA-SUR difference as a function of the
average of the PWA and SUR scores, which serves as an estimate of the true level of functioning.
For the Talking and Writing scales, there was a weak, but statistically significant negative
correlation between the PWA-SUR difference and the average. This suggests that for PWA with
lower reported functioning, SUR participants tended to underestimate ability relative to PWA,
and for PWA with higher reported functioning, SUR participants tended to overestimate ability
relative to PWA. We also used the estimated reliability for each scale (Talking: 0.94;
Comprehension: 0.86; Writing: 0.93) to compute the 95% CI about the assumption of a null
difference between individual PWA and SUR score pairs. These confidence intervals are shown
in Figure 1. Cases falling outside these intervals showed statistically significant disagreement at
p < 0.05. Thirty-three percent of PWA-SUR differences were significant on the Talking scale,
26% were significant on the Comprehension scale, and 15% were significant on the Writing
Effects of Comprehension Impairment on Patient Responses
Finally, in order to evaluate whether comprehension impairment negatively affected
PWAs’ ability to provide meaningful responses, we conducted an additional series of factor
analyses. We included in these analyses only the 116 participants for whom we had PICA scores,
and we began by stratifying this sample into two approximately sub-groups based on
comprehension performance. Specifically, we divided the sample into groups with severe (n =
39), and mild or moderate (n = 77) comprehension impairments based on the average of their
raw scores on the PICA auditory and reading comprehension subtests.
We then evaluated measurement invariance between the severely impaired sub-sample
and the remaining participants, using an approach similar to that described above. This analysis
was motivated by the hypothesis that if comprehension impairment prevented participants with
severe aphasia from understanding and validly responding to the questions, this should be
reflected in non-invariant parameter estimates for the severe group compared to the rest of the
sample. Put differently, if participants with severe aphasia were responding based on incorrect
understanding of the items, the items’ positions relative to one another on the latent trait scale
and the relative strength of their relationships to the latent trait should be affected. The major
difference between the present analyses and the analyses of PWA-SUR invariance described
above was that in this case the subsamples were independent, permitting us to conduct
traditional multiple group analyses in which only one factor for each scale was specified. Also,
for these analyses, we tested only configural, weak, and strong invariance, because tests of strict
invariance are not particularly relevant for the this question.
The results of these analyses are presented in Table 7. For the Talking and
Comprehension scales, the chi-square difference tests were not significant, suggesting that
severity of comprehension impairment was not associated with reliable differences in factor
loadings or intercepts. For the Writing scale, the test was significant (p = 0.048). Inspection of
the modification indices revealed that the constrained intercepts for the item “communicate by
email” were the single largest contributor to model misfit. Participants with severe
comprehension impairment found this item to be harder (relative to the other items in the Writing
scale) than did the participants with mild-to-moderate comprehension impairment. With this
constraint relaxed, the chi-square difference test was no longer significant.
This is the first investigation of agreement between patient and proxy reports of communication
functioning in aphasia that has demonstrated measurement invariance of the scales in question, a
necessary precondition for making the comparison. The first aim of this study was to evaluate
whether self and surrogate-reported communication functioning can be measured on the same
unidimensional scale. We conducted a series of exploratory and confirmatory factor analyses to
reduce a large initial item pool to form three single-factor scales: Talking, Comprehension, and
Writing. The Comprehension scale demonstrated full strict measurement invariance between self
and surrogate reports. The Talking and Writing scales demonstrated partial strict invariance, after
relaxing cross-group equality constraints on a small number of parameters in each model.
The second aim of this study was to evaluate the level of agreement between self- and
surrogate-reported communication functioning. Correlations between PWA and SUR factor
scores for Talking (0.71) and Comprehension (0.50) were moderately strong, while the
correlation between Writing scores was stronger (0.89). This replicates the previous finding,
noted above [13, 16, 17], that patients and proxies show better agreement on reports of
functioning in more directly observable domains. Finally, we evaluated whether aphasic
comprehension impairment prevented participants with severe aphasia from responding
meaningfully to the items. Factor analyses of the ACOM scales using participant sub-samples
stratified by severity of comprehension impairment suggested that even the participants with the
most severe aphasia understood the questions sufficiently well to provide meaningful and
coherently related responses.
Regarding self- and surrogate agreement, testing of nested confirmatory factor models in
each domain further suggested that there was no average bias for surrogates to over- or under-
report functioning relative to PWA. This finding contrasts with prior reports that proxies are
generally biased to report lower functioning and/or well-being [13, 14, 17]. We also found that
surrogate-reported scores had higher variance than self-reported scores in two domains, Talking
and Writing. The Bland-Altman plots presented in Figure 1 offer perspective on this finding.
They show a weak but significant tendency for surrogates to assign more extreme scores than
PWA in both domains. Thus, for PWA with lower ability in a given domain, SUR reports tended
to result in lower score estimates and for PWA with higher ability, SUR reports tended to result
in higher score estimates.
The plots in Figure 1 also show that, despite the moderate-to-strong relationships
between self and surrogate reports, there was statistically significant disagreement in a
substantial number of individual cases. Although the present analyses do not establish the
clinical meaningfulness of the observed differences, we do note that for the Talking and
Comprehension scales, the standard deviation of the PWA-SUR differences (0.83 and 0.94,
respectively) was comparable to the standard deviation of the PWA scale scores (0.93 and 0.89,
respectively). Thus, despite the lack of overall bias and moderately strong association between
self and surrogate reports, we conclude that substituting the latter for the former is inadvisable.
The construction of invariant scales is necessary for direct comparisons of self and
surrogate reports, and is fundamental to any research directed at understanding the disagreements
between patients and their surrogate raters. However, invariant scales will in most cases be
shorter than scales that are not subject to this requirement. For example, the current Talking scale
contained 24 items demonstrating configural factorial invariance across self and surrogate
reports, out of the initial 50-item pool for the Talking domain. Had we not required invariance,
an additional 9 items would have been retained in the single-factor model for the SUR data.
Likewise, a unidimensional Comprehension scale based solely on the PWA data would have
retained all 29 items identified with that factor in the initial exploratory analysis.
This exclusion of item content in the service of measurement invariance has two potential
negative effects . First, it reduces reliability. On first consideration, this might not seem like
pressing concern, given the adequate reliability for group measurement of the three invariant
scales reported here. However, the ACOM scales are also intended for clinical use with
individual patients, which requires a minimum reliability of 0.90, or more preferably 0.95 .
Also, it is our intention to make the ACOM available in a computerized adaptive testing (CAT)
format where larger item banks are desirable. Second, item exclusion may reduce content
validity. This concern is particularly relevant to the ACOM Comprehension scale. Although the
initial exploratory factor analyses suggested that auditory and reading comprehension were
associated with a single factor for both PWA and SUR respondents, exclusion of all reading
comprehension items was necessary to obtain configural invariance. Thus, if the goal is not to
directly compare self and surrogate reports, but instead to measure outcomes for the purpose of
evaluating an intervention or a service delivery model, then the costs of achieving measurement
invariance may not be justified. Self and surrogate reports on non-invariant scales could still be
obtained and used as alternative perspectives on outcome that are not directly comparable to one
One limitation of this study concerns the exploratory nature of the
analyses used to derive the scales. In developing the ACOM, we cautiously proposed that a large
item pool with relatively diverse item content might nevertheless approximate unidimensionality
. This hypothesis was based on prior work with patient- and surrogate-reported scales of
communication functioning in aphasia [54-56] and factor-analytic studies of performance-based
language functioning in aphasia [57, 58]. However, initial analyses of the current data set clearly
disconfirmed this hypothesis. We elected therefore to pursue construction of modality-based
scales for the domains of Talking, Comprehension, and Writing. This inconsistency with prior
results may be due in part to the fact that the present investigation employed more rigorous tests
of dimensionality. In any case, the fact that each of the ACOM domain scales reported here were
constructed from larger initial item pools through exploratory analyses means that their good fit
to the measurement models tested here may have resulted from particular characteristics of the
participant sample and may not generalize to other samples. Thus, it will be important to cross-
validate these results in an independent sample.
Two other limitations concern the question of whether participants with severe aphasia
were able to understand the questions sufficiently well to provide meaningful responses. First, in
order to evaluate this question, it was necessary to split the sample into smaller subsamples,
resulting in increased estimation error for model parameters and lower power for detecting
differences between models. For this reason, our findings related to this question should be taken
as preliminary rather than definitive. Second, while the current sample did include individuals
with severe comprehension impairments, individuals with profound comprehension impairments
(i.e., with BDAE severity ratings of 0, indicating no usable speech or comprehension) were
excluded from the study. Thus, we do not claim based on our results that all persons with aphasia
can provide meaningful self-reports about their own communication functioning, but rather that
significant comprehension impairments do not necessarily prevent persons with aphasia from
responding meaningfully to well-constructed and administered questions.
A final limitation of the present study concerns the heterogeneity of the participant
sample with respect to time post-onset and frequency of contact between patients and surrogates.
Either of these variables could conceivably affect the factor structure of the instrument and/or
PWA-surrogate agreement. However, constraining our participant selection criteria with respect
to these variables would have made it difficult or impossible to obtain the sample size necessary
to address the questions of primary interest. However, post-hoc analyses of measurement
invariance with respect to time post-onset (≤ 36 months vs. > 36 months) suggested that factor
loadings and intercepts were consistent for all three scales (all chi-square difference test p-values
> 0.09). As with the analyses of comprehension severity, these results should be interpreted
cautiously because of the small sample size. Also, time post-onset did not correlate significantly
with signed or absolute patient-surrogate agreement for any of the scales (Pearson r’s ranged
from -0.25 to 0.14, all p’s > 0.12). Likewise, separate analysis of the PWA-surrogate pairs
reporting daily or more frequent contact produced results that were not materially different from
the full analyses reported above, and frequency of contact was not significantly correlated with
PWA-surrogate agreement. In any case, these issues remain important avenues for further
Despite these limitations, the current findings have important implications for the
development of patient- and surrogate-reported measures of communication functioning in
aphasia. First, it is clear from the present results that patient and surrogate reports represent
distinct perspectives and are not interchangeable. A second, related conclusion is that attempts to
develop interchangeable scales that are equivalent for patients and surrogates may result in scales
with restricted item content that may fail to capture the full range of relevant behavior and are
too brief to provide reliable measurement . It is therefore likely that future work on the
ACOM will de-emphasize efforts to develop parallel scales for patients and surrogates. Instead,
our focus will be on developing maximally reliable and valid scales for each source of report,
without requiring them to be directly comparable to one another.
Acknowledgements The authors gratefully acknowledge the assistance of Beth Friedman, Jessica Rapier, Mary
Sullivan, Brooke Swoyer, Neil Szuminsky, and Sandra Wright.
1. Darley FL. Aphasia. Philadelphia, PA: W.B. Saunders; 1982.
2. Chapey R, Hallowell B. Introduction to language intervention strategies in adult aphasia. In: Chapey R, ed.
Language intervention strategies in aphasia and related neurogenic communication disorders. Baltimore, MD:
Lippincott Williams and Wilkins; 2001: 3-17.
3. Kauhanen ML, Korpelainen JT, Hiltunen P et al. Aphasia, depression, and non-verbal cognitive impairment in
ischaemic stroke. Cerebrovasc Dis 2000;10:455-461.
4. Code C. Aphasia. In: Damico JS, Muller N, Ball MJ, eds. The handbook of speech and language disorders.
West Sussex, UK: Wiley-Blackwell; 2010: 317-338.
5. Williams LS, Weinberger M, Harris LE et al. Development of a stroke-specific quality of life scale. Stroke
6. Duncan PW, Wallace D, Lai SM et al. The stroke impact scale version 2.0: Evaluation of reliability, validity,
and sensitivity to change. Stroke 1999;30:2131-2140.
7. Doyle PJ, McNeil MR, Mikolic JM et al. The Burden of Stroke Scale (BOSS) provides valid and reliable score
estimates of functioning and well-being in stroke survivors with and without communication disorders. J Clin
8. Hilari K, Byng S, Lamping DL et al. Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39): evaluation of
acceptability, reliability, and validity. Stroke 2003;34:1944-1950.
9. Long A, Hesketh A, Paszek G et al. Development of a reliable self-report outcome measure for pragmatic trials
of communication therapy following stroke: the Communication Outcome after Stroke (COAST) scale. Clin
10. Chue WL, Rose ML. The reliability of the Communication Disability Profile: A patient-reported outcome
measure for aphasia. Aphasiology 2010;64:940-956.
11. Glueckauf RL, Blonder LX, Ecklund-Johnson E et al. Functional outcome questionnaire for aphasia: Overview
and preliminary psychometric evaluation. NeuroRehabilitation 2003;18:281-290.
12. Paul DR, Fratalli CM, Holland AL et al. Quality of communication life scale. Rockville, MD: The American
Speech-Language-Hearing Association; 2004.
13. Hilari K, Owen S, Farrelly SJ. Proxy and self-report agreement on the Stroke and Aphasia Quality of Life
Scale-39. J Neurol Neurosurg Psychiatry 2007;78:1072-1075.
14. Cruice M, Worrall L, Hickson L et al. Measuring quality of life: Comparing family members' and friends'
ratings with those of their aphasic partners. Aphasiology 2005;19:111-129.
15. Hesketh A, Long A, Bowen A. Agreement on outcome: Speaker, carer, and therapist perspectives on functional
communication after stroke. Aphasiology 2010;25:291-208.
16. Duncan PW, Lai SM, Tyler D et al. Evaluation of proxy responses to the Stroke Impact Scale. Stroke
17. Williams LS, Bakas T, Brizendine E et al. How valid are family proxy assessments of stroke patients' health-
related quality of life? Stroke 2006;37:2081-2085.
18. Sneeuw KC, Aaronson NK, de Haan RJ et al. Assessing quality of life after stroke. The value and limitations of
proxy ratings. Stroke 1997;28:1541-1549.
19. Rautakoski P, Korpijaakko-Huuhka A-M, Klippi A. People with severe and moderate aphasia and their partners
as estimators of communicative skills: A client-centred evaluation. Aphasiology 2008;22:1269-1293.
20. Skolarus LE, Sanchez BN, Morgenstern LB et al. Validity of proxies and correction for proxy use when
evaluating social determinants of health in stroke patients. Stroke 2010;41:510-515.
21. Doyle PJ, McNeil MR, Hula WD et al. The Burden of Stroke Scale (BOSS): Validating patient-reported
communication difficulty and associated psychological distress in stroke survivors. Aphasiology 2003;17:291-
22. Lomas J, Pickard L, Bester S et al. The communicative effectiveness index: development and psychometric
evaluation of a functional communication measure for adult aphasia. J Speech Hear Disord 1989;54:113-124.
23. Long A, Hesketh A, Bowen A. Communication outcome after stroke: a new measure of the carer's perspective.
Clin Rehabil 2011;23:846-856.
24. Meredith W, Teresi JA. An essay on measurement and factorial invariance. Med Care 2006;44:S69-S77.
25. Meredith W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 1993;58:525-
26. Borsboom D. The attack of the psychometricians. Psychometrika 2006;71:425-440.
27. Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: Suggestions,
practices, and recommendations for organizational research. Organizational Research Methods 2000;3:4-70.
28. Gregorich SE. Do Self-Report Instruments Allow Meaningful Comparisons Across Diverse Population
Groups?:Testing Measurement Invariance Using the Confirmatory Factor Analysis Framework. Med Care
29. Baas KD, Cramer AO, Koeter MW et al. Measurement invariance with respect to ethnicity of the Patient Health
Questionnaire-9 (PHQ-9). J Affect Disord 2011;129:229-235.
30. Sousa RM, Dewey ME, Acosta D et al. Measuring disability across cultures--the psychometric properties of the
WHODAS II in older people from seven low- and middle-income countries. The 10/66 Dementia Research
Group population-based survey. Int J Methods Psychiatr Res 2010;19:1-17.
31. Rivera-Medina CL, Caraballo JN, Rodriguez-Cordero ER et al. Factor structure of the CES-D and measurement
invariance across gender for low-income Puerto Ricans in a probability sample. J Consult Clin Psychol
32. Heckman BD, Berlin KS, Watakakosol R et al. Psychosocial headache measures in Caucasian and African
American headache patients: psychometric attributes and measurement invariance. Cephalalgia 2011;31:222-
33. Coster WJ, Haley SM, Ludlow LH et al. Development of an applied cognition scale to measure rehabilitation
outcomes. Arch Phys Med Rehabil 2004;85:2030-2035.
34. Haley SM, Coster WJ, Andres PL et al. Activity outcome measurement for postacute care. Med Care
35. Zhang B, Fokkema M, Cuijpers P et al. Measurement invariance of the center for epidemiological studies
depression scale (CES-D) among chinese and dutch elderly. BMC Med Res Methodol 2011;11:74.
36. Taylor ML. A measurement of functional communication in aphasia. Arch Phys Med Rehabil 1965;46:101-107.
37. Blomert L, Kean M-L, Koster C et al. Amsterdam-Nijmegen Everyday Language Test: Construction, reliability
and validity. Aphasiology 1994;8:381-407.
38. Lincoln NB. The speech questionnaire: An assessment of functional langauge ability. International
Rehabilitation Medicine 1982;4:114-117.
39. Holland AL, Frattali C, Fromm D. Communication activities of daily living. 2nd ed. Austin, TX: Pro-Ed; 1999.
40. Bayles KA, Tomoeda CK. Functional linguistic communication inventory. Phoenix: Canyonlands; 1994.
41. Frattali CM, Thompson CK, Holland AL et al. The Amercian Speech-Language-Hearing Association functional
assessment of communication skills for adults (ASHA FACS). Rockville, MD: American Speech-Language
Hearing Association; 1995.
42. Holland AL. Communicative activities in daily living. Baltimore: University Park Press; 1980.
43. Wirz S, Skinner C, Dean E. Revised Edinburgh Functional Communication Profile. Tucson, AZ:
Communication Skill Builders; 1990.
44. Frattali CM. Functional assessment of communication: Merging public policy with clinical views. Aphasiology
45. Doyle PJ, McNeil MR, Le K et al. Measuring communicative functioning in community dwelling stroke
survivors: Conceptual foundation and item development. Aphasiology 2008;22:718-728.
46. Bayles KA, Tomoeda CK. Arizona Battery for Communication Disorders of Dementia. Tucson, AZ:
Canyonlands Publishing, Inc.; 1993.
47. Sheikh JI, Yesavage JA. Geriatric Depression Scale (GDS) Recent Evidence and Development of a Shorter
Version. In: Brink TL, ed. Clinical Gerontology: A Guide to Assessment and Intervention. New York:
Hawthorn Press; 1986: 165-173.
48. Porch B. Porch index of communicative ability. Albuquerque, NM: PICA Programs; 2001.
49. Muthén LK, Muthén BO. Mplus User's Guide. Fifth ed. Los Angeles, CA: Muthén & Muthén; 2007.
50. South SC, Krueger RF, Iacono WG. Factorial Invariance of the Dyadic Adjustment Scale across Gender.
Psychol Assess 2009;21:622-628.
51. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical
measurement. Lancet 1986;1:307-310.
52. McHorney CA, Fleishman JA. Assessing and understanding measurement equivalence in health outcome
measures. Issues for further quantitative and qualitative inquiry. Med Care 2006;44:S205-S210.
53. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994.
54. Doyle PJ, Hula WD, McNeil MR et al. An application of Rasch analysis to the measurement of communicative
functioning. J Speech Lang Hear R 2005;48:1412-1428.
55. Donovan NJ, Rosenbek JC, Ketterson TU et al. Adding meaning to measurement: Initial Rasch analysis of the
ASHA FACS Social Communication Subtest. Aphasiology 2006;20:362-373.
56. Rodriguez A, Donovan NJ, Velozo CA et al. Measurement properties of the functional outcomes questionnaire
for aphasia. Presented to the Clinical Aphasiology Conference, Scottsdale, AZ.Clinical Aphasiology
Conference . 2007.
57. Schuell H, Jenkins JJ, Carrol JB. A factor analysis of the Minnesota Test for the Differential Diagnosis of
Aphasia. J Speech Hear Res 1962;5:350-369.
58. Clark C, Crockett DJ, Klonoff H. Factor analysis of the porch index of communication ability. Brain Lang
Fig. 1 Bland-Altman plots for each ACOM scale. The plots for Talking and Writing demonstrate a weak but
significant tendency for SUR respondents to give more extreme scores than PWA respondents. The dashed lines in
each plot mark the 95% CI about the assumption of null PWA-SUR difference. Points outside these lines indicate
significant disagreement at p < 0.05
Table 1. Demographic characteristics of the study sample.
Persons with Aphasia Surrogates
Age in Years, mean (sd) 60 (14) 60 (13)
Gender, % male 63% 26%
African American 9%
Native American 1%
Primary/Middle School 0% 8%
High School 24% 22%
Some College 34% 35%
College Graduate 20% 18%
Post-Graduate Degree 14% 17%
Missing 9% 0%
Surrogate Relationship with PWA
Other Relative 1%
Frequency of Contact with PWA
Length of Relationship in Years, mean (sd) 38.5 (15)
Currently Married or Cohabitating 64%
Divorced or Separated 22%
Never Married 11%
Table 2. Clinical characteristics of the participants with aphasia.
Months Post-Onset of Aphasia, median (min-max) 49 (1-507)
PICA Overall Score Percentile, median (min-max) 67 (18-92)
% of PWA with concomitant
Motor Speech Disorder 44%
BDAE Severity Rating
Table 3. Factor correlations from exploratory factor analyses of 101 ACOM items in the PWA and SUR samples.
The PWA correlations are presented below the diagonal and the SUR correlations are presented above the diagonal.
*p < 0.05
Talking Writing Comprehension
Talking 0.48* 0.46*
Writing 0.42* 0.23*
Comprehension 0.49* 0.13
Table 4. Results of factorial invariance testing for the ACOM Talking domain.
DIFFTEST χ2 p-
Model CFI TLI RMSEA WRMR
Loadings & thresholds
165 na na 0.971 0.988 0.065 0.877
2. Strong: Loadings &
121 0.011 1 0.970 0.988 0.065 0.902
2a. Partial Strong 126 0.136 1 0.971 0.988 0.064 0.893
3. Strict: Residual
99 0.150 2a 0.973 0.989 0.062 0.969
4. Equal means 98 0.913 3 0.974 0.989 0.062 0.969
5. Equal variances 97 <0.001 4 0.959 0.983 0.081 1.104
Table 5. Results of factorial invariance testing for the ACOM Comprehension domain.
DIFFTEST χ2 p-
Model CFI TLI RMSEA WRMR
1. Configural: Loadings &
69 na na 0.992 0.995 0.030 0.715
2. Strong: Loadings &
52 0.263 1 0.991 0.995 0.032 0.770
3. Strict: Residual
42 0.139 2 0.988 0.993 0.037 0.845
4. Equal means 41 0.550 3 0.989 0.993 0.035 0.848
5. Equal variances 40 0.329 4 0.991 0.994 0.034 0.868
Table 6. Results of factorial invariance testing for the ACOM Writing domain.
DIFFTEST χ2 p-
Model CFI TLI RMSEA WRMR
1. Configural: Loadings &
111 na na 0.981 0.993 0.070 0.805
2. Strong: Loadings &
79 <0.001 1 0.976 0.992 0.074 0.864
2a. Partial Strong: 84 0.097 1 0.980 0.993 0.069 0.832
3. Partial Strict: Residual
70 0.241 2a 0.984 0.994 0.065 0.911
4. Equal means 69 0.072 3 0.983 0.994 0.066 0.920
5. Equal variances 68 0.040 4 0.981 0.993 0.072 0.970
Table 7. Results of factorial invariance testing across comprehension severity groups.
Model CFI TLI RMSEA WRMR
Talking 1. Configural:
140 na na 0.965 0.974 0.098 1.194
2. Strong: Loadings
96 0.625 1 0.968 0.978 0.091 1.239
Comprehension 1. Configural:
56 na na 0.994 0.995 0.032 0.876
2. Strong: Loadings
40 0.719 1 1.000 1.001 0.000 0.967
Writing 1. Configural:
96 na na 0.980 0.985 0.106 1.109
2. Strong: Loadings
64 0.048 1 0.978 0.985 0.106 1.260
2a. Partial Strong: 66 0.088 2 0.979 0.985 0.103 1.239
Online Resource 1
Article: Self- and Surrogate Reported Communicative Functioning in Aphasia
Journal: Quality of Life Research
Authors: Patrick J. Doyle, William D. Hula, Shannon N. Austermann Hula, Clement A. Stone, Julie L. Wambaugh, Katherine B.Ross,
James B. Schumacher
Corresponding Author: William D. Hula, VA Pittsburgh Healthcare System, email@example.com
Table Item content and factor loadings for the 101 items included in the exploratory factor analyses described in the Analysis and
Results section. Only loadings ≥ 0.4 are shown. Items retained in the three 1-factor models for the Talking, Comprehension, and
Writing domains are indicated by bolded item content and factor loadings.
Item Content PWA
talk about your health concerns
with family members
talk to someone you don't know 0.847 0.861
start a conversation with other
speak to family members and
friends on the phone
ask questions to get information 0.778 0.802
explain how to do something 0.776 0.574
talk about your day with family or
start a new topic in conversation 0.759 0.917
talk about your hobbies and
talk with a group of people 0.749 0.761
find the words you want to say
make yourself understood when you
speak with strangers
correct yourself when people do not
make yourself understood when you
speak with family or friends
talk if you are stressed or under
ask for information over the phone 0.722 0.682
talk on the telephone 0.721 0.763
share opinions 0.719 0.911
have a conversation with family and
say your name 0.686 0.624
say the names of food items 0.678 0.776
make appointments on the phone 0.674 0.557 0.448
talk about your past (e.g., childhood,
tell people about yourself 0.667 0.862
communicate at family gatherings 0.661 0.710
correct mistakes you make when
tell a story 0.660 0.780
say the names of clothing items 0.651 0.766
talk about current events that you are 0.646 0.655
say “thank you" and "you're
keep a conversation going 0.637 0.954
say the names of body parts 0.628 0.620
talk about your future plans with
family or friends
introduce yourself 0.596 0.734
answer questions about yourself 0.590 0.702
tell people how you feel 0.572 0.642
talk to your closest family member or
read words aloud 0.519 0.440
greet people appropriately (e.g., Hi,
how are you?)
0.499 0.459 0.434
say the names of common objects
(e.g., bed, lamp, pencil)
respond to greetings 0.486 0.421 0.573
call friends by name 0.474 0.691
order food in a restaurant 0.439
tell people what you like and dislike 0.420 0.471
get your point across when you are
upset or angry
understand someone you don't know 0.414 0.528
say your address 0.413 0.603 0.426
say what day of the week it is 0.406 0.591
say what month it is 0.400 0.648
write a simple “to do" list
write a shopping list 0.877 0.778
write simple messages
write messages in greeting cards
write your social security number
write your address
fill out simple forms
write a personal letter
write your phone number 0.711 0.903 0.401
write your name
write down a phone message
fill out complex forms 0.616 0.697
communicate by e-mail
dial a telephone number
buy things at a store 0.479 0.420
use a calendar to plan and keep track
0.454 0.587 0.506
understand medicine labels 0.428 0.522 0.541 0.548
recognize the names of common
objects when someone says them
understand popular sayings (e.g., It's
raining cats and dogs)
recognize your name in print 0.602 0.769 0.562
recognize the names of family
members when someone says
follow TV shows
follow simple spoken requests (e.g.,
pass the salt)
tell time 0.456 0.711 0.627
recognize your name when called 0.407 0.693 0.680
understand price tags 0.450 0.680 0.655
follow spoken instructions
follow conversation about familiar
follow a story someone tells
understand restroom signs 0.580 0.626 0.622
understand what the doctor tells
understand a single written word 0.595 0.669
read traffic signs 0.519 0.579 0.482
understand jokes and funny stories
follow simple written instructions 0.426 0.550 0.470
express agreement or disagreement 0.550
understand your closest family
member or friend when they
talk to you
follow spoken directions 0.512 0.484
communicate your basic needs
(hunger, restroom, pain,
let people know if you understand
follow group conversation
follow conversation about
call family members by name 0.587
ask for clarification when you do not
make your wants and needs known 0.443
understand a fast-paced conversation 0.847
understand people when you are
stressed or under pressure
answer yes/no questions 0.544
recognize when people do not 0.540
Online Resource 2
Article: Self- and Surrogate Reported Communicative Functioning in Aphasia
Journal: Quality of Life Research
Authors: Patrick J. Doyle, William D. Hula, Shannon N. Austermann Hula, Clement A. Stone,
Julie L. Wambaugh, Katherine B.Ross, James B. Schumacher
Corresponding Author: William D. Hula, VA Pittsburgh Healthcare System,
Table Item content and proportion of cases with missing data for items excluded from the
analyses based on the missing data exclusion criterion of >5%.
Item Content APHmissing SURmissing
read bus/train schedules 0.58 0.72
talk to your minister, priest, or rabbi about things that
buy things on the internet 0.53 0.62
get help in an emergency 0.53 0.57
give a speech to a group of people 0.44 0.57
use the computer for work/school tasks 0.44 0.47
write reports for work/school 0.41 0.43
take notes during meetings/classes 0.36 0.40
meet the communication needs of your job or school 0.35 0.33
follow written work procedures 0.32 0.37
understand work/school documents 0.32 0.37
follow religious services 0.31 0.38
contribute to class/business discussions 0.31 0.34
have a conversation with coworkers/classmates at work
use the internet to get information 0.29 0.38
use an ATM machine 0.28 0.41
write a business letter 0.27 0.32
understand business/class discussions 0.27 0.27
understand computer icons 0.26 0.38
use a computer at home 0.26 0.33
follow radio news programs 0.23 0.46
pick out greeting cards for different occasions 0.20 0.31
use a road map 0.20 0.44
read names on a ballot to vote 0.20 0.30
read grocery store coupons and flyers 0.17 0.37
understand a TV schedule 0.17 0.21
make business calls 0.17 0.19
look up a number in the telephone book 0.14 0.39
follow an instruction manual 0.14 0.26
talk about books that you have read 0.14 0.23
order something over the phone 0.14 0.12
discuss your medications with the pharmacist 0.13 0.31
pay bills 0.13 0.20
read a book for pleasure 0.12 0.20
talk about politics 0.12 0.11
tell a joke 0.11 0.19
make transactions with a bank teller 0.09 0.26
manage your personal finances 0.08 0.14
read product labels 0.08 0.18
understand legal documents, such as a will or advanced
use a credit/debit card to buy things 0.07 0.17
use cash to buy things 0.07 0.08
ask for help from family or friends 0.07 0.04
read food labels 0.06 0.18
understand medical insurance information 0.06 0.17
understand your bank/credit card statements 0.06 0.14
count change at the store 0.05 0.14
follow TV news programs 0.05 0.09
follow therapy instructions 0.05 0.09
leave a message on an answering machine 0.05 0.07
follow driving directions 0.05 0.17
make small talk with neighbors 0.05 0.06
ask for information from store employees 0.04 0.20
introduce friends by name 0.04 0.11
discuss family matters with your spouse and children 0.04 0.07
follow movies 0.04 0.05
read street name signs 0.03 0.10
talk about current/previous work 0.03 0.08
talk about movies that you have seen 0.03 0.05
have a conversation with strangers 0.03 0.05
44 Download full-text
spell your whole name out loud 0.02 0.26
understand newspaper headlines 0.02 0.10
understand humor in pictures (e.g., comics, photographs) 0.02 0.09
introduce family members by name 0.02 0.06
read sentences aloud 0.02 0.11
understand warning signs (e.g., slippery floor, "do not
add and subtract 0.02 0.09
read signs in a store to find what you need 0.02 0.08
say your social security number 0.02 0.07
understand conversation in a noisy place (e.g., party,
tell people why you can't talk very well 0.01 0.10
recognize your address when someone says it 0.01 0.05
understand magazine/newpaper articles 0.01 0.05
explain your health concerns to your doctor 0.00 0.09
explain how to get somewhere 0.00 0.05
say your phone number 0.00 0.05