Validity of DSM-IV Axis V Global Assessment of Relational Functioning Scale

Article (PDF Available)inThe Journal of nervous and mental disease 197(1):50-5 · January 2009with90 Reads
DOI: 10.1097/NMD.0b013e3181923ca1 · Source: PubMed
Abstract
We investigate the convergent validity of the DSM-IV Axis V Global Assessment of Relational Functioning Scale (GARF; American Psychiatric Association, 1994). This study included 79 patients at a university-based outpatient treatment clinic. We examined clinician-rated GARF and the relationship to self-reported (Inventory of Interpersonal Problems; IIP-C; Horowitz et al. 2000) and free response themes [Social Cognition and Object Relations Scale: SCORS; Hilsenroth, Stein & Pinsker, 2004; Westen, 1995] of interpersonal functioning. Clinician ratings of the GARF scale and SCORS variables were highly reliable and internally consistent. Convergent Validity among the GARF, SCORS, and IIP scores was calculated using a Principal Components Analysis and Confirmatory Factor Analysis (CFA). Results of the Principal Components Analysis revealed that the GARF, SCORS, and IIP scores converged on a single factor, although findings of the CFA did not fully confirm the 1 factor model originally proposed. Intercorrelations among the GARF, SCORS, and IIP variables were analyzed and a pattern of significant relationships was found between the GARF and SCORS variables. This study helps support the convergent validity GARF as a relational functioning measure and is one of the first investigations to examine this scale multidimensionally.

Figures

ORIGINAL ARTICLE
Validity of DSM-IV Axis V Global Assessment of Relational
Functioning Scale
A Multimethod Assessment
Michelle B. Stein, PhD,* Mark Hilsenroth, PhD,† Janet H. Pinsker-Aspen, PhD,†
and Louis Primavera, PhD‡
Abstract: We investigate the convergent validity of the DSM-IV Axis V
Global Assessment of Relational Functioning Scale (GARF; American
Psychiatric Association, 1994). This study included 79 patients at a univer-
sity-based outpatient treatment clinic. We examined clinician-rated GARF
and the relationship to self-reported (Inventory of Interpersonal Problems;
IIP-C; Horowitz et al. 2000) and free response themes Social Cognition and
Object Relations Scale: SCORS; Hilsenroth, Stein & Pinsker, 2004; Westen,
1995 of interpersonal functioning. Clinician ratings of the GARF scale and
SCORS variables were highly reliable and internally consistent. Convergent
Validity among the GARF, SCORS, and IIP scores was calculated using a
Principal Components Analysis and Confirmatory Factor Analysis (CFA).
Results of the Principal Components Analysis revealed that the GARF,
SCORS, and IIP scores converged on a single factor, although findings of the
CFA did not fully confirm the 1 factor model originally proposed. Intercor-
relations among the GARF, SCORS, and IIP variables were analyzed and a
pattern of significant relationships was found between the GARF and
SCORS variables. This study helps support the convergent validity GARF as
a relational functioning measure and is one of the first investigations to
examine this scale multidimensionally.
Key Words: multimethod assessment, relational functioning, SCORS,
GARF, IIP.
(J Nerv Ment Dis 2009;197: 50 –55)
T
he Global Assessment of Relational Functioning Scale (GARF)
was originally developed as a way to separate relational func-
tioning from overall psychologic functioning and symptomatology
as measured by the Global Assessment of Functioning Scale (GAF;
APA, 1994; GAP, 1996; Yingling et al. 1998). In addition, the
GARF was initially designed as a measure of relational health and
dysfunction in couples, as well as in family systems, and not
originally intended to assess individual pathology (Ross & Doherty,
2001). The GARF is a clinician-rated measure that is scored on a 0
to 100 point scale where lower scores are more indicative of
maladaptive relational patterns with family, friends, or significant
others. The GARF allows the clinician to assess the degree to which
a family or other ongoing relational unit meets the affective or
instrumental needs of its members in the areas of Problem Solving
(adaptability to stress, communication skills, negotiating goals),
Organization (distribution of power, control, responsibility, and the
role of interpersonal boundaries), and Emotional Climate (tone as
well as a range of feelings, quality of caring, empathy, attachment,
values, respect, sexuality functioning, and role of intimacy; (APA,
1994, p. 758).
One of the first studies to examine the construct validity of the
GARF was conducted by the Group for the Advancement of Psychiatry
Committee on the Family (GAP) that rated families and couples on
relational severity and demonstrated that lower, more maladaptive,
GARF scores were associated with families and couples who had more
severe interpersonal concerns (GAP, 1996). Likewise, Dausch et al.
(1996) had experienced master level therapists rate videotapes sessions
of patients with varying mood disorders as well as the patient’s families
and GARF scores were found to be independent of mood severity.
Thus, both of these initial studies demonstrated that, as intended, the
GARF was measuring something other than overall psychological
functioning or symptomatology.
Ross and Doherty (2001) used the GARF as a treatment
outcome measure for couples and families and found that GARF
change scores were positively correlated with patient and therapist
reported changes and client satisfaction. Additionally, pretreatment
GARF negatively correlated with number of sessions. Additional
research has focused on the reliability of the GARF ratings across
levels of clinician experience and/or education (marriage and family
therapy interns and supervisors; Rosen et al. 1997). Findings dem-
onstrated that higher supervisor GARF ratings (i.e., healthier) were
associated with fewer sessions and also able to differentiate between
“at risk” and “nonrisk” groups for dangerousness, violence, and
child abuse. Mottarella et al. (2001) also found that doctoral student
clinicians can reliably rate the GARF across families or couples.
Extending this previous work Wilkins and White (2001) found that
suitably trained master’s and undergraduate coders achieved excel-
lent levels of reliability, and that GARF ratings correlated with
self-reported measures of general family functioning, marriage qual-
ity, and the ability to identify behaviors and symptoms.
More recently, clinical psychologists used the GARF to
assess patients seeking individual therapy. Hay et al. (2003) con-
ducted a 2-year follow-up study evaluating outcome, including
patient satisfaction. Results indicated higher GARF ratings were
associated with participants who had regular weekly family contact
since admission and were also reflective of patients who had greater
social support, or more family availability. Hilsenroth et al. (2000)
examined the reliability and convergent/discriminant validity of the
GARF on a sample of community outpatients. High interrater
reliability was established and results indicated that the GARF was
significantly related to clinician ratings of axis II pathology. In a
related study examining treatment outcome, Hilsenroth et al.
(2003) examined GARF ratings of patients diagnosed with de-
pression and found that GARF as well as patient self-report
scores significantly improved during treatment.
Currently, the convergent validity of the DSM-IV Axis V
scales has been used with an expanded version of the Social
From the *Massachusetts General Hospital and Harvard Medical School, Psy-
chology Assessment Center, 1 Bowdoin Square, 7th Floor, Boston, Massa-
chusetts; †Derner Institute of Advanced Psychological Studies, Adelphi Uni-
versity, Garden City, New York and ‡Graduate School of Psychology, Touro
College, New York, New York.
Send reprint requests to Michelle Stein, PhD, Massachusetts General Hospital and
Harvard Medical School, Psychology Assessment Center, 1 Bowdoin Square,
7th Floor, Boston, Massachusetts, 02114-2919. Email: mstein@partners.org.
Copyright © 2009 by Lippincott Williams & Wilkins
ISSN: 0022-3018/09/19701-0050
DOI: 10.1097/NMD.0b013e3181923ca1
The Journal of Nervous and Mental Disease Volume 197, Number 1, January 200950
Cognition and Object Relations Scale (SCORS; Hilsenroth et al.
2004; Westen, 1995). Peters et al. (2006) rated relational narratives
in psychotherapy sessions using the GARF and results indicated a
significant relationship with the SCORS variables. Fowler et al.
(2004) investigated treatment outcome of intensive treatment using
both behavioral and implicit measures. Lower, more maladaptive,
GARF scores were associated with higher, more maladaptive, pri-
mary process aggression, enmeshment and lower health index
scores. Also higher, more adaptive, SCORS ratings were reflective
of higher, more adaptive, GARF ratings and GARF scores changed
significantly between pre- and posttreatment. To summarize, these
studies established the reliability and convergent validity of the
GARF and SCORS as measures of relational functioning rated from
narratives told in patient treatment and demonstrate that the GARF
is a sensitive measure that can be used to assess change in individual
therapy.
The present study intends to further investigate the GARF as
a measure of overall interpersonal functioning. Also, we aim to
assess the multimethod assessment of relational functioning using
clinician-rated (Therapist and External Rated GARF), self-report
(Inventory of Interpersonal Problems; Horowitz et al.; IIP-C), and
free response narrative (SCORS) measures. This study is unique in
many ways. First, the degree to which clinician-rated measures of
relational functioning (GARF) is related to self-reported interper-
sonal distress (IIP-C) and thematic narratives evoked in early child-
hood recollections (SCORS) will be assessed. Second, there has
been limited research focusing on multimethod assessment, specif-
ically using implicit and explicit measures to assess various facets of
relational functioning.
METHODS
Participants
In regard to any future review or meta-analysis, it is important
to note that although the present study represents different analyses
with a larger sample, the sample in Hilsenroth et al. (2000) repre-
sents a subsample of the data presented here. Participants in this
study included 79 patients at a university-based outpatient treatment
clinic. As can be observed in Table 1, patients were predominantly
female and single. The mean age for the sample was 29 (SD 10.7).
This sample consisted of primarily mood disordered patients with
relational problems manifested in either Axis II or subclinical
traits/features of Axis II. Finally, 3 patients were not included in the
combined analyses among the IIP, GARF, and SCORS as they did
not complete the IIP. However, the GARF and SCORS ratings of
these patients were used in analyses.
Procedure
Patients entering treatment were asked to participate in a
research project and no one was excluded based on particular
diagnosis, comorbidity, etc. Therapists were assigned cases based on
availability, caseload, etc. as is routine practice for an outpatient
clinic. Patients who agreed to participate in this project filled out an
informed consent before engaging in the research study.
The clinicians who conducted the psychologic assessment
and psychotherapy sessions were 18 advanced doctoral students (6
man and 12 woman) enrolled in an APA-approved Clinical PhD
program. The clinicians who completed the assessment received a
minimum of 3.5 hours of supervision per week on assessment data,
clinical interventions, organization of feedback session, and weekly
review of videotaped case material. Further details of the method-
ology and procedures used in this assessment process are described
more fully elsewhere (Hilsenroth, 2007).
GARF raters were 2 advanced graduate students enrolled in
an APA approved Clinical Psychology doctoral program (or 1
advanced doctoral student and 1 clinical psychologist). Before
completing GARF ratings in the current study, coders underwent
supervised training on the scale. External raters and clinicians rated
the GARF after training took place. The SCORS was used to rate
early childhood recollections from each patient. SCORS raters were
2 advanced graduate students enrolled in an APA approved Clinical
Psychology doctoral program. Details regarding the establishment of
the SCORS interrater reliability obtained in this study are described
elsewhere (Stein et al. 2007).
Measures
GARF ratings were documented on a DSM-IV Multiaxial
Evaluation Report Form (MERF, APA, 1994) and were based on
video-recorded semistructured interviews during the assessment and
feedback session. The patient’s therapist and an external rater scored
the GARF scale. Three GARF ratings were calculated (Therapist,
External Rater, and Combined Score). External Raters, who were
advanced graduate students and/or a PhD clinical supervisor,
watched videotape of psychotherapy sessions and completed the
GARF immediately. They were blind to all clinician ratings. Spe-
cific early childhood narratives (8 in total) were elicited at the end of
the clinical interview (Fowler et al. 1995). Early childhood recol-
lections were obtained from the patient verbally. That is, the clini-
cian was present and prompted the patient for specific queries and
relevant themes. This material was videotaped and transcribed
verbatim. Additionally, the therapist’s supervisor reviewed the vid-
eotape to ensure that the early childhood recollections were being
administered correctly and consistently. The Inventory of Interper-
sonal Problems along with other self-reports was given the session
after the initial intake interview.
Data Analysis
The interrater reliability of the GARF and SCORS were
calculated using the One-Way Random Effects Model Intraclass
TABLE 1. Demographic Information (N 79)
Variable N %
Gender
Male 23 (29%)
Female 56 (71%)
Mean age (SD) 29.4 (10.7)
Marital Status
Single 49 (62%)
Married 15 (19%)
Divorced 14 (18%)
Widowed 1 (1%)
Primary axis I diagnosis
Adjustment disorder 9 (11%)
Anxiety disorder 12 (15%)
Eating disorder 2 (2%)
Mood disorder 44 (56%)
Substance-related disorder 1 (1%)
V Code relational problem 10 (13%)
None 1 (1%)
Axis II diagnosis 46 (58%)
Axis II trait/features 13 (16%)
Psychiatric severity
M SD
Intake axis V GAF 59.8 5.9
SCL-GSI (and SD) 1.1 0.6
SCL-GSI indicates Global Severity Index of the Symptom Checklist 90 Revised.
The Journal of Nervous and Mental Disease Volume 197, Number 1, January 2009 GARF and Multimethod Assessment
© 2009 Lippincott Williams & Wilkins 51
correlation coefficient (ICC), Spearman Brown Corrected ICC One-
Way Random Effects Model and Coefficient Alpha. Shrout and
Fleiss (1979) reported the magnitude for interpreting ICC values
where poor is less than 0.40, fair ranges from 0.40 to 0.59, good
ranges from 0.60 to 0.74, and excellent is above 0.74.
A principal components analysis with orthogonal rotation was
conducted to examine whether these 3 different measures of rela-
tional functioning were a unitary factor. The number of factors
retained was determined by the inspection of eigenvalues, percent
variance, and scree plot. Bryant and Yarnold (1995) state that
“variables with factor loading coefficients of at least 0.30 in absolute
value ‘as loading on the eigenvector’ and thus as worthy of consid-
eration in the interpretation of the meaning of the eigenvector
(p.106).” Additionally, the Pearson r correlation was used to examine
the convergent validity and intercorrelations between each measure.
A Confirmatory Factor Analysis was then calculated to eval-
uate whether the Principal Components (PCA) factor model pro-
vided a good fit to the current data. CFA helps confirm and compare
factor structures from the PCA (Fabrigar et al. 1999; Floyd &
Widamen, 1995).
CFA was conducted with AMOS Version 5.0 using maximum
likelihood estimates derived from the covariance matrix. Several
statistics were used in examining the models as no single index can
adequately assess the goodness of fit of a measurement model alone
(Bollen, 1989; Hoyle & Panter, 1995; Hu & Bentler, 1995; Wilde et
al. 2003). The following model fit statistics were examined: Tucker-
Lewis Index (TLI; Tucker & Lewis, 1973), Comparative Fit Index
(CFI: Bentler, 1990), and the root-mean square error of approxima-
tion (RMSEA; Steiger, 1990). TLI ranges from 0.00 to 1.00,
whereas larger values indicate a better fit. TLI and CFI are consid-
ered a good fit if scores are above 0.90 (Bentler & Bonett, 1980).
Wilde et al. posit that RMSEA assesses “the discrepancies between
elements of the model fitted to the sample and the model fitted to the
population covariance matrix (p.59).” Values below 0.08 indicate a
reasonable model fit. The closer the value is to zero, the better the model
fit.
RESULTS
Therapist, External Rated, and Combined (i.e., Therapist and
External Rater) GARF scores reflected a moderate range of pathol-
ogy (i.e., 4060) within the sample and consistent with a clinical
outpatient sample (Table 2). The mean SCORS ratings of 3 to 4
reflect a mild to moderate range of pathology within the sample of
early childhood recollections (Table 2).
Interrater Reliability
ICCs were used to calculate reliability for GARF ratings.
Therapist, External Rater, and Combined scores were used in anal-
yses. Intraclass Correlation Coefficient (1) fell in the “good” range
for the GARF scale (Table 2). The Spearman Brown (1, 2) ICC
corrected value fell in the “excellent” range (Table 2). In summary,
the GARF ratings used in this study were highly reliable and
internally consistent.
Intraclass Correlation Coefficients (1) fell in the good to
excellent range for COM, AFF, Emotional Investment in Relation-
ships (EIR), SC, Experience and Management of Aggressive Im-
pulses (AGG), SC and Identity and Coherence of Self (ICS), and in
the “fair” range for Emotional Investment in Value and Moral
Standards (EIM). The Spearman Brown (1, 2) ICC corrected values
fell in the excellent range for all SCORS variables. All Coefficient
Alpha’s were above 0.75 (Table 2). The SCORS-Composite
(SCORS-C) score and individual SCORS variables were used in
analyses. SCORS-C is the average of the 8 SCORS variables. In
summary, the ratings of the SCORS variables used in this study were
highly reliable and internally consistent.
Convergent Validity Between GARF-C,
SCORS-C, and IIP-Total
A principal components analysis with orthogonal rotation was
conducted for SCORS-Composite, GARF-Combined (Therapist and
External Rater) and IIP-Total scores to examine whether these 3
different measures of relational functioning were a unitary factor.
Results indicated a 1 factor model that had an eigenvalue of 1.2
accounting for 42% of the variance. The primary loading was the
SCORS ratings of early childhood narratives (SCORS-C 0.84,
GARF-C 0.63, and IIP-Total ⫽⫺0.40). Please note that the
negative IIP value is expected as increased scores indicate greater
pathology whereas higher GARF and SCORS ratings are reflective
of less pathology. The 3 measures of relational functioning used in
this study converged on a single factor thus supporting our hypoth-
esis. Results of intercorrelations demonstrated that GARF-C was
significantly related to SCORS-C (r 0.23, P 0.05). That is,
higher, more adaptive, combined GARF scores were reflective of
healthier SCORS narratives (Table 3).
We then conducted a second analysis by separating GARF-
Combined into GARF-Therapist and GARF-External (GARF-V)
component ratings, as we wanted to examine whether the factor
structure would change by having different observer ratings (i.e.,
patient therapist vs. external rater). A principal components analysis
with orthogonal rotation was calculated between SCORS-C,
GARF-T, GARF-V, and IIP-Total. Factor 1 had an eigen value of
1.9 accounting for 47% of the variance and factor 2 had an eigen
value of 1.1 accounting for 29% of the variance. Orthogonal Factor
Solutions indicated that Factor 1 had primary loadings for GARF-T
and GARF-V (0.95 and 0.91 respectively). SCORS-C also loaded on
this factor (0.31). Factor 2 had primary loadings for IIP-Total (0.83);
and SCORS-C was also high (0.70). Results of intercorrelations
TABLE 2. Interrater Reliability of the GARF and SCORS
Variables for Early Childhood Recollections (N 79; 569
Early Childhood Recollections)
Mean SD ICC (1) ICC (1, 2)
Coefficient
Alpha
GARF scales
GARF-T 52.8 11.6
GARF-V 49.2 9.3
GARF-C 51.0 9.9 0.71 0.83 0.87
SCORS variables
COM 3.2 0.60 0.60 0.75 0.75
AFF 3.9 0.98 0.84 0.91 0.91
EIR 3.7 0.91 0.71 0.83 0.83
EIM 3.7 0.46 0.59 0.74 0.75
SC 3.5 0.79 0.66 0.80 0.80
AGG 3.7 0.54 0.72 0.84 0.84
SE 3.7 0.61 0.63 0.77 0.78
ICS 4.5 0.62 0.68 0.81 0.82
MEAN 3.7 0.69 0.68 0.81 0.81
ICC indicates intraclass correlation coefficient, (1) Model 1; One-way random
effect, (1, 2), Model 1, 2 raters; Spearman Brown Correction for One-way random
effect, AFF, affective quality of representations; EIR, emotional investment in relation-
ships; EIM, emotional investment in moral standards; SC, understanding of social
causality; AGG, experience and management of aggressive impulses; SE, self-esteem;
ICS, identity and coherence of self; GARF-T, therapist GARF ratings; GARF-V,
external rater from video GARF; GARF-C, average of therapist and external rater
GARF ratings.
Stein et al. The Journal of Nervous and Mental Disease Volume 197, Number 1, January 2009
© 2009 Lippincott Williams & Wilkins52
revealed that Therapist Rated GARF (GARF-T) was significantly
related to the SCORS composite score (SCORS-C; r 0.27, P
0.02). That is, higher more adaptive responses on the GARF were
reflective of healthier SCORS ratings (Table 3).
A Confirmatory Factor Analysis was then calculated to eval-
uate whether the Principal Components (PCA) factor model pro-
vided a good fit to the current data. Results of the CFA indicate that
the TLI score was over 1 (1.224), CFI was 0.26, and a RMSEA of
0.21 was obtained, which indicates the 1-factor model proposed by
the PCA was not fully explained by the current data. Therefore,
contrary to the PCA results, TLI, CFI, and RMSEA scores did not
provide support that the SCORS-C, GARF-C, and IIP converged on
single factor.
Intercorrelations Between GARF and SCORS
To further explore the prior significant relationships between
the GARF and SCORS ratings, secondary analyses were exam-
ined between each of the individual GARF ratings (N 79;
GARF-T, GARF-V, and GARF-C) and SCORS variables. GARF-T
demonstrated a relationship to the following SCORS variables: EIR
(r 0.23, P 0.04), AGG (r 0.21, P 0.06), SE (r 0.30, P
0.01), and ICS (r 0.30, P 0.01). That is, higher, more adaptive
SCORS narratives on EIR, AGG, SE, and ICS related to higher,
more adaptive therapist rated GARF. There was also a trend toward
significance between External Rated GARF (GARF-V) and the
SCORS variable, SE (r 0.21, P 0.06). Lastly, significant
relationships existed between Combined GARF (GARF-C) and
AGG (r 0.22, P 0.05), SE (r 0.27, P 0.02) and ICS (r
0.25, P 0.03). Again, higher SCORS ratings on AGG, SE, and
ICS were reflective of healthier, more adaptive combined GARF
scores.
DISCUSSION
The present study assessed the convergent validity of the GARF
with established measures of interpersonal functioning. Additionally,
we sought to study the relationship between measures of relational
functioning (GARF, IIP, and SCORS) across different methods of
assessment (clinician rated, self-report, and free-response). The results
of the first principal components analysis (PCA) revealed that the
GARF-Combined, IIP-Total, and SCORS-Composite scores all con-
verged on a single factor indicating that the 3 tests are measuring
relational functioning. This is notable in several ways. First, this further
supports the usage of the GARF as a measure of relational functioning.
Second, implicit, explicit, and clinician-rated measures are known to
measure different aspects of a patient’s functioning and it is more
common for these tests to either not to show a strong association or only
a limited relationship to one another (Bornstein, 2002; Child et al. 1956;
McClelland et al. 1989). Therefore, the results demonstrate that mea-
sures from different assessment modalities share a similar construct,
although the construct may be expressed in different ways based on the
assessment method used. Intercorrelations of the first principal compo-
nents analysis revealed that higher, more adaptive ratings on the
GARF-C were reflective of higher, more adaptive SCORS-C scores.
The overall correlation between the GARF and SCORS is larger than
the relationship between the SCORS and IIP or the IIP and GARF.
The results of the second principal components analysis
(separating out GARF-C into GARF-Therapist and GARF Video-
External) revealed 2 factors. SCORS-C loaded on both factors
although its primary loading was on factor 2. The GARF-T,
GARF-V, and SCORS-C all require a clinician to assess a patient’s
relationships whether it is across psychotherapy sessions or early
childhood recollections. These measures seem to examine the clini-
cian’s judgment of patient generated narratives whether in session or
in early recollections. However, the SCORS also requires patients to
generate early recollections from specific time periods (i.e., first day
of school). In some respects, the patient (not necessarily con-
sciously), determines which recollections he/she will report/recall to
the examiner. Therefore, the SCORS requires input from both the
examiner and clinician and may be why it loaded on both factors. On
the other hand, the IIP does not require input from a clinician. The
patient responds based on how well set statements fit with his/her
self-perception. Taking this into consideration, we decided that
factor 1 may be reflective of clinician-rated measures (clinician’s
assessment of patient’s relational functioning) and Factor 2 more
indicative of patient generated data (patient’s perception of his/her
relational functioning). Another way to interpret these 2 factors can
be that Factor 1 (GARF and SCORS) measures relationship issues
focused on both the self and the other globally, whereas Factor 2 (IIP
and SCORS) might focus more on the self and one’s interpersonal
problems. The GARF is designed to examine global relational
problems, whereas the IIP asks the patient to report interpersonal
problems that are difficult to engage in or excessive in nature.
Intercorrelations of the second principal components analysis
also demonstrated a significant relationship between GARF-T and
SCORS-C. The therapist’s GARF scores were more associated with
patient’s early childhood recollections than the external-rated
GARF. Higher, more adaptive therapist-rated GARF scores were
associated with higher, more adaptive SCORS ratings. Perhaps, the
therapist was able to detect relational nuances in the treatment room,
TABLE 3. Intercorrelation Matrices Between the Global Assessment of Relational Functioning Scale (GARF), Inventory of
Interpersonal Problems (IIP), and the Social Cognition and Object Relations Scale (SCORS) Variables for Early Childhood
Recollections
Principal components analysis one
(N 76) (N 79)
IIP-Total SCORS-C
GARF-C r 0.08, p 0.51 r 0.23, p 0.05
IIP-Total r ⫽⫺0.18, p 0.12
Principal components analysis two
(N 79) (N 76) (N 79)
GARF-V IIP-Total SCORS-C
GARF-T r 0.79, p 0.0001 r 0.13, p 0.26 r 0.27, p 0.02
GARF-V r ⫽⫺0.002, p 0.98 r 0.15, p 0.21
IIP-Total r ⫽⫺0.18, p 0.12
GARF-C indicates average of therapist and external GARF ratings; GARF-T, therapist GARF ratings; GARF-V, external rater GARF; SCORS-C, average of 8 SCORS variables.
The Journal of Nervous and Mental Disease Volume 197, Number 1, January 2009 GARF and Multimethod Assessment
© 2009 Lippincott Williams & Wilkins 53
similar to those the SCORS was able to detect. Specifically, the
GARF is comprised of 3 components (Problem Solving, Organiza-
tion, and Emotional Climate). The therapist might be assessing, in
part, the Emotional Climate of the therapeutic relationship when
rating the patient’s GARF. Perhaps, an External Rater is not as
sensitive to the patients’ emotional climate when rating videotape.
For example, patients might be verbalizing or describing their family
in 1 way; however it might not capture the intensity of what the
patient might be feeling and/or experiencing in the session. This
emotional information might be lost, to some degree, when an
external viewer is rating the session.
Despite these earlier findings, the CFA results did not confirm
the 1 factor model that was originally proposed. There are many
potential explanations as to why there was a discrepancy between
the PCA and CFA. Different analytic techniques are sensitive to
different aspects of the data as shown with the discrepancy between
the PCA and CFA results (Floyd & Widaman, 1995). The PCA
demonstrated that the GARF-C, SCORS-C, and IIP-Total converged
on a single factor, whereas the CFA did not, and as such do not
provide convincing support for this 1 factor model. One way to
explain this difference may be that the GARF-C, SCORS-C, and IIP
are distinct measures that examine different aspects of relational
functioning. One model cannot be substituted for another in practice
as different methods assess a different aspect of relational function-
ing. It should be noted that CFA of multimethod data often result in
poorly defined solutions (e.g., negative error variances; Marsh,
1989) and nonconvergence (Kenny & Kashy, 1992). Another prac-
tical reason why this discrepancy occurred in the current project
might be due to the limited number of measures (GARF, IIP, and
SCORS). Increasing the number of methods would help to better
understand discrepant results. In sum, based on the size of the
eigenvalue, the PCA demonstrates that there is overlap in the ability
of these 3 methods to assess relational functioning, but yet the CFA
is suggesting that it is not a large or optimal enough overlap to
consider these measures as assessing a single unitary construct, but
rather 3 separate facets of a larger relational functioning construct.
Our findings support past research in that the GARF and
SCORS can be reliably rated as well as there being significant
relationships between the GARF and SCORS variables (Table 2).
Higher, more adaptive GARF-T scores were significantly associated
with higher, more adaptive SCORS ratings on EIR, SE, and ICS.
Higher, more adaptive GARF-C scores correlated significantly with
AGG, SE, and ICS. The relationship between GARF scores and
EIR, AGG, SE, and ICS on the SCORS strengthens past research
suggesting that the SCORS affective variables (AFF, EIR, AGG,
SE, and ICS) were more sensitive than the cognitive variables
(COM, EIM, and SC) in assessing relational functioning (Ackerman
et al. 2000; Fowler et al. 2004, Hibbard et al. 1995). That is, more
adaptive relational functioning (GARF-C) were reflective of early
childhood recollections with more positive affective representations
of the self and other (AFF), increased emotional connection and
intimacy (EIR), more adaptive experience and expressions of anger
(AGG), a higher sense of self (SE), and a sense of one’s own identity
(ICS). This is consistent with past research showing that relation-
ships are often emotionally charged, which can impact interpersonal
functioning (Ackerman et al. 2001). Additionally this supports the
Peters et al. (2006) study that demonstrated a significant relationship
between the GARF-C and the SCORS variable, AFF using relational
narratives expressed in psychotherapy.
Little research has examined the relationship between thera-
pist-rated versus external-rated GARF. However, there has been
research conducted on outcome, levels of training, and subsequent
GARF scores. Research suggests that there were significant findings
between number of sessions and GARF ratings completed by su-
pervisors as opposed to therapist ratings (Rosen et al. 1997). Also,
Wilkins and White (2001) demonstrated that GARF scores corre-
lated differently with self-report measures of symptomology, quality
of marriage, and family functioning measures depending on the type
of rater (undergraduate, extern, or therapist). It would be beneficial
to further examine the relationship between the GARF and SCORS
to see if similar results exist across other clinical and nonclinical
populations (i.e., inpatient vs. outpatient mental health clinics, coun-
seling centers, etc). This can help better determine whether it is the
level of training, the nature of rating video, or the measures being
used that are contributing to the differences between therapist and
external rater. Past research primarily focused on generating patient
GARF ratings through video (Dausch et al. 1996; Hilsenroth et al.
2000, Hilsenroth et al. 2003; Mottarella et al. 2001; Peters et al.
2006) or case notes/interview (Hay et al. 2003; Rosen et al. 1997;
Ross & Doherty, 2001; Wilkins and White, 2001). The findings of
the present study suggest that GARF-C appears to encompass
GARF-T and GARF-V most effectively and as such should be
considered for future research to represent the GARF as a valid and
reliable measure of relational functioning.
There were no significant relationships between the GARF and
IIP. A potential explanation may be that clinicians are attuned to
different or broader aspects of interpersonal functioning than are pa-
tients. That is, this is a clinical sample seeking treatment because of
relational difficulties with a parent, sibling, friends, loved one, etc. and
patients may not always recognize the subtleties of their struggles, their
origin, and/or the role they and others play within given relationships.
In contrast, clinicians may be more sensitive to these aspects of
functioning that might be out of the patient’s awareness. As is the case
with all self-assessment measures, our IIP data represent the self-
perception of these patient’s interpersonal problems.
This study contributes to the literature in numerous ways.
First, our use of a naturalistic treatment seeking clinical sample
allows our findings to be more readily generalizable to other assess-
ment and treatment settings. Another strength of the present study is
that it helps support the convergent validity of the GARF as a
relational functioning measure. As noted earlier, this is one of the
first studies to examine GARF assessed relational functioning mul-
tidimensionally. Also, the present research demonstrates the clinical
utility of taking a multimethod approach toward understanding the
various facets of relational functioning. Specifically, the PCA sug-
gests that the GARF, SCORS, and IIP examined a similar construct
of relational functioning despite varying methods. Although the
CFA did not confirm this 1 factor model, the totality of our findings
would suggest that the GARF, SCORS, and IIP are examining at
least similar facets (i.e., clinician rated, implicit, vs. self-attributed)
of relational functioning. This provides further evidence that the
type of assessment method or perspective used is as important as the
construct being examined and subsequently may affect how well
measures converge/diverge (McClelland et al. 1989). Clinically, the
results of the present study can aid in treatment planning as the
therapist can use his/her relational assessment tools to gain a clearer
picture of the patient’s strength and/or struggles in a relatively short
period of time. The therapist and patient can then work collabora-
tively to develop the treatment focus, identify potential impedi-
ments, and make appropriate adjustments to therapeutic technique.
ACKNOWLEDGMENTS
The authors thank Robert Bornstein, PhD, Robert Mendel-
sohn, PhD, Marshall Silverstein, PhD, and Joel Weinberger,
PhD, who helped us with earlier versions of this manuscript. An
earlier version of this article was presented at the annual meeting
of the Society for Personality Assessment, Arlington, Virginia,
March 2007.
Stein et al. The Journal of Nervous and Mental Disease Volume 197, Number 1, January 2009
© 2009 Lippincott Williams & Wilkins54
REFERENCES
Ackerman S, Hilsenroth MJ, Clemence AJ, Weatherill R, Fowler JC (2000) The
effects of social cognition and object representation on psychotherapy contin-
uation. Bull Menninger Clin. 64:386 408.
Ackerman SJ, Hilsenroth MJ, Clemence AJ, Weatherill R, Fowler JC (2001)
Convergent validity of Rorschach and TAT scales of object relations.
J Pers Assess. 77(2):295–306.
American Psychiatric Association (1994) Diagnostic and Statistical Manual of
Mental Disorders, (4th ed). Washington (DC): American Psychiatric Associa-
tion.
Bentler PM, Bonett DG (1980) Significance tests and goodness of fit in the
analysis of covariance structures. Psychol Bull. 88:588 606.
Bentler PM (1990) Comparative fit indexes in structural models. Psychol Bull.
107:238 –246.
Bollen KA (1989) Structural Equations With Latent Variables. New York: Wiley.
Bornstein R (2002) A process dissociation approach to objective-projective test
score interrelationships. J Pers Assess. 78:47– 68.
Bryant FB, Yarnold PR (1995) Principal-components analysis and exploratory
and confirmatory factor analysis. In Grimm L, Yarnold PR (Eds), Reading and
Understanding Multivariate Statistics (pp. 99 –136). Washington (DC): APA.
Child I, Frank K, Storm T (1956) Self ratings and TAT: Their relations to each
other and to adulthood background. J Pers. 25:96 –114.
Dausch B, Miklowitz D, Richards J (1996) Global assessment of relational
functioning scale (GARF): II. Reliability and validity in a sample of families of
bipolar patients. Fam Process. 35:175–189.
Fabrigar L, Wegener D, MacCallum R, Strahan E (1999) Evaluating the use of
exploratory factor analysis in psychological research. Psychol Methods. 4:272–
299.
Floyd FJ, Widaman KF (1995) Factor analysis in the development and refinement
of clinical assessment instruments. Psychol Assess. 7:286 –299.
Fowler C, Hilsenroth M, Handler L (1995) Early Memories: An exploration of
theoretically derived queries and their clinical utility. Bull Menninger Clin.
59:79 –98.
Fowler C, Ackerman S, Speanburg S, Bailey A, Blagys M, Conklin AC (2004)
Personality and symptom change in treatment-refractory inpatients: Evaluation
of the phase model of change using Rorschach, TAT, and DSM-IV Axis V. J
Pers Assess. 83:306–322.
Group for the Advancement of Psychiatry Committee on the Family (GAP) (1996)
Global assessment of relational functioning scale (GARF): 1. Background and
Rationale. Fam Process. 35:155–172.
Hay P, Phil D, Katsikitis M, Begg J, DaCosta J, Blumenfeld N (2003) A two-year
follow-up study and prospective evaluation of the DSM-IV Axis V. Psychiatr
Serv. 54:1028–1030.
Hibbard S, Hilsenroth M, Hibbard J, Nash M (1995) A validity study of two
projective representation measures. Psychol Assess. 7:332–339.
Hilsenroth M (2007) A programmatic study of short-term psychodynamic psycho-
therapy: Assessment, process, outcome, and training. Psychother Res. 17:31–45.
Hilsenroth M, Ackerman S, Blagys M, Baity M, Mooney M (2003) Short-term
psychodynamic psychotherapy for depression: An examination of statistical,
clinically significant, and technique-specific change. J Nerv Ment Dis. 191:
349 –357.
Hilsenroth M, Ackerman S, Blagys M, Baumann B, Baity M, Smith S, Price J,
Smith C, Heindselman T, Mount M, Holdwick D (2000) Reliability and validity
of DSM-IV Axis V. Am J Psychiatry. 157:1858 –1863.
Hilsenroth M, Stein M, Pinsker J (2004) Social cognition and object relations scale:
Global rating method (SCORS-G). Unpublished manuscript, Garden City, NY:
The Derner Institute of Advanced Psychological Studies, Adelphi University.
For a copy of the manual, please email this address: mstein3@partners.org.
Horowitz LM, Alden LE, Wiggins JS, Pincus AL (2000) IIP-C: Inventory of
Interpersonal Problems Manual. Psychological Corporation.
Hoyle R, Panter A (1995) Writing about structural equation models. In Hoyle RH
(Ed), Structural Equation Modeling: Concepts, Issues and Applications (pp.
158 –176). Thousand Oaks (CA): Sage.
Hu L, Bentler PM (1995) Evaluating Model Fit. In Hoyle RH (Ed), Structural
Equation Modeling: Concepts, Issues, and Applications (pp. 76–99). Thousand
Oaks (CA): Sage.
Kenny DA, Kashy DA (1992) Analysis of the Multitrait-MultiMethod matrix by
confirmatory factor analysis. Psychol Bull. 112:165–172.
Marsh HW (1989) Confirmatory factor analyses of Multitrait-Multimethod data:
Many problems and a few solutions. Appl Psychol Meas. 13:335–361.
McClelland D, Koestner R, Weinberger J (1989) How do self-attributed and
implicit motives differ? Psychol Rev. 96:690 –702.
Mottarella K, Philpot C, Fritzsche B (2001) Don’t Take Out This Appendix!
Generalizability of the global assessment of relational functioning scale. Am J
Fam Ther. 29:271–278.
Peters EJ, Hilsenroth MJ, Eudell-Simmons EM, Blagys MD, Handler L (2006)
Reliability and validity of the social cognition and object relations scale in
clinical use. Psychother Res. 16:617– 626.
Rosen K, McCollum E, Middleton K, Locke L, Bird K (1997) Interrater reliability
and validity of the global assessment of relational functioning (GARF) scale in
a clinical setting: A preliminary study. American J Fam Ther. 25:357–360.
Ross N, Doherty W (2001) Validity of the global assessment of relational
functioning (GARF) when used by community-based therapists. Am J Fam
Ther. 29:239–253.
Shrout PE, Fleiss JL (1979) Intraclass correlations: Uses in assessing rater
reliability. Psychol Bull. 86:420428.
Steiger JH (1990) Structural model evaluation and modifications: An interval
estimation approach. Multivariate Behav Appr. 25:173–180.
Stein MB, Pinsker JH, Hilsenroth MJ (2007) Borderline pathology and the
personality assessment inventory (PAI): An evaluation of criterion and concur-
rent validity. J Pers Assess. 88:81– 89.
Tucker LR, Lewis C (1973) A reliability coefficient for maximum likelihood
factor analysis. Psychometrika. 38:1–10.
Wilde NJ, Strauss E, Chelune G, Hermann B, Hunter M, Loring D, Martin R,
Sherman (2003) Confirmatory factor analysis of the WMS-III in patients with
temporal lobe epilepsy. Psychol Assess. 15:56 63.
Wilkens L, White M (2001) Interrater reliability and concurrent validity of the
global assessment of relational functioning (GARF) scale using a card sort
method: A pilot study. Fam Ther. 28:157–170.
Yingling L, Miller W, McDonald A, Galewaler S (1998) GARF Assessment
Sourcebook: Using the DSM-IV Global Assessment of Relational Functioning.
Washington (DC): Taylor and Francis.
The Journal of Nervous and Mental Disease Volume 197, Number 1, January 2009 GARF and Multimethod Assessment
© 2009 Lippincott Williams & Wilkins 55
    • "Internal consistency estimates for noncognitive SCORS–G scales were lower than typically desired. Past research on these scales is mixed, with some studies reporting findings similar to ours (Inslegers et al., 2012; Leigh et al., 1992 ) and others reporting adequate internal consistency (Bram, 2014; Hibbard et al., 2001; Stein et al., 2009; Stein et al., 2012). In short, research on SCORS–G scales' internal consistency is inconsistent. "
    [Show abstract] [Hide abstract] ABSTRACT: The content of Thematic Apperception Test (TAT) cards can, in some cases, influence how respondents form narratives. This study examines how imagery from select TAT cards affects the narratives of respondents from a nonclinical setting. The Social Cognition and Object Relations Scale–Global Rating Method (SCORS–G; Stein, Hilsenroth, Slavin-Mulford, & Pinsker, 201140. Stein, M. B., Hilsenroth, M., Slavin-Mulford, J., & Pinsker, J. (2011). Social Cognition and Object Relations Scale–Global Rating method (SCORS–G) (4th ed.). Unpublished manuscript, Massachusetts General Hospital and Harvard Medical School, Boston, MA.View all references; Westen, Lohr, Silk, Kerber, & Goodrich, 1989) was used to rate narratives. Forty-nine college students provided narratives to 6 TAT cards. Narratives were rated by two expert raters using the SCORS–G. Consistent with prior research, Card 2 exhibited the most pull for adaptive ratings on SCORS–G scales, and 3 BM exhibited the most pull for maladaptive ratings. Findings for other cards were mixed. Although raters were highly reliable, internal consistency estimates were lower than desirable for 6 of the 8 SCORS–G scales. Variance component estimates indicated that card by person interactions accounted for the largest amount of variance in person-level scores. Results and limitations are discussed in light of prior research. We also make suggestions for further lines of research in this area.
    Article · Apr 2016
    • "Internal consistency estimates for noncognitive SCORS–G scales were lower than typically desired. Past research on these scales is mixed, with some studies reporting findings similar to ours (Inslegers et al., 2012; Leigh et al., 1992 ) and others reporting adequate internal consistency (Bram, 2014; Hibbard et al., 2001; Stein et al., 2009; Stein et al., 2012). In short, research on SCORS–G scales' internal consistency is inconsistent. "
    Article · Apr 2016
    • " and SCORS – G ratings . Past research with adult samples has also reported similar findings ( Pinsker - Aspen , Stein , & Hilsenroth , 2007 ) . We also found that the SCORS – G composite ratings were positively correlated to the patients ' GAF scores . This pro - vides convergent validity as similar results were found in past SCORS – G research ( Stein et al . , 2009 ) . We would expect that those who have healthier self - and interpersonal functioning capacities would lead to better coping capacities and overall"
    [Show abstract] [Hide abstract] ABSTRACT: The Social Cognition and Object Relations Scale-Global Rating Method (SCORS-G) is a clinical rating system assessing eight domains of self and interpersonal relational experience which can be applied to narrative response data (e.g., Thematic Apperception Test [TAT; Murray, 1943]; early memories narratives) or oral data (e.g., psychotherapy narratives, Relationship Anecdotal Paradigms). In the current study, seventy-two psychiatrically hospitalized adolescents consented and were rated by their individual and group therapist using the SCORS-G. Clinicians also rated therapy engagement, personality functioning, quality of peer relationships, school functioning, global assessment of functioning (GAF), history of eating disordered behavior and history of nonsuicidal self-injury. SCORS-G composite ratings achieved an acceptable level of inter-rater reliability and were associated with theoretically predicted variables (e.g., engagement in therapy; history of nonsuicidal self-injury). SCORS-G ratings also incrementally improved the prediction of therapy engagement and global functioning beyond what was accounted for by GAF scores. This study further demonstrates the clinical utility of the SCORS-G with adolescents.
    Full-text · Article · Aug 2014
Show more