Content uploaded by Robert B. Kaiser
Author content
All content in this area was uploaded by Robert B. Kaiser on Apr 29, 2019
Content may be subject to copyright.
1
Are All Scales Created Equal?
Robert B. Kaiser
Robert E. Kaplan
Presented at the 21st Annual SIOP Conference
Dallas, TX
April 2006
Response Format and the Validity of Managerial Ratings
Download
• Handouts
www.kaplandevries.com
• More on the new rating scale
www.versatileleader.com
+ r
The point of feedback
• Change behavior to increase organizational
performance
Leader
Personality
Leadership
Style
Employee
Attitudes
Team
Functioning
Organizational
Performance
Feedback
Hogan & Kaiser (2005). What we know about leadership.
Review of General Psychology.
R C
Factors affecting validity
• Linguistic properties of items: short, simple, specific,
in context (Brutus & Facteau, 2003; Kaiser & Craig, 2005)
• Response scale format?
• Moratorium on rating scale research (Landy & Farr, 1980)
– All scales equally ineffective at reducing errors
– But no direct comparisons of predictive
validity for alternative formats
– Premature moratorium?
Common response formats
• Frequency: how often, to what extent
By far most common (21 of 24 in Feedback to Managers)
“Descriptive”
• Evaluative: how effective, how good
Can be absolute (poor/outstanding) or relative (below/above avg)
“Judgmental”
Example
2
never rarely sometimes often always
How often does this manager do the following?
Frequency response scale
1. Direct—tells people when
he is dissatisfied with their
work.
X
ineffective adequate effective very extraordinarily
effective effective
How effective is this manager at the following?
1. Direct—tells people when
he is dissatisfied with their
work.
X
Evaluation response scale
- 4 -3 -2 -1 0 +1 +2 +3 +4
The right
amount Too much
Too little
Much
too little
Barely
too little
Barely
too much
Much
too much
X
1. Direct—tells people when
he is dissatisfied with their
work.
“Too little/too much” response scale
Kaiser & Kaplan (2005). Overlooking overkill.
Human Resources Planning.
Confounds in typical scales?
-4 -3 -2 -1 0 +1 +2 +3 +4-4 -3 -2 -1 0 +1 +2 +3 +4
Frequency Scale
Never
Always
-4 -3 -2 -1 0 +1 +2 +3 +4-4 -3 -2 -1 0 +1 +2 +3 +4
Evaluation scale
Poor
Outstanding
Confound: a lot and
too much.
Confuses activity with
effectiveness.
Confound: two different kinds of
ineffectiveness (deficiency and
overkill)
Validity
• Nomological: relationships with theoretically
relevant external variables (criteria)
• Structural: relationships among dimensions
within the model
Cronbach & Meehl (1955); Loevinger (1957); Binning &
Barrett (1987); Messick (1995).
How should
Task-
and
People-
oriented dimensions be related?
Kaiser & Kaplan (2005). On the folly of linear rating scales.
Performance Appraisals: A Critical View.
3
Bakeoff Study
1. Relative validity of three distinct response
scales in predicting leadership criteria
2. Does new “too little/too much” scale tease
apart confounds in typical scales?
3. Relationship between “opposites”
Experimental Design
• Within-subjects survey study
•N= 79 employed MBA students
• Rated current boss three times on the same
dimensions: Freq, Eval, TL/TM formats
(randomized presentation order)
• Criteria: rated their organizational attitudes,
their team, and boss’ overall effectiveness.
Response scales
Frequently,
if not always
Fairly
often
Sometimes
Once
in a while
Not
at all
54321
Outstanding
Very
Strong
Competent
Under-
developed
Not
developed
54321
Freq
Eval
TL/TM
Dimensions
•Consideration and Initiating Structure
(LBDQ—XII; Stogdill, 1963; 10 items each)
•Enabling and Forceful Leadership (LVI2.0;
Kaplan & Kaiser, 2006; 9 items each)
– Conceptually related to C & IS
– Designed on “too much/too little” premise
Criteria
α
kMSD
Job satisfaction .83 5 5.09 1.16
Psych empowerment .87 12 5.47 .92
Team efficacy .90 8 5.28 1.05
Boss’ overall effectiveness .79 3 4.46 1.52
Leader
Personality
Leadership
Style
Employee
Attitudes
Team
Functioning
Organizational
Performance
7-point “Agree/Disagree” Scale
Results
Compare by response scale
Nomological validity (predicting criteria)
• Bi-variate validity (correlation)
• Multivariate validity (multiple regression)
• Incremental validity (hierarchical regression)
Structural validity
• Correlations between opposites
4
.29.17.18.30.50
Enablin
g
.34
.41
.39
.41
.22
.45
.33
.47
.35
.49
.41
Average
.31.24.25.57
Forceful
.31.29.36.68
Consideration
.38.32.33.51
Initiating Structure
"Too little/Too much" *
.29.34.40.62
Enabling
.24.15.11.37
Forceful
.36.32.43.68
Consideration
.31.26.25.51
Initiating Structure
Frequency
.31.39.48.72
Enabling
.31.26.28.56
Forceful
.38.38.49.73
Consideration
.37.32.30.66
Initiating Structure
Evaluation
Team
Efficacy
Psych.
Empower.
Job
Satisfact’n
Overall
Effect.
Response format
Performance Dimension
.43
.35
.36
Bi
-var
i
a
t
eva
lidit
y
.23.17.11.16.47"Too little/Too much"
.22.14.12.17.47Frequency
.28.15.16.24.58Evaluation
Average
Team
Efficacy
Psych.
Empow’t
Job
Satisf’n
Overall
Effect.
Response format
Multivariate R2
Summary of multiple regressions using all dimensions
to predict criteria
Multivariate validity
.06.03.05.04.10"Too little/Too much"
.04.02.03.03.07Frequency
.10.04.07.08.19Evaluation
Average
Team
Efficacy
Psych.
Empow’t
Job
Satisf’n
Overall
Effect.
Response format
Multivariate ∆R2
Round-robin: run all pair-wise hierarchical regression models to
determine incremental validity of each scale vis-à-vis the other two,
compute the average ∆R2 across both comparisons.
Incremental validity
Structural Validity
rCons.-Init. Struct. rForceful-Enabling
Evaluation +.67 +.49
Frequency +.57 +.12
Too Little/Too Much +.11 -.41
Correlation between “opposites”
Upshot
• “Evaluation” format is the most valid—by a
wide margin
• “Frequency” format is least valid—but
most common in practice
• “Too little/Too much” format may give a
truer read on dimensional performance
• Consider a combo:
Evaluation and “Do More/Do Less”
More…
• Handouts
www.kaplandevries.com
• The new rating scale
www.versatileleader.com
• Email me
rkaiser@kaplandevries.com