Conference PaperPDF Available

Are All Scales Created Equal? Response Format and the Validity of Managerial Ratings

Authors:
  • Kaiser Leadership Solutions

Abstract

Compared the validity of ratings of managerial behavior made with two types of 5-pt rating scales (those measuring (1) frequency of behavior and (2) an evaluation of the effectiveness of behavior) and a recently introduced "too little/too much" rating scale that distinguishes shortcoming from strengths and from strengths overused. We tested hypotheses about how the "too little/too much" scale might remove confounds in frequency scales (by distinguishing doing a lot from doing too much of a behavior) and evaluation scales (by distinguishing ineffective behaviors due to doing too little compared to too much of it). Concerning the predictive validity of the three rating scale formats, we compared the statistical associations between four common performance dimensions measured three ways (with the three different ratings scales) and four effectiveness criteria. We also compared the incremental validity of each rating format relative to the other two. Results offer partial support for the new "too little/too much" scale as a method for uniquely measuring variance associated with "too much" of certain behaviors.
1
Are All Scales Created Equal?
Robert B. Kaiser
Robert E. Kaplan
Presented at the 21st Annual SIOP Conference
Dallas, TX
April 2006
Response Format and the Validity of Managerial Ratings
Download
• Handouts
www.kaplandevries.com
More on the new rating scale
www.versatileleader.com
+ r
The point of feedback
Change behavior to increase organizational
performance
Leader
Personality
Leadership
Style
Employee
Attitudes
Team
Functioning
Organizational
Performance
Feedback
Hogan & Kaiser (2005). What we know about leadership.
Review of General Psychology.
R C
Factors affecting validity
Linguistic properties of items: short, simple, specific,
in context (Brutus & Facteau, 2003; Kaiser & Craig, 2005)
Response scale format?
Moratorium on rating scale research (Landy & Farr, 1980)
All scales equally ineffective at reducing errors
But no direct comparisons of predictive
validity for alternative formats
Premature moratorium?
Common response formats
Frequency: how often, to what extent
By far most common (21 of 24 in Feedback to Managers)
“Descriptive”
Evaluative: how effective, how good
Can be absolute (poor/outstanding) or relative (below/above avg)
“Judgmental”
Example
2
never rarely sometimes often always
How often does this manager do the following?
Frequency response scale
1. Direct—tells people when
he is dissatisfied with their
work.
X
ineffective adequate effective very extraordinarily
effective effective
How effective is this manager at the following?
1. Direct—tells people when
he is dissatisfied with their
work.
X
Evaluation response scale
- 4 -3 -2 -1 0 +1 +2 +3 +4
The right
amount Too much
Too little
Much
too little
Barely
too little
Barely
too much
Much
too much
X
1. Direct—tells people when
he is dissatisfied with their
work.
“Too little/too much” response scale
Kaiser & Kaplan (2005). Overlooking overkill.
Human Resources Planning.
Confounds in typical scales?
-4 -3 -2 -1 0 +1 +2 +3 +4-4 -3 -2 -1 0 +1 +2 +3 +4
Frequency Scale
Never
Always
-4 -3 -2 -1 0 +1 +2 +3 +4-4 -3 -2 -1 0 +1 +2 +3 +4
Evaluation scale
Poor
Outstanding
Confound: a lot and
too much.
Confuses activity with
effectiveness.
Confound: two different kinds of
ineffectiveness (deficiency and
overkill)
Validity
Nomological: relationships with theoretically
relevant external variables (criteria)
Structural: relationships among dimensions
within the model
Cronbach & Meehl (1955); Loevinger (1957); Binning &
Barrett (1987); Messick (1995).
How should
Task-
and
People-
oriented dimensions be related?
Kaiser & Kaplan (2005). On the folly of linear rating scales.
Performance Appraisals: A Critical View.
3
Bakeoff Study
1. Relative validity of three distinct response
scales in predicting leadership criteria
2. Does new “too little/too much” scale tease
apart confounds in typical scales?
3. Relationship between “opposites”
Experimental Design
Within-subjects survey study
N= 79 employed MBA students
Rated current boss three times on the same
dimensions: Freq, Eval, TL/TM formats
(randomized presentation order)
Criteria: rated their organizational attitudes,
their team, and boss’ overall effectiveness.
Response scales
Frequently,
if not always
Fairly
often
Sometimes
Once
in a while
Not
at all
54321
Outstanding
Very
Strong
Competent
Under-
developed
Not
developed
54321
Freq
Eval
TL/TM
Dimensions
Consideration and Initiating Structure
(LBDQ—XII; Stogdill, 1963; 10 items each)
Enabling and Forceful Leadership (LVI2.0;
Kaplan & Kaiser, 2006; 9 items each)
– Conceptually related to C & IS
– Designed on “too much/too little” premise
Criteria
α
kMSD
Job satisfaction .83 5 5.09 1.16
Psych empowerment .87 12 5.47 .92
Team efficacy .90 8 5.28 1.05
Boss’ overall effectiveness .79 3 4.46 1.52
Leader
Personality
Leadership
Style
Employee
Attitudes
Team
Functioning
Organizational
Performance
7-point “Agree/Disagree” Scale
Results
Compare by response scale
Nomological validity (predicting criteria)
Bi-variate validity (correlation)
Multivariate validity (multiple regression)
Incremental validity (hierarchical regression)
Structural validity
Correlations between opposites
4
.29.17.18.30.50
Enablin
g
.34
.41
.39
.41
.22
.45
.33
.47
.35
.49
.41
Average
.31.24.25.57
Forceful
.31.29.36.68
Consideration
.38.32.33.51
Initiating Structure
"Too little/Too much" *
.29.34.40.62
Enabling
.24.15.11.37
Forceful
.36.32.43.68
Consideration
.31.26.25.51
Initiating Structure
Frequency
.31.39.48.72
Enabling
.31.26.28.56
Forceful
.38.38.49.73
Consideration
.37.32.30.66
Initiating Structure
Evaluation
Team
Efficacy
Psych.
Empower.
Job
Satisfact’n
Overall
Effect.
Response format
Performance Dimension
.43
.35
.36
Bi
-var
i
a
t
eva
lidit
y
.23.17.11.16.47"Too little/Too much"
.22.14.12.17.47Frequency
.28.15.16.24.58Evaluation
Average
Team
Efficacy
Psych.
Empow’t
Job
Satisf’n
Overall
Effect.
Response format
Multivariate R2
Summary of multiple regressions using all dimensions
to predict criteria
Multivariate validity
.06.03.05.04.10"Too little/Too much"
.04.02.03.03.07Frequency
.10.04.07.08.19Evaluation
Average
Team
Efficacy
Psych.
Empow’t
Job
Satisf’n
Overall
Effect.
Response format
Multivariate R2
Round-robin: run all pair-wise hierarchical regression models to
determine incremental validity of each scale vis-à-vis the other two,
compute the average R2 across both comparisons.
Incremental validity
Structural Validity
rCons.-Init. Struct. rForceful-Enabling
Evaluation +.67 +.49
Frequency +.57 +.12
Too Little/Too Much +.11 -.41
Correlation between “opposites”
Upshot
“Evaluation” format is the most valid—by a
wide margin
“Frequency” format is least valid—but
most common in practice
“Too little/Too much” format may give a
truer read on dimensional performance
Consider a combo:
Evaluation and “Do More/Do Less”
More…
• Handouts
www.kaplandevries.com
The new rating scale
www.versatileleader.com
Email me
rkaiser@kaplandevries.com
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.