ArticlePDF Available

A Method for Capturing Context in the Assessment of Leaders: The “Too Little/Too Much” Rating Scale


Abstract and Figures

In their focal article, Reynolds, McCauley, Tsacoumis, and the Jeanneret Symposium Participants (2018) stress the importance of context in leadership assessment. For instance, they argue that senior executives work in a different context compared to lower-level managers and that this should be taken into account. A simple example is that the competency of strategic thinking is critical for executive performance but much less so, if at all, for front-line supervisors. The claim that context matters in leadership and in the assessment of leaders is easy to grasp but difficult to apply in practice.
Content may be subject to copyright.
A method for capturing context in the assessment of leaders:
The “too little/too much” rating scale
Jasmine Vergauwe1*, Robert B. Kaiser2, Bart Wille3, Filip De Fruyt1 and Joeri Hofmans4
1Ghent University, Belgium
2Kaiser Leadership Solutions, USA
3University of Antwerp, Belgium
4 Vrije Universiteit Brussel, Belgium
Accepted for publication in: Industrial and Organizational Psychology: Perspectives on
Science and Practice
Commentary on the focal article: Reynolds, D. H., McCauley, C. D., Tsacoumis, S., & the
Jeanneret Symposium Participants (in press). A critical evaluation of the state of
assessment and development for senior leaders. Industrial and Organizational
Psychology: Perspectives on Science and Practice, 11(4).
*Address correspondence to: Jasmine Vergauwe, Department of Developmental, Personality,
and Social Psychology, Ghent University. H. Dunantlaan 2, B-9000 Gent. Belgium. Tel.: +32 9 264 64 29
In their focal article, Reynolds, McCauley, Tsacoumis, and the Jeanneret Symposium
Participants (in press), stress the importance of context in leadership assessment. For instance,
they argue that senior executives work in a different context compared to lower-level
managers and that this should be taken into account. A simple example is that the competency
of strategic thinking is critical for executive performance but much less so, if at all, for front-
line supervisors. The claim that context matters in leadership and in the assessment of leaders
is easy to grasp, but difficult to apply in practice.
Although recent advances have been made in specifying leadership context (e.g.,
SHL’s Leader Edge that was celebrated with the 2018 M. Scott Myers Award), a big part of
the challenge is the complexity of context, both as a general feature of situations and as
specifically applied to leadership. Context can be considered simultaneously from several
levels of analysis including the level of the broader external landscape (e.g., economic trends),
the organizational level (e.g., culture, stage in life cycle), the team level (e.g., member
stability, cohesion), the leader-member dyad level (e.g., interpersonal trust), and the
individual level (e.g., leader tenure). Context is also multifaceted, being defined by what is
happening, where it is taking place, when it is occurring, and who is involved (Pervin, 1978;
Parrigon, Woo, Tay, & Wang, 2017). To complicate matters further, different leaders might
perceive the same objective situation in a different way (i.e., the psychological situation;
Parrigon et al., 2017).
In sum, although context is an important consideration when assessing leaders, its
multilevel, multifaceted, and dynamic nature stands in the way of a straightforward
implementation into the assessment process. The authors of the focal article seem well aware
of the challenge when they wondered, What’s the best way to capture ever-changing
[organizational] context?(p. 11, Reynolds et al., in press). First we consider the downsides
to a common strategy, and then we recommend a simple methodological innovation for
integrating contextual information in leader assessment.
Making Matters Worse?
One way in which Reynolds et al. (in press) suggest that context can be taken into
account is by drawing on recently developed situational taxonomies, like the CAPTION
(Parrigon et al., 2017) and DIAMONDS (Rauthmann et al., 2014) frameworks. Unfortunately,
these situational taxonomies are broad and generic, meant to apply to most situations in
general and not for leadership in particular. Further, taxonomies of situational variables
specific to leadership have been narrowly defined and fragmentedfor example, with an
isolated focus on follower characteristics (Hersey & Blanchard, 1977) or decision urgency,
quality, and buy-in (Vroom & Yetton, 1973). In other words, generic situational taxonomies
are probably too broad, whereas those developed for leadership are too narrow.
Even if sufficiently representative yet practically useful taxonomies of leadership
contexts existed (for a promising start, see Porter & McLaughlin, 2006), there is a question of
how to apply them in assessment. At one extreme would be an algorithm to decide what
dispositions, behaviors, competencies, processes, and outcomes to measure for various
combinations of contextual factors. One could even think about different norms or
interpretation guidelines for each of those particular combinations or “situations.” In
principle, such a comprehensive approach could be taken. But it may be impractical and
unrealistic in all but very large-scale projects that have the required resources available for
doing the legwork.
A New Rating Scale
Another, simpler and more straightforward way to incorporate contextual
considerations in the assessment of leaders involves an innovation in measurement
methodology. Specifically, the new too little/too much (TLTM) rating scale provides
assessments of leader behavior and competencies not in the abstract or in a vacuum, but
relative to the salient features of the situation. This rating scale format is presented in Figure
1. It ranges from -4 (much too little), to 0 (the right amount), to +4 (much too much) and was
specifically developed to measure leader behaviors from a multi-source perspective (Kaiser &
Kaplan, 2005a; Kaiser, Overfield, & Kaplan, 2010; Vergauwe, Wille, Hofmans, Kaiser, & De
Fruyt, 2017).
Figure 1. The too little/too much (TLTM) rating scale. Reproduced from R. B. Kaiser, D. V.
Overfield, and R. E. Kaplan, Authors, 2010, Leadership Versatility Index® version 3.0:
Facilitator’s Guide, Greensboro, NC: Kaplan DeVries Inc. Copyright 2010 by Kaplan DeVries
Inc. Used with permission from the publisher.
The scale was originally designed as a way to identify strengths that become
weaknesses through overuse, a key dynamic identified in the original derailment studies at the
Center for Creative Leadership (McCall & Lombardo, 1983). Research confirms that raters
are able to distinguish shortcomings (a skill gap) from strengths overused (a skill excessively
applied) with this rating scale format (Kaiser & Kaplan, 2009; Kaplan & Kaiser, 2003).
However, a by-product of this scale is that it encourages raters to think not just about the
performance behavior they have observed but also about the situational appropriateness (and
effectiveness) of that behavior (Kaiser & Kaplan, 2005b).
An example might help illustrate the point. Early studies of how the TLTM scale
functioned differently from typical Likert-type, five-point rating scales, used protocol analysis
by asking raters to think out loud as they decided how to rate a leader they knew well two
times using the same set of leader behaviors, once using a five-point Likert-type scale and
again using the TLTM scale (Kaiser & Kaplan, 2005a). This allowed for the analysis of the
cognitive processes involved in using each type of rating scale. One study participant read the
item, “Takes charge—is in control of her area of responsibility.” With the five-point Likert-
type scale, the rater said, “Oh yes, definitely a take-charge type. She shows great initiative. A
five.” But when using the TLTM scale, the same rater said, Well, clearly in control. And this
worked well when she was a director. Her team was less experienced and needed that
guidance. But in her current role as a VP, some of her people know more about the business
than she does. She would often be better served to step back and let the teams hash things out.
I’d say +2, too controlling.” It is clear that with the TLTM scale the rater was not just
evaluating take-charge behavior, but the impact of how that behavior was used given the
context in which the leader was operatingin this case, with reference to the needs of the
people being led.
The three most common contextual factors raters mentioned in these studies concerned
culture (e.g., “we don’t confront each other that directly,” “not enough detail and data for our
leaders”), the business situation (e.g., “not enough attention to repositioning the business
since deregulation,” “he is intense, but it is an appropriate sense of urgency given the crisis
we were facing”), and the needs of the people being led (as in the example above). However,
although not as frequently, raters referred to several other nuances in the operating
environment (e.g., “that worked for his last manager, but the new person had very different
expectations,” “the sort of attention to detail you expect from a functional lead, but that is lost
in the weeds for the head of business unit”). Raters referred to a host of possible contextual
factors affecting the assessments, but they honed in on what seemed to be most salient for the
focal leader and the particular behavior in question. To that point, it was not uncommon for
raters to refer to different contextual factors in their assessment of different behaviors.
In this methodology, it is left up to the rater to determine which aspects of context are
most relevant in the assessment of how each behavior, skill, or competency is demonstrated.
In that sense, it is left to the wisdom of the crowd (Surowiecki, 2004) to decide the situational
appropriateness and effectiveness of the behavior. This is as opposed to having a concrete
definition of the context, which can be useful for the assessment designer when selecting
contextual factors to build into the assessment process. In the absence of such specification,
the assessment results can only be interpreted against the situational variability that others
deem relevant at that time. Indeed, using the TLTM scale, the tradeoff seems to be less
systematic control and explicit consideration of all possible situational variables but higher
fidelity and relevance to the present situation, at least as socially constructed. In the event that
contextual specification and explication is required, one might consider asking raters to
expressly clarify the contextual information they took into account when rating the leader
(Kaiser & Kaplan, 2005a).
Additional Benefits
Recent research comparing the TLTM scale to traditional Likert-type scales has shown
that the TLTM scale captures unique information that is not caught by Likert-type scales. In
two studies, Vergauwe et al. (2017) asked subordinates to first rate their respective leaders
performance, and to then rate the leader twice on four leader behaviors (i.e., forceful,
enabling, strategic, operational): once using a five- (Study 1) or nine-point (Study 2) Likert
scale ranging from totally disagree to totally agree, and once using the nine-point TLTM
scale. Results of both studies indicated strong positive correlations between the too little side
of the TLTM scale (the -4 to 0 range) and the Likert scale scores, whereas there was no
relation between the Likert scale ratings and the too much side of the TLTM scale (0 to +4
range). These findings indicate that Likert ratings predominantly cover the low end of the
TLTM scale (i.e., from “too little” to “the right amount”), whereas they fail to systematically
capture variance at the high end of the TLTM scale (i.e., from “the right amount” to “too
much”). Further, incremental validity analyses showed that the TLTM ratings added
significantly to the prediction of leader performance beyond Likert scale measures of leader
behaviors, and that the unique predictive value was exclusively situated on the “overdoing”
part of the TLTM scale. Thus, the TLTM scale, by implicitly asking raters to take into
account the context in which the leader operates, is able to capture both deficient and
excessive leader behaviors, or leader behaviors that are too weak or too strong for the
situation. Likert-type scales, because they make no reference to context, cannot provide such
In sum, it should come as no surprise that the TLTM rating scale solicits ratings that
take contextual information into account. After all, in the original derailment research, McCall
and Lombardo (1983, p. 11) explained, “Executives derail for reasons… all connected to the
fact that situations change.” Further, few would disagree that “the right amount” of a
particular behavior depends on the situation. Systematic research as well as first-hand
experience using the TLTM scale in practice has revealed the subtle way in which the scale
encourages coworkers to consider the context to determine which of the many factors are
most relevant and then evaluate behavior against those pivotal factors.
Although context should be taken into account when assessing leaders, doing so in a
systematic manner remains a major challenge. We suggest that this can be done by asking
raters to rate the appropriateness of leader behaviors/competencies for a particular context
using the TLTM scale. Apart from its simplicity, a key advantage of this way of integrating
context into leader assessment is the clear connection with leadership development. In 360
degree feedback, for instance, one can identify under- or overdoing of certain leader
behaviors, with straightforward implications for change (e.g., “to do more”, “do less”, or
“keep it up with more of the same”). As such, the TLTM scale not only allows integrating
context into leadership assessment, but by indicating whether a certain behavior is used too
little, the right amount, or too much, it also takes the guesswork out of how to act on the
Hersey, P., & Blanchard, K. H. (1977). Management of Organizational Behavior: Utilizing
Human Resources (3rd ed.). Englewood Cliffs, New Jersey: Prentice Hall.
Kaiser, R. B., & Kaplan, R. E. (2005a). Overlooking overkill? Beyond the 1-to-5 rating scale.
Human Resources Planning, 28(3), 7-11.
Kaiser, R. B., & Kaplan, R. E. (2005b). On the folly of linear rating scales for a non-linear
world. In S. Reddy (Ed.), Performance Appraisals: A Critical View (Ch. 12, pp. 170-
197). Nagarjuna Hills, Hyderabad, India: ICFAI University Press.
Kaiser, R. B., & Kaplan, R. E. (2009). When strengths run amok. In R. B. Kaiser (Ed.), The
perils of accentuating the positives (pp. 57-76). Tulsa, OK: Hogan Press.
Kaiser, R. B., Overfield, D. V., & Kaplan, R. E. (2010). Leadership Versatility Index version
3.0 Facilitator's Guide. Greensboro, NC: Kaplan DeVries Inc.
Kaplan, R. E., & Kaiser, R. B. (2003). Rethinking a classic distinction in leadership:
Implications for the assessment and development of executives. Consulting Psychology
Journal: Research and Practice, 55, 15-25.
McCall, M. W., Jr., & Lombardo, M. M. (1983). Off the track: Why and how successful
executives get derailed. Greensboro, NC: Center for Creative Leadership.
Parrigon, S., Woo. S. E., Tay, L., & Wang, T. (2017). CAPTION-ing the situation: A
lexically-derived taxonomy of psychological situation characteristics. Journal of
Personality and Social Psychology, 112(4), 642-681. DOI: 10.1037/pspp0000111
Pervin, L. (1978). Definitions, measurements, and classifications of stimuli, situations, and
environments. Human Ecology, 6(1), 71-105.
Porter, L. W., & McLaughlin, G. B. (2006). Leadership and the organizational context: Like
the weather? The Leadership Quarterly, 17, 559576.
Rauthmann, J. F., Gallardo-Pujol, D., Guillaume, E. M, Todd, E., Nave, C. S., Sherman, R.
A., Ziegler, M., Jones, A. B., & Funder, D. C. (2014). Journal of Personality and Social
Psychology, 107(4), 677-718. DOI: 10.1037/a0037250
Reynolds, D. H., McCauley, C. D., Tsacoumis, S., & the Jeanneret Symposium Participants
(in press). A critical evaluation of the state of assessment and development for senior
leaders. Industrial and Organizational Psychology: Perspectives on Science and
Practice, 11(4).
Surowiecki, J. (2004). The wisdom of crowds. New York: Doubleday.
Vergauwe, J., Wille, B., Hofmans, J., Kaiser, R. B., & De Fruyt, F. (2017). The "too little/too
much" scale: A new rating format for detecting curvilinear effects. Organizational
Research Methods, 20, 518-544. DOI: 10.1177/1094428117706534
Vroom, V. H., & Yetton, P. W. (1973). Leadership and Decision-Making. Pittsburgh:
University of Pittsburgh Press.
... In addition, scholars have attempted to rectify the measurement issues by developing a scale that allows for respondents to indicate too little versus too much of a certain behavior. This allows for the exploration of curvilinear effects and boundary conditions for certain leader traits and behaviors (Vergauwe et al., 2018). Although this research will certainly help the field move forward, quantitative work on the current state of affairs in leadership research also indicates a large amount of overlap between different positive leadership styles (see below). ...
Construct proliferation in the leadership field raises questions concerning parsimony and whether we should focus on joint mechanisms of leadership styles, rather than the differences between them. In this theoretical research article, we propose that positive leadership styles translate into similar leader behaviors on the work floor that influence employee work engagement through a number of shared pathways. We take a deductive approach and review several established theories as well as relevant up-to-date empirical work from a bird’s-eye view to generate a general framework. We introduce a model with three processes (one direct process and two indirect processes) and five pathways (practical, motivational, affective, cognitive, and behavioral). With regard to the indirect processes, we propose that work characteristics (material pathway) and psychological need satisfaction (intrapersonal motivational pathway) mediate the relationship between positive leadership styles and engagement. Regarding the direct interpersonal process, we propose that leaders directly influence employee engagement through three pathways: emotional contagion (affective interpersonal pathway), social exchange (cognitive interpersonal pathway), and role modeling (behavioral interpersonal pathway). Our parsimonious research model furthers the integration of different theoretical viewpoints as well as underscores joint mechanisms with regard to the effect of positive leadership styles. Practically speaking, this article also provides insight into which processes leaders can work on to stimulate employee work engagement through progressive policies and work practices.
Full-text available
People often use comparative information to better understand themselves and their standing in the world. Such comparisons influence self-evaluations, impact emotions, and direct future goal pursuit. Prior research has found that comparative information based on external (i.e., social) frames of reference may be particularly influential in the context of health-relevant behavior change. However, few studies have examined the use and impact of internal frames of reference, such as comparing one’s current health to the past (i.e., temporal comparisons) or comparing one health-related domain (e.g., exercise) to a second domain (e.g., diet; i.e., dimensional comparisons). The present research aimed to explore the similarities and differences between internal and external comparisons within physical activity contexts. First, a cross-sectional Pilot Study was conducted to establish that both internal and external frame of reference comparisons were associated with physical activity-related outcomes. MTurk participants (N = 365) made social, temporal, and dimensional comparisons of vigorous exercise, and then reported self-evaluations and behavioral intentions. Next, to assess the causal impact of such comparisons, we conducted two pre-registered experimental studies ( that utilized between-subjects designs wherein MTurk participants (total N = 533) made either upward or downward social, temporal, or dimensional comparisons about engaging in physical activity-related behaviors. Following our comparison manipulations, participants then reported self-evaluations, affective reactions, and behavioral intentions about physical activity-related behaviors. Study 2 replicated and extended the results of Study 1 using specific rather than general comparison targets (e.g., typical American vs. best friend, 2 years ago vs. 2 months ago, eating a healthy diet vs. endurance). Results across all three studies were consistent with our first overarching hypothesis, which stated that downward (upward) comparisons would result in more (less) favorable self-evaluations. In addition, results partially supported our hypothesis that downward (upward) comparisons would result in more (less) favorable affective reactions (Studies 1 and 2). Of the three studies presented, only the Pilot Study fully supported our hypothesis regarding behavioral intentions. Specifically, results revealed a negative association between comparative judgments and intentions, such that lower comparative judgments (i.e., upward comparisons) were associated with greater intentions, whereas higher comparative judgments (i.e., downward comparisons) were associated with lower intentions. Given that suboptimal adherence to physical activity behaviors has both clinical and public health significance, this research has implications for feasible and low-cost interventions aimed at increasing such behavior.
Full-text available
It seems like a blinding glimpse of the obvious to suggest that there are two basic classes of managerial performance problems: deficiencies and excesses. However, the kinds of rating scales typically used to assess performance, like those on 360-degree feedback surveys, don't reflect this reality. Through action research consulting to executives about their leadership, the authors have created and refined a solution to this oversight. In this chapter, they explain limitations with existing rating scales and then summarize their program of research that led to the development and validation of a new format that captures deficiencies and strengths as well as excesses--when strengths can become a weakness through overuse. This innovation in rating technology opens up new doors for leadership theory, research, and the practice of leadership development.
Full-text available
Practice and research with senior leaders can be rewarding but also challenging and risky for industrial and organizational (I-O) psychologists; the fact that much of the work with these populations is difficult to access elevates these concerns. In this article we summarize work presented by prominent researchers and practitioners at a symposium organized to share common practices and challenges associated with work at higher levels of organizational management. We review implications for research and practice with senior leaders by examining how assessments are applied at senior levels, how assessments and development practices can be linked, and the challenges associated with research and evaluation conducted with these leaders. Also, we offer suggestions for advancing research and practice at senior levels.
Full-text available
This article describes the too little/too much (TLTM) scale as an innovation in rating scale methodology that may facilitate research on the too-much-of-a-good-thing effect. Two studies demonstrate how this scale can improve the ability to detect curvilinear relationships in leadership research. In Study 1, leaders were rated twice on a set of leader behaviors: once using a traditional 5-point Likert scale and once using the TLTM scale, which ranged between –4 (much too little), 0 (the right amount), and þ4 (much too much). Only linear effects were observed for the Likert ratings, while the TLTM ratings demonstrated curvilinear, inverted U-shaped relationships with performance. Segmented regressions indicated that Likert ratings provided variance associated with the too little range of the TLTM scale but not in the too much range. Further, the TLTM ratings added incremental validity over Likert ratings, which was entirely due to variance from the too much range. Study 2 replicated these findings using a more fine-grained, 9-point Likert scale, ruling out differences in scale coarseness as an explanation for why the TLTM scale was better at detecting curvilinear effects.
Full-text available
Taxonomies of person characteristics are well developed, while taxonomies of psychologically important situation characteristics are underdeveloped. A working model of situation perception implies the existence of taxonomizable dimensions of psychologically meaningful, important, and consequential situation characteristics tied to situation cues, goal affordances, and behavior. Such dimensions are developed and demonstrated in a multi-method set of six studies. First, the “Situational Eight DIAMONDS” dimensions Duty, Intellect, Adversity, Mating, pOsitivity, Negativity, Deception, and Sociality are established from the Riverside Situational Q-Sort (Study 1). Second, their rater agreement (Study 2) and associations with situation cues and goal/trait affordances (Studies 3 and 4) are examined. Finally, the usefulness of these dimensions is demonstrated by examining their predictive power of behavior (Study 5), particularly vis-à-vis measures of personality and situations (Study 6). Together, we provide extensive and compelling evidence that the DIAMONDS taxonomy is useful for organizing major dimensions of situation characteristics. We discuss the DIAMONDS taxonomy in the context of previous taxonomic approaches and sketch future research directions.
Full-text available
This article reviews the leadership literature from 1990–2005 in twenty-one major journals in order to determine the nature and extent of attention to the organizational context as a factor affecting leaders' behavior and their effectiveness. Both conceptual and empirical articles were rated as having “moderate/strong,” “slight,” or “no” emphasis on the organizational context. Those articles classified in the moderate/strong category were analyzed under seven organizational context components. Suggestions are included for improving the breadth and depth of empirical knowledge about the interaction of leadership and the organizational context.
Full-text available
The authors present a new way of construing the classic distinction between self-assertive, task-oriented leadership and empowering, people-oriented leadership. These twin pillars--what they call forceful and enabling, respectively--are portrayed as a duality, a pair of seemingly contradictory yet in fact complementary leadership "virtues." The authors also describe a new approach to measuring this duality. Data collected in this way reflect the clear tendency for managers to be lopsided--to overdo one side and to underdo the other. There is also a strong statistical association between lopsidedness--or, stated positively, versatility--and overall effectiveness. This linked way of formulating and measuring leadership in terms of dualities is very useful in giving feedback to executives and in guiding their development. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Compared to personality taxonomic research, there has been much less advancement toward establishing an integrative taxonomy of psychological situation characteristics (similar to personality characteristics for persons). One of the main concerns has been the limited content coverage of the characteristics being used. To address this issue, we present a collection of four lexically-based studies using the largest-to-date number of situation characteristics to identify the major dimensions of the psychological situation. These studies each implemented a unique sampling and analytic methodology – namely, a qualitative dimensional exploration; the factor analyses of two, independent samples of large-scale in situ ratings of situations; and the use of lexical-vector representations from neural-network-based models derived from millions of sources of natural-language usage with a total of 146.7 billion words. Across these studies, a clear seven-dimensional structure emerged: Complexity, Adversity, Positive Valence, Typicality, Importance, Humor, and Negative Valence – collectively referred to as the “CAPTION” model, which parsimoniously integrates the diversity of dimensions found in the extant literature. We then introduce both full- and short-form measures of these CAPTION. Data from two additional diverse samples of native English speakers suggest that the measures have good psychometric properties, and are able to predict a broad range of important psychological outcomes (e.g., behaviors, affect, motivation, and need satisfaction) even when pitted against extant situation taxonomic frameworks. We conclude by discussing how the CAPTION framework may serve as a useful tool for conceptualizing and measuring a broad range of psychological situations across all areas of psychology.