ArticlePDF Available

Abstract and Figures

To complement contemporary nonprofit literature, which mainly offers theory-driven recommendations for measuring nonprofit effectiveness, performance, or related concepts; this article presents seven trade-offs for researchers and practitioners to consider before engaging in a nonprofit effectiveness measurement project. For each trade-off, we offer examples and suggestions to clarify the advantages and disadvantages of methodological choices that take various contextual elements into account. In particular, we address the differences between formative and reflective approaches, as well as the differences between unit of interest, unit of data collection, and unit of analysis. These topics require more in-depth attention in the nonprofit effectiveness literature to avoid misinterpretations and measurement biases. Finally, this article concludes with five avenues for further research to help address key challenges that remain in this research area.
Content may be subject to copyright.
ORIGINAL PAPER
Seven Trade-offs in Measuring Nonprofit Performance
and Effectiveness
Jurgen Willems Silke Boenigk Marc Jegers
Published online: 6 March 2014
ÓInternational Society for Third-Sector Research and The Johns Hopkins University 2014
Abstract To complement contemporary nonprofit literature, which mainly offers
theory-driven recommendations for measuring nonprofit effectiveness, perfor-
mance, or related concepts; this article presents seven trade-offs for researchers and
practitioners to consider before engaging in a nonprofit effectiveness measurement
project. For each trade-off, we offer examples and suggestions to clarify the
advantages and disadvantages of methodological choices that take various contex-
tual elements into account. In particular, we address the differences between for-
mative and reflective approaches, as well as the differences between unit of interest,
unit of data collection, and unit of analysis. These topics require more in-depth
attention in the nonprofit effectiveness literature to avoid misinterpretations and
measurement biases. Finally, this article concludes with five avenues for further
research to help address key challenges that remain in this research area.
Re
´sume
´Afin de comple
´ter la litte
´rature contemporaine portant sur les organismes
a
`but non lucratif, qui propose principalement des recommandations base
´es sur la
the
´orie visant a
`e
´valuer leur efficacite
´, leurs performances ou des concepts con-
nexes, cet article pre
´sente sept compromis que les chercheurs et les professionnels
pourront prendre en conside
´ration avant de s’engager dans un projet d’e
´valuation de
l’efficacite
´d’un organisme a
`but non lucratif. Pour chaque compromis, nous don-
nons des exemples et des suggestions mettant en lumie
`re les avantages et les in-
conve
´nients de choix me
´thodologiques qui tiennent compte de divers e
´le
´ments
contextuels. En particulier, nous traitons des diffe
´rences qui existent entre les
J. Willems (&)S. Boenigk
Department of Nonprofit & Public Management, University of Hamburg, Von-Melle-Park 5,
20146 Hamburg, Germany
e-mail: jurgen.willems@wiso.uni-hamburg.de
J. Willems M. Jegers
Department of Applied Economics, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium
123
Voluntas (2014) 25:1648–1670
DOI 10.1007/s11266-014-9446-1
approches formative et re
´flexive, ainsi qu’entre unite
´d’inte
´re
ˆt, unite
´de collecte de
donne
´es et unite
´d’analyse. Ces sujets exigent d’e
ˆtre approfondis dans la litte
´rature
portant sur l’efficacite
´des organismes a
`but non lucratif pour e
´viter les interpre
´-
tations errone
´es et les biais d’e
´valuation. Enfin, cet article conclut avec cinq pistes a
`
explorer dans l’objectif de relever les de
´fis importants qui demeurent dans ce secteur
de recherche.
Zusammenfassung Die Realisierung von empirischer Nonprofit Forschung hat in
den letzten Jahren, sowohl in der akademischen Forschung als auch in der Nonprofit
Praxis, stark zugenommen. Wa
¨hrend jedoch in anderen Disziplinen und Fac-
hzeitschriften die methodische Diskussion und Fragen der Messung sehr intensiv und
teilweise kritisch diskutiert werden, hat dieser Diskurs in der Nonprofit Forschung
noch nicht begonnen. Dieser Beitrag verfolgt daher das allgemeine Ziel, den me-
thodischen Diskus unter den Nonprofit Forschenden und Praktikern anzuregen, um
mittelfristig einen Beitrag zur Erho
¨hung der Messqualita
¨t von empirischen Nonprofit
Studien zu leisten. Zu diesem Zweck werden sieben Entscheidungsbereiche vorges-
tellt und vertiefend diskutiert, die bei der Realisierung von empirischen Studien im
Nonprofit Management, insbesondere bei der Messung von Nonprofit Erfolg und
Effektivita
¨t, vermehrt beachtet werden sollten. Im Einzelnen sind dies folgende
Entscheidungstatbesta
¨nde: (1) Uni- versus Multidimensionalita
¨t, (2) Formative versus
reflective Messung, (3) Individual versus Gruppenmeinung, (4) Interne versus externe
Messung, (5) Leading versus Lagging, (6) Distinkt versus U
¨berlappende Messung
sowie letztlich (7) Adaptiver versus Multiplikativer Messansatz. Jeder Ents-
cheidungsbereich wird mit seinen Vor- und Nachteilen erla
¨utert und exemplarisch
diskutiert, um Handlungsempfehlungen fu
¨r die Nonprofit Community abzuleiten, in
welchen Studiensituationen welcher Messansatz zu bevorzugen ist.
Resumen Para complementar el material publicado contempora
´neo sobre orga-
nizaciones sin a
´nimo de lucro, que ofrece principalmente recomendaciones im-
pulsadas por la teorı
´a para medir la efectividad, el rendimiento o conceptos
relacionados de las organizaciones sin a
´nimo de lucro, este artı
´culo presenta siete
compromisos o te
´rminos medios a considerar por los investigadores y profesionales
antes de implicarse en un proyecto de medicio
´n de la efectividad de las organiz-
aciones sin a
´nimo de lucro. Para cada compromiso ofrecemos ejemplos y suge-
rencias para clarificar las ventajas y desventajas de las elecciones metodolo
´gicas que
toman en cuenta diversos elementos contextuales. En particular, abordamos las
diferencias entre enfoques formativos y reflexivos, ası
´como tambie
´n la diferencia
entre unidad de intere
´s, unidad de recopilacio
´n de datos y unidad de ana
´lisis. Estos
temas requieren ma
´s atencio
´n en profundidad en el material publicado sobre la
efectividad de las organizaciones sin a
´nimo de lucro para evitar malas interpre-
taciones y sesgos de medicio
´n. Finalmente, el presente artı
´culo concluye con cinco
´as de investigacio
´n adicional para ayudar a abordar los desafı
´os claves que siguen
existiendo en esta a
´rea de investigacio
´n.
Keywords Nonprofit performance Nonprofit effectiveness Measurement
Formative or reflective specifications Measurement biases
Voluntas (2014) 25:1648–1670 1649
123
Introduction
Measuring nonprofit performance and effectiveness has remained a topic of
substantial debate for more than half a century. Nonprofit organizations are defined
solely by their lack of profit distributions (Hansmann 1987), so they can have various
goals (Perrow 1961; DiMaggio 2001). Therefore, a one-size-fits-all solution to
measure the achievement of these goals is unlikely. As Lecy et al. (2012) and Jun and
Shiau (2012) note in their extensive historical overviews of nonprofit effectiveness
literature, multiple studies acknowledge the complexity and the plethora of theoretical
elements that constitute the very concept of effectiveness (Osborne and Tricker 1995;
Plantz et al. 1997; Rojas 2000; DiMaggio 2001; Sawhill and Williamson 2001;Sowa
et al. 2004; Herman and Renz 2008; Carman and Fredericks 2010).
Although previous research has used the terms nonprofit performance and
nonprofit effectiveness synonymously, we clarify their usage for this study. In line
with nonprofit research and organizational performance science (e.g., Richard et al.
2009), nonprofit performance encompasses four concrete areas of interest:
(a) financial performance (e.g., donations raised in a year, state funding),
(b) stakeholder performance (e.g., volunteer satisfaction, donor loyalty, stakeholder
identification), (c) market performance (e.g., nonprofit image, nonprofit brand
reputation, service quality), and (d) mission performance (achieving the mission of
the organization). Nonprofit effectiveness is closely related but broader than
nonprofit performance, in that it focuses more on the balanced input and output
achieved through the combination of processes, projects, and programs imple-
mented by the nonprofit organization to reach its predefined goals. These nonprofit
goals are framed by the organization’s mission, which generally focuses on creating
particular effects for various stakeholder groups (Frumkin 2002).
Herman and Renz (1997,1999,2000,2008) offer an intensive discussion of
nonprofit effectiveness, from which they derive nine general theses. Some of these
theses regard important consequences of how to measure nonprofit effectiveness.
However, as pointed out by Lecy et al. (2012) and Jun and Shiau (2012), substantial
challenges remain with respect to the empirical verification of nonprofit perfor-
mance and effectiveness insights. In particular, they note the paradoxical
observation that proposed guidelines for measuring nonprofit effectiveness are
seldom all met in a single study.
Therefore, this study takes a more empirical-driven perspective on nonprofit
effectiveness measurements, and we add an important but missing methodological
part to the mainly theory-driven recommendations available in contemporary
literature (as summarized by Jun and Shiau 2012; Lecy et al. 2012). We focus on
how the contextual factors of a measurement project, from both research and
practitioner perspectives, can be addressed more clearly as a means to reach
appropriate nonprofit effectiveness measurements. Specifically, we aim to provide
researchers and practitioners with a detailed overview of criteria to be taken into
account when measuring nonprofit effectiveness. Furthermore, we formulate
concrete avenues for research that can enhance both the methodological robustness
and the theoretical depth of the ongoing discussion about nonprofit effectiveness.
1650 Voluntas (2014) 25:1648–1670
123
As we strongly want to avoid adding another list of theory-driven requirements to
the broad literature that already exists, we purposefully adopt a distinct approach.
We identify from our literature review seven trade-offs that researchers and
practitioners should consider when they initiate a project to measure nonprofit
effectiveness. For each trade-off, we note the advantages and disadvantages of each
choice, devoting special attention to the context in which the measurements would
be used (e.g., target group, unit of analysis, type of research question, etc.). From
this perspective, we define a trade-off in the context of this study as an evaluation of
several context-dependent advantages and disadvantages that results in a choice
between two methodological options, or a well-considered combination of these
options. Discussing trade-offs, rather than presenting a supposed one-size-fits-all list
of recommendations, enables us to focus on the importance of the contextual
elements on which decisions could be based in order to come to valuable and
adjusted measurement designs. We build on empirical studies in the domain of
nonprofit effectiveness, which provide examples of advantages and disadvantages of
the various choices. In addition, we use methodological contributions from outside
the traditional nonprofit research domain, such as organizational research,
marketing, psychology, and/or sociology, which offer more substantial experience
in terms of quantifying complex concepts.
Trade-off Decisions in Measuring Effectiveness
Before discussing each trade-off, we stress two points. First, we structure our
discussion in accordance with these seven trade-offs, because the classification
allows us to address stand-alone elements. That is, each element has to be
considered by a researcher or practitioner before engaging in a nonprofit
effectiveness measurement project. Furthermore, this classification enables us to
single out elements from the overall discussion of nonprofit effectiveness, on which
we base our recommendations for further research. We consider each of these trade-
offs equally important with respect to the need for being considered for a
measurement project. Though, our explanations of them differ in length, largely
because some aspects have not been discussed previously to the extent they require.
Second, because we rely on broader literature to frame our methodological
considerations (i.e., organizational research, marketing, psychology, and sociology),
our focal topics may be relevant in other research areas too (e.g., corporate social
responsibility in a for-profit or public setting). Nevertheless, to maintain a clear
focus and provide a targeted contribution, beyond the theoretical overviews already
offered by Jun and Shiau (2012) and Lecy et al. (2012), we use mainly nonprofit-
related examples and theoretical insights from the nonprofit research domain. The
seven trade-offs are summarized in Table 1, and explained in detail in the following
sections.
Uni- Versus Multidimensional Measurements
The first trade-off is between a unidimensional or a multidimensional measurement
approach. In their extensive chronological literature review, Jun and Shiau (2012)
Voluntas (2014) 25:1648–1670 1651
123
Table 1 Seven trade-offs when measuring nonprofit effectiveness and performance (advantages and disadvantages)
Trade-offs
1. Uni- versus
multidimensional
measurement
Unidimensional measurement (one item, criterion, or dimension to measure
effectiveness)
?Can be applied in heterogeneous samples
?To test relationships in the context of a particular theory, but applicable across many
different settings (higher relevance for specific research projects).
?Easier to find larger samples for which a unidimensional measurement is relevant for
all sample entities (e.g., different organizations, stakeholders)
?Less cost/effort needed to obtain measurements
– Focus on a single aspect of effectiveness
– Conclusions might remain general
– Lower potential content validity and/or reliability of measurements
Multidimensional measurement (multiple criteria and/or
dimensions)
?Gives more detailed insights in the true complexity of
nonprofit effectiveness
?Can be used to investigate complex relationships of elements
with other concepts (detailed analyses)
?Higher practical relevance
?Use of multidimensional measurements conforms with
contemporary expectations in nonprofit effectiveness literature
– Suitable for particular contexts (more homogeneous samples)
– Potential data samples might be naturally constrained
(statistical power to test relationships of interest might be low)
– More complex analysis methods might be necessary.
2. Formative versus
reflective
measurements
Formative measurement (criteria define nonprofit effectiveness)
?Conforms with a normative approach to effectiveness measurement (dominant
rationale in contemporary literature)
?Can support theoretical claims that different dimensions of effectiveness have unique
impacts
?Can investigate complex relationships of elements of effectiveness with other
concepts (detailed analyses)
?Appropriate for action-oriented assessments of effectiveness
– When wrongly specified, tested, or validated, might result in severe
misinterpretations
Reflective measurement (criteria summarize nonprofit
effectiveness)
?Conforms with a social constructionist approach to
effectiveness measurement
?Can be used to improve reliability of measurements
?Appropriate for perception-based evaluations of effectiveness
?Validation methods are better known and more used in
contemporary literature
– When wrongly specified, tested, or validated, might result in
severe misinterpretations
1652 Voluntas (2014) 25:1648–1670
123
Table 1 continued
Trade-offs
3. Individual versus
group measurements
Individual measurement (single source is consulted to measure effectiveness)
?Larger samples can be composed at the level of the unit of interest
?A broad range of criteria can be probed from the source with the most access to
information
– When unit of interest differs from unit of data collection or analysis, substantial
measurement biases might exist due to (a) social desirability, (b) different background
of raters, or (c) different personal reference frameworks
– Interpretation of results should be at the atomistic level (i.e., level of data collection,
rather than level of interest).
Group measurement (multiple sources are consulted to measure
effectiveness)
?More reliable measurements might be obtained (countering
measurement biases)
?A more holistic view from different perspectives can be
acquired
– Data collection might be costly and complex
– Assumptions necessary to aggregate data, which might result
in few data points at the unit of interest
– Control variable at individual (source) level might be
necessary
4. Internal versus
external
measurements
Internal measurement (internal sources)
?More detailed information is available
– Measurement biases might exist due to social desirability and ivory tower judgments
External measurement (external sources, such as customers,
other organizations, donors)
?Information of higher relevance might be obtained (real
world)
– Comparability of samples across organizations should exist
– Differences across sources in each organization should be
accounted for (e.g., control variables, weights), to avoid biases
due to the (a) different background of raters or (b) different
personal reference frameworks
5. Leading versus
lagging
measurements
Leading measurement (focused on actions and their direct outcomes)
?More objectively observable
?Generalizable across more heterogeneous samples of organizations
– Assumptions should be realistic regarding the effects associated with these
measurements
Lagging measurement (focused on the effects of organizational
actions and outcomes)
?Focus on the elements that are of real interest (effects: higher
practical relevance)
– More subjective and dependent on personal backgrounds and
reference frameworks
– Less generalizable across organizations with various missions
– Less generalizable across stakeholder groups with various
interests and needs
Voluntas (2014) 25:1648–1670 1653
123
Table 1 continued
Trade-offs
6. Distinct versus
overlapping
measurements
Distinct concepts measured (to investigate causes and effect of nonprofit effectiveness)
?To investigate factors and conditions that determine differences in nonprofit
effectiveness
?To identify management actions that should be in place or improved to obtain high
effectiveness
?Enables strong theorization
– Extensive pretesting necessary to determine distinctness of concepts
– More complex and costly data collection processes
Overlapping concepts measured (to investigate mental models
on nonprofit effectiveness)
?To investigate how concepts are perceptually related, and/or
how mental models are constituted among managers or
stakeholders (e.g., research on managerial sense making)
– Risk of over-investigation and interpretation of nonexistent
relations (Type I errors)
7. Additive versus
multiplicative
measurements
Additive measurement (various criteria aggregated in additive way, adding or
averaging)
?To reach a more reliable measurement (reflective criteria) or when combining
different sources (e.g., inter-rater agreement)
?To obtain evaluative measurements when criteria can compensate for one another
– Unique contributions of separate criteria to overall effectiveness measurement not
clearly observable
Multiplicative measurement (various criteria aggregated in a
multiplicative way)
?To make conditionality of particular criteria more prominent
?To identify criteria, dimensions, or stakeholder groups that
require most urgent management actions.
?Useful for the selection of highly effective organizations (e.g.,
qualitative research analysis)
– Less consistent with contemporary uses of variables and
quantification, so assumptions about distribution, variances,
and range might be violated (i.e., not suitable for commonly
used scientific research methods)
1654 Voluntas (2014) 25:1648–1670
123
distinguish between unidimensional and framework-based approaches. Unidimen-
sional approaches, typical of the first generation of nonprofit effectiveness studies,
focus on a single dimension of nonprofit performance and are commonly applied
according to a particular theory. Framework-based approaches, or the second
generation of studies, apply an additional classification between multidimensional
and multi-constituency approaches. Multidimensional approaches take more than
one dimension of effectiveness into account; for example, Sowa et al. (2004)
differentiate management effectiveness from program effectiveness. A multi-
constituency approach instead addresses the different interests and perspectives of
separate stakeholder groups.
We consider the advantages and disadvantages of unidimensional versus
multidimensional measurements in this section. The reasons listed in contemporary
literature for using a multidimensional effectiveness assessment reflect mainly
theoretical considerations (Lecy et al. 2012). Most nonprofit organizations (1) have
multiple goals, (2) share goals with other organizations, (3) pursue subjective
outcomes, and (4) involve various stakeholders with different interests, so a single
criterion likely could not quantify an organization’s effectiveness sufficiently
(Perrow 1961; DiMaggio 2001; Kaplan 2001; Herman and Renz 2008; Moxham
2009). Therefore, considering various criteria would help researchers gain a more
holistic view of effectiveness, such that they could make better inferences about the
complexity of the ‘‘real world’’ (Micheli and Kennerley 2005; Gordon et al. 2010).
As another important benefit, a multidimensional scale can investigate the
differential impacts of various factors on different dimensions (Jackson and
Holland 1998; Brown 2005; Dart 2010).
However, at least three considerations involving multidimensional measurements
highlight their potential disadvantages. The first relates to the generalizability of the
findings. Because of the many differences that exist across nonprofit organizations,
it is hard, or even impossible, to find a set of performance dimensions that would be
broadly applicable to many different types of organizations and their varied
stakeholders (DiMaggio 2001; Micheli and Kennerley 2005; Baruch and Ramalho
2006; Eckerd and Moulton 2011). To ensure that the criteria adopted in a
multidimensional approach are relevant for all organizations and stakeholders
addressed, the data collection would need to be constrained to a particular context of
similar organizations and/or stakeholders (i.e., homogeneous sample). When
researchers seek generalizable findings or if practitioners want to make an overall
assessment across stakeholder groups, they might prefer measurements with a single
dimension or criterion, relevant for all the different entities. Thus they could
investigate more heterogeneous samples, in which the organizations or stakeholders
differ widely. A useful suggestion in this context comes from Baruch and Ramalho
(2006), who propose including both context-specific and generalist measurements
(items and/or dimensions) of effectiveness in every study. Such an approach could
offer a detailed analysis of a particular research question (context-specific
measures), while also framing and comparing results within a broader investigation
of effectiveness (general measurements).
A second consideration is the challenges that multidimensional measures create
for data analysis. Despite the growing availability of various statistical applications
Voluntas (2014) 25:1648–1670 1655
123
(Sowa et al. 2004), to date, few contributions apply them to the study of
multidimensional measurements of effectiveness (Lynn et al. 2000; Lecy et al.
2012). These methods require substantial sets of observations (Maas and Hox 2005;
Herman and Renz 2008), but in most cases, either the populations are naturally
constrained (i.e., there are no unlimited contexts in which all measurement
dimensions make sense), or the groups of target respondents are too heterogeneous
(i.e., requiring many control variables, reducing the statistical power for testing the
real relationships of interest). The use of multidimensional measurements thus
seems more appropriate for testing the potentially complex relatedness of a few
variables in a controlled environment. In contrast, a unidimensional measure could
test a more generalizable relationship, across more heterogeneous contexts, based
on a general theory (Jun and Shiau 2012).
Finally, from a scientific perspective, some nonprofit researchers pursue high
measurement quality and internal consistency by using items and criteria that are
very similar in nature. Despite their high internal reliability (e.g., high Cronbach’s
alpha values) and good model fit, the second or third item in such multi-item
constructs often contributes very little beyond the information obtained from the
first item (Drolet and Morrison 2001). Nor do multiple-item measurements,
compared with single-item measures, necessarily have better predictive validity
(Bergkvist and Rossiter 2007). When several similar criteria are combined, efforts
and costs increase for data collection, yet less relevant information might emerge,
due to the measurement of redundant items and criteria. Therefore, nonprofit
effectiveness could be measured with a single-item indicator if (1) it is used for
measuring an overall personal perception, (2) a consistent methodological approach
is applied, and (3) the interpretation of results sufficiently takes individual
respondents’ characteristics into account (Pandey et al. 2007; Stazyk and Goerdely
2010). Furthermore, a perception-based reflective effectiveness item can be useful
for explaining individual behavior, because it relates to multiple, more objective
organizational effectiveness criteria, such as giving donations, being committed to
the organization or volunteer for the organization (Forbes 1998; Padanyi and Gainer
2003; Yoo and Brooks 2005; Daellenbach et al. 2006; Sarstedt and Schloderer 2010;
Mews and Boenigk 2012). For example, such an item might ask, ‘‘On an overall
basis, rank the effectiveness of your agency in accomplishing its core mission
(0 =not effective at all; 10 =extremely effective)’’ (Pandey et al. 2007, p. 406).
Considering the three key elements, and potentially in contrast with a dominant
rationale in previous literature, we propose that a unidimensional or even single-
criterion perspective on effectiveness deserves more attention, especially if the aim
is to (1) improve generalizability across contexts, (2) enhance analytical robustness,
(3) deal with naturally constrained data availability, and/or (4) reduce costs or
efforts because they are not worth the minimal extra information provided by an
additional item, criterion, or dimension.
Formative Versus Reflective Measurements
Another trade-off pertains to whether to apply a formative or reflective approach.
Each approach inherently encompasses very different assumptions, and when
1656 Voluntas (2014) 25:1648–1670
123
wrongly applied, it could have severe consequences in terms of Type I errors
(Diamantopoulos 1999). These errors occur when the results suggest that a
relationship exists between two concepts, whereas in reality no such relationship
exists, or for example, when findings recommend that practitioners should invest in
particular practices, even though it produces no returns in reality.
A reflective approach assumes that the latent variable (i.e., the concept of
interest, which would be nonprofit effectiveness in our context) causes the different
indicators being measured (DeVellis 2003; Sarstedt and Schloderer 2010; Boenigk
et al. 2011). For example, if a donor is satisfied with an organization (which would
mean that the latent variable is ‘‘donor satisfaction’’), the reflective approach
includes the assumption that the donor will evaluate the overall service delivered by
the organization positively and believes that his or her personal expectations have
been fulfilled. As a result, reflective indicators of donor satisfaction might be: ‘‘I am
happy with the service delivered by this organization’’ or ‘‘My expectations are met
by this organization.’
A formative approach instead takes the opposite assumption: All the indicators
together define or conceptually cause the latent construct (Bollen and Lennox 1991).
An example in the nonprofit context comes from Willems et al. (2012a). For the
latent variable ‘‘nonprofit governance quality,’’ whether organizational governance
quality is considered high depends on whether several distinct criteria are high. For
example, to consider an organization well governed, stakeholders would need to be
sufficiently involved in the organization’s decision processes, its internal structures
and procedures must be well developed, recurrent evaluations should review prior
achievements and outputs, etc. (together all formative criteria for ‘‘governance
quality’’ (Willems et al. 2012a).
For most constructs, the choice between a formative and a reflective approach
should be obvious (Petter et al. 2007; Sarstedt and Schloderer 2010). Yet the
different perspectives applied in the literature to nonprofit effectiveness makes the
choice less obvious. Errors might result when methodological specifications are
inconsistent with theoretical claims and interpretations, so this second trade-off
deserves substantially more attention in the nonprofit literature. We address two
perspectives that appear in contemporary nonprofit literature and that each support
either a formative or a reflective approach. Depending of the context of their
particular study, researchers and practitioners can chose for a formative, reflective,
or a combined approach. These perspectives are the social construction and the
normative nature of nonprofit effectiveness.
If effectiveness is seen as a social construction for a particular group of people—
that is, they hold a shared, common perception of the concept nonprofit
effectiveness (Herman and Renz 1997; Forbes 1998; Liao-Troth and Dunn
1999)—a reflective approach is appropriate. If people consider an organization
‘effective,’’ they might designate it as a good example for other organizations or
cite the organization as a best practice example. As a result, the reflective items in a
scale measuring overall effectiveness might include, ‘‘This organization is a good
example for other organizations’’ or ‘I would talk about this organization to
illustrate good practices.’’ From a reliability perspective, several such items,
preferably with high common variance, could be combined, such that the items
Voluntas (2014) 25:1648–1670 1657
123
together measure a single latent variable (DeVellis 2003). Therefore, a reflective
approach is appropriate when the measurements focus on perceptions of effective-
ness or on reputational effectiveness (Smith and Shen 1996; Forbes 1998; Liao et al.
2001).
In contrast, from a normative perspective, nonprofit performance, or effective-
ness requires a formative measurement approach (Petter et al. 2007; Willems et al.
2012a). This means that the theoretical framework requires a combined consider-
ation of multiple items or dimensions to provide a comprehensive assessment of the
organization’s effectiveness (e.g. Kaplan 2001). The assessment of the various
criteria thus leads to the conclusion about whether the nonprofit organization is
effective (i.e., causal relationship from items to latent variable). In this case, the
common variance of the items is less important, while their unique variances
become critical. A formative construct is useful if little common variance exists
among its indicators, as each indicator then has a relevant contribution to the overall
concept (which is usually assessed by the level of multicollinearity) (Diamanto-
poulos and Siguaw 2006). Specialized literature offers a more extensive overview of
the steps necessary to develop and test the appropriateness of formative constructs
(Jo
¨reskog and Goldberger 1975; Bollen and Lennox 1991; Diamantopoulos and
Winklhofer 2001; Jarvis et al. 2003; Petter et al. 2007; Sarstedt and Schloderer
2010).
Arguments in existing literature that suggest the need to measure nonprofit
effectiveness with multiple dimensions also tend to take a normative perspective
(Herman and Renz 1999,2008; Rojas 2000; DiMaggio 2001; Sowa et al. 2004;
Shilbury and Moore 2006). This rationale states that researchers and practitioners
should investigate multiple, distinct dimensions, because a single dimension cannot
sufficiently represent what effectiveness is. This argument inherently endorses the
idea that each dimension contains a piece of relevant information that cannot be
captured by other dimensions. Thus, a formative measurement model approach is
appropriate for a normative perspective. Unfortunately, several (frequently cited)
examples in nonprofit literature assert the need for multiple dimensions and propose
multiple criteria or dimensions, yet the results they report suggest that the
dimensions proposed are not multiple or distinct effectiveness criteria. For example,
principal component analysis (PCA), validation reporting (e.g., Cronbach’s alpha),
and interpretations all build on high common variances among items or dimensions
(e.g., Gill et al. 2005; Sowa et al. 2004; Shilbury and Moore 2006). Because such
approaches thus contradict the theoretical claims, they create the potential for
Type I errors (Diamantopoulos and Siguaw 2006). Therefore, we suggest that the
items proposed in such seemingly multidimensional scales could be used in further
research to quantify a single, unidimensional, ‘‘social constructionist’’ latent
variable that quantifies overall effectiveness (rather than measuring distinct
dimensions), or that new criteria and items are proposed that truly measure a
unique element of overall nonprofit effectiveness. We also suggest avoiding
substantive explanations of the superficial relatedness of arbitrarily chosen
dimensions (with one another or other concepts). Such efforts grant distinct
interpretations to relations between concepts that are not inherently different, which
again creates Type I errors (Diamantopoulos and Siguaw 2006). Furthermore, we
1658 Voluntas (2014) 25:1648–1670
123
suggest re-validations, using formative approaches, of existing scales that initially
were developed to measure distinct dimensions but that led to high internal
correlations. These new evaluations also could clarify the extent to which the
proposed dimensions actually measure distinct aspects. If no such proof is
forthcoming, new scale development should be undertaken, to uncover multiple,
unrelated dimensions that accord with the issued theoretical claims.
Individual Versus Group Measurements
Although virtually no empirical tests address the extent to which we can rely on a
single rater’s opinion to assess an organization’s effectiveness (Lynn et al. 2000),
opinions among individuals can differ substantially (Green and Griesinger 1996;
Herman and Renz 2008; Willems et al. 2012b). First, raters engage in various types
of relationships with the organization, so they judge the usefulness of the
organizational outcomes differently depending on their personal needs (Krashinsky
1997; Balser and McClusky 2005; Mistry 2007; Babiak 2009). Second, a
comparative view is inherent to assessments of concepts that are difficult to
quantify objectively (Herman and Renz 1999,2008; DiMaggio 2001), so the
personal reference frameworks of raters play important roles. That is, people rely on
their personal experiences with other organizations to judge the effectiveness of an
organization (Herman and Renz 1999,2008). Third, the social desirability of the
reported output might result in (unsystematically) biased assessments that differ
across stakeholder types (Green and Griesinger 1996). For example, a fundraiser
might present organizational effectiveness higher than it actually is, because she or
he hopes to create a positive image of the organization and thereby obtain more
funding.
Considering these three sources of potential measurement biases in individual
assessments of organizational characteristics, it might be useful to distinguish
between the (1) unit of interest, (2) unit of data collection, and (3) unit of analysis
(though all three often are referred to with the common term ‘‘unit of analysis’’).
The unit of interest pertains to the level on which the theoretical considerations are
focused. In nonprofit effectiveness literature, the unit of interest is often the
organization (e.g., ‘‘Do particular practices in an organization result in better effects
for its stakeholders?’’). The unit of data collection is the level at which data are
collected. For example, raters might be asked to judge the effectiveness of an
organization, or researchers might review various annual reports by the organiza-
tion. In these cases, the respective units of data collection would be persons in the
organizations and years in which the organization was active. These data can be
analyzed in their original form, aggregated, or related across levels on the basis of
existing assumptions. These choices in turn determine the unit of analysis. If the unit
of interest (e.g., organization) differs from the unit of data collection or unit of
analysis (e.g., individual opinions; change in reported donors), important method-
ological considerations are necessary with respect to robustness, reliability, the
research design, and the contribution.
From a reliability perspective, different opinions or measurements can be
gathered and aggregated for a single organization, which requires the use of
Voluntas (2014) 25:1648–1670 1659
123
inter-rater reliability measures (Boyer and Verma 2000). However, this method
requires an extensive data collection, which might result in high costs and reduce
the number of organizations in the sample, leading to less statistical power for
analyzing the unit of interest (Groves and Heeringa 2006; Herman and Renz 2008).
Thoughtful decisions also are necessary with respect to how to aggregate the data
(especially when the numbers of respondents differ across organizations). It should
be possible to find comparable groups of respondents across organizations (e.g.,
board members are likely more comparable across organizations than the
beneficiaries of different types of organizations). In contrast, researchers could
use the opinions of single respondents with comparable positions across organi-
zations to gather comparable answers from a relatively large sample of organiza-
tions, such as the chair of the board or the general manager (Brown 2005; Gill et al.
2005; LeRoux and Wright 2010). In this way, larger samples at the level of the unit
of interest can be composed, but biases can occur due to differing backgrounds,
social desirability, and subjective measurements can induce misinterpretations of
the results—particularly for studies that rely on individual perceptions (please see
‘‘ Leading versus lagging measurements’’ a n d ‘‘ Distinct versus overlapping effec-
tiveness measurements’ sections).Therefore, as long as the analysis also includes
individual control variables or provides a more atomistic (i.e., individual-level)
interpretation of the findings, a single person’s assessment can be a valid substitute
to approximate an organization’s characteristics.
Rather than controlling for potential biases, it is also possible to consider more
complex research questions, in which both individuals and organizations are units of
interest. We know little about the factors and effects of the unique perceptions of
individual raters versus the shared perceptions among raters in the same
organization; substantial improvement of our understanding of nonprofit effective-
ness might result from a closer examination of such multilevel data structures (Lynn
et al. 2000; Yoo and Brooks 2005; Hitt et al. 2007). Thus, the challenges regarding
existing biases become research questions, with both high academic and practical
relevance (Lynn et al. 2000; Sowa et al. 2004; Willems et al. 2012a).
Internal Versus External Measurements
Closely related to the previous trade-off, some additional advantages and
disadvantages refer to internal assessments, being the opinions of people inside
the organization (e.g., CEO, board members, staff, volunteers) versus external
assessments by people outside the organization (e.g., customer, funders, beneficia-
ries) (Van Puyvelde et al. 2012). As Green and Griesinger (1996) note, relying too
much on inside judgments can result in biased assessments due to social desirability
or ivory tower judgments, which arise when people in leadership positions lack
insight into actual operational performance. Yet the availability of information to
insiders means that substantially more and detailed information can be gathered
from these internal respondents (Brown 2005).
Using assessments by donors, beneficiaries, other organizations in the field, or
other external stakeholders offers the advantage of greater relevance, in terms of
real effects (Smith and Shen 1996; Forbes 1998; Liao et al. 2001). The perceptions
1660 Voluntas (2014) 25:1648–1670
123
held by these external stakeholders determine (implying a reflective operational-
ization; see ‘Formative versus reflective measurements’ section), for example,
donation levels, subsidies, volunteering intentions, and whether their interests are
sufficiently met by the organizational outcomes (Forbes 1998; Padanyi and Gainer
2003; Yoo and Brooks 2005; Daellenbach et al. 2006; Sarstedt and Schloderer 2010;
Mews and Boenigk 2012). Yet their perceptions also might reflect biases, including
those mentioned in the previous trade-off (i.e., different stakeholder needs or
different personal reference frameworks). The decisions about whom to rely on
(‘Internal versus external measurements’ section) and how many sources to include
(‘Individual versus group measurements’’ section) thus might be combined, to deal
with the overall biases that can result from differences between the unit of interest
and the unit of data collection and/or analysis.
Leading Versus Lagging Measurements
The distinction between leading and lagging measurements, as noted in practice-
oriented business literature (Brewer and Speh 2000; Epstein and Wisner 2001;
Bremser and Barsky 2004), refers to the causal sequence of organizational actions
and direct outputs on one hand, and the effects for various stakeholders on the other.
From a practical perspective, lagging indicators focus on the overall, mission-
related effects of an organization for a certain period of time, following from the
investments made and actions performed by an organization (e.g., number of
healthy children after a vaccination program, change in literacy rate due to a
deployed reading program). Leading indicators instead forecast and quantify the
organization’s actions or direct outcomes (e.g., number of vaccinations given,
number of people educated in the reading programs).
Leading indicators tend to be more objectively observable. For example, whether
particular board practices are in place is easier to observe than the extent to which
public opinion has changed after a campaign. A focus on leading indicators might
help avoid biases due to social desirability, personal background, or strong
individual reference frameworks. Furthermore, leading indicators often can be
generalized across heterogeneous samples. For example, regardless of the mission
or strategy of an organization, specific board, or management practices likely need
to be in place at various types of organizations, and can therefore more easily be
compared across organizations. Assessments of their achieved mission or strategy
instead are much more context dependent (Sawhill and Williamson 2001; Moxham
2009).
In contrast, lagging indicators have greater relevance, in the sense that they
reflect changes in society, achievement of mission, or impact on stakeholders. Yet
they also tend to be more subjective and dependent on personal reference
frameworks. To deal with these issues, we refer to prior suggestions regarding the
need to consider multiple external assessments, and control and test for contextual
and individual characteristics.
In this context, we note an interesting contribution by Packard (2010), who
describes a complex, conceptual sequence of relevant indicators: input indicators,
throughput indicators, management capacity, program capacity, and outputs. These
Voluntas (2014) 25:1648–1670 1661
123
combined dimensions reveal the organization’s overall performance as an ‘‘over-
riding concept, including throughputs in terms of program and management
operations; and results including outputs, quality, efficiency, and effectiveness’
(Packard 2010, p. 976). With a survey of 52 staff members in 14 nonprofit
programs, Packard finds that the respondents consider goal accomplishment of great
importance. These lagging indicators also are of stronger interest from a theoretical
perspective and they invoke generally high ratings when people are asked to judge
their own organization. Perhaps social desirability influences their answers,
especially those for indicators that are less objectively measurable. However,
respondents seem to rate leading indicators, such as customer and employee
satisfaction, as better indicators of nonprofit effectiveness, because such indicators
can more actively be managed, as well as compared across organizations.
Distinct Versus Overlapping Effectiveness Measurements
While the previous trade-off deals with the causal sequence of distinct concepts like
organizational actions, outcomes, and effects, here we address potentially overlap-
ping effectiveness-related concepts. The literature shows different logical sequences
regarding overall effectiveness, in which for example (1) particular board practices
lead to board performance (or governance quality), (2) board performance leads to
organizational performance, (3) organizational performance leads to reputational
effectiveness, and (4) reputational effectiveness leads to higher donations (stake-
holder performance) (Bradshaw et al. 1992; Green and Griesinger 1996; Kushner
and Poole 1996; Siciliano 1997; Smith and Shen 1996; Herman et al. 1997; Herman
and Renz 2000; Forbes 1998; Padanyi and Gainer 2003; Radbourne 2003; Brown
2005; Gill et al. 2005; Yoo and Brooks 2005; Daellenbach et al. 2006; Carman and
Fredericks 2010; Sarstedt and Schloderer 2010; Mews and Boenigk 2012; Helmig
et al. 2013).
When multiple of these effectiveness-related concepts appear in a single study,
regardless of whether they are formative or reflective, or leading or lagging, they
might be completely distinct, partially overlapping, or even completely overlapping
(Tacq 1984). Consider for example the unique impact of multiple management
practices and their quality on organizational effectiveness, tested in a regression
analysis. An inherent assumption is that they are distinct concepts in a causal
sequence. However, when the assessment of all these concepts (management
practices and effectiveness) relies on perceptions, the indicators could be related or
partially overlapping, as they contain very similar elements. The overlap even could
be so extensive that the assumed cause equals the hypothesized effect (i.e., two
conceptual names are used, but both refer to a single and general concept, or causa
aequat effectum, Tacq 1984, p. 146). If the researcher posits that the concepts are
distinct when they are not (or only partially), substantive causal interpretations
could emerge for non-existing relations (i.e., Type I error). In trade-off 2, we
described similar errors, which resulted from assigning different interpretations to
seemingly distinct dimensions that were not different in reality. Here, the errors
result from assuming causal relations among concepts that are, at least from a
measurement perspective, not different.
1662 Voluntas (2014) 25:1648–1670
123
Thus, when multiple effectiveness-related concepts combine in a single study,
with causal relations hypothesized and tested across these concepts, we suggest an
initial, thorough analysis of whether the measured concepts actually may be
considered distinct. The discriminant and convergent validity of the concepts could
be tested (Fornell and Larcker 1981; Cohen 1988; Wilson et al. 2007; Farrell and
Rudd 2009; Farrell 2010; Shiu et al. 2011). For example, Boenigk and Helmig
(2013) assess whether donor–nonprofit identification and donor identity salience are
distinct constructs, with separate and direct effects on donation behavior. This
assessment revealed that the two identification constructs were distinct, even though
the indicators used for the two measurements seemed similar. To avoid Type I
errors, more careful interpretations are needed regarding whether different concepts
truly are being measured and if they causally relate, especially if single informants
provide the judgments for several effectiveness-related concepts, such as measures
of professionalism, aspects of leadership, or governance effectiveness (e.g., 260
CEOs judging their own organizations, LeRoux and Wright 2010), as well as
perceptions of chairs (e.g., Harrison et al. 2012 surveyed different people close to
board chairs). When extremely high standardized coefficients or correlations (up to
0.86) are reported as causal relationships between perception-based concepts, we
might suspect that the measurements combine leading and lagging indicators,
quantifying a single overall latent construct that the respondents perceive similarly,
rather than distinct concepts that are causally related.
In contrast, studying overlapping concepts can help in better understanding how
mental models are shaped by (groups of) practitioners. Having insight in how
individuals separately, or groups through mutual interaction, construct a mental
framework on how their actions and decisions potentially result in outcomes, can
improve our insights in the individual and collective sense making processes. As
these processes are at the basis of actual managerial and governance decisions, they
can clarify what information is mainly taken into account and for what reasons
(Mitchell 2013).
Additive Versus Multiplicative Measurements
Researchers and practitioners might contemplate assumptions they make with
respect to aggregating data, which can reflect different criteria, dimensions, and
items (see ‘Uni-versus multidimensional measurements’ and ‘‘Formative versus
reflective measurements’ sections), or different sources (see ‘Individual versus
group measurements’ section). In most cases, if multiple items refer to a single
dimension, an average score is used. In combination with the high common variance
of reflective constructs, such a quantification should give a reliable indication of an
existing latent concept. The combination of multiple sources, together with high
inter-rater reliability (Boyer and Verma 2000), also offers a reliable assessment. For
formative items, weights (comparable to regression coefficients) might be used to
extract the unique effects of the indicators. Similar considerations apply to
combinations of dimensions or of assessments by different stakeholder groups. In
these cases, the assumption is that the measurement model is additive, such that the
(weighted) sum of multiple assessments constitutes the score for the overall
Voluntas (2014) 25:1648–1670 1663
123
concepts. From a practitioner perspective, a multiplicative approach may be more
useful for assessing and benchmarking organizations. This approach stresses the
conditionality of each dimension or data source. For example, if an observer reviews
several dimensions to make a holistic assessment of an organization’s effectiveness,
separate scores are available for each dimension. An organization might score high
on all except one dimension and extremely low on this single dimension. An
additive approach would produce a fairly positive outcome, because the negative
outcome for the one dimension would be compensated for by the other dimensions.
A multiplicative approach instead would lead to a lower value for the overall
concept, because in this view, an organization is effective only if it performs well on
every dimension. The latter approach thus could be valuable for selecting highly
effective organizations for further qualitative exploration (Balser and McClusky
2005; Herman and Renz 2000). Finally, a multiplicative approach can reveal the
causes and effects of a particular and obvious shortcoming, rather than of averaged
achievements across multiple dimensions or sources (Willems et al. 2012a).
Conclusions and Avenues for Further Research
Rather that striving for a ‘‘best’’ way that is applicable to any kind of nonprofit
organization, we have sought to present both the advantages and the disadvantages
of a variety of choices that can be made, taking into account the contextual elements
of an effectiveness measurement project. As our first contribution, we offer an
overview of these advantages and disadvantages, as summarized in Table 1.This
overview offers decision guidelines for researchers and practitioners who engage in
nonprofit effectiveness measurement projects.
As a second main contribution, we identify from the seven trade-offs five
research avenues that could extend and enhance the ongoing nonprofit effectiveness
discussion.
First, our approach started with existing theoretical insights, examples, and
literature overviews from the broad nonprofit research domain. Starting from the
seven trade-offs identified, a next step for research could be to pursue a more
standardized, abstracted approach for identifying general decision rules for
nonprofit effectiveness measurement projects. Our literature review enabled us to
identify the separate trade-offs, but it was outside the scope of this project to
investigate actual, concrete decisions that have been considered (or ignored) by the
authors of the studies that we cite. This limitation actually is inherent to the
academic publication process, which focuses on reporting detailed research findings
rather than on the trade-offs that the researchers or practitioners considered reaching
these findings. It would be even harder to gain insights into all the contextual factors
that determined their choices. Therefore, we invite researchers and practitioners to
discuss and evaluate their own project-related decisions, using the trade-offs that we
identified (e.g., in the methodology section of their articles). In addition to clarifying
the purpose and decision processes underlying research projects, this could also
result in a more critical evaluation and elaboration of the trade-offs postulated
herein. Ultimately, this could also provide more standardized insights into how
1664 Voluntas (2014) 25:1648–1670
123
contextual settings can determine optimal decisions for nonprofit effectiveness
measures.
Second, considering the misinterpretations that can result from misspecifications
or over-investigation of seemingly distinct but overlapping concepts (see ‘‘Forma-
tive versus reflective measurements’ and ‘‘Distinct versus overlapping effectiveness
measurements’ sections), we hope further research digs into the question of where
to draw boundaries between dimensions and/or related effectiveness concepts. In
contrast with the divergent evolution in the previous literature, typified by the
introduction of new classifications, dimensions, conditions, factors, and effects, we
consider a converging phase in nonprofit effectiveness research more relevant.
When new concepts are introduced, a proper clarification and testing of how they
are similar or different from those that already exit previously, is important for true
theory advancement. In particular, when concepts are quantified for further analysis,
we strongly suggest reusing existing measurements. As Baruch and Ramalho (2006)
recommend, new studies on nonprofit effectiveness might reuse basic, generalizable
concepts across contexts and for various research questions. Depending on the
particular context, additional measurement criteria might be introduced. Such
(partial) reuse of dimensions, criteria, or scales offers two major advantages for an
overall understanding of nonprofit effectiveness. It provides a continuous verifica-
tion of existing concepts, dimensions, and measurements, both theoretically and
methodologically, and it enables a more critical evaluation of the generalizability of
prior research findings across various contexts.
Third, we used the distinction between units of interest, data collection, and
analysis mainly to discuss potential measurement biases. However, the differences
among these units also imply strong opportunities for developing new theoretical
insights into nonprofit effectiveness. Several contributions already assert that
nonprofit effectiveness is a social construction (Herman and Renz 1997; Forbes
1998; Liao-Troth and Dunn 1999), focusing mainly on the fact that people
individually develop an understanding of what nonprofit effectiveness means to
them. However, through their social interactions they develop a certain degree of
common understanding. The social constructionist perspective thus far has served
mainly to argue how people differ in their assessments of organizations. An opposite
perspective, regarding the extent to which they agree, might deserve more attention.
That is, we know little about how and why groups of people come to agree about the
meaning of nonprofit effectiveness. The perceived effectiveness of an organization
has significant impacts on the behavior of individuals and stakeholder groups
toward the organization though (Yoo and Brooks 2005; Daellenbach et al. 2006;
Sarstedt and Schloderer 2010; Mews and Boenigk 2012), so research should address
in-depth the social interaction factors and processes that shape shared versus unique
perceptions of an organization’s effectiveness. This analysis might provide
important new insights into how social processes among individuals can be
managed, and to steer collective behavior toward an organization.
Fourth, for conciseness, we focused only on nonprofit-related examples and
literature reviews. However, challenges to measure effectiveness-related concepts
go far beyond nonprofit realms. In addition to the supplementary suggestions that
we have made already, we recommend that continued contributions contemplate
Voluntas (2014) 25:1648–1670 1665
123
and investigate similarities and dissimilarities with other contexts, such as public
and profit sectors (Rojas 2000; Micheli and Kennerley 2005). Such a comparison
could enhance our understanding of the extent to which these seven trade-offs are
unique to the nonprofit sector or generalizable to other areas, or whether these trade-
offs are dealt with differently, or result in other decisions for other research
domains.
Fifth, we note the challenges that remain for bridging the gap between theoretical
insights and practitioner recommendations for measuring nonprofit effectiveness.
We have postulated seven trade-offs, largely based on academic literature. Yet the
basic decisions in each trade-off are highly relevant for practitioners too. For
example, practitioners seek high-quality, reliable measures of their actions, outputs,
and effects, and they hope to obtain these measures at reasonable costs.
Accordingly, they face very similar decisions in terms of which criteria to measure,
what sources to use, how many sources to gather, how to aggregate data, how to
quantify various measurement types, and so on. Thus, research projects should focus
not only on the type of measurements used by nonprofit practitioners or their
usefulness but also on why they have been chosen and in which contexts. Such
studies could provide some verification for whether the seven trade-offs we propose
are truly relevant beyond the academic context. More important, they could reveal
new insights into the practical usefulness of various types of effectiveness
measurements, depending on the real-world context.
As nonprofit organizations are increasingly confronted with competition to attract
resources and with continuous internal and external accountability obligations
(Ashley and Faulk 2010; Faulk et al. 2012), the trade-offs described in this article
could help as guidelines in professionalizing their measurement-based management.
As pointed out by Mitchell (2013), nonprofit practitioners often rely on personal and
intuitive effectiveness assessments to make managerial decisions. In addition, the
content of these assessments seem unfortunately not fully captured by the
effectiveness focus taken by academic scholars. Therefore, nonprofit practitioners
and consultants could move forward in setting up own measurement systems that
support their particular needs, but that are sufficiently providing useful information
at a reasonable cost. In this context, organizations could continuously experiment
with different types of measurements and effectiveness reporting, in order to find
out how sufficient managerial benefits can be obtained [e.g. more efficient practices,
more (social) return on investment, more impact on stakeholders, etc.]. When
selecting these performance indicators, special attention could be devoted to find an
optimal balance regarding the many advantages and disadvantages discussed in the
trade-offs. For example, practitioners could define for their own organization a set
of broad en general performance indicators that allow comparison with other
organizations. In doing so, organizations can benchmark themselves and learn from
each other when aiming to improve their practices and performance indicators. In
addition, specific indicators for the particular context of an organization and its
stakeholders can complement the core set of general indicators, aiming at better
understanding how particular actions cause improvements in serving stakeholder
interests and in achieving the organizational mission. In doing so, they can verify or
adjust their own mental models on how their actions potentially result in
1666 Voluntas (2014) 25:1648–1670
123
performance, and they can incrementally improve their understanding of the true
impact of their decisions and actions.
References
Ashley, S., & Faulk, L. (2010). Nonprofit competition in the grants marketplace: Exploring the
relationship between nonprofit financial ratios and grant amount. Nonprofit Management and
Leadership, 21(1), 43–57.
Babiak, K. M. (2009). Criteria of effectiveness in multiple cross-sectoral interorganizational relation-
ships. Evaluation & Program Planning, 32(1), 1–12.
Balser, D., & McClusky, J. (2005). Managing stakeholder relationships and nonprofit organization
effectiveness. Nonprofit Management & Leadership, 15(3), 295–315.
Baruch, Y., & Ramalho, N. (2006). Communalities and distinctions in the measurement of organizational
performance and effectiveness across for-profit and nonprofit sectors. Nonprofit and Voluntary
Sector Quarterly, 35(1), 39–65.
Bergkvist, L., & Rossiter, J. R. (2007). The predictive validity of multiple-item versus single-item
measures of the same constructs. Journal of Marketing Research, 44(2), 175–184.
Boenigk, S., & Helmig, B. (2013). Why do donors donate? Examining the effects of organizational
identification and identity salience on the relationships among satisfaction, loyalty, and donation
behavior. Journal of Service Research, 16, 533–548.
Boenigk, S., Leipnitz, S., & Scherhag, C. (2011). Altruistic values, satisfaction and loyalty among first-
time blood donors. Nonprofit and Voluntary Sector Marketing, 16(4), 356–370.
Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation
perspective. Psychological Bulletin, 110(2), 305–314.
Boyer, K. K., & Verma, R. (2000). Multiple raters in survey-based operations management research:
A review and tutorial. Production and Operations Management, 9(2), 128–140.
Bradshaw, P., Murray, V., & Wolpin, J. (1992). Do nonprofit boards make a difference? An exploration of
the relationships among board structure, process and effectiveness. Nonprofit and Voluntary Sector
Quarterly, 21(13), 227–249.
Bremser, W. G., & Barsky, N. (2004). Utilizing the balanced scorecard for R&D performance
measurement. R&D Management, 34(3), 229–238.
Brewer, P. C., & Speh, T. W. (2000). Using the balanced scorecard to measure supply chain performance.
Journal of Business Logistics, 21(1), 75–93.
Brown, W. A. (2005). Exploring the association between board and organizational performance in
nonprofit organizations. Nonprofit Management & Leadership, 15(3), 317–339.
Carman, J. G., & Fredericks, K. A. (2010). Evaluation capacity and nonprofit organizations: Is the glass
half-empty or half-full? American Journal of Evaluation, 31(1), 84–104.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum.
Daellenbach, K., Davies, J., & Ashill, N. J. (2006). Understanding sponsorship and sponsorship
relationships—Multiple frames and multiple perspectives. International Journal of Nonprofit and
Voluntary Sector Marketing, 11(1), 73–87.
Dart, R. (2010). A grounded qualitative study of the meanings of effectiveness in Canadian ‘results-
focused’ environmental organizations. VOLUNTAS: International Journal of Voluntary and
Nonprofit Organizations, 21, 202–219.
DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed., Vol. 26)., Applied Social
Research Methods Series London: Sage Publications.
Diamantopoulos, A. (1999). Viewpoint–export performance measurement: Reflective versus formative
indicators. International Marketing Review, 16(6), 444–457.
Diamantopoulos, A., & Siguaw, J. A. (2006). Formative versus reflective indicators in organizational
measure development: A comparison and empirical illustration. British Journal of Management,
17(4), 263–282.
Diamantopoulos, A., & Winklhofer, H. M. (2001). Index construction with formative indicators:
An alternative to scale development. Journal of Marketing Research, 38(2), 269–277.
Voluntas (2014) 25:1648–1670 1667
123
DiMaggio, P. (2001). Measuring the impact of the nonprofit sector on society is probably impossible but
possibly useful: A sociological perspective. In P. Flynn & V. Hodgkinson (Eds.), Measuring the
impact of the nonprofit sector (pp. 249–272). New York: Kluwer Academic/Plenum Publishers.
Drolet, A. L., & Morrison, D. G. (2001). Do we really need multiple-item measures in service research?
Journal of Service Research, 3(3), 196–204.
Eckerd, A., & Moulton, S. (2011). Heterogeneous roles and heterogeneous practices: Understanding the
adoption and uses of nonprofit performance evaluations. American Journal of Evaluation, 32(1),
98–117.
Epstein, M. J., & Wisner, P. S. (2001). Using a balanced scorecard to implement sustainability.
Environmental Quality Management, 11(2), 1–10.
Farrell, A. M. (2010). Insufficient discriminant validity: A comment on Bove, Prevan, Beatty and Shiu
(2009). Journal of Business Research, 63(5), 324–327.
Farrell, A. M., & Rudd, J. M. 2009. Factor analysis and discriminant validity: A brief review of some
practical issues. Australia-New Zealand Marketing Academy Conference (ANZMAC), December,
Melbourne, Australia.
Faulk, L., Lecy, J. D., & McGinnis, J. 2012. Nonprofit competitive advantage in grant markets:
Implications of network embeddedness. Andrew Young School of Policy Studies Research Paper
Series No. 13-07.
Forbes, D. P. (1998). Measuring the unmeasurable: Empirical studies of nonprofit organizations
effectiveness from 1977 to 1997. Nonprofit and Voluntary Sector Quarterly, 27(2), 183–202.
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables
and measurement error. Journal of Marketing Research, 18(1), 39–50.
Frumkin, P. (2002). On being nonprofit: A conceptual policy primer. Cambridge, MA: Harvard
University Press.
Gill, M., Flynn, R. J., & Raising, E. (2005). The governance self-assessment checklist: An instrument for
assessing board effectiveness. Nonprofit Management & Leadership, 15(3), 271–294.
Gordon, T. P., Khumawala, S. B., Kraut, M., & Neely, D. G. (2010). Five dimensions of effectiveness for
nonprofit annual reports. Nonprofit Management & Leadership, 21(2), 209–228.
Green, J. C., & Griesinger, D. W. (1996). Board performance and organizational effectiveness in
nonprofit social services organizations. Nonprofit Management & Leadership, 6(4), 381–402.
Groves, R. M., & Heeringa, S. G. (2006). Responsive design for household surveys: Tools for actively
controlling survey errors and costs. Journal of the Royal Statistical Society Series A, 169(3),
439–457.
Hansmann, H. (1987). Economic theories of nonprofit organizations. In W. W. Powell (Ed.), The
nonprofit sector: A research handbook (pp. 27–42). New Haven, CT: Yale University Press.
Harrison, Y., Murray, V., & Cornforth, C. (2013). Perceptions of board chair leadership effectiveness in
nonprofit and voluntary sector organizations. VOLUNTAS: International Journal of Voluntary and
Nonprofit Organizations, 24(3), 688–712. doi:10.1007/s11266-012-9274-0.
Helmig, B., Ingerfurth, S., & Pinz, A. 2013. Success and Failure of Nonprofit Organizations: Theoretical
Foundations, Empirical Evidence, and Future Research, Voluntas. doi:10.1007/s11266-013-9402-5.
Herman, R. D., & Renz, D. O. (1997). Multiple constituencies and the social construction of nonprofit
organization effectiveness. Nonprofit and Voluntary Sector Quarterly, 26(2), 185–206.
Herman, R. D., & Renz, D. O. (1999). Theses on nonprofit organizational effectiveness. Nonprofit and
Voluntary Sector Quarterly, 28(2), 107–126.
Herman, R. D., & Renz, D. O. (2000). Board practices of especially effective and less effective local
nonprofit organizations. American Review of Public Administration, 30(2), 146–160.
Herman, R. D., & Renz, D. O. (2008). Advancing nonprofit organizational effectiveness research and
theory: Nine theses. Nonprofit Management & Leadership, 18(4), 399–415.
Herman, R. D., Renz, D. O., & Heimovics, R. D. (1997). Board practices and board effectiveness in local
nonprofit organizations. Nonprofit Management & Leadership, 7(4), 373–385.
Hitt, M. A., Beamish, P. W., Jackson, S. E., & Mathieu, J. E. (2007). Building theoretical and empirical
bridges across levels: Multilevel research in management. Academy of Management Journal, 50(6),
1385–1399.
Jackson, D. K., & Holland, T. P. (1998). Measuring effectiveness of nonprofit boards. Nonprofit and
Voluntary Sector Quarterly, 27(2), 159–182.
Jarvis, C. B., MacKenzie, S. B., & Podsakoff, P. M. (2003). A critical review of construct indicators and
measurement model misspecification in marketing and consumer research. Journal of Consumer
Research, 30(2), 199–218.
1668 Voluntas (2014) 25:1648–1670
123
Jo
¨reskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple
causes of a single latent variable. Journal of the American Statistical Association, 70(351), 631–639.
Jun, K.-N., & Shiau, E. (2012). How are we doing? A multiple constituency approach to civic association
effectiveness. Nonprofit and Voluntary Sector Quarterly, 41(4), 632–655.
Kaplan, R. S. (2001). Strategic performance measurement and management in nonprofit organizations.
Nonprofit Management & Leadership, 11(3), 353–370.
Krashinsky, M. (1997). Stakeholder theories of the non-profit sector: One cut at the economic literature.
VOLUNTAS: International Journal of Voluntary and Nonprofit Organizations, 8(2), 149–161.
Kushner, R. J., & Poole, P. P. (1996). Exploring structure–effectiveness relationships in nonprofit arts
organizations. Nonprofit Management & Leadership, 7(2), 119–136.
Lecy, J. D., Schmitz, H. P., & Swedlund, H. (2012). Non-governmental and not-for-profit organizational
effectiveness: A modern synthesis. VOLUNTAS: International Journal of Voluntary and Nonprofit
Organizations, 23(2), 434–457.
LeRoux, K., & Wright, N. S. (2010). Does performance measurement improve strategic decision making?
Findings from a national survey of nonprofit social service agencies. Nonprofit and Voluntary Sector
Quarterly, 39(4), 571–587.
Liao, M., Foreman, S., & Sargeant, A. (2001). Market versus societal orientation in the nonprofit context.
International Journal of Nonprofit and Voluntary Sector Marketing, 6(3), 254–268.
Liao-Troth, M., & Dunn, C. P. (1999). Social constructs and human service: Managerial sensemaking of
volunteer motivation. Voluntas, 10(4), 345–361.
Lynn, L. E, Jr, Heinrich, C. J., & Hill, C. J. (2000). Studying governance and public management:
Challenges and prospects. Journal of Public Administration Research and Theory, 10(2), 233–261.
Maas, C. J. M., & Hox, J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1(3),
86–92.
Mews, M., & Boenigk, S. (2012). Does organizational reputation influence the willingness to donate
blood? International Review on Public and Nonprofit Marketing,. doi:10.1007/s12208-012-0090-4.
Micheli, P., & Kennerley, M. (2005). Performance measurement frameworks in public and non-profit
sectors. Production Planning & Control, 16(2), 125–134.
Mistry, S. 2007. How does one voluntary organization engage with multiple stakeholder views of
effectiveness? Voluntary Sector Working Paper: Number 7, London School of Economics and
Political Science.
Mitchell, G. E. (2013). The construct of organizational effectiveness: Perspectives from leaders of
international nonprofits in the United States. Nonprofit and Voluntary Sector Quarterly, 42(2),
324–345.
Moxham, C. (2009). Performance measurement: Examining the applicability of the existing body of
knowledge to nonprofit organisations. International Journal of Operations & Production
Management, 29(7), 740–763.
Osborne, S. P., & Tricker, M. (1995). Researching non-profit organisational effectiveness: A comment on
Herman and Heimovics. VOLUNTAS: International Journal of Voluntary and Nonprofit Organi-
zations, 6(1), 85–92.
Packard, T. (2010). Staff perceptions of variables affecting performance in Human service organizations.
Nonprofit and Voluntary Sector Quarterly, 39(6), 971–990.
Padanyi, P., & Gainer, B. (2003). Peer reputation in the nonprofit sector: Its role in nonprofit sector
management. Corporate Reputation Review, 6(3), 252–265.
Pandey, S. K., Coursey, D. H., & Moynihan, D. P. (2007). Organizational effectiveness and bureaucratic
red tape: A multimethod study. Public Performance & Management Review, 30(3), 398–425.
Perrow, C. (1961). The analysis of goals in complex organizations. American Sociological Review, 26(6),
854–866.
Petter, S., Straub, D., & Rai, A. (2007). Specifying formative constructs in information systems research.
Management Information Systems Quarterly, 31(4), 623–656.
Plantz, M. C., Greenway, M. T., & Hendricks, M. (1997). Outcome measurement: Showing results in the
nonprofit sector. New Directions Evaluation, 75, 15–30.
Radbourne, J. (2003). Performing on board: The link between governance and corporate reputation in
nonprofit arts boards. Corporate Reputation Review, 6(3), 212–222.
Richard, P. J., Devinney, T. M., Yip, G. S., & Johnson, G. (2009). Measuring organizational performance:
Towards methodological best practice. Journal of Management, 35(3), 718–804.
Rojas, R. R. (2000). A review of models for measuring organizational effectiveness among for-profit and
nonprofit organizations. Nonprofit Management & Leadership, 11(1), 97–104.
Voluntas (2014) 25:1648–1670 1669
123
Sarstedt, M., & Schloderer, M. P. (2010). Developing a measurement approach for reputation of non-
profit organizations. International Journal of Nonprofit and Voluntary Sector Marketing, 15(3),
276–299.
Sawhill, J. C., & Williamson, D. (2001). Mission impossible? Measuring success in nonprofit
organizations. Nonprofit Management & Leadership, 11(3), 371–386.
Shilbury, D., & Moore, K. A. (2006). A study of organizational effectiveness for national Olympic
sporting organizations. Nonprofit and Voluntary Sector Quarterly, 35(1), 5–38.
Shiu, E., Pervan, S. J., Bove, L. L., & Beatty, S. E. (2011). Reflections on discriminant validity:
Reexamining the Bove et al. (2009) findings. Journal of Business Research, 64(3), 497–500.
Siciliano, J. I. (1997). The relationship between formal planning and performance in nonprofit
organizations. Nonprofit Management & Leadership, 7(4), 387–403.
Smith, D. H., & Shen, C. (1996). Factors characterizing the most effective nonprofits managed by
volunteers. Nonprofit Management & Leadership, 6(3), 271–289.
Sowa, J. E., Selden, S. C., & Sandfort, J. R. (2004). No longer unmeasurable? A multidimensional
integrated model of nonprofit organizational effectiveness. Nonprofit and Voluntary Sector
Quarterly, 33(4), 711–728.
Stazyk, E. C., & Goerdely, H. T. (2010). The benefits of bureaucracy: Public managers’ perceptions of
political support, goal ambiguity, and organizational effectiveness. Journal of Public Administration
Research and Theory, 21(4), 645–672.
Tacq, J. J. A. (1984). Causaliteit in sociologisch onderzoek. Een beoordeling van causale analysetech-
nieken in het licht van wijsgerige opvattingen over causaliteit. Deventer: Van Loghum Slaterus.
Van Puyvelde, S., Caers, R., Du Bois, C., & Jegers, M. (2012). The governance of nonprofit
organizations: Integrating agency theory with stakeholder and stewardship theories. Nonprofit and
Voluntary Sector Quarterly, 41(3), 431–451.
Willems, J., Huybrechts, G., Jegers, M., Weijters, B., Vantilborgh, T., Bidee, J., et al. (2012a). Nonprofit
governance quality: Concept and measurement. Journal of Social Service Research, 38(4), 561–578.
Willems, J., Van den Bergh, J., & Deschoolmeester, D. (2012b). Analyzing employee agreement on
maturity assessment tools for organizations. Knowledge and Process Management, 19(3), 142–147.
Wilson, B. C., W. Ringle, C. M., & Henseler, J. (2007). Exploring causal path directionality for a
marketing model using Cohen’s path method. In H. Martens, T. Naes, & M. Martens (Eds.),
Causalities Explored by Indirect Observation: Proceedings of the 5th International Symposium on
PLS and Related Methods (PLS’07) MATFORSK (pp. 57–61).
Yoo, J., & Brooks, D. (2005). The role of organizational variables in predicting service effectiveness:
An analysis of a multilevel model. Research on Social Work Practice, 15(4), 267–277.
1670 Voluntas (2014) 25:1648–1670
123
... As subjectivity and social constructionism are inherent to the evaluation of CSO performance, reputation and perceptions are in many cases substantially more important than true performance (Herman and Renz, 1999;Willems et al., 2014). Acknowledging this has substantial impact on the transparency and accountability debate for CSOs, as this inherent subjectivity shows that accountability is not only about being transparent about certain elements of the organization (what) and to which stakeholder (to whom), but also about the way reporting is conducted (how). ...
... Need for clear transparency and hard accountability CSO goals and their achievements are often multidimensional, subjective and difficult to quantify (Lecy et al., 2012;Sowa et al., 2004;Willems et al., 2014). As a result, there is a broad range of choices regarding how to report resources, processes, outputs and outcomes of CSOs (Cabedo et al., 2018). ...
Chapter
Despite the vast repertoire of practitioner and scientific literature since the early 1990s on how civil society organizations (CSOs) should be governed, we continue to regularly hear stories of severe organizational crises. Even well-respected, internationally active CSOs sometimes find themselves in the middle of a media storm (Archambeault and Webber, 2018; Cordery and Baskerville, 2011; Harris et al., 2018; Willems, 2016; Willems and Faulk 2019). It is naïve to assume that such events will cease in the future or at least stop endangering the sustainability and continuity of CSOs. Nevertheless, an explicit evaluation of how crisis situations can be avoided and how their devastatingly negative effects can be mitigated through CSO governance processes makes it necessary to focus on CSO accountability and transparency. As a result, the clarification and elaboration of the concepts of accountability and transparency can strengthen theoretical and practical insights as to how CSOs can become more crisis-resistant and resilient (Brown, 2005; Helmig et al., 2014). In addition, insight into the inherent trade-offs that CSO leadership teams need to consider in their governance decisions can help both practitioners and researchers to (1) avoid more CSO crisis situations in the future, (2) more effectively overcome such crises when they occur and (3) identify the contextual and organizational factors affecting leaders' governance decisions. Against this background, the aim of this chapter is threefold: 1. Provide an elaborated definition of CSO transparency and accountability that takes into account the nature and role of CSOs in contemporary societies. After highlighting the uniquely defining characteristics of CSOs, the chapter identifies from the inter-disciplinary literature a set of circumstances that underpin the need for a multidimensional elaboration of transparency and accountability specific to CSOs. 2. Document governance responsibilities that CSOs have with respect to transparency and accountability. The chapter explains why transparency and accountability are necessary elements of the CSO governance function. 3. Develop propositions for further scientific elaboration and validation of how CSO governance practices encompass but also support and lead to CSO transparency and accountability. The output of the first two research aims is juxtaposed with five dimensions of a governance quality index, highlighting how governance quality dimensions include and relate to various aspects of CSO transparency and accountability.
... How to measure the performance of governments (Lee and Whitford 2009;Brewer et al. 2007), international organisations (Tallberg et al. 2016;Gutner and Thompson 2010), or pressure groups (Grant 2005;Willems et al. 2014), has been discussed extensively. ...
Article
Full-text available
How the EU interacts with its Eastern neighbours has been researched extensively. How it has performed in this region however has been systemically researched on far fewer occasions. This gap is even more glaring when straying away from policy areas such as trade, rule of law, or democratisation. More than 10 years after the Eastern Partnership’s inauguration, this paper therefore investigates the performance of the EU’s Common Agricultural Policy, with a specific focus on its endogenous rural development programme LEADER in Georgia. It finds a mixed picture for the performance attributes relevance, effectiveness, and impact, with differences in performance found between policy instruments and between actors on the local and state level. Therefore, it suggests analysing performance not only across its constituent attributes, but also with a view to specific policy instruments and actors beyond the central government.
... personeelstevredenheid) of het activiteitenniveau (bv. kwaliteitsvolle dienstverlening)(Sowa, Selden & Sandfort 2004;Willems, Boenigk & Jegers 2014). In deze studie kiezen we voor een perceptuele maatstaf, nl. ...
Article
Full-text available
Veel non-profitorganisaties in Vlaanderen worden gesubsidieerd door de overheid. Om te controleren of publiek geld 'goed' besteed wordt, vraagt de overheid aan deze organisaties om op geregelde basis te rapporteren over hun prestaties. Waar critici deze dynamiek reduceren tot een bureaucratische oefening die tijd wegneemt van de kernactiviteiten, stellen management-onderzoekers dat publieke verantwoording organisatorisch leren in de hand kan werken. Immers, externe verantwoording stimuleert het gebruik van managementtools, nodig om prestaties te meten. Dit geeft inzicht in de eigen prestaties en-belangrijker-mogelijke verbeterpunten. Op basis van surveydata verzameld bij Vlaamse non-profitorganisaties (N = 496) wordt in deze bijdrage getest in welke mate en hoe publieke verantwoording kan bijdragen aan 'betere' non-profitprestaties. De resultaten leren dat (a) publieke verantwoording samenhangt met een positievere inschatting van non-profitprestaties door de respondenten, en (b) deze dynamiek primair verloopt via het gebruik van managementtools om pres-taties te meten.
... Because of the potential for undue influence by groups with greater financial and social capital as discussed above, this focus on individual projects often results in an inequitable distribution of benefits from the agency-foundation relationship. The need for careful long-term planning and evaluation is compounded by the complex, multidimensional nature of nonprofit effectiveness; not only do NPOs need to seek financial success, they must also balance their mission, the stakeholder groups they serve, and their position relative to other organizations (Richard et al., 2009;Willems et al., 2014). ...
Article
Full-text available
In the face of persistent funding shortfalls, local park and recreation agencies oftentimes engage in neoliberal conservation strategies, including partnerships with nonprofit park foundations. Although such partnerships have been extensively studied in other contexts, there exists a lack of research examining these partnerships and their potential impact on equitable outcomes. The purpose of this study was to extend the existing body of research surrounding this neoliberal conservation strategy and its relationship to equity in the context of local park and recreation services. We analyzed in-depth interviews with leaders from agencies and foundations using an inductive qualitative method. Our results describe several roles for inequity in the agency-foundation relationship: as a call to action, as an outcome, as a process, and as an unresolved challenge. Based on these results, we discuss potential implications and provide a series of recommendations for advancing social justice in the delivery of park and recreation services.
... As Cronbach's Alpha is based on the number of items and as measuring role models by only five single variables is quite complex the low reliability of both indices is not a major impediment to its use. However, neither the RRM nor ARM roles are the expression of a "shared, common perception of the concept" (Willems et al., 2014(Willems et al., , p. 1656) as in reflexive constructs. SCS might well diverge in their understanding of the duties a SCS conventionally has to cover when fulfilling either a supportive, reactive role, or an active role. ...
Article
Full-text available
The influence of senior civil servants’ (SCS) tasks on their role perceptions has been widely ignored in the past research on the administrative élite. This paper presents new survey data on SCS in German federal ministries to test this relation by categorizing SCS into three task-related groups: strategists, policy specialists and administrators. Regression analyses reveal that SCS’s tasks do not influence their (strong) identification with reactive (supportive) roles but have a significant impact on their identification with active, more politically entrepreneurial roles. This entails two important findings: First, SCS’s tasks matter for their appreciation of different roles. Second, active and reactive role models are not irreconcilable (as it is often argued in the literature on bureaucratic politicization), but complementary.
... Literature provides models to measure the performance of NPOs (Cestari et al., 2018;Willems et al., 2014) using indicators (Kim, 2017;Righi & Andreoni, 2014). However, non-profit transparency measurement systems in particular have received limited attention. ...
Article
Full-text available
We are currently witnessing the development of a set of organizations that have been entrusted with meeting the very diverse needs of citizens. As a result, they receive funds, in order to ensure they are managed appropriately. The transparency of the information revealed by Non-profit Organizations (NPOs) has become of increasing interest to public authorities and research. However, very few studies empirically measure the extent of transparency in NPOs. Only a handful checked the compliance of various indicators, lacking agreement on which ones to include and their weighting. To address this issue, this study empirically validates the weighting of the indicators from the alliance between the Platform for Social Action NGO and the Spanish Coordinator for Development NGO (CONGDE) document with experts in NPOs’ opinions. We use the Best-Worst Method (BWM) to optimally assign weights to multi-criteria decision making situations. Our results show interesting differences in the level of importance given to the indicators by public authorities and experts, suggesting the need for a revision of the importance proposed.
Article
The past two decades have witnessed massive growth in the amount of quantitative research in nonprofit studies. Despite the large number of studies, findings from these studies have not always been consistent and cumulative. The diverse and competing findings constitute a barrier to offering clear, coherent knowledge for both research and practice. To further advance nonprofit studies, some have called for meta-analysis to synthesize inconsistent findings. Although meta-analysis has been increasingly used in nonprofit studies in the past decade, many researchers are still not familiar with the method. This article thus introduces meta-analysis to nonprofit scholars and, through an example demonstration, provides general guidelines for nonprofit scholars with background in statistical methods to conduct meta-analyses, with a focus on various judgement calls throughout the research process. This article could help nonprofit scholars who are interested in using meta-analysis to address some unsolved research questions in the nonprofit literature.
Article
This study examines the effect of transformational leadership style on the use of management control systems, specifically Simons’ (1995) levers of control, in human service not-for-profit organizations. Unlike prior studies in which transformational leadership style has been treated as an aggregate construct, we decompose the construct and examine its effect at the dimensional level. Using matched survey and archival data for 271 organizations in Australia, we find two dimensions of transformational leadership, articulating a vision and facilitating acceptance of group goals, to be dominant in their influence on the use of the levers of control. This finding is consistent with the context of human service not-for-profit organizations that are heavily dependent on communication, cooperation, and collaboration across multidisciplinary teams for effective service delivery.
Article
Full-text available
Cognitive computing is ushering in the fourth industrial revolution through its promises of improved accuracy, scalability and personalisation. Therefore, business-to-business (B2B) organisations are wavering in the decision for adoption into their digital marketing initiatives. However, embracing moral rules and/or moral judgments in their digital marketing innovation can be challenging, since making mistakes could damage reputations. Therefore, this study applies the ethical principles of cognitive computing in B2B digital marketing business-centric ethical challenges. An integrated theoretical framework grounded on multidisciplinary studies is proposed. The primary data were collected from 300 respondents within B2B businesses. The results of this research led to the conclusion that good ethical practices are essential for the improvement of both organisational effectiveness and organisational reputation. Increased organisational reputation delivers a competitive edge in fast-growing marketplaces. B2B businesses need to look for proactive ways to achieve continuous improvement.
Chapter
The diversity of social impact measurement approaches creates confusion for organisations on the purpose and rationale for measuring social impact. This chapter draws on insights from the field of impact investment to unpack how impact investors explain their rationale for engaging in impact measurement practices. Using discourse analysis, the chapter explores the competing discourses of impact measurement forwarded by impact investment funds in the United Kingdom. The analysis examines the ways in which impact investors justify and explain their use of impact measurement practices on their websites and in their annual report and impact reports. It highlights the need for organisations to appreciate the multi-disciplinary nature of impact measurement as they strive to address the Sustainability Development Goals [Relevant SDGs: SDG17: Partnerships for the Goals].
Article
Full-text available
Increased citizen participation in policy processes through voluntary civic associations warrants an analysis of their effectiveness, which this article undertakes using a multiple constituency framework. We find a gap in the literature on nonprofit effectiveness where theoretical and empirical studies have mainly focused on organizations that directly provide tangible goods and services. We propose a multiple constituency approach to evaluate and understand the implications for assessing the organizational effectiveness of community-based advisory civic associations. We empirically analyze the evaluation of Los Angeles neighborhood councils by three different constituency groups-citizen participants, street-level bureaucrats, and city council staffs. We find that the effectiveness ratings of the constituency groups are dissimilar on different dimensions of effectiveness. These findings suggest that the multiple constituency framework holds theoretical and practical value for understanding the organizational effectiveness of voluntary associations, where the different goals of various stakeholders lead to different views on effectiveness.
Article
The statistical tests used in the analysis of structural equation models with unobservable variables and measurement error are examined. A drawback of the commonly applied chi square test, in addition to the known problems related to sample size and power, is that it may indicate an increasing correspondence between the hypothesized model and the observed data as both the measurement properties and the relationship between constructs decline. Further, and contrary to common assertion, the risk of making a Type II error can be substantial even when the sample size is large. Moreover, the present testing methods are unable to assess a model's explanatory power. To overcome these problems, the authors develop and apply a testing system based on measures of shared variance within the structural model, measurement model, and overall model.
Article
With an empirical study in two nonprofit industries (a money-collecting and blood-collecting organization), the authors investigate how organizational identification and identity salience together function in relation to satisfaction, loyalty, and behavior. They develop and test a model that best represents relationships featuring donor-nonprofit identification and donor identity salience in existing satisfaction-loyalty studies. Overall, the study empirically confirms that donor-nonprofit identification and donor identity salience are distinct constructs and that both have direct positive effects on loyalty, but not that much on donations. Within the money donation context, both identification constructs have stronger total effects on donor loyalty than donor satisfaction, whereas in the blood donation context, donor satisfaction has a stronger effect on loyalty. In testing the causal direction between donor-nonprofit identification and donor satisfaction, the authors also find that the path should be conceptualized from satisfaction to identification. The study contributes to the theory of organizational identification and identity salience by highlighting the advantages of taking a combined theoretical approach. Finally, the study suggests several means to implement donor identification management, including group activities, development of online communities, donor events, and more long-term-oriented tactics, all of which treat the donor as a cocreator of value.
Article
Although we often contrast the cool scientific temperament of modernity to the superstition, ignorance, and magical thought of premodern eras, modern culture is notable less for the degree to which ritual and myth are important than for the kinds of myths and rituals that have been substituted for the religious and nationalist orthodoxies of the past (Meyer, 1988). As Weber foresaw in The Protestant Ethic, first published in 1904, the most powerful of modern icons is “rationality,” the deliberative, instrumental orientation that Weber warned would confine humanity within “an iron cage” until “the last ton of fossilized coal is burnt” (Weber, 1958, p. 181). Impelled by economic competition, sponsored by state bureaucrats and policy elites, supported by an industry of consultants, techniques aimed at maximizing instrumental rationality— by which I mean the systematic attempt to understand and act on one’s understanding of systems of cause and effect—have proliferated (Meyer & Gupta, 1994). Today, the United States is a society in which cost-benefit analysis, performance assessment, and the pursuit of efficiency represent a cultural system that is as much taken for granted and as tightly linked to society’s most powerful institutions as was Buddhism in medieval Japan or the divine right of kings in early modern France (Meyer, Boli, & Thomas, 1987; Dobbin, 1994).
Article
R&D programs are critical for many firms to achieve and sustain competitive advantage. Yet, measuring R&D performance over time can be quite complex due to inherent uncertainty. The paper responds to calls in the R&D literature to explore integrated performance measurement systems that capture financial and nonfinancial performance. We integrate the Stage-Gate approach to R&D management with the Balanced Scorecard to present a framework to show how firms can link resource commitments to these activities and the firm's strategic objectives. In this paper, we provide specific examples of how firms can apply this integrated performance measurement system to the R&D function.
Article
An understanding of organizational behavior requires close examination of the goals of the organization reflected in operating policies. To reach a first approximation of operative goals, a scheme is proposed which links technology and growth stages to major task areas--capital, legitimization, skills, and coordination--which predict to power structure and thence to limits and range of operative goals. The major illustration of the utility of the scheme is provided by voluntary general hospitals; other voluntary and non-voluntary service organizations are discussed, in these terms, as well as profit-making organizations.