Content uploaded by Daniel McNeish

Author content

All content in this area was uploaded by Daniel McNeish on Oct 19, 2018

Content may be subject to copyright.

Thanks Coefficient Alpha, We’ll Take it From Here

Daniel McNeish

Utrecht University &

University of North Carolina, Chapel Hill

ACCEPTED FOR PUBLCIATION IN PSYCHOLGOGICAL METHODS

FEBRUARY 20TH, 2017

AUTHOR NOTES:

Daniel MCNEISH is now at the University of North Carolina, Chapel Hill; 100 E. Franklin

Street Suite 200, Chapel Hill, NC, USA, 27599. Email: dmcneish@email.unc.edu. He wrote

the first version of this paper while an assistant professor in the Department of Methodology

and Statistics, Utrecht University, the Netherlands. All subsequent revisions were completed

at UNC. The information in this paper has not been previously disseminated at a conference

or electronically.

I am indebted to Denis Dumas, Greg Hancock, Katherine Muenks, and Kathryn Wentzel for

conversations that inspired the motivation for this paper. I especially would like to thank

Gjalt-Jorn Peters for his expertise and assistance with the R code included in this paper.

Abstract

Empirical studies in psychology commonly report Cronbach's alpha as a measure of internal

consistency reliability despite the fact that many methodological studies have shown that

Cronbach's alpha is riddled with problems stemming from unrealistic assumptions. In many

circumstances, violating these assumptions yields estimates of reliability that are too small,

making measures look less reliable than they actually are. Although methodological critiques

of Cronbach's alpha are being cited with increasing frequency in empirical studies, in this

tutorial we discuss how the trend is not necessarily improving methodology used in the

literature. That is, many studies continue to use Cronbach's alpha without regard for its

assumptions or merely cite methodological papers advising against its use to rationalize

unfavorable Cronbach's alpha estimates. This tutorial first provides evidence that

recommendations against Cronbach’s alpha have not appreciably changed how empirical

studies report reliability. Then, we summarize the drawbacks of Cronbach's alpha

conceptually without relying on mathematical or simulation-based arguments so that these

arguments are accessible to a broad audience. We continue by discussing several alternative

measures that make less rigid assumptions which provide justifiably higher estimates of

reliability compared to Cronbach’s alpha.. We conclude with empirical examples to illustrate

advantages of alternative measures of reliability including omega total, Revelle’s omega

total, the greatest lower bound, and Coefficient H. A detailed software appendix is also

provided to help researchers implement alternative methods.

ALTERNATIVES TO CRONBACH’S ALPHA 1

Thanks Coefficient Alpha, We’ll Take it From Here

In many areas of psychology and in the behavioral sciences more broadly, variables that

are of interest (e.g., motivation, depression, cognitive abilities) are not directly observable and

are there measured with scales or instruments comprised of a set of items. These items indirectly

measure the variable of interest by inferring that some underlying construct manifests itself

through these items. For example, an MRI study cannot directly measure the amount of

extraversion present in a person’s brain. Rather, items are created and administered to an

individual. If the individual has high extraversion, this trait manifests itself through certain

responses to the items.

Because most measurement in psychology is done through the use of indirect

measurement tools, researchers often report a measure of reliability to demonstrate that the items

composing the measure are reliable, meaning that the scores based on the items are reasonably

consistent, the responses to the scale are reproducible, and that responses are not simply

comprised of random noise. Put another way, a reliability analysis provides evidence that the

scale is consistently measuring the same thing (although, this is distinct from concluding that the

scale is measuring the intended construct– that is a question of scale validity).

In psychology studies, the most commonly used reliability index, by a wide margin, is

Cronbach’s alpha. In a review of reliability reporting practices conducted by Hogan, Benjamin,

and Brezinski (2000), about two-thirds (66%) of studies reporting a reliability measure selected

Cronbach’s alpha. Of those reporting a type of reliability that requires only a single

administration (e.g., not test-retest or interrater reliability), 87% (548 out 633) reported

Cronbach’s alpha (or the KR-20, which is a special case of alpha where all items are binary;

Crocker & Algina, 2008). Indeed, Cronbach’s alpha can be universally found in the pages of

ALTERNATIVES TO CRONBACH’S ALPHA 2

psychology journals in any subfield. As of October 2014, the seminal Cronbach (1951) paper

that first introduced Cronbach’s alpha was the 64th most cited English language research paper

on Google Scholar in any field and, within psychology, is only surpassed by the paper of Baron

and Kenny (1986) on mediation and moderation and the seminal paper of Bandura (1977) on

self-efficacy (van Noorden, Maher, & Nuzzo, 2014). In the last 20 years, however, many

methodological articles have appeared which question how Cronbach’s alpha is applied (Bentler,

2007; Crutzen, 2007; Crutzen & Peters, 2015; Cortina, 1993; Dunn, Baguley, & Brunsden, 2014;

Geldhof, Preacher, & Zyphur, 2014; Graham, 2006; Green & Hershberger, 2000; Green & Yang,

2009a, 2009b; Peters, 2014; Raykov, 1997a,1997b, 1998, 2004; Raykov & Shrout, 2002; Revelle

& Zinbarg, 2009; Schimtt, 1996; Sijtsma, 2009; Teo & Fan, 2013; Yang & Green, 2011;

Zinbarg, Yovel, Revelle, & McDonald, 2006; Zinbarg, Revelle, Yovel, & Li, 2005). These

articles argue that the assumptions made by Cronbach’s alpha are commonly violated in types of

data and models with which psychological researchers work. These arguments have led to the

development of alternative reliability measures whose assumptions are more in-line with

psychological data (Hancock & Mueller, 2001; Jackson & Agunwamba, 1977; McDonald, 1970,

1999; Revelle, 1979). Software routines for calculating these measures are also available in R

packages such as MBESS (Kelley, 2007), psych (Revelle, 2008), or the scaleStructure

function in the userfriendlyscience package (Peters, 2014).

The articles to which we referred in the previous paragraphs are actually fairly well-

known, even among non-methodological researchers. For instance, based on Google Scholar

citation counts, Sijtsma (2009) has over 800 citations, Zinbarg et al. (2005) over 450, Hancock

and Mueller (2001) almost 400, Yang and Green (2001) over 125, and Dunn et al. (2014) over

100 as of October 2016. Although such seemingly high awareness of issues with Cronbach’s

ALTERNATIVES TO CRONBACH’S ALPHA 3

alpha appears reassuring, it does not appear that there have been substantial changes in the use of

Cronbach's alpha.

To provide evidence for this claim and to show the enduring status of Cronbach’s alpha,

we reviewed articles in three flagship APA journals from educational psychology (the Journal of

Educational Psychology; JEP), social psychology (the Journal of Personality and Social

Psychology; JPSP), and clinical psychology (the Journal of Abnormal Psychology, JAP) from

January 2014 until October 2016. We located studies through Google Scholar by searching for

the string “reliability” within these journals. This resulted in 369 total studies (131 from JEP,

118 from JPSP, and 120 from JAP). We filtered out studies that reported types of reliability that

are not of interest to this paper (e.g., interrater reliability), studies where “reliability” only

appeared in the references, or where reliability was not used in in a psychometric sense. This

netted 118 total studies (52 from JEP, 31 from JPSP, and 35 from JAP). Of these 118 studies,

109 (92%) solely used Cronbach’s alpha to assess reliability of the scales used in their study

while 9 (8%) reported an alternative reliability measure either by itself or in addition to

Cronbach’s alpha. Despite the large number of citations of articles calling for alternative

reliability measures, reliability reporting in these flagship APA journals (which have stringent

methodological requirements) appears unchanged from the results reported in the Hogan et al.,

(2000) review. In fact, the aforementioned studies advising against Cronbach’s alpha were nearly

invisible in these APA journals. For example, none of the five aforementioned, highly-cited

papers which advocate for alternative measures were cited more than once each in the 118

reviewed papers.

This evidence suggests that researchers continue to almost exclusively rely on

Cronbach’s alpha as a measure of scale reliability. The pattern that methodological studies are

ALTERNATIVES TO CRONBACH’S ALPHA 4

well-cited but do not appear in flagship journals may suggest that researchers are aware of the

issues with Cronbach’s alpha but are reluctant to adopt new methods because these methods are

not as widely known or accepted, that reviewers may not be familiar with the alternative

methods, that the editorial process does not require more rigorous methods so researchers do not

invest time to learn them, or that researchers are unsure how to obtain estimates of alternative

measures for their data because many are not offered as popular general software packages like

SPSS, SAS, or Stata. This also suggests that the more rigorous methodological work advising

against Cronbach’s alpha has not impacted psychologists as much as it has psychometricians or

statisticians working in psychological domains. Sijtsma (2009) aptly summarizes this by stating,

“while much of Cronbach’s paper was and still is accessible to many psychologists, the work by

Lord, Novick, and Lewis and many others since may have gone unnoticed by most

psychologists. This is truly an example of the gap that has grown between psychometrics and

psychology and that prevents new and interesting psychometric results” (p. 115).

Though it appears promising that methodological papers are highly cited, there is limited

evidence that the findings, conclusions, and recommendations are being incorporated in

empirical studies. This may be taken to suggest that these studies are either being misinterpreted

or not being read in their entirety, possibly because many appear in journals that are aimed at

methodologists and statisticians and therefore may be written at too technical for empirical

researchers with less quantitative training to fully benefit from the arguments being presented.

Consistent with recent recommendations from Sharpe (2013) concerning bridging innovations in

the use of statistical methods in psychology to empirical researchers, the aim of this tutorial

paper is to state as plainly and succinctly as possible why Cronbach’s alpha is often

inappropriate in empirical contexts and why researchers would benefit from abandoning

ALTERNATIVES TO CRONBACH’S ALPHA 5

Cronbach’s alpha in favor of alternative measures. Though there are many resources for readers

capable of following mathematically-based arguments, far fewer resources exist for the large

number of psychological researchers operating below such a level of mathematical

sophistication. As such, the scope of this paper is intended to be very broad to elucidate the

general idea that widespread adoption and continued use of Cronbach’s alpha is detrimental. We

heavily cite previous work in this area that can provide additional technical or nuanced detail on

the issues discussed herein.

To outline this paper, we first discuss the basics behind Cronbach’s alpha including the

restrictive assumptions that often obviate its use. We then overview some of the more

conceptually clear, leading alternatives that can be employed to yield better estimates of

reliability than Cronbach’s alpha. This is followed by a brief comparison of scenarios in which

these alternatives have specific advantages and disadvantages. Rather than lay out mathematical

or logical arguments for why Cronbach’s alpha should not be used as has been the primary

method of previous papers on the topic, we demonstrate some of the issues with Cronbach’s

alpha using example analyses from publicly available datasets. We end with a discussion of why

prolonged use of Cronbach’s alpha is detrimental and how alternative measures are better suited

to accomplish the same goal, often to researchers’ benefit. We provide a heavily annotated

software appendix to help readers employ these methods in their own research so that they can

abandon Cronbach’s alpha in favor of better alternatives.

Basics of Reliability and Cronbach’s Alpha

From a theoretical standpoint, some observed score X for a trait or construct is considered

to have two latent components: the true component T and an error component E such that

X T E

. From a classical test theory perspective (Novick & Lewis, 1967), reliability is

ALTERNATIVES TO CRONBACH’S ALPHA 6

considered to be greater when the variance of the true score component accounts for a higher

proportion of variance in the observed scores relative to the variance attributable to the error

component. More formally, reliability is defined by the ratio of the true score variance to the

observed score variance,

'( ) ( )

XX Var T Var X

. Under this more formal definition, reliability

can also be interpreted as the correlation between scores on two consecutive administrations,

assuming the respondent does not recall their answers from the first administration (hence the

choice of

'XX

as the symbol for reliability).

Although the definition of reliability is relatively straightforward, obtaining an estimate

of reliability is not always so easy. Historically, many methods for assessing reliability (parallel

forms, test-retest, test-retest with parallel forms; Crocker & Algina, 2008) required multiple test

administrations which were then correlated to form an estimate of reliability. Due to logistical

issues of multiple administrations, the ability to calculate reliability from a single test

administration was highly desirable. Cronbach (1951) addressed this in his seminal paper on

internal consistency reliability, the type of reliability on which this paper focuses. Rather than

inspecting the correlation between separate administrations, internal consistency reliability

inspects the relation of each item to all other items from a single administration. If respondents

provide similar answers to a set of items, then their responses would reasonably generalize to

other items from a similar domain, and the set of items would be considered to have high internal

consistency reliability. (Crocker & Algina, 2008).

Cronbach’s Alpha

ALTERNATIVES TO CRONBACH’S ALPHA 7

Cronbach’s alpha (Cronbach, 1951) is by far the most common measure of internal

consistency reliability.

1

Cronbach’s alpha is calculated by

2

2

1

1

i

X

s

k

ks

(1)

where k is the number of items,

2

i

s

is the variance of individual item i where i =1,…, k, and

2

X

s

is

the variance for all items on the scale. This formula is often reported in reduced form as

22

/

ij X

k s s

where

ij

s

is the mean covariance between all pairs of items on the scale (Geldhof

et al., 2014). One can interpret the value of Cronbach’s alpha in one of many different ways:

1. Cronbach’s alpha is the correlation of the scale of interest with another scale of the same

length that intends to measure the same construct, with different items, taken from the

same hypothetical pool of items (Kline, 1986)

2. The square root of Cronbach’s alpha is an estimate of the correlation between observed

scores and true scores (Nunnally & Bernstein, 1994)

3. Cronbach’s alpha is the proportion of the variance of the scale that can be attributed to a

common source (DeVellis, 1991).

4. Cronbach’s alpha is the average of all possible split-half reliabilities from the set of items

(Pedhazur & Schmelkin, 1991).

Under certain assumptions, Cronbach’s alpha is a consistent estimate of the population

internal consistency; however, these assumptions are quite rigid and are precisely why

methodologists have argued against the use of Cronbach’s alpha (Gignac et al., 2007; Graham,

1

Readers should note that there are several criticisms of Cronbach’s alpha about the degree to which is truly

measures internal consistency (e.g., Revelle & Zinbarg, 2009; Sijtsma, 2009). These arguments can become rather

abstract and theoretical so, given the intent of this paper, we will not delve into the specifics and we will use

“internal consistency” as a simplification of what Cronbach’s alpha intends to measures. Do note, however, that

Cronbach’s alpha being a true measure of internal consistency has been called into question on multiple occasions.

ALTERNATIVES TO CRONBACH’S ALPHA 8

2006; Novick & Lewis, 1967; Revelle & Zinbarg, 2009; Yang & Green, 2011). The assumptions

of Cronbach’s alpha are:

1. The scale adheres to tau equivalence

2. Scale items are on a continuous scale and normally distributed

3. The errors of the items do not covary

4. The scale is unidimensional

These assumptions have been stated in other locations (e.g., Green & Yang, 2009a; Yang &

Green, 2011) and demonstrated mathematically (e.g., Bentler, 2009; Sijtsma, 2009) but their

importance (and rigidity) may not necessarily be understood or appreciated in empirical work.

The following subsections will expound these assumptions.

Assumption 1: Tau equivalence. Tau equivalence is the statistically precise way to state

that that each item on a scale contributes equally to the total scale score. To put this assumption

into perspective, imagine that an exploratory factor analysis is run on the scale and a single

factor is extracted (as a researcher would desire). For the tau equivalence assumption to be

upheld, the standardized factor loadings for each item would need to be nearly identical to all

other items on the scale. Figure 1 below shows what hypothetical SPSS output would look like

for a five item scale that does meet tau-equivalence (left panel) and a scale that does not meet tau

equivalence (right panel).

Item

Std.

Loading

Item

Std.

Loading

Q1

0.711

Q1

0.806

Q2

0.714

Q2

0.790

Q3

0.716

Q3

0.725

Q4

0.709

Q4

0.578

Q5

0.721

Q5

0.523

ALTERNATIVES TO CRONBACH’S ALPHA 9

Figure 1. Hypothetical SPSS exploratory factor analysis output for standardized factor loadings

of a 5 item scale that meets tau equivalence (left) and that does not meet tau equivalence (right)

Tau-equivalence tends to be unlikely for most scales that are used in empirical research –

some items strongly relate to the construct while some are more weakly related. Furthermore, if a

scale captures only a single construct, it is unlikely that all the items devised by researchers

capture the construct to an equal degree (Cortina, 1993; Yang & Green, 2011). Put more

technically, most scales are congeneric (Geldhof et al., 2014; Graham, 2006; Peterson & Kim,

2013) which means that the items measure the same construct, but they do so with different

degrees of precision (Raykov, 1997a). Such disparities between the quality of the individual

items does not mean that the weaker items necessarily need to be removed, but it does violate the

assumptions made by Cronbach’s alpha with the result being that Cronbach’s alpha will be too

low (Miller, 1995).

In the likely event that the assumption of tau equivalence is violated, Cronbach’s alpha

becomes a lower-bound estimate of internal consistency rather than a true estimate, provided that

errors are reasonably uncorrelated (Graham, 2006; Sijtsma, 2009; Yang & Green, 2011). This

results in Cronbach’s alpha estimates that can vastly underestimate the actual value of reliability

– even if just a single item on the scale is responsible for the violation of tau equivalence

(Raykov, 1997b). A simulation by Green and Yang (2009) found that Cronbach’s alpha may

underestimate the true reliability by as much as 20% when tau equivalence is violated (e.g., if the

true reliability is 0.70, Cronbach’s alpha would estimate reliability in the mid-0.50s).

Furthermore, the degree of underestimation is greatest when scales have a fairly small number of

items (e.g., less than 10), which is often the case in empirical psychological research (Graham,

2006).

ALTERNATIVES TO CRONBACH’S ALPHA 10

Assumption 2: Continuous Items with Normal Distributions. As noted in discussions

of Equation 1, Cronbach’s alpha is largely based on the observed covariances (or correlations)

between items. In most software implementations of Cronbach’s alpha (such as in SAS and

SPSS), these item covariances are calculated using a Pearson covariance matrix (Gadermann,

Guhn, & Zumbo, 2012). A well-known assumption of Pearson covariance matrices is that all

variables are continuous in nature. Otherwise, the elements of the matrix can be substantially

biased downward (i.e., the magnitudes will be closer to 0 than they should be; Flora & Curran,

2004). However, it is particularly common for psychological scales to contain items that are

discrete (e.g., Likert or binary response scales), which violates this assumption. If discrete items

are treated as continuous, the covariance estimates will be attenuated, which ultimately results in

underestimation of Cronbach’s alpha because the relations between items will appear smaller

than they actually are.

2

To accommodate items that are not on a continuous scale, the covariances between items

can instead be estimated with a polychoric covariance (or correlation) matrix rather than with a

Pearson covariance matrix. Polychoric covariance matrices assume that there is an underlying

normal distribution to discrete responses. For instance, imagine a three category Likert item

whose response choices consist of Agree, Neutral, and Disagree. A polychoric covariance matrix

first assumes that these response choices map onto a normal distribution whereby there is no

longer three distinct categories but a continuous range of “agreement”. Then thresholds are

estimated which can conceptually be thought of as cut-points on the continuous agreement scale

that separate the response categories. So, respondents at the 40th percentile or below on the

2

Likert scales with many response options can often be treated as continuous without any adverse effects. The

definition of how many response options constitutes “many” has been debated in the methodological literature. In

latent variable models broadly, Rhemtulla, Broussard, & Savalei (2012) recommend 5. In the specific context of

Cronbach’s alpha, Gaderman et al., (2012) recommended 7 response options.

ALTERNATIVES TO CRONBACH’S ALPHA 11

hypothetical agreement continuum may be considered in the “Disagree” category, respondents

between the 40th and 80th percentile on the hypothetical agreement continuum would correspond

to the “Neutral” category, and respondents above the 80th percentile would correspond to the

“Agree” category (the percentile cut-points are estimated and would change for each item).

Provided that it is reasonable to assume that a normal distribution underlies the discrete options,

the polychoric covariance estimates correct the attenuation that occur when discrete items are

treated as continuous (Carroll, 1961). Gadermann et al. (2012) demonstrate how using a

polychoric covariance matrix with Cronbach’s alpha can addresses underestimation of reliability

attributable to discrete items.

Another related and less commonly considered assumption is that both the true scores

and the errors are normally distributed (e.g., van Zyl, Neudecker, & Nel, 2000, Zimmerman,

Zumbo, & LaLonde, 1993). Studies investigating the effect of non-normal distributions on

Cronbach’s alpha have been mixed. Zimmerman et al. (1993) generally conclude that

Cronbach’s alpha is fairly robust to deviation from normality. On the other hand, Sheng and

Sheng (2012) reported that leptokurtic distributions lead to negative bias (i.e., reliability

estimates are too low) while platykurtic distributions lead to positive bias (i.e., reliability

estimates are too high). In the simulation in Sheng and Sheng (2012), these biases dissipated as

sample size and the magnitude of the true reliability increased.

Assumption 3: Uncorrelated errors. Although frequently overlooked (Zumbo & Rupp,

2004), the assumption that errors are uncorrelated is also required when utilizing Cronbach’s

alpha. Correlated errors occur when sources other than the construct being measured cause item

responses to be related to one another. Correlated errors between items may arise for a variety of

reasons including the order of the items on the scale (Cronbach & Shavelson, 2004; Green &

ALTERNATIVES TO CRONBACH’S ALPHA 12

Hershberger, 2000), speeded tests (Rozeboom, 1966), transient responses where feelings or

opinions may change over the course of the scale (Becker, 2000; Green 2003), or unmodeled

multidimensionality of a scale (Steinberg & Thissen, 1996). Unlike the tau equivalence

assumption, the impact of correlated errors does not necessarily bias Cronbach’s alpha estimates

in a predictable direction, meaning that violations can lead to either overestimates or

underestimates of reliability. When errors are correlated, the correlations are often positive which

will result in Cronbach’s alpha overestimating the reliability (Bentler, 2009; Green &

Hershberger, 2000; Green & Yang, 2009b). When correlated errors are not accounted for in the

calculation of reliability, Cronbach’s alpha can be overestimated by as much as 20% (Gessaroli

& Folske, 2002).

Some reasons for error covariances are innocuous while others are much more

problematic. For instance, if error covariances are necessary because of item order effects, error

covariances can be incorporated to yield appropriate estimates. On the other hand, if the error

covariances are needed due to unmodeled dimensions in the scale, this eliminates nearly all

support for using the scale (i.e., the assumption of unidimensionality is violated – this

assumption is discussed next). Unfortunately, considerations for which of these mechanisms is

responsible for the covariances is difficult to determine empirically. It is difficult to test whether

error covariances are non-null because there are often not sufficient degrees of freedom to

include many error covariances into the model. Possible solutions to such a violation are

discussed in subsequent sections.

Assumption 4: Unidimensionality. Though Cronbach’s alpha is sometimes thought to

be a measure of unidimensionality because its colloquial definition is that it measures “how well

items stick together”, unidimensionality is an assumption that needs to be verified prior to

ALTERNATIVES TO CRONBACH’S ALPHA 13

calculating Cronbach’s alpha rather than being the focus of what Cronbach’s alpha measures

(Cortina, 1993; Crutzen & Peters, 2015; Green, Lissitz, & Mulaik, 1977; Schmitt, 1996).

Although the terminology is not universally accepted (c.f., Sijtsma, 2009), Schmitt (1996) makes

the distinction between unidimensionality and internal consistency. He defines internal

consistency as the interrelatedness of a set of items while unidimensionality is the degree to

which the items all measure the same underlying construct.

Green et al. (1977) note that internal consistency is necessary for unidimensionality but

that internal consistency is not sufficient for demonstrating unidimensionality. That is, items that

measure different things can still have a high degree of interrelatedness, so a large Cronbach’s

alpha value does not necessarily guarantee that the scale measures a single construct. As a result,

violations of unidimensionality do not necessarily bias estimates of Cronbach’s alpha. In the

presence of a multidimensional scale, Cronbach’s alpha may still estimate the interrelatedness of

the items accurately and the interrelatedness of multidimensional items can in fact be quite high

(Cortina, 1993; Schmitt, 1996; Sijtsma, 2009).

Many papers (e.g., Crutzen & Peters, 2015; Schmitt, 1996; Green & Yang, 2009)

recommend beginning any reliability analysis with an inspection of the factor structure of the

scale, specifically examining whether a one-factor model fits well via inferential tests like the

minimum fit function chi square statistic or via fit index values. Though vitally important to the

interpretation of scales, a review by Crutzen and Peters (2015) found that only 2.4% of health

psychology studies reported any information about the dimensionality of the scale beyond

assessments of reliability. Many leading alternatives to Cronbach’s alpha (discussed in detail in

the next section), make explicit use of the factor analytic approach to reliability, facilitating the

presentation of dimensionality and reliability side-by-side.

ALTERNATIVES TO CRONBACH’S ALPHA 14

Alternatives to Cronbach’s Alpha

There are many methods available to assess the reliability of scales. Hattie (1985) reviews

about 30 such methods and there are undoubtedly many additional methods that have been

developed in the 30+ years since this review was published. Our intention is not to update Hattie

(1985) by providing a broad overview of all the possible alternatives to Cronbach’s alpha that are

available. Instead, we focus on three particular methods: omega coefficients, Coefficient H, and

the greatest lower bound. These three alternatives are selected because (1) they have been shown

to perform well in previous studies, (2) they do not make as strict assumptions as Cronbach’s

alpha, and (3) they are conceptually similar to Cronbach’s alpha, so the idea of each should be

relatively familiar if one understands Cronbach’s alpha.

Omega and Composite Reliability

Composite reliability is conceptually related to Cronbach’ alpha in that it assesses

reliability via a ratio of the variability explained by items compared to the total variance of the

entire scale (Bentler, 2007; Geldhof et al., 2014; Raykov, 1997a, 1997b, 1998). Omega

(McDonald, 1970, 1999) is a commonly recommended measure of composite reliability that is

available in multiple software programs. Omega is designed for congeneric scales, where the

items vary in how strongly they are related to the construct being measured (i.e., in a factor

analysis setting, the loadings would not be assumed to be equal). In other words, where tau

equivalence is not assumed. Composite reliability is appropriate when the items from a scale are

unit-weighted to form the total scale score but the scale itself in congeneric (Bentler, 2007;

Geldhof et al., 2014). A unit-weighted scale means that the total score of the scale is calculated

by adding up the raw scores (or reverse coded raw scores, if appropriate) of the individual items:

each item is weighted equally.

ALTERNATIVES TO CRONBACH’S ALPHA 15

There are multiple variations of omega including omega hierarchical, omega total, and

what we will refer to as “Revelle’s omega total”. Omega hierarchical is useful for scales that

may not be truly unidimensional and may contain additional minor dimensions (Zinbarg et al.,

2006). Omega hierarchical attempts to parse out the variability attributable to sub-factors and

calculates reliability for a general factor that applies to all items. Although highly advantageous,

omega hierarchical differs from Cronbach’s alpha conceptually, so we will only provide a broad

overview here (although we do recommend its use if researchers believe that the items in the

scale are organized in hierarchical factors).

Omega total, on the other hand, assumes that the scale is unidimensional and estimates

the reliability for the composite of items on the scale (which is conceptually similar to

Cronbach’s alpha). In the R software environment, two packages (MBESS and psych)

calculate versions of omega total. However, they yield different results because MBESS uses a

different specification which generally tends to be more conservative and yields estimates closer

to Cronbach’s alpha (Peters, 2014; Revelle & Zinbarg, 2009; Revelle, 2016). We overview the

properties and formulas for each version of omega total in the next subsections. Though both

versions are typically referred to as “omega total”, we assign different names to each version

help keep them distinct. We refer to the omega total value based on the psych R package

specification as “Revelle’s omega total”. We use “omega total” to refer to the version calculated

by the MBESS R package (and as presented in many other sources).

Omega total. Under the assumption that the construct variance is constrained to 1 and

that there are no error covariances, omega total is calculated from factor analysis estimates such

that

ALTERNATIVES TO CRONBACH’S ALPHA 16

2

1

2

11

k

i

i

Total kk

i ii

ii

(2)

where

i

is the factor loading (not necessarily standardized) for the ith item on the scale,

ii

is the

error variance for the ith item, and k is the number of items on the scale. Omega total can only be

calculated if the scale is first factor analyzed to obtain the factor loadings and error variances.

This is necessary because tau equivalence is no longer assumed and the potentially differential

contribution of each item to the scale must be assessed.

Although perhaps not immediately intuitive, Equation 2 is identical to the Cronbach’s

alpha formula in Equation 1 under the condition of tau equivalence (Geldhof et al., 2014). The

condensed equation for Cronbach’s alpha that appears under Equation 1 can alternatively be

written as

2

ij X

ij

k

because

ij

ij

ij k

. From factor analysis path tracing

rules, the model-implied covariance for a pair of items (with no error covariances) that load on

the same factor is equal to the square of the loadings (times the factor variance which is assumed

to be equal to 1). Under tau equivalence, all the loadings are equal, so the total true score

variance is equal to the item covariance for a single pair of items, repeated k times. In both

Equation 1 and Equation 2, this variance is divided by the total variance of the scale. The

denominator is Equation 2 is the factor analysis representation of

2

X

s

from Equation 1. As such,

omega total is a more general version of Cronbach’s alpha and actually subsumes Cronbach’s

alpha as a special case. More simply, if tau equivalence is met, omega total will yield the same

result as Cronbach’s alpha but omega total has the flexibility to accommodate congeneric scales,

unlike Cronbach’s alpha.

ALTERNATIVES TO CRONBACH’S ALPHA 17

Similar to Cronbach’s alpha, omega total overestimates reliability if errors have a positive

covariance. The omega total formula in Equation 2 assumes that errors are uncorrelated, though

it can be generalized to cases where this assumption is violated by altering the denominator term

to account for error covariance such that,

3

2

1

2

1 1 2 1

2

k

i

i

TCov k k k i

i ii ij

i i i j

(3)

If the residual covariances may be attributable to additional minor dimensions, then omega

hierarchical will yield a more accurate estimate of the reliability of the scale (Zinbarg et al.,

2006). Extensions of omega total are also available for cases where the factor variance is not

assumed to be 1 (Raykov, 2004) and when the data contain multiple groups (Zinbarg, Revelle, &

Yovel, 2007). These extensions, however, are outside the scope of this introduction and will not

be discussed further.

Revelle’s omega total. Though similar in name and idea, Revelle’s omega total can yield

quite different (and typically larger) estimates of reliability than omega total. This is due to a

different, more sophisticated variance decomposition that is used. In Revelle’s omega total, a

factor model is estimated as with omega total. However, the solution is then transformed with a

Schimd-Leiman rotation (Schmid & Leiman, 1957). Though we will not go into full detail

regarding this rotation because it is rather technical and full detail is outside the scope of this

3

Note that, although the inclusion of the error covariances in the denominator appropriately takes the extra source of

variation into account, it does not solve the broader issue of why there is error covariance. That is, whether the error

covariance is attributable to a model misspecification where an important factor has been omitted from the model

(Green & Hershberger, 2000) or whether design-driven aspects of the scale led to the correlated errors (e.g., speeded

tests; Cole, Ciesla, & Steiger, 2007). Bentler (2009) nicely summarizes this issue by stating ““It would seem that the

question of whether to consider correlated errors as factors and hence part of the common factor space, or as residual

covariances and hence as part of the unique space, should be left up to the goals of the investigator.” (pp. 139).

ALTERNATIVES TO CRONBACH’S ALPHA 18

paper (for full details, see Mansolf & Reise, 2016 or Wolff & Preising, 2005), the general idea is

to rotate the factor solution to a bifactor model where there is one general factor and several

minor factors. More specifically, each item will load on the single general factor (g), one or more

group factors (f), and an item-specific factor (s). The communality is then calculated by squaring

the loadings of the general factor and the group factor(s) but not the item-specific factors

(Revelle, 2016).

The formula for Revelle’s omega total is essentially the same as Equation 2; however, it

is more complex to account for the differential variance decomposition and additional minor

factors. Namely, Revelle’s omega is equal to

2

2

1 1 1

f

k

kF

gi fi

i f i

RT

X

V

(4)

where

gi

is the loading of the ith item on the general factor,

fi

is the standardized loading of

the ith item on the fth group factor, k is the total number of items, F is the total number of group

factors, and

f

k

is the number of items that load on the fth group factor.

X

V

is the total variance

after rotation which is equal to the sum of each element of the sample Pearson (or polychoric)

correlation matrix (in matrix notation, this can be succinctly written as

T

1 R1

where R is the

sample correlation matrix).

Omega hierarchical is based on the exact same Schmid-Leiman transformation except

that it only considers contributions of the general factor and disregards the loadings of both the

group factors in addition to the item-specific factors,

2

1

k

gi

i

H

X

V

(5)

ALTERNATIVES TO CRONBACH’S ALPHA 19

For interested readers, Kelley and Pornprasertmanit (2016) provide a highly readable description

of omega hierarchical and when it should be used. Readers looking for complete details on

omega hierarchical are referred to Zinbarg et al. (2005)

Though the formulas may look intimidating, the idea is quite straightforward because

software will handle the rotation and complexities of the formula. Explanations of how these

values are extracted from the data are provided in the software appendix.

Coefficient H and Maximal Reliability

Should researchers want to use the information present from the factor loadings to create

a scale that is optimally-weighted where each item contributes different amounts of information

to the overall scale score (instead of each item being given the same weight with unit-weighting),

then maximal reliability is a more appropriate measure of the scale’s reliability (Bentler, 2007;

Hancock & Mueller, 2001; Raykov, 2004).

4

Hancock and Mueller (2001) derived Coefficient H

as a measure of maximal reliability for an optimally-weighted scale. Similar to the form of

omega total presented in Equation 2, Coefficient H requires the (standardized) factor loadings

from a unidimensional factor analysis of the scale (or from unidimensional subscales).

Coefficient H is calculated by.

1

1

2

2

1

11

ki

ii

H

(6)

where k is again the number of items on the scale and

i

is the standardized factor loadings for

the ith item. Unlike Equation 2, notice that the squaring of the factor loadings occurs prior to

summing over the each of the items. Both Cronbach’s alpha and omega (all versions) are

4

When using optimal weighting, the contribution of each item to the scale score is based on the magnitude of its

standardized factor loading. For example, an item with a standardized loading of 0.90 would have a much larger

impact on the scale score than an item with a standardized loading of 0.50.

ALTERNATIVES TO CRONBACH’S ALPHA 20

adversely affected by items with negative loadings, whereas Coefficient H squares the loadings

first so that magnitude (and not sign) is the only important feature. This means that negatively

worded items do not need to be reverse coded with Coefficient H.

There are several other features of Coefficient H that differentiate it from omega total.

First, error variances are not included in the denominator of the equation. This means that items

with weak factor loadings do not negatively affect Coefficient H as they do in the computation of

omega total. In Equation 2, an item with a weak loading will necessarily have a large error

variance (i.e., the underlying construct accounts for a small percentage of the variance, so the

remaining variance must be attributable to error). In Coefficient H, the scale is not penalized for

featuring weaker items because its intended use is for optimally-weighted scales. For example,

whereas adding an item completely unrelated to the construct of interest to a scale reduces

reliability for Cronbach’s alpha and omega (which are appropriate for unit-weighted scales), with

optimal-weighted scales, an unrelated item’s factor loading will essentially be 0 and the

information from this item would not affect the scale scores. Put another way, in unit-weighted

scales, every item receives equal treatment so an unrelated item hurts the scale; in optimally-

weighted scales, items are differentially weighted so an unrelated item does not hurt reliability

because the item simply receives very little or zero consideration when scoring the scale.

Another property exclusive to Coefficient H is that the reliability of the scale cannot be less than

the squared loading (the definition of reliability in factor analytic models) of the single best item

(Geldhof et al., 2014).

Greatest Lower Bound

The greatest lower bound (GLB) is a class of methods for assessing reliability which are

all based on the same conceptual idea. First introduced by Jackson and Agunwamba (1977), the

ALTERNATIVES TO CRONBACH’S ALPHA 21

GLB is based on the classical test theory approach to reliability. First, the GLB extends the

classical test theory formula from

X T E

to

( ) ( ) ( )Cov Cov CovX T E

– the covariance

matrix of all observed scores X is equal to the covariance matrix of all trues scores T plus the

covariance matrix of all the errors E (Shapiro & ten Berge, 2000; ten Berge & Sočan, 2004).

Conceptually, Jackson and Agunwamba (1977) argued that the greatest lower bound for

reliability could be calculated from the estimate of the covariance matrix of E with the largest

trace that is consistent with the data (provided that Cov(T) and Cov(E) are non-negative

definite).

5

Once the estimated covariance matrix for E with the largest trace is found, GLB

reliability is calculated by

2

[ ( )]

1

X

trace Cov

GLB s

E

(7)

where

2

X

s

is the variance of the observed items. More simply, the goal is to determine the

maximal values for the error component of the observed scores that is consistent with the data

because reliability calculated with these maximum errors will yield the lowest possible value for

reliability (Sočan, 2000). Jackson and Agunwamba (1977) showed that Cronbach’s alpha and

other single administration measures like split-half reliability can be shown to be based on the

same principle as the GLB with the exception that they inefficiently estimate Cov(E) and

therefore do not exceed the theoretical GLB value.

Though appealing theoretically, a major challenge for GLB reliability is its computation.

The difficulty stems from finding the estimate of Cov(E) that maximize the trace. In fact, a

simple analytical solution is generally impossible, so several iterative methods have been

proposed to determine this matrix with leading candidates being the minimum rank factor

5

The trace of a matrix is computed by adding up all of the diagonal elements and non-negative definite means that

the diagonal elements of the matrix are 0 or larger.

ALTERNATIVES TO CRONBACH’S ALPHA 22

analysis (MRFA) approach of ten Berge and Kiers (1991) and the GLB algebraic solution from

Moltner and Revelle (2015) (both of which can be implemented in R). An additional limitation

of GLB reliability is that it tends to overestimate reliability with smaller sample sizes (e.g., bias

is rather large with a sample size of 100 but is reasonable with a sample size of 500; Shapiro &

ten Berge, 2000; Trizano-Hermosilla & Alvarado, 2016).

Practical Comparison of Methods

Table 1 compares the six aforementioned methods (Cronbach’s alpha, omega total,

Revelle’s omega total, omega hierarchical, Coefficient H, and the GLB) based on practical

considerations. That is, because adopting new statistical approaches often entails a steep learning

curve, Table 1 does not compare strict statistical properties or asymptotic behavior but rather

overviews which software can compute each method, whether the method is calculable by hand,

notable conceptual advantages, and notable conceptual disadvantages. Alternatives to

Cronbach’s alpha tend to have very little support in general software, so the easiest measures to

report are omega total or Coefficient H because they can be calculated using a simple

spreadsheet. More computationally intensive measures are only currently supported in R. We

realize that R is not the first-choice software for many psychologists, so extensive annotated R

code is provided in an appendix to assist in calculating measures that require more computational

resources (e.g., Schmid-Leiman transformation, MRFA).

TABLE 1 ABOUT HERE

Empirical Examples

In this section, we provide example analyses to demonstrate the shortcomings of

Cronbach’s alpha. The first example dataset is based on a subsample of the Early Childhood

Longitudinal Study – Kindergarten (ECLS-K) from the United States’ National Center for

ALTERNATIVES TO CRONBACH’S ALPHA 23

Educational Statistics. The data include 21,054 students and thousands of variables such as direct

cognitive assessments of students, teacher reports of students, parental reports of students, and

detailed information about demographic information and students’ home life at seven time-

points. The data are publicly available from the United States’ National Center for Educational

Statistics (https://nces.ed.gov/ecls) and are intended to allow researchers to answer research

questions pertaining to child development, school readiness, and experiences in schools. We

used a subsample consisting of 1977 students who had complete math and reading scores at all

seven waves of the study. Socioeconomic status is not captured by a single variable in ECLS-K,

therefore researchers have argued and demonstrated that it is more fruitful to form a scale for

socioeconomic status using variables that capture different aspects of socioeconomic status

(Curran & Kellogg, 2016; Lubienski & Crane, 2010). In this example, we use 9 variables:

Mother’s Education, Father’s Education, Household income (in dollars), parents’ expectation of

child’s eventual education level, the number of books the child has, whether the child qualifies

for free or reduced lunch, whether the parent volunteers at school, whether there is a computer in

the house (these data were collected in the late 1990s when home computers were not

ubiquitous), and whether the child is enrolled in music lessons. These variables were collected

during the fall semester of the child’s kindergarten year. The first example primarily

demonstrates how the assumption of tau equivalence adversely affects Cronbach’s alpha in ways

that do not affect other measures. Differences between reliability for optimally-weighted and

unit-weighted scales are also shown.

The second example contains responses to 25 Likert items from the Big Five Inventory

for personality traits. The data contain responses from 2800 people and were collected as part of

the Synthetic Aperture Personality Assessment (SAPA) project (Revelle, Wilt, & Rosenthal,

ALTERNATIVES TO CRONBACH’S ALPHA 24

2010). The data are freely available in the psych R package as the “bfi” data. This example

shows how the various measures are similar when tau equivalence is approximately met and how

the measures diverge when scales are congeneric. The data in this example are based on Likert

items, so the example also shows how reliability is attenuated if discrete responses are treated as

continuous and how discrete items similarly affect alternatives measures as well.

Although we previously listed other assumptions earlier in the text, these examples

primarily focus on violations of the tau equivalence and continuous item assumptions. This is

intentional because these assumption violations are the most often violated assumptions of

Cronbach’s alpha and are the simplest assumptions to relax.

ECLS-K Example

To demonstrate the large violation of tau-equivalence in these data, we first perform a

likelihood ratio test comparing a model with constrained standardized loadings across all items

to a model with standardized loadings freely estimated for all items. We reverse coded the Free

or Reduced Lunch variable because its loading was negative, which would adversely affect fit.

With all loadings constrained,

2(35) 625.33

, SRMR = .12, McDonald Centrality = .83

6

and

the standardized loading for all items was estimated to be 0.48. When loadings were allowed to

be unconstrained,

2(27) 160.52

, SRMR = .05. McDonald Centrality = .96. A likelihood ratio

test of these two models results in a value of

2(8) 464.81

which is clearly significant (the

6

Hu and Bentler (1999) recommend McDonald’s Centrality > .90 and SRMR < .09 as a combinational rule that

minimizes the sum of Type-I and Type-II errors (pp. 26) while McDonald’s Centrality > .93 and SRMR < .06 also

worked fairly well but tended to over-reject true models. We use this criteria to establish goodness-of-fit throughout

these examples because factor models for scales with few items tend to have few degrees of freedom, for which

RMSEA vastly over-rejects well-fitting models (Kenny, Kaniskan, & McCoach, 2014) and because the sample size

in both models is rather large, which may render the chi-square test overpowered (e.g., Hu & Bentler, 1998). Note

that there has been a steady wave of criticism against generalizing the Hu and Bentler cut-offs (e.g., Marsh, Hau, &

Wen, 2004; Hancock & Mueller, 2011) although our examples fall fairly closely to their original simulation design

(factor model with 5 items per factor and standardized loadings near 0.70).

ALTERNATIVES TO CRONBACH’S ALPHA 25

0.05 cut-off is 15.51) and indicates that the model with constrained loadings fits significantly

worse. The standardized loadings for the unconstrained model are presented in Table 2, which

clearly show a wide range of standardized factor loadings (Range: 0.21 to 0.76). The fit indices

also provide evidence that the scale is unidimensional because a one factor solution fits the data

reasonably well. Table 2 provides the reliability estimates using Cronbach’s alpha, omega total,

Revelle’s omega total, the GLB, and Coefficient H. If Cronbach’s alpha is used, the value is in

the mid .70s which would result in the scale being seen as “acceptable” using common

guidelines from Kline (1986) and Devalis (1991). However, recall that the loadings in this

example are highly discrepant and that this negatively biases Cronbach’s alpha estimates. Using

an alternative measure of reliability results in noticeable increases in reliability estimates, as high

as 10% with Coefficient H.

Although many researchers would consider removing the Music Lessons variable due to

its low loading, we have retained it to demonstrate the difference in reliability estimates for unit-

weighted and optimally-weighted scales. For Cronbach’s alpha, both omega totals, and the GLB,

a weakly related item decreases reliability because each item receives equal consideration when

computing scale scores. However, optimally weighted scales (for which Coefficient H is

appropriate) differentially weight each item based on its factor loading. As a result, Coefficient H

in this case is higher (5% higher than Revelle’s omega total) because the Music Lessons variable

is heavily down-weighted and the other, more reliable items would be weighted much more

heavily when scale scores are computed. As a reminder, even though it may be appealing to

report Coefficient H in such a case because it is higher, it is only appropriate if the scale score is

calculated using optimal weights.

Big Five Inventory Example

ALTERNATIVES TO CRONBACH’S ALPHA 26

Unlike the previous example where tau equivalence was badly violated, this example

features five subscales with various gradations of (possible) violations to tau equivalence. Table

3 shows the standardized factor loadings based on the Pearson and polychoric correlation

matrices. Both sets of results were obtained in R using the psych package and the

scaleStructure wrapper from the userfriendlyscience package (details are

provided in the appendix). Each subscale in this dataset contains five items that are intended to

be unidimensional (i.e., each item only measures a single construct). To assess the

unidimensionality of these subscales, SRMR and McDonald’s Centrality are provided for each

subscale; the values for each subscale meet the suggested guidelines and we continue under the

assumption that unidimensionality for each sub-scale is preserved.

Upon initial inspection of Table 3, the various subscales adhere to tau equivalence to

varying degrees. The loadings for the Conscientiousness subscale are rather close to one another

(magnitude range: 0.55 to 0.67 using a Pearson covariance matrix, 0.58 to 0.72 using a

polychoric covariance matrix). On the other hand, the loadings for the Agreeableness subscale

are quite variable (Range: 0.37 to 0.76 using a Pearson covariance matrix, 0.43 to 0.80 using a

polychoric covariance matrix). To more rigorously demonstrate the similarity of the loadings on

the Conscientiousness subscale, we constrained the standardized loadings to be equal and

compared the fit to a model where all loadings are freely estimated. The likelihood ratio test was

significant

2(4) 28.17, .01p

but the changes in the SRMR (

SRMR .0125

) and

McDonald’s Centrality (

McDonald .0048

) were rather small.

7

We proceed by allowing the

7

When sample size is large, some studies have recommended using change in fit indices instead of likelihood ratio

test (e.g., Cheung & Rensvold, 2002; F.F. Chen, 2007). Although the field has not uniformly accepted this approach

(e.g., Barrett, 2007), these changes in fit indices between models are below the recommend cut-offs (less than .025

for SRMR when testing loadings, greater than -.005 for McDonald’s Centrality; Chen, 2007).

ALTERNATIVES TO CRONBACH’S ALPHA 27

loadings to be freely estimated, but we treat the Conscientiousness subscale as an exemplar of

the behavior of the various reliability measures when tau equivalence is roughly appropriate.

Table 4 shows the estimated reliability using Cronbach’s alpha, omega total, Revelle’s

omega total, the GLB (using the MRFA approach), and Coefficient H using both a Pearson

covariance matrix and a polychoric covariance matrix. First, notice that when the subscale is

very closely tau equivalent (as in the Conscientiousness subscale), there are small differences

between the various reliability measures.

8

However, the difference between the estimates grows

larger the as the subscales deviate from tau equivalence with relative percentage increases over

Cronbach’s alpha ranging from 5 to 12% across subscales.

This example also shows the effect of treating truly discrete items as continuous when

calculating reliability, which is an assumption of all methods because each use the inter-item

covariance matrix in some form in their calculation. Even though item responses are on a six

point Likert scale, the reliability estimates using the polychoric covariance matrix are noticeably

larger because treating the items as continuous attenuates the covariances. Across each subscale,

the estimates based on the polychoric covariance matrix are between .02 to .11 points higher for

the same measure than if the Pearson covariance matrix is used. Regardless of which method is

used to calculate reliability, when assessing reliability, it is important to consider the scale of the

responses.

Among the various alternatives to Cronbach’s alpha, the expected trends can be seen in

this example. First, Cronbach’s alpha consistently yields the lowest estimate of reliability. This is

expected because Cronbach’s alpha is the only method making the tau equivalence assumption

8

When a scale is perfectly tau equivalent, omega total and Coefficient H will be identical to Cronbach’s alpha,

provided that all other assumptions are met. With tau equivalence, there is no difference between unit weighting and

optimal weighting because, with optimal weighting and tau equivalence, each item receives the same weight. The

GLB will not necessarily be equal to Cronbach’s alpha, even if a scale is tau equivalent (Sočan, 2000).

ALTERNATIVES TO CRONBACH’S ALPHA 28

which is rarely tenable and inappropriate for at least four of the five subscales in this example.

Second, when subscales have an item that has a noticeably poor item relative to the other items

(e.g., Item1 on Agreeableness, Item4 on Openness), Coefficient H tends to provide larger

reliability estimates than omega total, the GLB, and sometimes than Revelle’s omega total

because the scale would be better scored using optimal weighting (to down-weight the impact of

the poor item). When subscales have factor loadings in the same general vicinity (but not

necessarily close enough to be considered approximately tau equivalent), the GLB and Revelle’s

omega total yield higher estimates than Coefficient H. In the case of approximate tau

equivalence, Coefficient H converges to Cronbach’s alpha whereas the GLB is known to exceed

Cronbach’s alpha in such instances (e.g., Sočan, 2000). When there is moderate separation

between the loadings of the various items (as on the Neuroticism subscale), Coefficient H and the

GLB are approximately equal.

Take-Home Message

The take-home message of these examples is that there is a vast discrepancy in the

reliability estimates when applying the conventional Cronbach’s alpha compared to employing

alternative methods. In the Big Five Inventory example, Cronbach’s alpha for the Openness

subscale using a Pearson covariance matrix is .61 which would be classified as borderline poor

(Kline, 1986 and DeValis, 1991 designate the “poor” classification at <.60) and would likely

need to be defended if a manuscript were submitted for publication. However, by appropriately

accounting for the discreteness of the responses and using a method that does not mandate tau

equivalence, Revelle’s omega total, the GLB, and Coefficient H estimate the reliability to be

well above .70. The GLB yields the highest estimate at .76, 25% higher than the Cronbach’s

alpha estimate based on the Pearson covariance matrix.

ALTERNATIVES TO CRONBACH’S ALPHA 29

Discussion

Although Cronbach’s alpha is familiar, commonly reported, and easy to obtain in

software, it is rarely an appropriate measure of reliability - its assumptions are overly rigid and

almost always violated. Worse yet, under the near ubiquitous violation of tau equivalence,

Cronbach’s alpha estimates make scales appear much less reliable than they are in actuality.

Moreover, even if all assumptions are met, Cronbach’s alpha is a special case of the alternative

measures overviewed in this paper meaning that, even if Cronbach’s alpha is appropriate, other

methods will yield the exact same values and others (Revelle’s omega total and the GLB) have

been shown to routinely exceed Cronbach’s alpha. Quite plainly, there is no situation where

Cronbach’s alpha is the optimal method for assessing reliability.

Despite a steady stream of criticism against Cronbach’s alpha, researchers continue to

report it in flagship APA journals, as reviewed in the introduction. A common tactic when

reporting unfavorable values of Cronbach’s alpha is to appeal to the weakness of the method.

This approach, while well-intended, is highly problematic for the scientific process because it

impedes the ability to identify scales with less desirable properties. That is, if a scale has a

Cronbach’s alpha value of 0.40, the value could be low because (1) the scale is not reliable or (2)

the scale is sufficiently reliable but assumption violations led to downwardly biased estimates of

Cronbach’s alpha. This uncertainty leads towards a dichotomy where either (1) the use of the

scale is supported because reliability is sufficiently high (e.g., 0.70 or greater) or (2) Cronbach’s

alpha should be higher but was underestimated because assumptions were violated and the scale

is still usable. Such a dichotomy hides a third option which is simply that the scale is not reliable.

In the long run, it does the field little good to use faulty methods whose results may subsequently

ALTERNATIVES TO CRONBACH’S ALPHA 30

be disregarded; the process of scale validation at such a point becomes highly subjective and not

readily falsifiable, eroding the credibility of psychometric analysis.

Given that many psychologists employ latent variable methods (item response theory,

confirmatory factor analysis, or exploratory factor analysis) to explore their scales rather than

classical test theory, it is difficult to excuse the continued use of Cronbach’s alpha. Specifically,

the vital assumption of tau equivalence is quite easy to inspect by examining the similarity of the

factor loadings. Even the classic eyeball test can be an effective approximation in many cases.

For instance, in the ECLS-K example, formal tests are not likely necessary to determine that

standardized loadings of 0.21 and 0.76 are not approximately equal. If the factor loadings are not

equivalent for all items on the scale, then Cronbach’s alpha is not appropriate and its use will

adversely affect results by making reliability appear lower than it actually is. Other measures are

susceptible to other assumption violations, but we remind readers that there are ways in which

these could be addressed such as omega hierarchical for the presence of minor dimensions,

including error covariances between items for design-driven reasons, or basing estimates on a

polychoric rather than Pearson covariance matrix if item responses are discrete rather than

continuous. We would like to note that Likert items, even with many categories, attenuate the

item covariances that are used in all methods we discuss in this paper, which results is

downwardly biased estimates of reliability. Therefore, it tends to be in researchers’ best interest

to acknowledge potential discreteness of items.

Although there have been previous calls to abandon Cronbach’s alpha, Revelle and

Zinbarg (2009) noted that software for other methods was somewhat limited and that empirical

researchers may be hesitant because of the undoubted attraction to methods that have simple

software applications. Although the GLB and Revelle’s omega total are best estimated in R

ALTERNATIVES TO CRONBACH’S ALPHA 31

because of some computational complexities, omega total and Coefficient H are fairly

straightforward to compute manually or with spreadsheets and do not require sophisticated or

iterative processes. In the Appendix, we provide annotated R code that can be used to estimate

these alternative measures. Some of the functionality included in these packages may require

additional analyses in R, which we realize may not be helpful to users who are unfamiliar with or

who dislike using R (though the scaleStructure function can eliminate the need for these

additional analyses for most of the alternative measures). In an attempt to make these measures

more accessible, we provide an Excel spreadsheet on the first author’s personal website and on

the Open Science Framework that allows researchers to compute Coefficient H and omega total

using only the standardized factor loadings. Guidance for using this spreadsheet is also provided

in the appendix.

This paper is not intended to fully cover all the nuances and issues associated with

Cronbach’s alpha or calculating and reporting scale reliability as this literature is rather

extensive. Other researchers have provided more technical information on this topic for those

seeking a deeper understanding of the issues surrounding reliability. Geldhof et al. (2014)

provide further guidance on calculating reliability with Cronbach’s alpha, omega total, and

Coefficient H when data come from a multilevel structure. Kelley and colleagues have several

recent papers discussing the importance of confidence intervals around reliability estimates and

discuss how to compute such intervals for many measures which have been included in their

MBESS R package (e.g., Kelley & Cheng, 2012; Kelley & Pornprasertmanit, 2016; Terry &

Kelley, 2012). Zhang and Yuan (2014) discuss robust methods to compute Cronbach’s alpha and

omega total with non-normal or missing data and also provide the R package

coefficientalpha. We presented only a few of the possible alternatives to Cronbach’s

ALTERNATIVES TO CRONBACH’S ALPHA 32

alpha. Bentler’s rho (Bentler, 1968) has also been recommended and is easy to compute in the

EQS software while Sijtsma (2009) has vouched for the explained common variance (ECV)

method. We focused on unidimensional scales, although there is a growing trend in the literature

to assess the reliability of multidimensional scales. Bifactor and hierarchical models (where there

is a single general factor and several subscale factors) are more appropriate for these types of

scales and there are alternative measures (Reise, 2012; Reise, Bonifay & Haviland., 2013; Reise,

Morizot, & Hays, 2007).

In conclusion, we hope that we have sufficiently demonstrated why Cronbach’s alpha is

obsolete and that it is time for the field to move on to better, more general alternatives. As seen

in the empirical examples, the practical differences among the competing alternatives tends to be

rather small – the example showed that the GLB, Revelle’s omega total, and Coefficient H tend

to provide the highest estimates of reliability. We realize that readers may be hoping for

guidance on which of the aforementioned methods should be the “successor” to Cronbach’s

alpha.

9

Although some of these comparisons have been noted in the literature and some general

relations are known (such as those presented in Table 1), these results should not be taken as

rigorous and comprehensive since they are anecdotal and not based on analytic derivations or

simulation results (though such comparisons would undoubtedly be a fruitful avenue of future

research).The common theme we hope to espouse is that Cronbach’s alpha is outperformed by

all of these methods. We believe that the most important message empirical researchers receive

from this article is that using any of the alternatives is preferable to continued use of Cronbach’s

alpha. Cronbach’s alpha had a good run and was able to hold down the fort for the field for over

50 years, but methodological reinforcements have indeed arrived.

9

This phrase was used by a reviewer, which we adopted because we thought it very aptly described the current state

of affairs

ALTERNATIVES TO CRONBACH’S ALPHA 33

References

Bandura, A. (1977). Self-efficacy: toward a unifying theory of behavioral change. Psychological

Review, 84, 191-215.

Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social

psychological research: Conceptual, strategic, and statistical considerations. Journal of

Personality and Social Psychology, 51, 1173.

Barrett, P. (2007). Structural equation modelling: Adjudging model fit. Personality and

Individual Differences, 42, 815-824.

Becker, G. (2000). How important is transient error in estimating reliability? Going beyond

simulation studies. Psychological Methods, 5, 370-379.

Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency

reliability. Psychometrika, 74, 137-143.

Bentler, P. M. (2007). Covariance structure models for maximal reliability of unit-weighted

composites. In S. Lee (Ed.), Handbook of computing and statistics with applications: Vol. 1.

Handbook of latent variable and related models (pp. 1–19). New York: Elsevier.

Bentler, P. M. (1968). Alpha-maximized factor analysis (Alphamax): Its relation to alpha and

canonical factor analysis. Psychometrika, 33, 335-345.

Carroll, J. B. (1961). The nature of the data, or how to choose a correlation

coefficient. Psychometrika, 26, 347-372.

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing

measurement invariance. Structural Equation Modeling,9, 233-255.

Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement

invariance. Structural Equation Modeling, 14, 464-504.

Cole, D. A., Ciesla, J. A., & Steiger, J. H. (2007). The insidious effects of failing to include

design-driven correlated residuals in latent-variable covariance structure analysis. Psychological

Methods, 12, 381-398.

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and

applications. Journal of Applied Psychology, 78, 98-104.

Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory. New York:

Holt.

Cronbach, L.J., & Shavelson, R.J. (2004). My current thoughts on coefficient alpha and

successor procedures. Educational and Psychological Measurement, 64, 391–418.

ALTERNATIVES TO CRONBACH’S ALPHA 34

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,

297-334.

Crutzen, R. (2007). Time is a jailer: What do alpha and its alternatives tell us about

reliability?. European Health Psychologist, 16, 70-74.

Crutzen, R., & Peters, G. J. Y. (2015). Scale quality: alpha is an inadequate estimate and factor-

analytic evidence is needed first of all. Health Psychology Review. OnlineFirst. DOI:

10.1080/17437199.2015.1124240

Curran, F. C., & Kellogg, A. T. (2016). Understanding science achievement gaps by

race/ethnicity and gender in kindergarten and first grade.Educational Researcher, 45, 273-282.

DeVellis, R. F. (1991). Scale Development. London: Sage.

Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to

the pervasive problem of internal consistency estimation. British Journal of Psychology, 105,

399-412.

Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of

estimation for confirmatory factor analysis with ordinal data.Psychological Methods, 9, 466-491.

Gadermann, A. M., Guhn, M., & Zumbo, B. D. (2012). Estimating ordinal reliability for Likert-

type and ordinal item response data: A conceptual, empirical, and practical guide. Practical

Assessment, Research & Evaluation,17, 1-13.

Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel

confirmatory factor analysis framework. Psychological Methods, 19, 72-91.

Gessaroli, M. E., & Folske, J. C. (2002). Generalizing the reliability of tests comprised of

testlets. International Journal of Testing, 2, 277-295.

Gignac, G. E., Bates, T. C., & Lang, K. (2007a). Implications relevant to CFA model misfit,

reliability, and the five factor model as measured by the NEO–FFI. Personality and Individual

Differences, 43, 1051–1062.

Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability

what they are and how to use them. Educational and Psychological Measurement, 66, 930-944.

Green, S. B. (2003). A coefficient alpha for test-retest data. Psychological Methods, 8, 88-101.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index

of test unidimensionality. Educational and Psychological Measurement, 37, 827-838.

ALTERNATIVES TO CRONBACH’S ALPHA 35

Green, S. B., & Hershberger, S. L. (2000). Correlated errors in true score models and their effect

on coefficient alpha. Structural Equation Modeling, 7, 251-270.

Green, S. B., & Yang, Y. (2009a). Commentary on coefficient alpha: A cautionary

tale. Psychometrika, 74, 121-135.

Green, S. B., & Yang, Y. (2009b). Reliability of summed item scores using structural equation

modeling: An alternative to coefficient alpha. Psychometrika, 74, 155-167.

Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282.

Hancock, G. R., & Mueller, R. O. (2011). The reliability paradox in assessing structural relations

within covariance structure models. Educational and Psychological Measurement, 71, 306-324.

Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable

systems. In R. Cudeck, S. du Toit, & D. Sörbom (Eds.), Structural equation modeling: Present

and future—A festschrift in honor of Karl Jöreskog (pp. 195–216). Lincolnwood, IL: Scientific

Software International.

Hattie, J. (1985). Methodology review: assessing unidimensionality of tests and items. Applied

Psychological Measurement, 9, 139-164.

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the

frequency of use of various types. Educational and Psychological Measurement, 60, 523-531.

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:

Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.

Hu, L. T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to

underparameterized model misspecification.Psychological Methods, 3, 424-453.

Jackson, P. H., & Agunwamba, C. C. (1977). Lower bounds for the reliability of the total score

on a test composed of non-homogeneous items: I: Algebraic lower bounds. Psychometrika, 42,

567-578.

Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R

package. Behavior Research Methods, 39, 979-984.

Kelley, K., & Cheng, Y. (2012). Estimation of and confidence interval formation for reliability

coefficients of homogeneous measurement instruments. Methodology, 8, 39-50.

Kelley, K., & Pornprasertmanit, S. (2016). Confidence intervals for population reliability

coefficients: Evaluation of methods, recommendations, and software for composite

measures. Psychological Methods, 21, 69-92.

ALTERNATIVES TO CRONBACH’S ALPHA 36

Kenny, D. A., Kaniskan, B., & McCoach, D. B. (2015). The performance of RMSEA in models

with small degrees of freedom. Sociological Methods & Research, 44, 486-507.

Kline, P. (1986). A handbook of test construction: Introduction to psychometric design. London:

Methuen

Lubienski, S., & Crane, C. C. (2010). Beyond free lunch: Which family background measures

matter?. Education Policy Analysis Archives, 18, 11.

Mansolf, M. & Reise, S.P. (2016) Exploratory bifactor analysis: The Schmid-Leiman

orthogonalization and Jennrich-Bentler analytic rotations. Multivariate Behavioral Research, 51,

698-717.

Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-

testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and

Bentler's (1999) findings.Structural Equation Modeling, 11, 320-341.

McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Erlbaum.

McDonald, R. P. (1970). The theoretical foundations of principal factor analysis, canonical

factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical

Psychology, 23, 1-21.

Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical

test theory and structural equation modeling. Structural Equation Modeling, 2, 255-273.

Moltner, A., & Revelle, W. (2015). Find the greatest lower bound to reliability. Available online

at: http://personality-project.org/r/psych/help/glb.algebraic. html

Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite

measurements. Psychometrika, 32, 1-13.

Nunnally, J.C. & Bernstein, R.H. (1994). The assessment of reliability. In: J.C. Nunnally, R.H.

Bernstein (Eds.) Psychometric theory. (pp. 248-292). New York: McGraw-Hill.

Pedhazur‚ E. J.‚ and Schmelkin‚ L. P. (1991). Measurement‚ design‚ and analysis: An integrated

approach. Hillsdale‚ NJ: Erlbaum.

Peters, G. J. Y. (2014). The alpha and the omega of scale reliability and validity: why and how to

abandon Cronbach’s alpha and the route towards more comprehensive assessment of scale

quality. European Health Psychologist, 16, 56-69.

Peterson, R. A., & Kim, Y. (2013). On the relationship between coefficient alpha and composite

reliability. Journal of Applied Psychology, 98, 194-198.

ALTERNATIVES TO CRONBACH’S ALPHA 37

Raykov, T. (1997a). Estimation of composite reliability for congeneric measures. Applied

Psychological Measurement, 21, 173-184.

Raykov, T. (1997b). Scale reliability, Cronbach's coefficient alpha, and violations of essential

tau-equivalence with fixed congeneric components. Multivariate Behavioral Research, 32, 329-

353.

Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated

nonhomogeneous items. Applied Psychological Measurement, 22, 375-385.

Raykov, T. (2004). Behavioral scale reliability and measurement invariance evaluation using

latent variable modeling. Behavior Therapy, 35, 299-331.

Raykov, T., & Shrout, P. E. (2002). Reliability of scales with general structure: Point and

interval estimation using a structural equation modeling approach. Structural Equation

Modeling, 9, 195-212.

Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral

Research, 47, 667-696.

Reise, S. P., Bonifay, W. E., & Haviland, M. G. (2013). Scoring and modeling psychological

measures in the presence of multidimensionality. Journal of Personality Assessment, 95, 129-

140.

Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving

dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19-31.

Revelle, W. (2016, May). Using R and the psych package to find ω. Accessed October 19, 2016

from http://personality-project.org/r/psych/HowTo/omega.pdf.

Revelle, W. (2008). psych: Procedures for personality and psychological research (R package

version 1.0-51).

Revelle, W. (1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate

Behavioral Research, 14, 57-74.

Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods

for examining the personality-cognition link. In A. Gruszka, G. Matthews, & B. Szymura (Eds.),

Handbook of individual differences in cognition: Attention, memory and executive control (pp.

27–49). New York: Springer.

Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments

on Sijtsma. Psychometrika, 74, 145-154.

ALTERNATIVES TO CRONBACH’S ALPHA 38

Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be

treated as continuous? A comparison of robust continuous and categorical SEM estimation

methods under suboptimal conditions. Psychological Methods, 17, 354-373.

Rozeboom, W. W. (1966). Scaling theory and the nature of measurement. Synthese, 16, 170-233.

Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor

solutions. Psychometrika, 22, 53-61.

Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350-353.

Shapiro, A., & Ten Berge, J. M. (2000). The asymptotic bias of minimum trace factor analysis,

with applications to the greatest lower bound to reliability.Psychometrika, 65, 413-425.

Sheng, Y., & Sheng, Z. (2012). Is coefficient alpha robust to non-normal data? Frontiers in

Psychology, 3, 34.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s

alpha. Psychometrika, 74, 107-120.

Sočan, G. (2000). Assessment of reliability when test items are not essentially τ-

equivalent. Developments in Survey Methodology, 15, 23-35.

Steinberg, L., & Thissen, D. (1996). Uses of item response theory and the testlet concept in the

measurement of psychopathology. Psychological Methods, 1, 81-97.

ten Berge, J. M., & Sočan, G. (2004). The greatest lower bound to the reliability of a test and the

hypothesis of unidimensionality. Psychometrika,69, 613-625.

ten Berge, J. M., & Kiers, H. A. (1991). A numerical approach to the approximate and the exact

minimum rank of a covariance matrix.Psychometrika, 56, 309-315.

Terry, L., & Kelley, K. (2012). Sample size planning for composite reliability coefficients:

Accuracy in parameter estimation via narrow confidence intervals. British Journal of

Mathematical and Statistical Psychology, 65, 371-401.

Teo, T., & Fan, X. (2013). Coefficient Alpha and beyond: Issues and alternatives for educational

research. The Asia-Pacific Education Researcher, 22, 209-213.

Trizano-Hermosilla, I., & Alvarado, J. M. (2016). Best Alternatives to cronbach’s alpha

reliability in realistic conditions: congeneric and asymmetrical measurements. Frontiers in

Psychology, 7, 769.

van Noorden, R., Maher, B., & Nuzzo, R. (2014). The top 100 papers. Nature, 514, 550-553.

ALTERNATIVES TO CRONBACH’S ALPHA 39

van Zyl, J. M., Neudecker, H., & Nel, D. G. (2000). On the distribution of the maximum

likelihood estimator of Cronbach's alpha. Psychometrika, 65, 271-280.

Wolff, H. G., & Preising, K. (2005). Exploring item and higher order factor structure with the

Schmid-Leiman solution: Syntax codes for SPSS and SAS. Behavior Research Methods, 37, 48-

58.

Yang, Y., & Green, S. B. (2011). Coefficient alpha: A reliability coefficient for the 21st

century?. Journal of Psychoeducational Assessment, 29, 377-392.

Zhang, Z., & Yuan, K. H. (2016). Robust coefficients alpha and omega and confidence intervals

with outlying observations and missing data: Methods and software. Educational and

Psychological Measurement, 76, 387-411.

Zimmerman, D. W., Zumbo, B. D., & Lalonde, C. (1993). Coefficient alpha as an estimate of

test reliability under violation of two assumptions. Educational and Psychological

Measurement, 53, 33-49.

Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability to

a latent variable common to all of a scale's indicators: A comparison of estimators for

ωh. Applied Psychological Measurement, 30, 121-144.

Zinbarg, R. E., Revelle, W., & Yovel, I. (2007). Estimating ω h for structures containing two

group factors: Perils and prospects. Applied Psychological Measurement, 31, 135-157.

Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle’s β, and

McDonald’s ω H: Their relations with each other and two alternative conceptualizations of

reliability. Psychometrika, 70, 123-133.

Zumbo, B.D., & Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate

inferences: Important advances in reliability and validity theory. In D. Kaplan (Ed.), The SAGE

handbook of quantitative methodology for the social sciences (pp. 73–92). Thousand Oaks: Sage.

Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of coefficients alpha

and theta for Likert rating scales. Journal of Modern Applied Statistical Methods, 6, 4.

ALTERNATIVES TO CRONBACH’S ALPHA 40

Table 1

Comparison of practical considerations for six different methods

Measure

Ease of Implementation in General

Statistical Software

Notable Advantages

Notable

Disadvantages

Cronbach's

Alpha

Ubiquitous in general software

(e.g., SPSS, SAS, Stata, R)

Familiar to readers and

reviewers

Underestimates

reliability, requires tau

equivalence

Omega Total

Available in the MBESS R package via the

ci.reliability function or via the

scaleStructure function in the

userfriendlyscience package.

Also calculable with a spreadsheet

(provided in the appendix). No built-in

option for computing a polychoric

covariance matrix, though factor analysis

procedures do, which do not affect manual

ease of calculation

Most conceptually

related to Cronbach's

alpha (Cronbach’s alpha

is a special case).

Formula can be

extended to take design-

driven error covariances

into account.

Tends to be yield

conservative estimates

compared to other

alternative methods

Revelle’s

Omega Total

Available in the psych R package via

omega function or via the

scaleStructure function in the

userfriendlyscience package. Not

calculable manually.

Tends to justifiably

exceed Omega Total

and often exceeds the

GLB

Proportionality

assumption of

Schmid-Leiman must

be met

Omega

Hierarchical

Available in the psych R package via

omega function or via the

scaleStructure function in the

userfriendlyscience package.

Not calculable manually. Includes a built-

in option for internally computing and

using a polychoric covariance matrix

Accounts for and

excludes effects of

minor dimensions

Most conceptually

distant from

traditional Cronbach's

alpha. Also dependent

on Schmid-Leiman

assumptions

GLB

Available in the psych R package via

glb.fa or glb.algebraic function.

Available via the scaleStructure

function in the userfriendlyscience

package. Not calculable manually. No

built-in option for computing polychoric

covariance matrix

Exceeds Cronbach's

alpha, even if all

assumptions are met

No analytic solution,

current software does

not offer polychoric

option

Coefficient

H

Very simple to calculate in a spreadsheet

(provided in in the appendix), calculated by

default in the scaleStructure function in the

userfriendlyscience package for continuous

items.

Designed for optimal-

weighted scales, not

affected by addition of

poor items

Misleading if the scale

is scored with unit-

weighting

ALTERNATIVES TO CRONBACH’S ALPHA 41

Table 2

ECLS-K example standardized factor loadings, estimated reliability using different methods, and

model fit indices

Variable

Std. Loading

Measure

Estimate

% Increase

FR Lunch

-0.52

Cronbach’s Alpha

0.74

---

Mom Education

0.73

Omega Total

0.75

1.4%

Dad Education

0.76

Revelle’s Omega Total

0.77

4.1%

Household Income

0.60

Greatest Lower Bound

0.80

8.1%

Expect Education

0.35

Coefficient H

0.81

9.5%

Number of Books

0.40

Music Lessons

0.21

Fit

Computer at Home

0.44

SRMR

0.05

Parent Volunteers

0.39

McDonald’s Centrality

0.96

Note: SRMR = standardized root mean squared residual, % Increase = the percent relative

increase of reliability compared to Cronbach’s alpha. The Free or Reduced Lunch variable was

reverse coded when calculating Cronbach’s alpha and both Omega Totals so that all covariances

would be positive.

ALTERNATIVES TO CRONBACH’S ALPHA 42

Table 3

Standardized factor loadings for Big Five example, treating the items as continuous with a

Pearson covariance matrix and discrete with a polychoric covariance matrix

Pearson Covariance Matrix

Subscale

Item1

Item2

Item3

Item4

Item5

SRMR

MC

Agreeableness

-.37

.66

.76

.48

.63

.04

.98

Conscientiousness

.55

.61

.55

-.67

-.59

.05

.97

Extraversion

-.61

-.73

.58

.69

.52

.04

.99

Neuroticism

.82

.80

.72

.55

.50

.07

.93

Openness

.55

-.44

.65

.30

-.51

.04

.99

Polychoric Covariance Matrix

Subscale

Item1

Item2

Item3

Item4

Item5

SRMR

MC

Agreeableness

-.43

.71

.80

.52

.67

.04

.99

Conscientiousness

.59

.64

.58

-.72

-.62

.06

.97

Extraversion

-.64

-.77

.60

.74

.54

.04

.99

Neuroticism

.86

.84

.74

.57

.52

.08

.93

Openness

.60

-.48

.69

.37

-.58

.05

.99

Note: SRMR = standardized root mean squared residual, MC = McDonald Centrality

ALTERNATIVES TO CRONBACH’S ALPHA 43

Table 4

Comparison of subscale reliabilities for model in Big Five Inventory example using Cronbach’s

Alpha, both versions of omega total, the GLB, and Coefficient H

Pearson Covariance Matrix

Subscale

Cronbach’s

Alpha

Omega

Total

Omega

Revelle

Greatest

Lower Bound

Coefficient

H

Agreeableness

.71

.71

.77

.75

.77

Conscientiousness

.73

.73

.77

.77

.74

Extraversion

.76

.77

.80

.82

.78

Neuroticism

.81

.82

.88

.85

.85

Openness

.61

.62

.68

.65

.65

Polychoric Covariance Matrix

Subscale

Cronbach’s

Alpha

Omega

Total

Omega

Revelle

Greatest

Lower Bound

Coefficient

H

Agreeableness

.76

.77

.83

.79

.81

Conscientiousness

.77

.77

.81

.81

.78

Extraversion

.79

.80

.83

.84

.81

Neuroticism

.84

.84

.90

.87

.88

Openness

.67

.68

.73

.76

.71

Note: Omega Revelle= Revelle’s Omega Total from psych R package. Items with negative

loadings were recoded when calculating Cronbach’s alpha and both omega totals so that all

covariances would be positive.

ALTERNATIVES TO CRONBACH’S ALPHA 44

Appendix

Software Code and Associated Screenshots for Obtaining Alternative Estimates of Reliability

Using R

Basics and Installing Packages

Because R is open source, new statistical packages are being added almost daily. In R, a

“package” is a set of procedures that can be used to perform certain statistical analyses. This is

equivalent to the “Proc” commands in SAS, procedures in SPSS, or commands in Stata. For

example, to fit a linear multilevel model, SAS uses the Proc Mixed procedure, SPSS uses the

MIXED procedure, Stata uses the xtmixed command, and R would use the lme4 package.

In R, not all packages are available by default upon opening the program (in fact, only very basic

packages are available). The packages needed to calculate scale reliability (psych, MBESS, and

userfriendlyscience) are not included and must be installed. This is done with the

following code:

install.packages("psych")

install.packages("MBESS")

install.packages("userfriendlyscience")

Note that code in R is case-sensitive so capitalization is important. After running this code, you

will likely be prompted to select a “mirror site” which is the location from where these packages

are downloaded. A list of geographic locations may appear; it makes little difference which is

selected and they all contain the same information. These packages may take a few minutes to

install. Installing packages only needs to be done once per machine. Once the packages are

installed, they do not need to be installed again.

Loading the Data

Undoubtedly, one of the most difficult tasks when working with a new software is to

successfully load the desired dataset. In this appendix, we use the data from the Big Five

Inventory example because it is included as an internal example without the psych package.

After installing the psych package, the Big Five Inventory dataset can be loaded with the

following code,

data(bfi, package="psych")

In general, there are multiple ways to load data into R. Although the pathway to the file can be

explicitly stated, it is often easier to find the desired file from a dialog menu. The following code

shows how to input datafiles into R that are saved in either the .csv, .sav (SPSS), .dta (Stata), or

permanent SAS data set formats.

install.packages("foreign")

ALTERNATIVES TO CRONBACH’S ALPHA 45

require(foreign) # after installing a package, the require

command tells R to use the package

dat<-read.csv(file.choose()) # CSV

dat<-read.spss(file.choose())#SPSS

dat<-read.dta(file.choose()) #Stata

dat<-read.ssd(file.choose()) # SAS

If the userfriendlyscience package is already installed, then one can use the getDat()

function to import data. This function determines the appropriate format and will automatically

import the data and assign it the name “dat”.

To simplify the analysis, we will separately break the full data into 5 separate datasets such that

each of the 5 subscales are contained within their own data set.

agre<-bfi[,1:5]

cons<-bfi[,6:10]

extr<-bfi[,11:15]

neur<-bfi[,16:20]

open<-bfi[,21:25]

The name of the left side of the arrow is the new data name. On the right side of the arrow is the

old dataset (called bfi here because that is the default name for this data when loaded in from R)

and a set of brackets. Within these brackets, users specify which parts of the data matrix to use.

The first value is blank because we want all the rows (people). The second numbers correspond

to the columns in the data. So, for the Agreeableness dataset (agre), we want the first 5 columns

of the bfi data. The Conscientious dataset (cons) is composed of the 6th through 10th columns of

the bfi and so on.

Reverse Scoring

As is common in psychometric scales, some items may need to be reverse scored (this is required

for appropriate calculation of some reliability coefficients like Cronbach’s alpha). This can be

done with the invertItems function that is part of the userfriendlyscience package.

agreRev <- invertItems(agre, 1)

consRev <- invertItems(cons, c(4,5));

extrRev <- invertItems(extr, c(1, 2));

openRev <- invertItems(open, c(2, 5));

This code creates a new R object (agreRev, consRev, extrRev, openRev) from the original R

data. After the invertItems function, the first value within the parentheses is the data set to

reverse score. After the comma, the numbers listed are the columns in the data that should be

reverse scored. The “c” indicates that a list will follow and is needed if multiple items are reverse

scored. So, the Agreeableness scale will reverse score Item 1, the Conscientiousness scale will

ALTERNATIVES TO CRONBACH’S ALPHA 46

reverse score Items 4 and 5, and so on. The Neuroticism scale does not contain any items that

need to be reverse scored.

Cronbach’s Alpha

Cronbach’s alpha can be calculated as part of many different functions. The simplest is to use the

alpha function from the psych R package. If relevant items are reverse scored as discussed

previously, then the only argument of the alpha function is the dataset.

alpha(agreRev)

alpha(consRev)

alpha(extrRev)

alpha(neur)

alpha(openRev)

The output for the Agreeableness scale is as follows. The estimate of Cronbach’s alpha can be

found in the first row of the output under std.alpha.

ALTERNATIVES TO CRONBACH’S ALPHA 47

Omega Total

To calculate the measure that we call omega total (not Revelle’s omega total), one must go

outside of the psych package to the MBESS package.

In the MBESS package, the ci.reliability function will estimate omega total as well as its

confidence interval.

require(MBESS)#only necessary the first time the package is used

ci.reliability(agreRev)

This yields the following output,

The estimate of omega total is the first value which appears beneath $est. On the Agreeableness

subscale, omega total is estimated to be 0.71 with a 95% confidence interval of [.69, .73]

ALTERNATIVES TO CRONBACH’S ALPHA 48

Revelle’s Omega Total

Revelle’s omega total is calculated from the omega function in the psych package. The

omega function also outputs Cronbach’s alpha as well, so it can be used in lieu of the alpha

function. Again, the only argument needed in the function to obtain Revelle’s omega total using

a Pearson covariance matrix is the data set.

omega(agreRev)

omega(consRev)

omega(extrRev)

omega(neur)

omega(openRev)

The output from this function for the Agreeableness subscale is as follows:

The Alpha row shows Cronbach’s alpha, which matches the output from the alpha function.

Revelle’s omega total is the last value in the first set of values which is listed as 0.77. Notice that

this value is not the same as omega total because it uses a variance decomposition based on a

Schmid-Leiman transformation (the details of which are provided below the output).

A convenient option in the omega function is that a polychoric covariance matrix can be

estimated and used internally and is possible by specifying only two additional words in the

code.

ALTERNATIVES TO CRONBACH’S ALPHA 49

omega(agreRev, poly=TRUE)

omega(consRev, poly=TRUE)

omega(extrRev, poly=TRUE)

omega(neur, poly=TRUE)

omega(openRev, poly=TRUE)

The output from the omega function with the polychoric option for the Agreeableness subscale is

as follows:

Notice that the Alpha and (Revelle’s) Omega Total values are much higher than in the previous

output. The alpha function does not feature this poly option, so Cronbach’s alpha with a

polychoric covariance matrix is best run through the omega function.

The computation of Revelle’s omega total is a little involved and there are not many sources that

describe this version of the omega coefficient (outside of documentation for the psych R

package). We outline where Revelle’s omega total from where comes for the remainder of this

section to elucidate what Revelle’s omega total is calculating. In Equation 4 of the main text, we

defined Revelle’s omega total as

2

2

1 1 1

f

k

kF

gi fi

i f i

RT

X

V

Revelle (2016) notes the numerator of this formula is equal to the communality of each item,

2

i

h

so the formula can be rewritten as

2

1

(1 )

1

K

i

i

RT

X

h

V

(A1)

ALTERNATIVES TO CRONBACH’S ALPHA 50

This can be simplified to

2

1

()

1

K

i

i

X

u

V

(A2)

where

2

i

u

is the uniqueness of the ith item (a.k.a. the error variance).

Using the polychoric covariance analysis of the Agreement subscale above, the communalities

appear in the “h2” column and the uniquenesses appear in the “u2” column. The sum of the

uniquenesses is equal to

.81 .01 .13 .68 .57 2.20

which is the numerator of Equation A2.

Unfortunately, the denominator

X

V

does not appear in the output. Fortunately, this value is quite

simple to calculate in R. Recall, that

X

V

is equal to the sum of all elements of the sample

correlation matrix. The polychoric correlation matrix in R can be saved as an object with the

following code,

mat<-polychoric(agreRev)

agrepoly<-mat$rho

The sum function can then be used to add all the individual elements

sum(agrepoly)

which yields

Therefore,

2.20

1 0.8272 0.83

12.73

RT

, matching the output above.

Omega hierarchical is similar except that the numerator is only equal to the variance explained

by only the common factor. This can be found by adding up all the values in the “g” column and

squaring (be sure to add first and then square the sum, do not square first and then add the

squares). In the polychoric Agreement example,

2

.34 .70 .79 .52 .62 8.82=

.

X

V

is still

equal to the same value (12.73) so hierarchical omega is equal to

8.82 /12.73 0.692

Greatest Lower Bound

The glb.fa function in the psych package estimates the greatest lower bound. Similar to

other methods in the psych package, the only necessary argument of the function is the data

name.

glb.fa(agreRev)

glb.fa(consRev)

ALTERNATIVES TO CRONBACH’S ALPHA 51

glb.fa(extrRev)

glb.fa(neur)

glb.fa(openRev)

The output for the Agreeableness subscale is as follows,

The greatest lower bound estimate appears as the first item in the output after $glb

Unfortunately, the glb.fa function does not offer the option to use a polychoric covariance

matrix internally and therefore uses a Pearson covariance matrix. However, this can be

circumvented by separately estimating a polychoric covariance or correlation matrix, and using

that as the input file instead of the raw data. However, it can be a bit tricky to save a polychoric

correlation matrix as a data frame in R.

First, the polychoric matrix is estimated with the polychoric function from the psych package.

Rather than immediately outputting the results, the output is saved to an object (called “mat” in

the code below). The output contains both the polychoric correlation matrix and thresholds; the

thresholds are not needed, so we want to exclude them and only save the matrix. In doing so, we

also must convert the object to a data frame. The R code for doing so for the Agreeableness

subscale is as follows,

mat<-polychoric(agreRev)

agre.poly<-as.data.frame(mat$rho)

The glb.fa function can accept a correlation matrix as input, so we can use the saved

polychoric correlation matrix as the input of the function.

glb.fa(agre.poly)

This will provide the desired output,

ALTERNATIVES TO CRONBACH’S ALPHA 52

scaleStructure Function

Although the above analyses are not difficult to perform because the commands are quite

straightforward, for inexperienced or reluctant R users, the scaleStructure can estimate

these quantities in a single pass and summarizes the output.

scaleStructure(dat=agreRev, ci=FALSE)

scaleStructure(dat=consRev, ci=FALSE)

scaleStructure(dat=extrRev, ci=FALSE)

scaleStructure(dat=neur, ci=FALSE)

scaleStructure(dat=openRev, ci=FALSE)

ci=FALSE indicates that we do not want the confidence interval for the estimate (although best

practice suggests that this is helpful to report).

The output for the Agreeableness subscale from this function is as shown on the next page

ALTERNATIVES TO CRONBACH’S ALPHA 53

The function goes through the previously outlined methods, estimates reliability, saves the

output, and summarizes them in one window. The first set of output shows the results assuming a

Pearson covariance matrix followed by results that use a polychoric covariance matrix. It also

differentiates between omega total and Revelle’s omega total and is the only R package of which

the author is aware that provides estimates of Coefficient H.

Using Excel

Although R is the best available software option for estimating alternatives to Cronbach’s alpha

(and it is open source), we realize that some users may be hesitant to adopt a new software

program, especially to use methods with which they are unfamiliar. In attempt to make these

methods as broadly accessible as possible, we have included two Excel spreadsheets for

calculating omega total and Coefficient H using only the standardized loadings from a factor

analysis. These loadings can be obtained from any software program of the user’s choosing and

does not require learning any new software.

The provided Excel spreadsheet has two tabs, one for coefficient H and one for omega total. The

spreadsheet allows for up to 36 items. A factor analysis must be conducted to obtain the factor

loadings. This can be done in any program of the user’s choosing. Then, these loadings are

placed into Column B of the spreadsheet. For omega total, the spreadsheet is setup to

automatically calculate the uniqueness terms based on the standardized loadings. Column G for

Coefficient H and Column F for omega total will reveal the estimate of these measures.

ALTERNATIVES TO CRONBACH’S ALPHA 54

Using the Agreeableness subscale example that was used in the previous section, we will first

obtain the standardized factor loadings using maximum likelihood in R using the fa function

from the psych package. These loadings need not be obtained from R and can be estimated

from any program of the user’s choice (e.g., Mplus, SPSS, SAS, Stata)

fa(agre, nfactors=1, fm="ml")

The output of this analysis yields the following,

The “ML1” column contains the standardized factor loadings for this scale (these correspond to

those provided in Table 3 of the main text). Taking these loadings and entering them into the

Excel spreadsheet for Coefficient H and omega total gives

- A preview of this full-text is provided by American Psychological Association.
- Learn more

Preview content only

Content available from Psychological Methods

This content is subject to copyright. Terms and conditions apply.