Fixing a Broken Clock: A Historical Review of the Originators of Reliability Coefficients
Professor, School of Business Administration
Kwangwoon University, Republic of Korea
Associate Professor, Department of Business Administration
Dankook University, Republic of Korea
Cho, E. & Chun, S. (2018). Fixing a broken clock: A historical review of the originators of
reliability coefficients including Cronbach’s alpha. Survey Research, 19(2), 23-54.
The names of commonly used reliability coefficients, such as Cronbach's alpha, give the
impression that we are expressing respect for the first developers of the formulas. However, few
studies have investigated the identity of each person who first discovered each reliability
coefficient from a neutral point of view. This study examines the history of reliability coefficients
and presents conclusions regarding who should be credited for developing each reliability
coefficient. For example, this study claims that credit for inventing the alpha formula should be
awarded to Kuder and Richardson (1937) and that the merit of developing a reliability coefficient
based on a unidimensional confirmatory factor analysis model should be returned to Jöreskog
(1971). This study criticizes the existing names of reliability coefficients as pseudo-historical (i.e.,
not actually but having the appearance of being historical), suggesting the use of ahistorical (i.e.,
without concern for history) names instead.
Keywords: Cronbach’s alpha, Coefficient alpha, the Spearman-Brown formula, Composite
reliability, McDonald’s omega
Psychological studies routinely report reliability coefficients of test scores. For example, readers
are likely familiar with at least some of the names of reliability coefficients, such as Cronbach's
alpha, standardized alpha, the Spearman-Brown formula, composite reliability, and McDonald's
omega. These conventional names give us the impression that we are expressing appreciation for
the scholars who first developed the reliability coefficients. This study originates from the
questions of whether the conventional names are historically legitimate and, if not, whether the
practical benefits of continuing to use the name outweighs the lack of historical evidence.
Let us take the name Cronbach’s alpha as an example. This name itself does not contain
any information that might help psychologists use the formula. For individuals without
background knowledge, the name Cronbach’s alpha does not yield a clue regarding its meaning
and function. The only possible conjecture based on the name is that Cronbach must have first
proposed it. Therefore, this study aims to confirm two issues: (1) whether Cronbach was the
researcher who first discovered this formula (and if not, who should be given the most credit for
developing the formula) and (2) (if not) whether this name should continue to be used for a
To answer the above questions, this study identifies the originator of each reliability
coefficient. Few studies have raised this issue. Cronbach's (1951) and McDonald's (1999) studies
have had a huge impact on how people call and use reliability coefficients. Past research that has
had a great effect needs to be reviewed from various perspectives. However, few detailed studies
have investigated the history of reliability coefficients, with the notable exceptions of Cronbach
and Shavelson’s (2004) own explanation of the history of alpha and Sijtsma’s congratulatory
comments (Heiser et al., 2016) on Cronbach’s (1951) record number of citations. This study
provides a comprehensive review of and a third-party perspective on the history of reliability
coefficients, with a substantial component dedicated to a discussion of Cronbach’s alpha, the
most commonly used reliability coefficient.
This study is divided into two components. The first section argues that the current
practice of recognizing the originator of alpha as Cronbach (1951) is incorrect. The second
section explains the history of four other reliability coefficients, namely, the Spearman-Brown
, standardized alpha, and McDonald’s omega.
2 Who First Developed Alpha?
Before proceeding with a discussion of alpha’s history, it should be clarified that Cronbach and
Shavelson (2004) himself declared that the expression Cronbach's alpha was inappropriate and
stated that Kuder and Richardson (1937) had published a formula commonly called KR-20 and
that alpha was "an easily calculated translation" (Cronbach & Shavelson, p. 397) of KR-20.
Despite his rejection, Cronbach's alpha remains the most common name used to refer to this
A reasonable explanation for this phenomenon is that while Cronbach's (1951)
contribution to the alpha formula is well recognized, the contributions of studies that published
the same formula before Cronbach (1951) are not well documented. Most textbooks describe
Cronbach (1951) as the first to create the alpha formula. Cronbach (1951) and Cronbach and
Shavelson (2004) vaguely explained previous studies other than that conducted by Kuder and
Richardson (1937) to the extent that readers who are unfamiliar with the history of reliability
coefficients might think that Cronbach (1951) was the first to publish a general formula of KR-
20. For example, he noted the following: "So far as I recall, there was no one to offer the version
that I offered in 1951, except for the Kuder-Richardson report, which did not give a general
formula" (Cronbach & Shavelson, 2004, p. 416). This study aims to help readers achieve a
balanced view of alpha’s history through a detailed review of pre-Cronbach (1951) studies.
2.1 Cronbach (1951) and Its Previous Studies
This study is not the first to argue that Cronbach (1951) did not first publish the alpha formula.
McDonald (1999) states that Guttman (1945) published the formula for alpha before Cronbach
(1951). Cho and Kim (2015) and Sijtsma (2009) assert that Hoyt (1941b) preceded both studies in
discovering alpha. However, previous studies did not address alpha's history as an important topic
and did not specify the commonalities and differences of the formulas proposed by Cronbach
(1951), Guttman (1945), and Hoyt (1941b).
This study asserts that Cronbach (1951) is the sixth (not second or third) study to have
discovered the general expression of KR-20. This study excavates three additional pre-Cronbach
(1951) studies (Edgerton & Thomson, 1942; Gulliksen, 1950; Jackson & Ferguson, 1941) that
contain the general expression of KR-20. In addition, this paper will explain the specific versions
of the formula presented by both Kuder and Richardson (1937) and the papers that followed.
Kuder and Richardson (1937) developed various reliability formulas, each with different
assumptions; however, they did not propose a special name for each reliability coefficient. They
believed that the twentieth and twenty-first formulas would be the most useful. Subsequent
studies referred to these formulas as Kuder-Richardson Formula 20 and 21, or KR-20 and 21 for
short. Kuder and Richardson (1937) address conditions in which the test had
scored items (e.g., correct or incorrect). The test score
is the sum of the observed scores of
the items (i.e.,
denotes the test score variance,
denotes the percentage
of correct responses for item
denotes the percentage of incorrect responses for item
p q k
formulas for KR-20 and KR-21 are presented as follows.
20( ) 22
KR Original XX
k p q
The general expression of KR-20 does not place limitations on the score of
. In the
may have a value of either 0 or 1. In the general expression, it may
have all real number values (e.g., 2.47). Let
denote the variance of item
. The general
expression of KR-20 is as follows.
JF ET X
Hoyt (1941b) is the first to present the general formula, describing an idea to derive the
KR-20 formula using analysis of variance (ANOVA), a method that generates exactly the same
result as alpha. However, Hoyt (1941b) does not present Equation 3, instead explaining the entire
process of the computation to explicate his method.
The second line of research to suggest the general expression is that of Jackson and
Ferguson (1941). Because Hoyt (1941b) was included in the third issue of a quarterly academic
journal, we assume it was published in July, August, or September of that year. Jackson and
Ferguson (1941) was published in October. In contrast to Hoyt (1941b), Jackson and Ferguson
(1941) clearly express Equation 3 (i.e.,
), making it the first paper to explicitly propose the
current version of the alpha formula.
The third study that featured the general expression (i.e.,
) is Edgerton and Thomson
(1942); however, it did not propose a new way of deriving KR-20 as the other studies introduced
Guttman (1945) is the fourth researcher to have published the general expression (i.e.,
). Based on the assumption that measurement errors are independent of each other, he deduces
six reliability estimators, designating them
. Guttman (1945) proves that these
estimators are always equal to or smaller than the reliability, introducing the term lower bounds
to describe this quality. He also offers mathematical proof that
is always a more accurate
reliability estimator than
but notes that the calculation of
is more complex than that of
can be used instead of
if the covariances are not significantly different
(i.e., being tau-equivalent in modern terms).
The fifth study to have presented the general expression (i.e.,
) is Gulliksen (1950),
which proposes a new way to derive KR-20 based on “[t]he simplest and most direct
assumption” (p. 223). In contemporary terms, his assumption is the same as the condition of
being essentially tau-equivalent (Lord & Novick, 1968).
Cronbach (1951), the sixth study to present the general expression (i.e.,
), sparked the
popular use of this reliability coefficient by eliminating concerns that made users hesitate to use it
(Heiser et al., 2016). First, his proof of the relationship between alpha and split-half reliability has
been highly responsive. Several reliability coefficients already existed at that time, but there was
no clear conclusion as to which coefficient to use. Cronbach (1951) proved that alpha equals the
average of split-half reliability (
: Guttman, 1945) values obtained from all possible split-halves.
This proof is not significant given the study by Guttman (1945), which proved that alpha (i.e.,
are not reliability coefficients in the strict sense but lower bounds of the reliability.
However, the concept of lower bounds was not fully understood at the time (Heiser et al., 2016),
and Cronbach's (1951) proof had the advantage of being intuitively easy to understand. This proof
has recognized alpha as the representative reliability coefficient and not just one of several methods.
Second, Cronbach (1951) presented a comprehensive and “encyclopedic” (Cronbach &
Shavelson, 2004, p. 396) explanation for the interpretation and use of alpha. The length of this
paper is 38 pages, making it not only the longest of all papers published in Psychometrika in 1951
but three times as long as the average paper. The most notable was Cronbach’s (1951) assertion
that a high value of alpha indicates the internal consistency or homogeneity of the data. In other
words, alpha has been explained to be useful for informing of not only the reliability but also the
unidimensionality of the data (Heiser et al., 2016; Sijtsma, 2009).
Third, Cronbach (1951) adopted a different approach to alpha’s prerequisites from previous
studies. Pre-Cronbach (1951) studies focused on the mathematical proof of the assumptions of the
alpha formula. However, because too-strict restrictions were needed to derive the formula, the
concern that alpha’s assumptions could not easily be met by real-world data was raised. For
example, Cronbach (1943) criticized KR-20's assumption of unidimensionality as unrealistic,
stating the following: "The basic assumption of the Kuder-Richardson method ... that the items
measure only one general variable plus specific factors, is manifestly untrue for most achievement
tests" (p.486). Cronbach (1951) took the opposite approach from the previous study. In fact, he
focused his attention on its interpretation, assuming that the alpha formula had already been
provided. Users were thus convinced that alpha could be used without regard to whether the data
satisfied the assumptions of the alpha formula. What changed was his attitude toward the
assumption of alpha, not the assumption itself.
Fourth, Cronbach (1951) suggested that the degree of alpha’s underestimation was not
worse than expected. Kuder and Richardson (1937) and Hoyt (1941a) regarded it as a major
advantage of KR-20 that it does not overestimate the reliability. In contrast, Cronbach (1943)
opposed the universal use of KR-20, criticizing it as producing “excessively conservative estimates
of reliability” (p. 488) that are sometimes less than zero. In addition, Cronbach (1943) lamented
that it was important to know the degree of underestimation of KR-20, but little information was
available. Cronbach’s (1951) proof that alpha is the mean of the split-half reliability values
obtained from all split-halves seemingly gave clues to his own question. In other words, it was
possible to conclude that alpha's tendency for underestimation is not very serious because alpha
provides a value greater than approximately half of the split-half reliability estimates. Considering
that the reference point of the comparison is the values of the split-half reliability coefficient, not
other competitive alternatives such as
, it is difficult to agree with this interpretation from a
2.2 KR-20 and Alpha were Considered Identical
Studies before Cronbach (1951) described the original and general expressions as the same
formula. Hoyt (1941b) states, “It may be interesting to some who are familiar with the work of
Kuder and Richardson that the foregoing method of estimating the coefficient of reliability gives
precisely the same result as formula (20) of their paper. This fact can be easily verified
algebraically” (p. 156). Jackson and Ferguson (1941) state that Equation 3 (i.e.,
identical with the Kuder-Richardson formula (20)” (p.74). Guttman (1945) indicates that “
resembles a formula developed separately by Kuder and Richardson and Hoyt. In fact, [
algebraically identical to this formula (which is formula (20) in Kuder and Richardson’s paper)”
(p.274-275). Gulliksen (1950) also emphasizes that the formula presented in his paper is
“identical” (p. 224) to the formula proposed in Kuder and Richardson (1937), Jackson and
Ferguson (1941), and Guttman (1945). None of the studies discussed here describe the two
expressions as different formulas.
It is common practice among scholars to attempt to differentiate their research by
emphasizing its difference from previous studies. However, Hoyt (1941a) uses the fact that he
derived KR-20 based on a different approach (Hoyt, 1941b) from Kuder and Richardson (1937)
to compliment the authors: “The theoretical soundness of the Kuder-Richardson derivation is
indicated by the fact that analysis of variance techniques applied to this problem produce an
identical formula” (p. 93). He does not boast of trivial differences from the previous literature as
2.3 Kuder and Richardson (1937) are Likely to have Chosen the Original Version
Current textbooks appear to indicate that the general expression overcomes the important
limitations of the original expression of KR-20. Readers who are accustomed to this
interpretation may experience difficulty understanding why pre-Cronbach (1951) studies
described the original and general expressions as being identical. Indeed, the two expressions are
different in only a minor fashion. The concept that
is an easy
relationship that is known to individuals familiar with basic statistics. From today’s perspective,
the general expression is more useful than the original expression because whereas the original
formula may be applied to only dichotomously scored items (that is, 0 or 1), the general
expression may be used for other general data. Furthermore, current users typically analyze data
not measured as dichotomously scored items. Why, then, did Kuder and Richardson (1937)
propose the original version?
There is a strong possibility that Kuder and Richardson (1937) deliberately chose the
original expression. It was not that they could not derive the general expression: If one follows
the logic with which they derived the original formula, one can easily understand that mere
modifications will also easily derive the general expression. For example, the authors referred to
as “the sum of the variances of the items” (p. 154) to explicitly describe the
i i i
. Kuder and Richardson (1937) likely proposed the original
expression because, at the time of publication, that expression was more helpful to users than
was the general expression. To understand this reasoning, we must understand the conditions of
the past, which differed from current conditions.
First, the data processed by the formula users of the time were measured with
dichotomously scored items. The “reliability of persons, over items, on a single trial” is typically
referred to as test score reliability, which is derived from the finding that the pioneers of
reliability research were primarily interested in students’ test scores. Unlike today, scoring and
calculating the results of a test once required many hours. To simplify the scoring process,
school tests at the time were configured as true or false (Vehkalahti, 2000). The International
Business Machine Counting Sorter, which made scoring and calculation four to eight times faster
than manual processing, began to be used in 1937 (Bedell, 1940). The IBM Counting Sorter also
classified answers as only true or false; thus, when Kuder and Richardson (1937) was published,
there was little need to propose the general version in place of the original version.
Second, the ease of calculation was thought to the most important consideration. Today’s
widespread use of statistical software packages enables us to obtain reliability coefficient values
without having to understand the formula; in the past, however, because reliability coefficients
had to be calculated by paper and pencil by users, the ease of calculation was considered critical.
Thus, the academic community (1) preferred the formula for which the calculation was easier if
the resulting value did not substantially differ and (2) preferred the more easily calculated
version between two algebraically equivalent formulas.
The importance placed on the ease of calculation in the first sense is indicated by the fact
that Kuder and Richardson (1937) proposed both KR-20 and KR-21 together. Because KR-21
produces less precise reliability estimates than KR-20, it is mathematically inferior, and from the
contemporary perspective, KR-21 would not be deemed sufficiently valuable to merit
presentation. However, although the calculation of KR-21 is easier and simpler, in most cases the
resulting values of the two formulas are not very different. KR-21 had high usability in an era
when computer-based computations were practically impossible.
Kuder and Richardson (1937) likely proposed the original expression instead of the
general expression because of the ease of calculation in the second sense. If the general
expression was proposed, users who did not understand the relationship of
i i i
would have had experienced difficulty in calculation. In a situation in which most users analyzed
dichotomously scored items, there was no specific need for the authors to suggest the general
In those days, there were no arguments that the general expression is more useful than or
superior to the original expression. Although many subsequent studies discuss Kuder and
Richardson (1937; Cronbach, 1943, 1947; Hoyt, 1941a; Kelley, 1942; Tucker, 1949; Wherry &
Gaylord, 1943), no authors have described the fact that the original formula may be applied to
only dichotomously scored items as a limitation. Ferguson (1951) argued that the original
formula can also be expanded to general situations through the following statement:
Hitherto the Kuder-Richardson [formula 20] has been largely used to provide a descriptive
index of the internal consistency of tests constructed of items which permit only two
categories of response, a pass or a fail, to which the values 1 and 0 are assigned,
respectively. The use of this formula may, however, be legitimately extended to provide
indices of the internal consistency of responses on personality inventories, attitude scales,
and other types of tests which permit more than two categories of response. (p. 614)
2.4 Evaluation of the Achievements of Cronbach (1951) and Kuder and Richardson
Although Cronbach's (1951) historical achievements should be respected, the fact that his
interpretation of alpha literally affects the present is undesirable. Name affects our perception. The
name Cronbach's alpha gives the misleading impression that Cronbach (1951) is the most
authoritative source of this reliability coefficient rather than only one of the many studies on this
reliability coefficient. Perception determines our behavior. Cronbach (1951) is still the most
influential source of this reliability coefficient. According to Google Scholar, nearly 3,000
studies per year cite Cronbach (1951). Numerous textbooks still illustrate Cronbach’s (1951)
mathematical proof and terminology (e.g., internal consistency) to explain the usefulness of alpha.
The public perception of alpha stands at the level of 1951, like a broken clock.
The pace of scientific progress is rapid. For example, the paper by Watson and Crick (1953),
which first identified the structure of deoxyribonucleic acid, is a great achievement, but its content
is only at a basic level from the standpoint of modern biology. Cronbach’s (1951) arguments and
approaches have been criticized as ineffective or proven to be inaccurate (Bentler, 2009; Cho &
Kim, 2015; Cortina, 1993; Green, Lissitz, & Mulaik, 1977; Green & Yang, 2009; Hunt & Bentler,
2015; McDonald, 1981; Osburn, 2000; Revelle & Zinbarg, 2009; Sijtsma, 2009, 2015; van der
Ark, van der Palm, & Sijtsma, 2011; Yang & Green, 2011). Cronbach (1951) should be recognized
as having historical value, but his claim should not be misinterpreted as valid until now. In other
words, one should refer to the latest research on alpha, not Cronbach’s (1951), to find an accurate
description of alpha.
This study acknowledges the contribution of Cronbach’s (1951) article. However, at least
some of the studies that published the alpha formula earlier than Cronbach (1951) should be
recognized for having greater contributions than Cronbach (1951). Among them, Kuder and
Richardson’s (1937) work is the most decisive achievement.
Kuder and Richardson (1937) resolved an important and difficult problem that had long
been a tangle. During the period in which the study was published, the only approach used to
estimate the reliability of a test score was to artificially split the items in half and apply the
formula proposed by Brown (1910) and Spearman (1910). The method was problematic in that
the manner in which the items were split produced varying values of reliability for the same data
set; however, no one identified a better approach for more than two decades before Kuder and
Richardson (1937). For example, Kelley (1924) describes this situation as follows:
“I know of no better simple way of securing an estimate of reliability of a college
entrance test than to split it into halves and use the Spearman-Brown formula and though
there are hazards in doing this I certainly think that such an estimate is very much better
than none at all” (p. 200).
Kuder and Richardson (1937) proposed an innovative technique that opened the new era for
3 Who First Developed Other Reliability Coefficients?
3.1 The Spearman-Brown or Brown-Spearman Formula
The name Spearman-Brown does not indicate cooperation between the two scholars. Brown
(1910) and Spearman (1910) simultaneously published algebraically equivalent formulas in the
British Journal of Psychology. If these two individuals were alive today, they would have been
sensitive to the issue of whose name comes before the other because they were not on amicable
terms. Charles Spearman was hostile to Karl Pearson, a renowned statistician who taught at the
same school, the University of London, and the two continued to publish articles that criticized
and ridiculed each other (Cowles, 2005). William Brown was Pearson’s student. Brown's
doctoral dissertation, which was later published as a book (Brown, 1911), devoted most of the
space to criticism of Spearman (1904). Decades ago, the name Brown-Spearman formula was
used in some cases; however, in recent times, most studies refer to it as the Spearman-Brown
(prophecy or prediction) formula. This study delves into the issue of which name is more valid.
It is difficult to rationalize why Spearman’s name should appear before Brown. One
seemingly fair explanation is that Spearman is a better-known scholar than Brown. Spearman left
huge marks on the field of research methods by developing rank correlation and pioneering a
statistical analysis technique known as factor analysis. In particular, Spearman (1904) developed
a formulaic definition of reliability to open new doors to the history of reliability research.
Cronbach, Rajaratnam, and Gleser (1963) described him as “the father of the classical reliability
theory in psychology” (p. 138). Thinking about the study in question, however, without
considering each scholar’s prestige, Brown’s name must precede that of Spearman.
First, Brown (1910) presented the version of the formula that is currently used. The two
studies both developed a formula that may predict the reliability of a test that has the length of
, when it is known that the reliability of a test with the length
the ratio of
. Most textbooks express this formula in Brown’s (1910) version (i.e.,
Equation 5) instead of the version of Spearman (1910; i.e., Equation 4). This formula is often
used to calculate the split-half reliability; however, only Brown (1910) suggests applying a
formula in case
(i.e., Equation 6). Let
denote the Pearson product-moment
correlation between the split-halves:
q p q
1 ( 1)
, and (5)
Second, Brown’s (1910) proof is superior to that of his competitor. Traub (1997) made an
assessment that “Brown’s proof of the formula is the more elegant” (p. 10). Compared with
Spearman’s (1910) proof that includes two pages, Brown’s proof (1910) is simpler and more
Third, there is a high likelihood that Brown (1910) was written before Spearman (1910).
Brown (1910) is a part of the author’s doctoral dissertation, and when the paper was published,
Brown had already obtained a doctoral degree from the University of London. Spearman (1910)
criticized Brown (1910), which indicates that Spearman was well aware of the contents of Brown
(1910). However, Brown (1910) criticized only Spearman (1904), not Spearman (1910). It is
unlikely a coincidence that the two rivals who belonged to the same university published the
same formula in the same journal at the same time; it is likely that Brown (1910) influenced
Finally, Brown comes before Spearman in alphabetical order. Determining whose
research achievements are superior or whose proof is more elaborate may depend on subjective
judgments, which makes it necessary to rely on objective principles for a delicate determination
such as the current issue. According to the criteria set by the American Psychological
Association, two or more researchers should be listed in alphabetical order. The Brown-
Spearman formula is the name that meets this principle.
3.2 The Flanagan-Rulon Formula and Guttman’s L4
The history of split-half reliability, which was presented after the Brown-Spearman formula, is
also not well known. The assumption to use the Brown-Spearman formula as a split-half
reliability coefficient is that the variances between each split half are equal. It has been explained
that Flanagan (1937), Guttman (1945;
), Rulon (1939), and Mosier (1941) independently
developed reliability coefficients that may be used if the variances between the two halves are
unequal (Cho, 2016; Cronbach, 1951; Raju & Guttman, 1965). However, the manner in which
Flanagan and Rulon contributed to the development of this formula has not been described in
The manner in which this formula was first developed and publicized is unique: Rulon
(1939) published the formula first developed by Flanagan. It is difficult to recognize Flanagan
(1937) as the first researcher to present this formula because the study did not explicitly state the
reliability formula or explain the calculation process prior to presenting several reliability
estimates. Rulon (1939) is the first study that proposed the formula, which presented the two
formulas, with an indication that the second formula is easier to calculate: Let
denote the variances of
, respectively. That is,
12 1 2 12 1 2
12 2 2
1 2 12 1 2
, and (7)
However, Rulon (1939) specified that Flanagan personally explained both formulas to him.
While writing the paper, Rulon briefly went on sabbatical to work with Flanagan. In sum,
Flanagan published the formula he developed in a paper published by his colleague, not in his
is one of the six lower bounds the author proposed. Although the
study suggested the utility of maximal
with the following statement, it was difficult to
previously push the idea further given the lack of computer technology: “It is desirable, of
course, to try to split the test in such a manner as to maximize [
]” (p. 260).
The split-half reliability formula is referred to differently depending on how it is used.
The three formulas previously described are algebraically equivalent (Cho, 2016). The name
is mainly used when Guttman’s (1945) lower bound concept is used to obtain many split-half
reliability values to choose from rather than calculating only one split-half reliability (e.g., Hunt
& Bentler, 2015; Osburn, 2000). Formulas used for other objectives invoke the names Flanagan-
Rulon or Rulon (e.g., Cortina, 1993; Green, 2003; Miller, 1995). Thus, the same formula is
referred to differently in different situations.
3.3 Standardized Alpha
The name standardized alpha results in a misconception in regards to the features of the
coefficient. First, the name gives the impression that this reliability coefficient is a type of alpha.
Previous studies have not strictly distinguished between alpha and standardized alpha in their
use. For example, Cho and Kim (2015), who provide examples to explain the features of alpha,
use the formula for standardized alpha instead of alpha.
Second, the name alpha induces users to prefer standardized alpha to alpha. Previous
studies explain that there are two types of alpha. For example, Yu (2001) suggests that raw alpha
and standardized alpha are two components of Cronbach’s alpha. The word standardized has a
more positive association than terms such as unstandardized or raw; thus, users without
background knowledge may prefer standardized alpha to the other alpha.
If the name alpha was not included in the formula in question, this confusion and
misunderstanding may not have occurred. Considering the characteristics of the formula, there is
no reason to include the word alpha in the name of the coefficient. The relationship between
standardized alpha and alpha is analogous to the relationship between the Brown-Spearman
formula (i.e., Equation 6) and the Flanagan-Rulon formula (i.e., Equation 7 or 8). Thus, the two
formulas are independent of each other. Historical evidence also suggests that there is no reason
to include the Greek letter in the name. Cronbach (1951) did not use the term “standardized
alpha” or recommend the use of the formula. The term “standardized alpha” is inappropriate for
the characteristics and history of this reliability coefficient, whose records must be reviewed to
understand this mislabeling.
Few previous studies delineate the history of standardized alpha, and studies that address
the reliability coefficient (Falk & Savalei, 2011, Hayashi & Kamata, 2005) do not mention the
origin of the formula. Unlike other reliability coefficients, standardized alpha does not have an
uncontroversial developer in the records, resulting in a unique genesis.
SPSS (currently owned by IBM) contributed to the popularity of standardized alpha.
SPSS was first developed for non-commercial use; however, it changed directions to the
commercial world with the establishment of SPSS Inc. in 1975. A search on Google Scholar
does not identify studies that used the term “standardized alpha” prior to 1975, which is when the
number of papers that reported standardized alpha values increased. The common source cited by
these papers is SPSS User’s Guide (Specht, 1975). SPSS not only named but also raised the level
of utility of this formula, which previously had been little used.
This formula was rediscovered by SPSS; however, it would not be prudent to declare that
it was first developed by a private company. The formula of standardized alpha has a similar
form to the Brown-Spearman formula. If we assume that the reliability of the previous test (
is the same as the average of the Pearson correlation coefficient (
/ ( 1)
Equation 5, the result will be the standardized alpha formula subsequently presented. The
difference between the two lies not in the formula itself but in the interpretation of the formula.
The Brown-Spearman formula has customarily been used to estimate the split-half reliability
, it has not been used as an independent reliability coefficient.
Considering the form of the formula, the first developers of standardized alpha are Brown (1910)
and Spearman (1910). McDonald (1999) also refers to standardized alpha as the Spearman-
1 ( 1)
3.4 Composite Reliability and McDonald’s Omega
Before beginning the discussion, a congeneric measurement model (Jöreskog, 1971) is
explained. The test score
is the weighted sum of the observed score
, from item
( 1, , )ik
is separated into the sum of two uncorrelated
unobserved components of the true score
and the error score
. Similar to Jöreskog (1971),
this study assumes that there is no specific factor. A congeneric model has a true score
i i i
, which, as such, is
i i i i
X F e
. This study assumes that the
errors among items are uncorrelated with each other (i.e.,
( , ) 0
Cov e e
) and the
variance of the latent variable
is 1.0 (i.e.,
( ) 1Var F
), whereas the expected value of
( ) 0
is referred to as the factor loading of item
The reliability coefficient based on a congeneric model was first presented by Jöreskog
(1971). Along with Jöreskog’s (1971) original version (
), this study presents a non-matrix
). Typical users use a unit-weighted sum (i.e.,
) and are unfamiliar with
matrix algebra. The version that most textbooks feature (
) was first proposed by Werts,
Linn, and Jöreskog (1974). The two studies described in this paragraph do not specifically label
the formula. To express gratitude for the scholar who first proposed the formula, it should be
named the Jöreskog’s formula or the Jöreskog-Werts formula; however, it is referred to entirely
2 2 2
i i i e
This coefficient answers to different names depending on the characteristics of the
research. Substantive studies typically refer to it as the composite reliability, and methodological
studies most commonly refer to it as the omega coefficient or McDonald’s omega. Composite
reliability is shorthand for the reliability of composite scores and is an inappropriate name for a
specific reliability coefficient (Cho & Kim, 2015). Because of these problems, an increasing
number of studies use the name omega. This study provides a criticism of the utility and
historical basis of the term omega.
A name’s utility originates from increased precision and efficiency of communication;
however, the term omega results in confusion. In literature on the subject of reliability, the
omega coefficient refers to a wide variety of reliability coefficients. The omega of Heise and
Bohrnstedt (1970) and McDonald’s omega share common features; however, they are different
formulas. McDonald (1978, 1985, 1999) referred to various unidimensional and
multidimensional reliability coefficients based on an exploratory factor analysis (EFA) and
confirmatory factor analysis (CFA) as all omega. The use of the term omega coefficient without
an explanation of the context will prevent the user from communicating the exact formula he or
she is attempting to use.
To determine the historical basis for the omega coefficient, McDonald (1970, 1985) must
be reviewed. McDonald (1970) included a reliability formula denoted as theta in the appendix of
the paper. Its original version (
) and non-matrix version ( ) are as follows:
w C w
( , )
i j i j
ww Cov X X
McDonald (1985) referred to the formula that is algebraically equivalent to Equation 12 as
omega and declared that McDonald’s (1970) theta will be renamed omega. When
, McDonald’s (1985) formula is indicated as follows:
McDonald (1999) explicitly stated that his omega coefficient was first suggested in McDonald
(1970). McDonald (1985, 1999) did not cite Jöreskog (1971) or Werts et al. (1974). He implied
that the first study on this reliability coefficient is not Jöreskog (1971) but McDonald (1970), and
this is the reason that this coefficient is referred to as McDonald’s omega. The following sections
contain a review of this assertion.
The formulas suggested by Jöreskog (1971) and McDonald (1970) appear similar;
however, they mean different things, considering the context and periodic backgrounds in which
the two formulas were presented. In this regard, three pieces of evidence are proposed.
First, McDonald (1970) proposed the formula in the context of EFA, not CFA. The title
“the theoretical foundations of principal factor analysis, canonical factor analysis, and alpha
factor analysis” is telling of the characteristics of this paper. Bentler (1968) and Heise and
Bohrnstedt (1970) also discussed reliability in terms of EFA. If McDonald’s (1970) omega can
be considered the general expression of Equation 11, other previous studies may be subject to the
same line of reasoning.
Second, Jöreskog (1971) answered a more central question. The author explained how to
produce reliability estimates (i.e.,
) in contrast to McDonald (1970). Equation 13
appeared only in the appendix of McDonald (1970), without related comments in the body. If
this formula was one that substantially stood out compared with previous achievements in the
field, it would not have been presented in such a minor manner. While it was relatively less
difficult to come up with a reliability coefficient, at the time, an important technical obstacle was
the estimation of the parameters of the formula. In an attempt to resolve this problem, Jöreskog
addressed the issue in multiple studies (e.g., Jöreskog, 1969, 1970, 1971).
Third, the denominators of the formulas are different. In Jöreskog’s (1971) formula, the
denominator expresses fitted covariances. From a contemporary perspective, the denominator of
McDonald’s (1970) formula may be understood to be a general expression that may express both
observed covariances and fitted covariances. However, the early 1970s was a time in which
knowledge regarding parameter estimation of CFA was not sufficient. Heise and Bohrnstedt
(1970), who expressed the denominator in a similar approach to that of McDonald (1970),
interpreted it in terms of observed covariances. The denominator in McDonald’s (1970) style
must be understood as indicating observed covariances.
We add a comment to prevent misunderstandings about McDonald. The discussion so far
on whose merit is greater is limited to the reliability coefficient based on a congeneric
measurement model, or a unidimensional CFA model. McDonald (1999) pioneered reliability
coefficients based on multidimensional CFA models, and his contribution and originality cannot
be overemphasized. It is highly likely that he referred to the various reliability formulas as
omega coefficients to help readers easily understand his book through consistent expression. His
reader-friendly explanations were very effective, as can be observed from the high impact that
his book has had on the field of psychometrics.
The ideal name of a tool is informative and consistent. For example, iron clubs in golf are named
from one to nine, with a difference in one number indicating a driving distance of ten yards.
Under this system, remembering the driving distance of one club will enable the user to easily
predict the driving distance of other irons. Iron clubs did not originally have this systematic
naming system: until the 1920s, they had irregular names that did not indicate (at least not to
individuals without background knowledge) each club’s characteristics (e.g., Mashie-Niblick).
As soon as the contemporary naming system was created, golf equipment companies did not
hesitate to abandon the conventional system in their quest to attract new customers. If the
industry has made success through name changes, could academia also benefit from change?
Reliability coefficients are also a type of tool. The goals of researchers who investigate
tools do not stop short of developing good devices; instead, they extend far beyond to help users
correctly utilize pre-existing tools. Users tend not to understand the mathematical formula that
underlies the reliability coefficients; thus, our goal as researchers of the tool should be not only
to help users understand the formulas but also to lead them to choose the correct reliability
coefficient without a deep understanding of the formulas. The names of reliability coefficients
should be considered not as a given constraint that cannot be changed but as a research topic that
should be investigated.
We have looked at the history of reliability coefficients. The reason we examined history
is to show that the current names are pseudo-historical. At first glance, it seems to be based on
history, but it actually has a name that is against historical facts. We do not claim that the names
of reliability coefficients should be historical. Knowing the history of each coefficient and the
names of the originators does not help us to use the reliability coefficient. Our argument is that the
names should be ahistorical (i.e., without concern for history). To keep the analogy of the golf
club, it is not at all important to the user who originally invented the 7 iron. However, a naming
system that gives information about when to pick a 7 iron is most helpful.
Table 1 shows the systematic nomenclature proposed by Cho (2016). It answers the
question of under what conditions the formula should be used and follows the consistent format of
“(data feature) + reliability”. For example, the prerequisite for alpha to equal the reliability is that
the data are tau-equivalent, so the name tau-equivalent reliability was proposed. The use of
ahistorical names will encourage users to correctly implement reliability coefficients. Most users
use alpha automatically for all data sets regardless of assumptions such as tau-equivalency. Despite
criticism from many previous studies (e.g., Green & Yang, 2009), this old habit has barely changed.
As long as we continue to use the name alpha, it is difficult to expect fundamental changes in
practice. If we begin to use the term tau-equivalent reliability instead of alpha, users will be able
to clearly understand which conditions necessitate this coefficient.
Insert Table 1 about here
Bedell, R. (1940). Scoring weighted multiple keyed tests on the IBM counting sorter.
Psychometrika, 5, 195-201. doi:10.1007/BF02288565.
Bentler, P. M. (1968). Alpha-maximized factor analysis (alphamax): Its relation to alpha and
canonical factor analysis. Psychometrika, 33, 335-345. doi:10.1007/BF02289328.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability.
Psychometrika, 74, 137–143. doi:10.1007/s11336-008-9100-1.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British
Journal of Psychology, 3, 296-322.
Brown, W. (1911). The essentials of mental measurement. London: Cambridge University Press.
Cho, E. (2016). Making reliability reliable: A systematic approach to reliability coefficients.
Organizational Research Methods, 19, 651-682. doi:10.1177/1094428116656239.
Cho, E., & Kim, S. (2015). Cronbach’s coefficient alpha: Well known but poorly understood.
Organizational Research Methods, 18, 207-230. doi:10.1177/1094428114555994.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications.
Journal of Applied Psychology, 78, 98-104. doi:10.1037/0021-9010.78.1.98.
Cowles, M. (2005). Statistics in psychology: An historical perspective. New York: Psychology
Cronbach, L. J. (1943). On estimates of test reliability. Journal of Educational Psychology, 34,
Cronbach, L. J. (1947). Test reliability; its meaning and determination. Psychometrika, 12, 1-16.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A
liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137-163.
Cronbach, L. J., Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor
procedures. Educational and Psychological Measurement, 64, 391-418.
Edgerton, H. A., & Thomson, K. F. (1942). Test scores examined with the Lexis ratio.
Psychometrika, 7, 281-288. doi:10.1007/BF02288629.
Falk, C. F., & Savalei, V. (2011). The relationship between unstandardized and standardized
alpha, true reliability, and the underlying measurement model. Journal of Personality
Assessment, 93, 445-453. doi:10.1080/00223891.2011.594129.
Ferguson, G. A. (1951). A note on the Kuder-Richardson formula. Educational and
Psychological Measurement, 11, 612-615. doi:10.1177/001316445101100409.
Flanagan, J. C. (1937). A proposed procedure for increasing the efficiency of objective tests.
Journal of Educational Psychology, 28, 17-21. doi:10.1037/h0057430.
Green, S. B. (2003). A coefficient alpha for test-retest data. Psychological Methods, 8, 88-101.
Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index
of test unidimensionality. Educational and Psychological Measurement, 37, 827–838.
Green, S. B., & Yang, Y. (2009). Commentary on coefficient alpha: A cautionary tale.
Psychometrika, 74, 121–135. doi:10.1007/s11336-008-9098-4.
Gulliksen, H. (1950). Theory of mental tests. New York, NY: John Wiley & Sons.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282.
Hayashi, K., & Kamata, A. (2005). A note on the estimator of the alpha coefficient for
standardized variables under normality. Psychometrika, 70, 579-586.
Heise, D. R., & Bohrnstedt, G. W. (1970). Validity, invalidity, and reliability. Sociological
Methodology, 2, 104-129. doi:10.2307/270785.
Heiser, W., Hubert, L., Kiers, H., Köhn, H.-F., Lewis, C., Muelman, J., . . . Takane, Y. (2016).
Commentaries on the ten most highly cited Psychometrika articles from 1936 to the
present. Psychometrika, 81, 1177–1211. doi:10.1007/s11336-016-9540-y.
Hoyt, C. J. (1941a). Note on a simplified method of computing test reliability. Educational and
Psychological Measurement, 1, 93-95.
Hoyt, C. J. (1941b). Test reliability estimated by analysis of variance. Psychometrika, 6, 153-
Hunt, T. D., & Bentler, P. M. (2015). Quantile lower bounds to reliability based on locally
optimal splits. Psychometrika, 80, 182-195. doi:10.1007/s11336-013-9393-6.
Jackson, R. W., & Ferguson, G. A. (1941). Studies on the reliability of tests. University of
Toronto Department of Educational Research Bulletin, 12, 132.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika, 34, 183-202. doi:10.1007/BF02289343.
Jöreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57,
Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109-
Kelley, T. L. (1924). Note on the Reliability of a Test: A reply to Dr. Crum’s criticism. Journal
of Educational Psychology, 15, 193–204. doi:10.1037/h0072471.
Kelley, T. L. (1942). The reliability coefficient. Psychometrika, 7, 75-83.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability.
Psychometrika, 2, 151-160. doi:10.1007/BF02288391.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:
McDonald, R. P. (1970). Theoretical canonical foundations of principal factor analysis,
canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and
Statistical Psychology, 23, 1-21. doi:10.1111/j.2044-8317.1970.tb00432.x.
McDonald, R. P. (1978). Generalizability in factorable domains: “Domain validity and
generalizability”. Educational and Psychological Measurement, 38, 75-79.
McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical
and Statistical Psychology, 34, 100–117. doi:10.1111/j.2044-8317.1981.tb00621.x.
McDonald, R. P. (1985). Factor analysis and related methods. Hillsdale, NJ: Lawrence Erlbaum.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical
test theory and structural equation modeling. Structural Equation Modeling: A
Multidisciplinary Journal, 2, 255-273. doi:10.1080/10705519509540013.
Mosier, C. I. (1941). A short cut in the estimation of split-halves coefficients. Educational and
Psychological Measurement, 1, 407–427. doi:10.1177/001316444100100133.
Osburn, H. G. (2000). Coefficient alpha and related internal consistency reliability coefficients.
Psychological Methods, 5, 343-355. doi:10.1037/1082-989X.5.3.343.
Raju, N. S., & Guttman, I. (1965). A new working formula for the split-half reliability model.
Educational and Psychological Measurement, 25, 963-967.
Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments
on sijtsma. Psychometrika, 74, 145–154. doi:10.1007/s11336-008-9102-z.
Rulon, P. J. (1939). A simplified procedure for determining the reliability of a test by split-
halves. Harvard Educational Review, 9, 99-103.
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha.
Psychometrika, 74, 107-120. doi:10.1007/s11336-008-9101-0.
Sijtsma, K. (2015). Delimiting Coefficient α from Internal Consistency and Unidimensionality.
Educational Measurement: Issues and Practice, 34, 10–13. doi:10.1111/emip.12099.
Spearman, C. (1904). The proof and measurement of association between two things. American
Journal of Psychology, 15, 72-101.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology,
1904-1920, 3, 271-295. doi:10.1111/j.2044-8295.1910.tb00206.x.
Specht, D. A. (1975). SPSS: Statistical package for the social sciences, version 6: Users guide to
subprogram reliability and repeated measures analysis of variance. Ames, IA: Iowa
Traub, R. E. (1997). Classical test theory in historical perspective. Educational Measurement:
Issues and Practice, 16, 8-14. doi:10.1111/j.1745-3992.1997.tb00603.x.
Tucker, L. R. (1949). A note on the estimation of test reliability by the Kuder-Richardson
formula (20). Psychometrika, 14, 117-119. doi:10.1007/BF02289147.
van der Ark, L. A., van der Palm, D. W., & Sijtsma, K. (2011). A latent class approach to
estimating test-score reliability. Applied Psychological Measurement, 35, 380–392.
Vehkalahti, K. (2000). Reliability of measurement scales: Tarkkonen’s general method
supersedes Cronbach’s alpha (Statistical Research Reports, Vol. 17). Helsinki, Finland:
Finnish Statistical Society.
Watson, J. D., & Crick, F. H. C. (1953). Molecular structure of nucleic acids; A structure for
deoxyribose nucleic acid. Nature, 171, 737–738. doi:10.1038/171737a0.
Werts, C. E., Linn, R. L., & Jöreskog, K. G. (1974). Intraclass reliability estimates: Testing
structural assumptions. Educational and Psychological Measurement, 34, 25-33.
Wherry, R. J., & Gaylord, R. H. (1943). The concept of test and item reliability in relation to
factor pattern. Psychometrika, 8, 247-264. doi:10.1007/BF02288707.
Yang, Y., & Green, S. B. (2011). Coefficient alpha: A reliability coefficient for the 21st century?
Journal of Psychoeducational Assessment, 29, 377–392.
Yu, C. H. (2001). An introduction to computing and interpreting Cronbach coefficient alpha in
SAS. In Proceedings of the Twenty-Sixth Annual SAS Users Group International
Conference, Paper 246. Cary, NC: SAS Institute, Inc.
Conventional and proposed names of reliability coefficients
Split-half parallel reliability