ArticlePDF Available

History and development of the Schmidt–Hunter meta-analysis methods

Authors:

Abstract

In this article, I provide answers to the questions posed by Will Shadish about the history and development of the Schmidt-Hunter methods of meta-analysis. In the 1970s, I headed a research program on personnel selection at the US Office of Personnel Management (OPM). After our research showed that validity studies have low statistical power, OPM felt a need for a better way to demonstrate test validity, especially in light of court cases challenging selection methods. In response, we created our method of meta-analysis (initially called validity generalization). Results showed that most of the variability of validity estimates from study to study was because of sampling error and other research artifacts such as variations in range restriction and measurement error. Corrections for these artifacts in our research and in replications by others showed that the predictive validity of most tests was high and generalizable. This conclusion challenged long-standing beliefs and so provoked resistance, which over time was overcome. The 1982 book that we published extending these methods to research areas beyond personnel selection was positively received and was followed by expanded books in 1990, 2004, and 2014. Today, these methods are being applied in a wide variety of areas. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
History and development of the
SchmidtHunter meta-analysis methods
Frank L. Schmidt*
In this article, I provide answers to the questions posed by Will Shadish about the history and development
of the SchmidtHunter methods of meta-analysis. In the 1970s, I headed a research program on personnel
selection at the US Ofce of Personnel Management (OPM). After our research showed that validity studies
have low statistical power, OPM felt a need for a better way to demonstrate test validity, especially in light
of court cases challenging selection methods. In response, we created our method of meta-analysis
(initially called validity generalization). Results showed that most of the variability of validity estimates
from study to study was because of sampling error and other research artifacts such as variations in range
restriction and measurement error. Corrections for these artifacts in our research and in replications by
others showed that the predictive validity of most tests was high and generalizable. This conclusion
challenged long-standing beliefs and so provoked resistance, which over time was overcome. The 1982
book that we published extending these methods to research areas beyond personnel selection was
positively received and was followed by expanded books in 1990, 2004, and 2014. Today, these methods
are being applied in a wide variety of areas. Copyright © 2015 John Wiley & Sons, Ltd.
Keywords: meta-analysis; measurement error; history of meta-analysis; sampling error
Question 1, WS: How did the idea rst occur to you? Did you think of it as meta-analysis (because Gene Glass had
coined the term)? Did you think of it under some other rubric?
Answer, FS: That is a long story. After obtaining my industrialorganizational psychology (I-O) PhD from Purdue
in 1970, my rst job was in the Michigan State University psychology department, an excellent and stimulating
department. However, my degree was in an applied area, and I felt a little like a fraud because I was teaching what
I had little experience applying myself. I obtained tenure early but still left in 1974 for the US Ofce of Personnel
Management (OPM) in Washington, hoping to obtain real-world experience. OPM is responsible for the methods
used to hire people for the Federal workforce, and my training was in personnel selection, psychological
measurement, and statistics. This was a time of turmoil in personnel selection because court challenges to hiring
procedures under the 1964 Civil Rights were at an all time high (One OPM hiring test went all the way to the
Supreme Court, where it was upheld.). As a result, it was very important for OPM to have research evidence to
defend its selection tests in court. I was asked to conduct and direct such research.
Legal defense of hiring tests consisted mostly of criterion-related validity studies (Criterion-related validity is
the correlation between scores on a hiring procedure and later performance on the job.). One of the rst
publications from our OPM research effort showed that the average statistical power of such studies was only
about .50 (Schmidt et al., 1976). This meant that it was very risky for an employer to conduct such a study because
if the validity was not statistically signicant, the employers own study could be used against him or her in court.
Much larger sample sizes were needed to overcome this problem, and other methods of showing test validity (e.
g., content validity) were needed.
Since about 1920, the belief in I-O psychology had been that test validity was situationally specic, meaning
that separate validity studies had to be conducted in each organization and for each job. Two pieces of evidence
supported this belief: the fact that signicant validity was found in only about half of all studies and the fact that
the actual magnitude of validity estimates (regardless of signicance levels) varied widely across studieseven
when the jobs in question appeared to be similar or identical. Our statistical power article showed why the rst
Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City, IA 52242, USA
*Correspondence to: Frank Schmidt, Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City,
IA 52242 USA.
E-mail: frank-schmidt@uiowa.edu
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
Special Issue Paper
Received 27 October 2014, Revised 10 November 2014, Accepted 20 November 2014 Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1134
1
of these was not valid evidence for the situational specicity theory. However, we still had to address the second
evidential basis for the theory. This challenge led to the development of our meta-analysis methods.
In addition to my OPM job, I had an appointment at George Washington University (GWU). One day in 1975,
sitting in my GWU ofce and pondering several highly variable distributions of validity coefcients, each from a
different job, I remembered an experience that I had had in graduate school. As a PhD student at Purdue, I took
courses from Joseph Tifn and Hubert Brogden. Tifn had performed, and was then doing, validity studies in
private industry, and he reported in class that in his experience, validity ndings were highly variable and seemed
to depend on peculiarities of the local situation. This was also the position taken in textbooks and articles.
Brogden had been the Technical Director of the US Army Research Institute (ARI). The validity results that he
reported in class were quite stable and were very similar from study to study. So, I asked Brogden why there
was this difference. He said that the answer was sampling error. The army had very large samples (often 10,000
or more), so there was little sampling error in the results, while in industry, the Ns were smalloften 50, 70, or
100and there was a lot of sampling error.
I completely forgot about that until I faced the problem of the bouncing validityat OPM. I had been
indoctrinated with the idea of situational specicity of validity, but when I was looking at all the variability in
validity, I remembered Brogdens comment and decided to see if I could gure out a way to quantify the amount
of sampling error in a distribution of validitythat is, a way to calculate the amount of variance expected from
sampling errors and then subtract that amount from the observed variance of the validity coefcients. I found that
most of the variance70 to 80%was sampling error. After subtracting out sampling error variance and applying
the usual corrections for measurement error in the job performance measures and for range restriction on test
scores, the typical distribution of validity had a substantial mean of about .50 or so and an SD of around .10 to
.15. This meant that virtually all of the values were in the positive range, and therefore, it demonstrated that
validity values were generalizable across situations. When I called Jack Hunter about my results, he was positive
and enthusiastic, which was reassuring to me because I was not sure that I had not made some mistake. Together,
we then began rening these procedures into general meta-analysis methods to be used in personnel selection. In
late 1975, we wrote this workup but did not immediately submit it for publication because we wanted to enter it
into a research contest sponsored by the I-O psychology division of American Psychological Association (APA,
Division 14). One requirement was that entries could not already be published. We won that award, but as a result,
our rst meta-analysis article was not published until 1977 (Schmidt and Hunter, 1977). Gene Glass published his
rst article on meta-analysis in 1976. We were not aware of Glasswork at that time, but we later realized that had
we not delayed publication for the research award, we could have tied Glass for the rst published application of
meta-analysis.
At about that time, I was asked to be the OPM liaison to a National Research Council committee on selection
testing issues. Lee J. Cronbach was on the committee, and when I showed him our results, he told us about the
work of Gene Glass. He also stated that we should expand application of our meta-analysis methods beyond
personnel selection to psychological literatures in general, and to that end, we should write a book describing
our methods. That book was published in 1982 (Hunter et al., 1982). At this time, we also adopted Glassterm
meta-analysisto describe these procedures; we had previously used only the term validity generalization.
However, again, Glass got there rst. The Glass, McGaw, and Smith meta-analysis book came out in 1981 (As an
aside, a couple of years later, we had Barry McGaw, the middle author, give a talk at OPM. I remember that Barry
made the point that he had come between Gene Glass and his wife.At that time, the third author, Mary Lee
Smith, was Glasswife.).
Question 2, WS: Did you think it would take on such a major role in science from the start or was it a surprise to
you how quickly it developed?
Answer, FS: We initially thought of our meta-analysis methods as just a solution to the problem of validity
generalization. However, early on, Lee J. Cronbach told us that we should be thinking of wider applications,
and we realized that the methods had many potential applications in other areas of I-O psychology and in many
areas of general psychology. However, we did not at that time realize how widely the methods would eventually
be applied or that they would come to be used in areas beyond psychology. However, by the mid 1980s, this had
become apparent. The methods were being applied widely in a variety of different areas in I-O psychology,
management research, human resources, marketing, information sciences, and other areas. An overview of the
impact in I-O psychology up to 2010 is presented in DeGeest and Schmidt (2011). Ironically, the one area in which
there was strong resistance was in personnel selection, the area in which the method had originated. It appeared
that the method was not only accepted but also enthusiastically embraced in research on all relationships except
the relationship between hiring procedures and job performance. I discuss this anomaly further later.
Question 3, WS: What were the obstacles that you encountered?
Answer, FS: The one big obstacle was resistance in the area of personnel selection, both from practitioners and
from the government agencies responsible for administration of the 1964 Civil Rights Act. Many I-O practitioners
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
2
made a good living conducting validity studies for employers. The implication of our meta-analytic validity
generalization studies was that these expensive situationally specic validity studies were not necessary. Our
ndings had shown that validity could safely be generalized without such studies. The so-called enforcement
agenciesthe Equal Employment Opportunity Commission, the Ofce of Federal Contract Compliance, and the
Civil Rights Division of the Department of Justicealso resisted these new ndings, apparently on grounds that
they made it too easy for employers to demonstrate validity for selection methods on which minorities tended
to do less well than others (despite the large body of research showing that these measures predict performance
just as accurately for minorities as for others). To this day, these agencies have refused to revise and update the
government Uniform Guidelines on Employee Selection Procedures (UGLs) that were issued in 1978, before the
newer research ndings were available, despite appeals from the Society for Industrial and Organizational
Psychology (SIOP) and other groups stating that revision and updating were sorely needed. The UGLs are not only
out of date but also inconsistent with current professional standards. This is a much longer term example of what
we have today with the global warming deniers and evolution deniers: If you do not like the research evidence,
just pretend that it does not exist. Some I-O practitioners have been happy with the agenciesposition because
it meant that they could continue to get paid for conducting unneeded validity studies.
Although the UGLs have not been updated, professional standards have been. The Standards for Psychological
Testing, published by the American Educational Research Association (AERA), the APA, and the National Council on
Measurement in Education, incorporated new meta-analytic ndings on validity starting in the 1990s and
continuing in the most recent edition (AERA et al., 2014). The same is true for the Principles for Employee Selection,
published by SIOP (Society for Industrial and Organizational Psychology, 2003). There had initially been some
resistance from professional societies to meta-analytic ndings, but this changed as the supporting ndings
increased. In addition to the cumulative impact of the increasing number of meta-analytic validity studies being
published, three other factors contributed to eventual acceptance. First, the illogic of a situation in which our
meta-analysis methods were embraced and lauded for application in all areas of research except the one area of
personnel selection became apparent. People saw that this did not make sense. The second big factor was the
publication of the article Forty Questions about Validity Generalization and Meta-Analysis(Schmidt et al.,
1985). Jack Hunter and I, along with Hannah Hirsh (now Rothstein), Kenneth Pearlman, Jim Caplan, Michael
McDaniel, and others at OPM who were involved in this work, had collected a long list of criticisms of meta-
analysis that we had heard expressed and had written responses to each [As an example, one criticism contended
that sampling error was only an hypothesis, not a fact (!).]. Milton Hakel, then the editor of Personnel Psychology,a
top-tier journal, invited us to publish these 40 questions and answers. He also invited a group of eminent
psychologists to respond to our answers, and we to their responses. The result was the longest article that has
ever appeared in that journalover 100 pages. This article apparently successfully addressed most or all the
reservations that people had about validity generalization and meta-analysis because the level of opposition
dropped dramatically after that.
The last turnaround factor occurred in 1994, when the American Psychological Association awarded Jack
Hunter and me the Distinguished Scientic Contribution Award for our meta-analytic work. This is probably the
most prestigious research award in psychology (and far outweighs the award that we obtained from SIOP for this
work the following year). After that, there was really no signicant opposition. This process of acceptance took
nearly 20 years.
Question 4, WS: What did you see as the signicant events or ideas that shaped your work in meta-analysis?
Answer, FS: Many of the events and ideas that shaped our work in meta-analysis are described in answers to
previous questions. However, there are a couple of additional ones. The rst shaping event that I would list here
was the fact that Jack Hunter and I were trained in measurement. Many people who became involved in meta-
analysis were PhDs in statistics but had no training in measurement. Jack Hunter and I were psychological
researchers and so were trained not only in statistics but also in psychological measurement. The principles
and methods of measurement play a critical role in psychological research (actually in all research). It is critical
to understand measurement error and other research artifacts such as range restriction and dichotomization of
continuous measures and to be able to correct for the (often large) biases that they create. In personnel selection,
the area in which we started our work, it was accepted that in primary studies, one should correct observed
validity values for measurement error in the job performance measure and for range restriction on the test or
other selection method. Moreover, it was apparent that in theoretical research, correction should also be made
for measurement error in the independent variable measure because in that type of research, it is construct-level
relationships that are of scientic interest [as noted by Rubin (1990)]. We incorporated these corrections for bias
into our meta-analysis methods, improving and rening them over time and evaluating the accuracy of each
change via computer simulation studies. We evolved two general meta-analytic approaches: methods for
correcting each correlation or dvalue individually prior to meta-analysis and methods for making these
corrections in meta-analysis based on distributions of artifact values (reliability values and range restriction ratios).
These methods are described in our various meta-analysis books, most recently in Schmidt and Hunter (2014). A
little after our initial work, similar methods were developed by Callender and Osburn (1980) and Raju and his
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
3
associates (Raju and Burke, 1983; Raju et al., 1991). These methods produced results virtually identical to our
methods, which helped to buttress our ndings and conclusions.
Measurement error is in many ways the most important artifact because it is the only one (beyond sampling
error) that is always present in all data. My colleagues and I believe that the arc in research methods always bends
in the direction of increased accuracy. For this reason, we believe that other approaches to meta-analysis will
eventually come to include corrections for the biasing effects of measurement error (and perhaps other research
artifacts too). Hedges (2009) and Matt and Cook (2009) have acknowledged the need to correct for measurement
error.
The second shaping inuence here was the decision to use a subtractive model of meta-analysis. We faced the
need in personnel selection to estimate the distribution of true validity population parameters. This led to a model
in which variance as a result of sampling error and other artifacts is subtracted from the observed variance of
correlations or dvalues, leaving an estimate of the variability of the population parameters, which was required
for validity generalization purposes. One consequence of this was that all of our methods were random-effects
models from the beginning. We never had any xed-effects (FE) models. This turned out to be important because
the FE models are now rejected as unrealistic by most meta-analysts because of their assumption that all studies
in the meta-analysis are estimating exactly the same population parameter (cf., Schmidt et al., 2009).
Question 5, WS: Do you think you got anything wrong? What would you do differently now with 2020
hindsight? What would you keep the same?
Answer, FS: There were at least a couple of things that we did wrong. First, we had an error in our rst
published application of meta-analysis in 1977. John Callender and Bart Osburn informed us that their simulation
studies showed that there was an error, but they did not know what was causing it. This error led to
underestimation of the SD of the population parameters (i.e., true validity values). We found the cause of this error
and corrected it in a subsequent application of our meta-analysis method (Schmidt et al., 1979). When we
corrected the means of the validity distributions for measurement error and range restriction, those corrections
increased the SDs. We had previously neglected to increase the SDs appropriately.
The other thing that we did wrong was underestimating the potential seriousness of publication bias and other
forms of unrepresentativeness in study sets. This occurred because we were able to show by a variety of analyses
that there was strong evidence that this was not a problem in the personnel selection literature (Hunter and
Schmidt, 1990, chapter 4, 2004, chapter 4). The error was the assumption that this was probably also the case
in most or all other areas. There is now a lot of evidence that publication bias and other availability issues are a
serious problem in some areas, especially in biomedical areas and in social psychology lab experiments. In the
most recent edition of our meta-analysis book, these problems are explored in detail (Schmidt and Hunter,
2014, chapter 13).
Some have argued that we made a mistake in not more strongly emphasizing the detection of moderators in
meta-analysis. In validity generalization research, one need only show that all or nearly all validity values are above
some practically useful value. Variation that exists above this point does not have to be addressed for purposes of
practical application. Moreover, in any event, that variation is usually too small to be consistent with moderators of
any size. However, for applications in other research areas, we emphasized moderator detection via subgrouping
of studies. The other alternative is meta-regression, which is widely abused in published meta-analyses today.
Statistical power is typically quite low, and capitalization on sampling error is substantial (especially that as a result
ex post facto selection of moderator candidates). The result is often the deadly combination of a high Type I error
rate along with a high Type II error rate. Five additional problems in the use of meta-regression are discussed in
the current edition of our meta-analysis book (Schmidt and Hunter, 2014, chapter 9). The accurate detection of
moderators in primary research and in meta-analysis is much more difcult than many have been led to believe
(Schmidt and Hunter, 1978).
Question 6, WS: How do you view meta-analysis today from your vantage point in history?
Answer, FS: Looking back today, I view meta-analysis as an inevitable development. By the 1970s, research
literatures across almost all areas were becoming unmanageable in size. Frustration with the task of making sense
of large conicting literatures was growing. The time was just ripe; a new tool was needed. I think that this
explains why meta-analysis was invented independently several times in different areas: by Chalmers in
biomedical research in the USA, by Peto in biomedical research in the UK, by Rosenthal in social psychology, by
Glass in education, and by Jack Hunter and me in I-O psychology. Also, at about this same time, Hedges and Olkin
published major advances in meta-analytic methods. There are other examples of nearly simultaneous
developments in scienceone example being the independent invention of calculus by both Newton and Leibniz
at about the same time.
I think it was clear that by the 1970s, the alternative to meta-analysis was epistemological despairnot an
attractive alternative given the scientic ideal of cumulative knowledge. Moreover, such despair was becoming
common and even mainstream. Even as eminent a methodologist as Lee J. Cronbach appears to have succumbed
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
4
to the temptation of epistemological despair. Cronbach (1975) argued that social and psychological reality
changes so fast and is so ephemeral that cumulative scientic knowledge is impossible in psychology and the
social sciences. In social psychology, Gergen (1982) was a well-known proponent of epistemological despair.
Meta-analysis lifted this veil of despair. It showed that cumulative knowledge was in fact possible.
Question 7, WS: What colleagues or events inuenced you to elevate the eld of meta-analysis? How did those
colleagues or events inuence you?
Answer, FS: As I indicated earlier, the encouragement from Lee J. Cronbach was important. However, we also
had support from other equally eminent psychologists. Anne Anastasi praised our methods and ndings and
featured them in the successive editions of her widely used textbook, Psychological Testing. Lloyd Humphries at
the University of Illinois was very supportive, as were Paul Meehl, Marvin Dunnette, and Tom Bouchard at the
University of Minnesota. I believe that these eminent gures were instrumental in the APA Distinguished Scientic
Contributions award that Jack Hunter and I received in 1994. I guess you could say that we were supported from
the commanding heights by some of the top brass in psychology, while at the same time being criticized and
rejected by many of the troops on the ground. Many personnel specialists and I-O practitioners were hostile to
our work because it refuted the 80-year-old belief in situational specicity of test validity and obviated the need
for the kinds of validity studies that they conducted.
Our meta-analysis methods also received great support from our colleagues at OPM, who promoted the
methods and who conducted and published many studies using these methods. These include Hannah Rothstein
(then Hannah Hirsh), Michael McDaniel, Kenneth Pearlman, Marvin Trattner, Lois Northrup, Ilene Gast, Murray
Mack, Deborah Whetzel, Guy Shane, Brian Stern at the ARI, and others. Later, at the University of Iowa, this list
included Deniz Ones, Vish Viswesvaran, Kenneth Law, Crissie Fry, Michael Judiesch, Kuh Yoon, Huy Le, Kevin
Carlson, Marc Orlitzky, In-Sue Oh, Jon Shaffer, Ben Postlethwaite, and others. The death of Jack Hunter in 2002
was a terrible loss. As betting his eminent contributions, his obituary was published in The American Psychologist
(Schmidt, 2003).
Question 8, WS: How has your teaching of meta-analysis evolved over the years?
What would you like to say about your former students and their contributions to the eld of meta-analysis?
Answer, FS: One change in my teaching over time was the increase in emphasis that I placed on applications of
meta-analysis in diverse areasbeyond just personnel selection. In fact, these other areas came to dominate my
teaching. This extended into meta-analysis of experimental studies. One reason for this change was the greater
acceptance of the methods in those other areas. Another was the fact that the PhD students in my meta-analysis
course came from a wide variety of areasmarketing, clinical psychology, engineering, sociology, human
resources, organizational behavior, nursing, and education. Almost all of these students submitted their required
class meta-analyses for publication, and almost all of these were published. This was also true of those students
whose dissertations were based on meta-analysis. Another change in my teaching is that the range of
methodological topics that I covered in my PhD meta-analysis course increased. One example is the introduction
of coverage of second-order sampling error. Meta-analysis greatly reduces sampling error in comparison to
individual studies but does not completely eliminate it. The remaining sampling error is second-order sampling
error. This emphasis eventually led to a new method of conducting second-order meta-analysis (Schmidt and
Oh, 2013), making possible the meta-analysis of meta-analyses.
The answer to the second part of this question can be found in my responses to questions 7 and 9.
Question 9, WS: What is your favorite (or among your favorites) application of meta-analysis and why?
Answer, FS: This is a difcult question because there have been so many excellent applications of our meta-
analysis methods. I will just select some of the larger ones that have had a big impact. The rst is the large
meta-analysis that Jack Hunter performed for the US Employment Service in the US Department of Labor (Hunter,
1983). This included a huge database (515 studies) on the validity of the general intelligence measure of General
Aptitude Test Battery, and it demonstrated generalizable validity essentially for all jobs in the US economy, with
the magnitude of validity depending on the complexity of the job family. The main ndings were later presented
in an article in the Psychological Bulletin (Hunter and Hunter, 1984).
Another is the Pearlman et al. (1980) meta-analysis of selection test validity values for clerical workers. This
study included over 650 validity studies focusing on a variety of test types and spanning a time period from
1920 to 1979. The largest meta-analysis in this study included 882 validity coefcients. This study is still a widely
cited classic. A recently published meta-analysis of studies conducted after the 1979 cutoff of Pearlman et al.
found essentially identical results (Whetzel et al., 2011), indicating stability of validity values over a period of
90 years despite the many changes in clerical duties and tasks that had occurred.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
5
Still another is the McDaniel et al. (1994) meta-analysis of employment interview validity values, also a citation
classic. Next, there is the Ones et al. (1993) meta-analysis of the validity of integrity tests used in hiring. This study
is probably the largest of all in terms of the sheer amount of data that it included. It has also been cited many
times.
Then, there is the Orlitzky et al. (2003) meta-analysis of the relation between corporate social responsibility and
corporate nancial outcomes. This one not only is a citation classic but also has been reprinted in three books, and
it won the 2004 Moskowitz Award for Finance Research.
Finally, there is the Harter et al. (2002) meta-analysis of the relation between average level of employee job
engagement and the business outcomes of revenue, prot, customer satisfaction, and low employee turnover.
This one has also been highly cited.
All of these studies required a great deal of effort over an extended period of time. Meta-analysis does not
provide an easy road to fame and fortune.
Question 10, WS: What is your favorite (or among your favorites) methodology in meta-analysis and why?
Answer, FS: As you might expect, my favorite methodologies are the methods presented in the four books on
meta-analysis that I coauthored: Hunter et al. (1982), Hunter and Schmidt (1990, 2004), and Schmidt and Hunter
(2014). Jack Hunter took the lead in the 1982 and 1990 books; they could not have been written without his
contributions. A major advantage and unique feature of the methods in these books are that they simultaneously
take into account both sampling error and measurement error, the two distorting factors that are present in all
data sets and in all studies. They also allow for the biasing effects of other research artifacts such as range
restriction and dichotomization of continuous measures to be corrected when they are present. Other approaches
to meta-analysis do not do this.
Question 11, WS: Did you see your work on meta-analysis mainly as a statistical exercise or mainly as a review of
the evidence?
Answer, FS: Jack Hunter and I (and our colleagues too) did not see meta-analysis as a mere statistical exercise.
We saw it as a path to improved epistemology in researcha successful and superior way to attain cumulative
knowledge and establish general scientic principles in spite of the variability in individual study ndings and
in spite of the confusion created by the nearly universal reliance on statistical signicance testing in the analysis
of research data. This was a continuation of the view that inspired our earlier pre-meta-analytic work.
The basic question was always what do data really mean and how can we extract reliable knowledge from data
(cf., Schmidt, 1992, 2010). This included our work promoting the use of condence intervals over signicance tests
(Schmidt, 1996; Schmidt and Hunter, 1997), the detection and calibration of moderator variables (Schmidt and
Hunter, 1978), the problem of instability of regression weights in data analysis (Schmidt, 1971, 1972), the chance
frequency of racial differences in test validity (Schmidt et al., 1973; Hunter et al., 1979), and other work in this vein.
So, our view of meta-analysis was that it was a continuation of this epistemological quest.
Question 12, WS: Can you reect on the relative roles of the broader eld of systematic review versus meta-
analysis now compared with when you started?
Answer, FS: This distinction appears to be a matter of terminology. Some use the term meta-analysis to
designate only the quantitative procedures of data analysis in meta-analysis, and they view the term systematic
review as a broader term that includes the search for studies, the coding of studies, and the interpretation and
presentation of the results. However, in my eld, the term meta-analysis includes all of these things, not just
the quantitative data analysis procedures. My impression is that this is the usage favored today by most people
concerned with meta-analysis. It is possible that the term systematic review originated in biomedical research,
and not in other areas that make use of meta-analysis. It is my impression that in the biomedical area, there
existed a concept of systemic review before meta-analysis developed, and when meta-analysis methods came
along, they were added as the quantitative component of these systematic reviews.
References
American Educational Research Association, American Psychological Association, National Council on
Measurement in Education. 2014. Standards for educational and psychological testing. Author: Washington, DC.
Callender JC, Osburn HG. 1980. Development and test of a new model for validity generalization. Journal of Applied
Psychology 65: 543558.
Cronbach LJ. 1975. Beyond the two disciplines of scientic psychology revisited. American Psychologist 30:
116127.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
6
DeGeest D, Schmidt FL. 2011. The impact of research synthesis methods on Industrial/Organizational Psychology:
The road from pessimism to optimism about cumulative knowledge. Research Synthesis Methods 1: 185197.
Gergen KJ. 1982. Toward transformation in social knowledge. Springer-Verlag: New York.
Glass GV, McGaw B, Smith ML. 1981. Meta-analysis in social research. Sage: Beverly Hills, CA.
Harter JK, Schmidt FL, Hayes TL. 2002. Business unit level relationships between employee satisfaction/
engagement and business outcomes: A meta-analysis. Journal of Applied Psychology 87: 268279.
Hedges LV. 2009. Statistical considerations. In Cooper H, Hedges LV, Valentine JC (eds). Handbook of research
synthesis and meta-analysis, 2nd edn. Russell Sage: New York, 3748.
Hunter JE. 1983. Test validation for 12,000 jobs: An application of job classication and validity generalization to
the General Aptitude Test Battery (GATB). Test Research Report No. 45, U.S. Department of Labor, U.S.
Employment Service, Washington, DC.
Hunter JE, Hunter RF. 1984. Validity and utility of alternative predictors of job performance. Psychological Bulletin
96:7298.
Hunter JE, Schmidt FL. 1990. Methods of meta-analysis: Correcting error and bias in research ndings. Sage:
Thousand Oaks, CA.
Hunter JE, Schmidt FL. 2004. Methods of meta-analysis: Correcting error and bias in research ndings, 2nd edn.
Sage: Thousand Oaks, CA.
Hunter JE, Schmidt FL, Hunter RF. 1979. Differential validity of employment tests by race: A comprehensive review
and analysis. Psychological Bulletin 86: 721735.
Hunter JE, Schmidt FL, Jackson GB. 1982. Meta-analysis: Cumulating research ndings across studies. Sage: Beverly
Hills, CA.
Matt GE, Cook TD. 2009. Threats to the validity of generalized inferences. In Cooper H, Hedges LV, Valentine JC
(eds). Handbook of research synthesis and meta-analysis, 2nd edn. Russell Sage: New York, 537560.
McDaniel MA, Whetzel DL, Schmidt FL, Mauer S. 1994. The validity of employment interviews: A comprehensive
review and meta-analysis. Journal of Applied Psychology 79: 599616.
Ones DS, Viswesvaran C, Schmidt FL. 1993. Comprehensive meta-analysis of integrity test validities: Findings and
implications for personnel selection and theories of job performance. Journal of Applied Psychology Monograph
78: 679703.
Orlitzky M, Schmidt FL, Rynes SL. 2003. Corporate social and nancial performance: A meta-analysis.
Organizational Studies 24: 403441.
Pearlman K, Schmidt FL, Hunter JE. 1980. Validity generalization results for tests used to predict job prociency
and training criteria in clerical occupations. Journal of Applied Psychology 65: 373407.
Raju NS, Burke MJ. 1983. Two new procedures for studying validity generalization. Journal of Applied Psychology
68: 382395.
Raju NS, Burke MJ, Normand J, Langlois GM. 1991. A new meta-analysis approach. Journal of Applied Psychology 76:
432446.
Rubin D. 1990. A new perspective on meta-analysis. In Wachter KW, Straf ML (eds). The future of meta-analysis.
Russell Sage: New York, 155166).
Schmidt FL. 1971. The relative efciency of regression and simple unit predictor weights in applied differential
psychology. Educational and Psychological Measurement 31: 699714.
Schmidt FL. 1972. The reliability of differences between linear regression weights in applied differential
psychology. Educational and Psychological Measurement 32: 879886.
Schmidt FL. 1992. What do data really mean? Research ndings, meta-analysis, and cumulative knowledge in
psychology. American Psychologist 47: 11731181.
Schmidt FL. 1996. Statistical signicance testing and cumulative knowledge in psychology: Implications for the
training of researchers. Psychological Methods 1: 115129.
Schmidt FL. 2003. John E. Hunter, 1939 2002. American Psychologist 58: 238.
Schmidt FL. 2010. Detecting and correcting the lies that data tell. Perspectives on Psychological Science 5: 233242.
Schmidt FL, Hunter JE. 1977. Development of a general solution to the problem of validity generalization. Journal
of Applied Psychology 62: 529540.
Schmidt FL, Hunter JE. 1978. Moderator research and the law of small numbers. Personnel Psychology 31: 215232.
Schmidt FL, Hunter JE. 1997. Eight common but false objections to the discontinuation of signicance testing in
the analysis of research data. In Harlow L, Muliak S, Steiger J (eds). What if there were no signicance tests?
Lawrence Erlbaum: Mahwah, NJ, 3764.
Schmidt FL, Hunter JE. 2014. Methods of meta-analysis: Correcting error and bias in research ndings, 3rd edn.
Sage: Thousand Oaks, CA.
Schmidt FL, Oh I-S. 2013. Methods for second order meta-analysis and illustrative applications. Organizational
Behavior and Human Decision Making 121: 204218.
Schmidt FL, Berner JG, Hunter JE. 1973. Racial differences in validity of employment tests: Reality or illusion?
Journal of Applied Psychology 58:59.
Schmidt FL, Hunter JE, Pearlman K, Hirsh HR. 1985. Forty questions about validity generalization and meta-analysis.
Personnel Psychology 38: 697798.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
7
Schmidt FL, Hunter JE, Urry VE. 1976. Statistical power in criterion-related validation studies. Journal of Applied
Psychology 61: 473485.
Schmidt FL, Hunter JE, Pearlman K, Shane GS. 1979. Further tests of the Schmidt-Hunter Bayesian validity
generalization model. Personnel Psychology 32: 257281.
Schmidt FL, Oh I-S, Hayes TL. 2009. Fixed versus random models in meta-analysis: Model properties and
comparison of differences in results. British Journal of Mathematical and Statistical Psychology 62:97128.
Society for Industrial and Organizational Psychology. 2003. Principles for the validation and use of personnel
selection procedures, 4th edn. Author: Bowling Green, OH.
Whetzel DL, McCloy RA, Hooper A, Russell TL, Waters SD, Campbell WJ, Ramos RA. 2011. Meta-analysis of clerical
performance predictors: Still stable after all these years. International Journal of Selection and Assessment 19:
4150.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
8
... Meta-analysis is the statistical approach to synthesizing quantitative research results in SLR (Retnawati et al., 2018). The research process for this study was constructed into three steps and followed Schmidt's (2015) meta-analysis procedures. Firstly, an SLR is conducted, followed by a screening and coding process. ...
... This study follows the meta-analysis procedures conducted by Schmidt (2015) and the strategy undertaken by Bhatia and Gulati (2021). The analysis begins with data extraction by calculating the effect size, including finding variance (Vz) and standard error effect size (SEz), testing the heterogeneity, and calculating the summary effect by creating a forest plot using JASP software version 0.14.1 of 2020. ...
... The Standard Error of Effect Size (SEz) is used in the meta-analysis to estimate the accuracy or precision of each study's calculated effect size. SEz is usually used in meta-analyses that use a correlation-based effect size, such as the Pearson correlation coefficient or the product-moment correlation coefficient (Schmidt, 2015). ES is a statistical measure used in meta-analysis to describe the magnitude of the effect or difference between two or more groups or conditions being compared. ...
Article
Full-text available
By synthesizing quantitative research results in 39 studies on how corporate governance mechanisms impact Corporate Social Responsibility (CSR) activities, this study conducts a Hunter-Schmidt meta-analysis to investigate the role of the Supervisory Board (SB) on the effectiveness of CSR activities. This study examines the role of the board independence, non-executive directors, and outside directors in CSR performance, which measured by environmental policy, and corporate social performance. Using JASP software, this study found that based on the previous studies in the quantitative approach, the independence board and the non-executive director influence the CSR disclosure positively and significantly. In contrast, earlier studies found that the outside director had a contradictory result. This meta-analysis offers a notable outcome in that the high quality of the publication provides evidence related to the relationship between SB's CSR activities, and the monitoring system provides adequate supervision to encourage the executive to concern the stakeholders and shareholders more equally.
... Furthermore, the random-effect model has an advantage over the fixed-effect model because it contains a component that captures between-study and within-study variance (Borenstein et al., 2007), producing more conservative and reliable estimates (Hedges & Vevea, 1998). Lastly, we adopted the psychometric meta-analysis method, which, unlike Glassian meta-analysis (i.e., descriptive), homogeneity test, and Bare Bones meta-analyses that are based on sampling error, corrects both unsystematic artifacts (sampling error) and systematic artifacts like range restriction, measurement error, and enhancement of dichotomization of measures (Schmidt, 2015). In this regard, we adjusted the effect sizes for reliability to correct for attenuation emanating from random measurement error (Hunter & Schmidt, 2004). ...
Article
Although corporate social responsibility (CSR) has received considerable attention in family firms, empirical findings on the CSR/family firm performance nexus are mixed and inconsistent. This meta‐analytic review aims to clarify the mixed results by establishing the degree to which CSR influences family firm performance and to test the moderating effects of contextual and methodological factors. Integrating a sample of 85 studies published up to May 2023 with 152,265 observations and employing a psychometric meta‐analysis through bivariate and meta‐regression analyses, we find that the average effect of CSR on family firm performance is positive, though small (≤0.20). Our study further reveals that CSR is positively and significantly related to financial performance, innovation, reputation, and sustainability, but the impact on firm sustainability is the largest. Our moderation analysis shows that the relationship between CSR and family firm performance is moderated by contextual factors (i.e., family ownership concentration, firm size, stock exchange listing, culture, and rule of law) and methodological factors (i.e., publication type, data type, performance proxy, and study type). Theoretically, our study appears to be the foremost meta‐analytic review on the CSR/family firm performance relationship, as previous meta‐analyses have focused on the drivers of CSR in family firms. Practically, we demonstrate that family firms can leverage CSR as both a “failure‐prevention” strategy (i.e., survival strategy) and a “success‐inducing” strategy (competitive advantage).
... Grounded in methodological rigor, meta-analysis enhanced robustness, interpretability, sample size, and statistical power to detect effects while mitigating study idiosyncrasies. It also quantified heterogeneity and explored potential moderators (Borenstein et al., 2009;Cooper et al., 2019;Schmidt, 2015). We used standardized coefficients from final multilevel models, allowing appropriate synthesis across studies with similar predictors/outcomes (Becker and Wu, 2007;Peterson and Brown, 2005). ...
Article
Full-text available
The COVID-19 pandemic has underscored the role of group identification in shaping collective health behaviors. Using the novel Pronoun-Influenced Collective Health Model — an integrated framework combining elements from health and social psychology theories — we investigated the relationship between online first-person plural pronoun usage and adherence to COVID-19 preventive measures across the United States. Analyzing weekly Google Trends data on English (Study 1) and Spanish (Study 2) first-person pronoun searches, alongside data on adherence to pandemic precautionary measures from early 2020 to late 2022, we found significant positive associations between relative first-person plural pronoun search volumes and adherence to social distancing, stay-at-home orders, vaccination rates, and proactive disease prevention information seeking. These associations remained robust after adjusting for potential confounding factors. A mini meta-analysis (Study 3) confirmed the consistency of our findings, revealing no significant moderation effects by language context or ecological-socio-cultural factors, suggesting broad generalizability. The implications of this research highlight the potential for tracking online collective language as a valuable indicator of and proxy for societal-level health engagement during crises. This novel digital linguistics approach, synergistically combining applied health and social psychology with big data from digital platforms such as Google, offers powerful tools for monitoring collective health actions across linguistic and cultural boundaries during large-scale health crises.
... Findings from SLR studies are called meta-analyses (Retnawati et al., 2018). The search process for this study was divided into three phases and followed the meta-analysis procedures of Schmidt (2015). First, a reflex is performed, followed by a selection and coding process. ...
Article
Full-text available
As the fastest-growing segment in the global financial industry, it is interesting to see the development of Islamic Banking. Corporate governance is an essential topic in the world of business development. This research uses a meta-analysis approach to test whether corporate governance influences bank performance in Islamic banking. This research begins by looking for what corporate governance indicators are most closely related to improving the performance of Islamic banking. This research was conducted from 2010 to 2022 and collected a database of 199 studies covering 1606 businesses of 25 distinguished published papers from the Scopus index. This research was assisted by Publish or Perish, NVivo, and JASP software. This study found that the most frequently used factors as a measurement of Sharia governance mechanism are Board Independence, Board of Director Size, Frequency of Meetings, Audit Committee, and CEO Duality, while the indicators most often used as a measurement of Islamic banking performance are Islamic banking performance as measured by Return on Assets, Return on Equity and Tobin's Q. The results of meta-analysis data processing show that board independence has a significant and positive effect on the return on assets of Islamic banking. The Islamic Bank in Indonesia suggests optimizing the position of the Board of Independence to improve Islamic banking performance.
... The Z values were then transformed into r using Z ffi ffi ffi We employed the psychometric meta-analysis method, which goes beyond Glassian meta-analysis (that is, descriptive), the homogeneity test, and bare bones meta-analyses (methods based on sampling error). This method corrects both unsystematic artifacts (sampling error) and systematic artifacts such as range restriction, measurement error, and enhancement of dichotomization of measures (Schmidt, 2015). In this way, we adjusted the effect sizes for reliability to correct for attenuation arising from random measurement error (Hunter & Schmidt, 2004). ...
... Schmidt and Hunter [8] recognizing that "the perfect study is a myth" (p. 17) and introduced a psychometric meta-analytical approach ([8], see also [9], for the historical development of this approach). Similar to traditional "classical" meta-analysis, psychometric meta-analysis combines findings from a minimum of two studies focusing on the same subject. ...
Article
Full-text available
Teachers’ judgment accuracy is a core competency in their daily business. Due to its importance, several meta-analyses have estimated how accurately teachers judge students’ academic achievements by measuring teachers’ judgment accuracy (i.e., the correlation between teachers’ judgments of students’ academic abilities and students’ scores on achievement tests). In our study, we considered previous meta-analyses and updated these databases and the analytic combination of data using a psychometric meta-analysis to explain variations in results across studies. Our results demonstrate the importance of considering aggregation and publication bias as well as correcting for the most important artifacts (e.g., sampling and measurement error), but also that most studies fail to report the data needed for conducting a meta-analysis according to current best practices. We find that previous reviews have underestimated teachers’ judgment accuracy and overestimated the variance in estimates of teachers’ judgment accuracy across studies because at least 10% of this variance may be associated with common artifacts. We conclude that ignoring artifacts, as in classical meta-analysis, may lead one to erroneously conclude that moderator variables, instead of artifacts, explain any variation. We describe how online data repositories could improve the scientific process and the potential for using psychometric meta-analysis to synthesize results and assess replicability.
Article
The economic valuation of sustainable policies is necessary for undertaking socially optimal decisions that reduce the adverse impacts of tourism across society and the environment. This paper focuses on the economic valuation of sustainable tourism policies based on the evidence found in the literature of discrete choice experiments (DCE). The methodology involves both a systematic critical literature review and a meta-regression analysis of tourists’ preferences and willingness to pay (WTP) for sustainable policy attributes. The findings show that tourists value policies that contribute to sustainability. However, preferences are higher for those features related to nature, the environment, and local communities and lower for those policies concerning sustainable accommodation, infrastructures, and facilities. The results show critical gaps both in terms of the available evidence of sustainable policies and the methodological pitfalls across study results. Based on this assessment, a guiding protocol is proposed for future research.
Article
Gamification has become a widely applied technique in the digital platform sector. Despite prior research exploring gamification in various contexts from different angles, an integrated empirical study has yet to draw cohesive conclusions from these findings. This study, utilizing data from 34 papers (N = 35,856), has developed a meta-analytic framework comprised of 17 paths. Through this framework, we have identified immersion, achievement, and social as core gamification affordance constructs, as well as functional value, emotional value, and social value as perceived value constructs, and we have also designated user behavior as the outcome, utilizing the stimulus-organism-response (SOR) framework. The research results indicate that emotional value has a profound effect on behavior, with context, platform, and country moderating to the gamification mechanism. This study has significant implications for the further advancement of gamification in the digital platform.
Article
Full-text available
This study conducts a meta-analysis on the day-of-the-week effect to shed more light on the replication crisis of this stock market anomaly. The findings confirm that Mondays and Tuesdays provide, on average, lower daily returns. In addition, Wednesdays and Fridays indicate higher returns, with an unexpectedly strong middle-of-the-week effect on Wednesdays. The study highlights the influence of study design on these findings and notes a more substantial effect in the 1980s and 1990s. While differences in empirical methods do not impact the anomaly, index choices affect findings on day-dependent returns. The real estate sector especially stands out with a stronger day-of-the-week effect. However, geographic differences are mostly insignificant except for Oceania. Cultural differences demonstrate a weak but significant effect on abnormal daily returns. From a meta-perspective, outliers remain an essential driver for this stock market anomaly, indicating that study design is not the only factor driving the replication crisis.
Article
Full-text available
Meta-analysis is a way of synthesizing previous research on a subject in order to assess what has already been learned, and even to derive new conclusions from the mass of already researched data. In the opinion of many social scientists, it offers hope for a truly cumulative social scientific knowledge.
Article
Full-text available
This meta-analytic review presents the findings of a project investigating the validity of the employment interview. Analyses are based on 245 coefficients derived from 86,311 individuals. Results show that interview validity depends on the content of the interview (situational, job related, or psychological), how the interview is conducted (structured vs. unstructured; board vs. individual), and the nature of the criterion (job performance, training performance, and tenure; research or administrative ratings). Situational interviews had higher validity than did job-related interviews, which, in turn, had higher validity than did psychologically based interviews. Structured interviews were found to have higher validity than unstructured interviews. Interviews showed similar validity for job performance and training performance criteria, but validity for the tenure criteria was lower.
Article
Full-text available
Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use meta-analysis to show that reliance on significance testing retards the development of cumulative knowledge. But reform of teaching and practice will also require that researchers learn that the benefits that they believe flow from use of significance testing are illusory. Teachers must revamp their courses to bring students to understand that (a) reliance on significance testing retards the growth of cumulative research knowledge; (b) benefits widely believed to flow from significance testing do not in fact exist; and (c) significance testing methods must be replaced with point estimates and confidence intervals in individual studies and with meta-analyses in the integration of multiple studies. This reform is essential to the future progress of cumulative knowledge in psychological research.
Article
This chapter provides a nonstatistical way of summarizing many of the main points in the preceding chapters. In particular, it takes the major assumptions outlined and translates them from formal statistical notation into ordinary English. The emphasis is on expressing specific violations of formal meta-analytic assumptions as concretely labeled threats to valid inference. This explicitly integrates statistical approaches to meta-analysis with a falsificationist framework that stresses how secure knowledge depends on ruling out alternative interpretations. Thus, we aim to refocus readers' attention on the major rationales for research synthesis and the kinds of knowledge meta-analysts seek to achieve. The special promise of meta-analysis is to foster empirical knowledge about general associations, especially causal ones, that is more secure than what other methods typically warrant. In our view, no rationale for meta-analysis is more important than its ability to identify the realm of application of a knowledge claim - that is, identifying whether the association holds with specific populations of persons, settings, times and ways of varying the cause or measuring the effect; holds across different populations of people, settings, times, and ways of operationalizing a cause and effect; and can even be extrapolated to other populations of people, settings, times, causes, and effects than those studied to date. These are all generalization tasks that researchers face, perhaps no one more explicitly than the meta-analyst. It is easy to justify why we translate violated statistical assumptions into threats to validity, particularly threats to the validity of conclusions regarding the generality of an association. The past twenty-five years of meta-analytic practice have amply demonstrated that primary studies rarely present a census or even a random sample of the populations, universes, categories, classes, or entities (terms we use interchangeably) about which generalizations are sought. The salient exception is when random sampling occurs from some clearly designated universe, a procedure that does warrant valid generalization to the population from which the sample was drawn, usually a human population in the social sciences. But most surveys take place in decidedly restricted settings (a living room, for instance) and at a single time, and the relevant cause and effect constructs are measured without randomly selecting items. Moreover, many people are not interested in the population a particular random sample represents, but ask instead whether that same association hold with a different kind of person, in a different setting, at a different time, or with a different cause or effect. These questions concern generalization as extrapolation rather than representation (Cook 1990). How can we extrapolate from studied populations to populations with many, few, or even no overlapping attributes? The sad reality is that the general inferences meta-analysis seeks to provide cannot depend on formal sampling theory alone. Other warrants are also needed. This chapter assumes that ruling out threats to validity can serve as one such warrant. Doing so is not as simple or as elegant as sampling with known probability from a well-designated universe, but it is more flexible and has been used with success to justify how manipulations or measures are chosen to represent cause and effect constructs (that is, construct validity). If meta-analysis is to deal with generalization understood as both representation and extrapolation, we need ways of using a particular database to justify reasonable conclusions about what the available samples represent and how they can be used to extrapolate to other kinds of persons, settings, times, causes, and effects. This chapter is not the first to propose that a framework of validity threats allows us to probe the validity of research inferences when a fundamental statistical assumption has been violated. Donald Campbell introduced his internal validity threats for instances when primary studies lack random assignment, creating quasi-experimental design as a legitimate extension of the thinking R. A. Fisher had begun (Campbell 1957; Campbell and Stanley 1963). Similarly, this chapter seeks to identify threats to valid inferences about generalization that arise in metaanalyses, particularly those that follow from infrequent random sampling. Of course, Donald Campbell and Julian Stanley also had a list of threats to external validity, and these also have to do with generalization (1963). But their list was far from complete and was developed more with primary studies in mind than with research syntheses. The question this chapter asks is how one can proceed to justify claims about the generality of an association when the within-study selection of persons, settings, times, and measures is almost never random and when it is also not even reasonable to assume that the available sample of studies is itself unbiased. This chapter proposes a threats-to-validity approach rooted in a theory of construct validity as one way to throw provisional light on how to justify general inferences.