Content uploaded by Frank L. Schmidt
Author content
All content in this area was uploaded by Frank L. Schmidt on Apr 27, 2018
Content may be subject to copyright.
History and development of the
Schmidt–Hunter meta-analysis methods
Frank L. Schmidt*
In this article, I provide answers to the questions posed by Will Shadish about the history and development
of the Schmidt–Hunter methods of meta-analysis. In the 1970s, I headed a research program on personnel
selection at the US Office of Personnel Management (OPM). After our research showed that validity studies
have low statistical power, OPM felt a need for a better way to demonstrate test validity, especially in light
of court cases challenging selection methods. In response, we created our method of meta-analysis
(initially called validity generalization). Results showed that most of the variability of validity estimates
from study to study was because of sampling error and other research artifacts such as variations in range
restriction and measurement error. Corrections for these artifacts in our research and in replications by
others showed that the predictive validity of most tests was high and generalizable. This conclusion
challenged long-standing beliefs and so provoked resistance, which over time was overcome. The 1982
book that we published extending these methods to research areas beyond personnel selection was
positively received and was followed by expanded books in 1990, 2004, and 2014. Today, these methods
are being applied in a wide variety of areas. Copyright © 2015 John Wiley & Sons, Ltd.
Keywords: meta-analysis; measurement error; history of meta-analysis; sampling error
Question 1, WS: How did the idea first occur to you? Did you think of it as meta-analysis (because Gene Glass had
coined the term)? Did you think of it under some other rubric?
Answer, FS: That is a long story. After obtaining my industrial–organizational psychology (I-O) PhD from Purdue
in 1970, my first job was in the Michigan State University psychology department, an excellent and stimulating
department. However, my degree was in an applied area, and I felt a little like a fraud because I was teaching what
I had little experience applying myself. I obtained tenure early but still left in 1974 for the US Office of Personnel
Management (OPM) in Washington, hoping to obtain real-world experience. OPM is responsible for the methods
used to hire people for the Federal workforce, and my training was in personnel selection, psychological
measurement, and statistics. This was a time of turmoil in personnel selection because court challenges to hiring
procedures under the 1964 Civil Rights were at an all time high (One OPM hiring test went all the way to the
Supreme Court, where it was upheld.). As a result, it was very important for OPM to have research evidence to
defend its selection tests in court. I was asked to conduct and direct such research.
Legal defense of hiring tests consisted mostly of criterion-related validity studies (Criterion-related validity is
the correlation between scores on a hiring procedure and later performance on the job.). One of the first
publications from our OPM research effort showed that the average statistical power of such studies was only
about .50 (Schmidt et al., 1976). This meant that it was very risky for an employer to conduct such a study because
if the validity was not statistically significant, the employer’s own study could be used against him or her in court.
Much larger sample sizes were needed to overcome this problem, and other methods of showing test validity (e.
g., content validity) were needed.
Since about 1920, the belief in I-O psychology had been that test validity was situationally specific, meaning
that separate validity studies had to be conducted in each organization and for each job. Two pieces of evidence
supported this belief: the fact that significant validity was found in only about half of all studies and the fact that
the actual magnitude of validity estimates (regardless of significance levels) varied widely across studies—even
when the jobs in question appeared to be similar or identical. Our statistical power article showed why the first
Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City, IA 52242, USA
*Correspondence to: Frank Schmidt, Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City,
IA 52242 USA.
E-mail: frank-schmidt@uiowa.edu
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
Special Issue Paper
Received 27 October 2014, Revised 10 November 2014, Accepted 20 November 2014 Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1134
1
of these was not valid evidence for the situational specificity theory. However, we still had to address the second
evidential basis for the theory. This challenge led to the development of our meta-analysis methods.
In addition to my OPM job, I had an appointment at George Washington University (GWU). One day in 1975,
sitting in my GWU office and pondering several highly variable distributions of validity coefficients, each from a
different job, I remembered an experience that I had had in graduate school. As a PhD student at Purdue, I took
courses from Joseph Tiffin and Hubert Brogden. Tiffin had performed, and was then doing, validity studies in
private industry, and he reported in class that in his experience, validity findings were highly variable and seemed
to depend on peculiarities of the local situation. This was also the position taken in textbooks and articles.
Brogden had been the Technical Director of the US Army Research Institute (ARI). The validity results that he
reported in class were quite stable and were very similar from study to study. So, I asked Brogden why there
was this difference. He said that the answer was sampling error. The army had very large samples (often 10,000
or more), so there was little sampling error in the results, while in industry, the Ns were small—often 50, 70, or
100—and there was a lot of sampling error.
I completely forgot about that until I faced the problem of the “bouncing validity”at OPM. I had been
indoctrinated with the idea of situational specificity of validity, but when I was looking at all the variability in
validity, I remembered Brogden’s comment and decided to see if I could figure out a way to quantify the amount
of sampling error in a distribution of validity—that is, a way to calculate the amount of variance expected from
sampling errors and then subtract that amount from the observed variance of the validity coefficients. I found that
most of the variance—70 to 80%—was sampling error. After subtracting out sampling error variance and applying
the usual corrections for measurement error in the job performance measures and for range restriction on test
scores, the typical distribution of validity had a substantial mean of about .50 or so and an SD of around .10 to
.15. This meant that virtually all of the values were in the positive range, and therefore, it demonstrated that
validity values were generalizable across situations. When I called Jack Hunter about my results, he was positive
and enthusiastic, which was reassuring to me because I was not sure that I had not made some mistake. Together,
we then began refining these procedures into general meta-analysis methods to be used in personnel selection. In
late 1975, we wrote this workup but did not immediately submit it for publication because we wanted to enter it
into a research contest sponsored by the I-O psychology division of American Psychological Association (APA,
Division 14). One requirement was that entries could not already be published. We won that award, but as a result,
our first meta-analysis article was not published until 1977 (Schmidt and Hunter, 1977). Gene Glass published his
first article on meta-analysis in 1976. We were not aware of Glass’work at that time, but we later realized that had
we not delayed publication for the research award, we could have tied Glass for the first published application of
meta-analysis.
At about that time, I was asked to be the OPM liaison to a National Research Council committee on selection
testing issues. Lee J. Cronbach was on the committee, and when I showed him our results, he told us about the
work of Gene Glass. He also stated that we should expand application of our meta-analysis methods beyond
personnel selection to psychological literatures in general, and to that end, we should write a book describing
our methods. That book was published in 1982 (Hunter et al., 1982). At this time, we also adopted Glass’term
“meta-analysis”to describe these procedures; we had previously used only the term “validity generalization.”
However, again, Glass got there first. The Glass, McGaw, and Smith meta-analysis book came out in 1981 (As an
aside, a couple of years later, we had Barry McGaw, the middle author, give a talk at OPM. I remember that Barry
made the point that he “had come between Gene Glass and his wife.”At that time, the third author, Mary Lee
Smith, was Glass’wife.).
Question 2, WS: Did you think it would take on such a major role in science from the start or was it a surprise to
you how quickly it developed?
Answer, FS: We initially thought of our meta-analysis methods as just a solution to the problem of validity
generalization. However, early on, Lee J. Cronbach told us that we should be thinking of wider applications,
and we realized that the methods had many potential applications in other areas of I-O psychology and in many
areas of general psychology. However, we did not at that time realize how widely the methods would eventually
be applied or that they would come to be used in areas beyond psychology. However, by the mid 1980s, this had
become apparent. The methods were being applied widely in a variety of different areas in I-O psychology,
management research, human resources, marketing, information sciences, and other areas. An overview of the
impact in I-O psychology up to 2010 is presented in DeGeest and Schmidt (2011). Ironically, the one area in which
there was strong resistance was in personnel selection, the area in which the method had originated. It appeared
that the method was not only accepted but also enthusiastically embraced in research on all relationships except
the relationship between hiring procedures and job performance. I discuss this anomaly further later.
Question 3, WS: What were the obstacles that you encountered?
Answer, FS: The one big obstacle was resistance in the area of personnel selection, both from practitioners and
from the government agencies responsible for administration of the 1964 Civil Rights Act. Many I-O practitioners
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
2
made a good living conducting validity studies for employers. The implication of our meta-analytic validity
generalization studies was that these expensive situationally specific validity studies were not necessary. Our
findings had shown that validity could safely be generalized without such studies. The so-called enforcement
agencies—the Equal Employment Opportunity Commission, the Office of Federal Contract Compliance, and the
Civil Rights Division of the Department of Justice—also resisted these new findings, apparently on grounds that
they made it too easy for employers to demonstrate validity for selection methods on which minorities tended
to do less well than others (despite the large body of research showing that these measures predict performance
just as accurately for minorities as for others). To this day, these agencies have refused to revise and update the
government Uniform Guidelines on Employee Selection Procedures (UGLs) that were issued in 1978, before the
newer research findings were available, despite appeals from the Society for Industrial and Organizational
Psychology (SIOP) and other groups stating that revision and updating were sorely needed. The UGLs are not only
out of date but also inconsistent with current professional standards. This is a much longer term example of what
we have today with the global warming deniers and evolution deniers: If you do not like the research evidence,
just pretend that it does not exist. Some I-O practitioners have been happy with the agencies’position because
it meant that they could continue to get paid for conducting unneeded validity studies.
Although the UGLs have not been updated, professional standards have been. The Standards for Psychological
Testing, published by the American Educational Research Association (AERA), the APA, and the National Council on
Measurement in Education, incorporated new meta-analytic findings on validity starting in the 1990s and
continuing in the most recent edition (AERA et al., 2014). The same is true for the Principles for Employee Selection,
published by SIOP (Society for Industrial and Organizational Psychology, 2003). There had initially been some
resistance from professional societies to meta-analytic findings, but this changed as the supporting findings
increased. In addition to the cumulative impact of the increasing number of meta-analytic validity studies being
published, three other factors contributed to eventual acceptance. First, the illogic of a situation in which our
meta-analysis methods were embraced and lauded for application in all areas of research except the one area of
personnel selection became apparent. People saw that this did not make sense. The second big factor was the
publication of the article “Forty Questions about Validity Generalization and Meta-Analysis”(Schmidt et al.,
1985). Jack Hunter and I, along with Hannah Hirsh (now Rothstein), Kenneth Pearlman, Jim Caplan, Michael
McDaniel, and others at OPM who were involved in this work, had collected a long list of criticisms of meta-
analysis that we had heard expressed and had written responses to each [As an example, one criticism contended
that sampling error was only an hypothesis, not a fact (!).]. Milton Hakel, then the editor of Personnel Psychology,a
top-tier journal, invited us to publish these 40 questions and answers. He also invited a group of eminent
psychologists to respond to our answers, and we to their responses. The result was the longest article that has
ever appeared in that journal—over 100 pages. This article apparently successfully addressed most or all the
reservations that people had about validity generalization and meta-analysis because the level of opposition
dropped dramatically after that.
The last turnaround factor occurred in 1994, when the American Psychological Association awarded Jack
Hunter and me the Distinguished Scientific Contribution Award for our meta-analytic work. This is probably the
most prestigious research award in psychology (and far outweighs the award that we obtained from SIOP for this
work the following year). After that, there was really no significant opposition. This process of acceptance took
nearly 20 years.
Question 4, WS: What did you see as the significant events or ideas that shaped your work in meta-analysis?
Answer, FS: Many of the events and ideas that shaped our work in meta-analysis are described in answers to
previous questions. However, there are a couple of additional ones. The first shaping event that I would list here
was the fact that Jack Hunter and I were trained in measurement. Many people who became involved in meta-
analysis were PhDs in statistics but had no training in measurement. Jack Hunter and I were psychological
researchers and so were trained not only in statistics but also in psychological measurement. The principles
and methods of measurement play a critical role in psychological research (actually in all research). It is critical
to understand measurement error and other research artifacts such as range restriction and dichotomization of
continuous measures and to be able to correct for the (often large) biases that they create. In personnel selection,
the area in which we started our work, it was accepted that in primary studies, one should correct observed
validity values for measurement error in the job performance measure and for range restriction on the test or
other selection method. Moreover, it was apparent that in theoretical research, correction should also be made
for measurement error in the independent variable measure because in that type of research, it is construct-level
relationships that are of scientific interest [as noted by Rubin (1990)]. We incorporated these corrections for bias
into our meta-analysis methods, improving and refining them over time and evaluating the accuracy of each
change via computer simulation studies. We evolved two general meta-analytic approaches: methods for
correcting each correlation or dvalue individually prior to meta-analysis and methods for making these
corrections in meta-analysis based on distributions of artifact values (reliability values and range restriction ratios).
These methods are described in our various meta-analysis books, most recently in Schmidt and Hunter (2014). A
little after our initial work, similar methods were developed by Callender and Osburn (1980) and Raju and his
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
3
associates (Raju and Burke, 1983; Raju et al., 1991). These methods produced results virtually identical to our
methods, which helped to buttress our findings and conclusions.
Measurement error is in many ways the most important artifact because it is the only one (beyond sampling
error) that is always present in all data. My colleagues and I believe that the arc in research methods always bends
in the direction of increased accuracy. For this reason, we believe that other approaches to meta-analysis will
eventually come to include corrections for the biasing effects of measurement error (and perhaps other research
artifacts too). Hedges (2009) and Matt and Cook (2009) have acknowledged the need to correct for measurement
error.
The second shaping influence here was the decision to use a subtractive model of meta-analysis. We faced the
need in personnel selection to estimate the distribution of true validity population parameters. This led to a model
in which variance as a result of sampling error and other artifacts is subtracted from the observed variance of
correlations or dvalues, leaving an estimate of the variability of the population parameters, which was required
for validity generalization purposes. One consequence of this was that all of our methods were random-effects
models from the beginning. We never had any fixed-effects (FE) models. This turned out to be important because
the FE models are now rejected as unrealistic by most meta-analysts because of their assumption that all studies
in the meta-analysis are estimating exactly the same population parameter (cf., Schmidt et al., 2009).
Question 5, WS: Do you think you got anything wrong? What would you do differently now with 20–20
hindsight? What would you keep the same?
Answer, FS: There were at least a couple of things that we did wrong. First, we had an error in our first
published application of meta-analysis in 1977. John Callender and Bart Osburn informed us that their simulation
studies showed that there was an error, but they did not know what was causing it. This error led to
underestimation of the SD of the population parameters (i.e., true validity values). We found the cause of this error
and corrected it in a subsequent application of our meta-analysis method (Schmidt et al., 1979). When we
corrected the means of the validity distributions for measurement error and range restriction, those corrections
increased the SDs. We had previously neglected to increase the SDs appropriately.
The other thing that we did wrong was underestimating the potential seriousness of publication bias and other
forms of unrepresentativeness in study sets. This occurred because we were able to show by a variety of analyses
that there was strong evidence that this was not a problem in the personnel selection literature (Hunter and
Schmidt, 1990, chapter 4, 2004, chapter 4). The error was the assumption that this was probably also the case
in most or all other areas. There is now a lot of evidence that publication bias and other availability issues are a
serious problem in some areas, especially in biomedical areas and in social psychology lab experiments. In the
most recent edition of our meta-analysis book, these problems are explored in detail (Schmidt and Hunter,
2014, chapter 13).
Some have argued that we made a mistake in not more strongly emphasizing the detection of moderators in
meta-analysis. In validity generalization research, one need only show that all or nearly all validity values are above
some practically useful value. Variation that exists above this point does not have to be addressed for purposes of
practical application. Moreover, in any event, that variation is usually too small to be consistent with moderators of
any size. However, for applications in other research areas, we emphasized moderator detection via subgrouping
of studies. The other alternative is meta-regression, which is widely abused in published meta-analyses today.
Statistical power is typically quite low, and capitalization on sampling error is substantial (especially that as a result
ex post facto selection of moderator candidates). The result is often the deadly combination of a high Type I error
rate along with a high Type II error rate. Five additional problems in the use of meta-regression are discussed in
the current edition of our meta-analysis book (Schmidt and Hunter, 2014, chapter 9). The accurate detection of
moderators in primary research and in meta-analysis is much more difficult than many have been led to believe
(Schmidt and Hunter, 1978).
Question 6, WS: How do you view meta-analysis today from your vantage point in history?
Answer, FS: Looking back today, I view meta-analysis as an inevitable development. By the 1970s, research
literatures across almost all areas were becoming unmanageable in size. Frustration with the task of making sense
of large conflicting literatures was growing. The time was just ripe; a new tool was needed. I think that this
explains why meta-analysis was invented independently several times in different areas: by Chalmers in
biomedical research in the USA, by Peto in biomedical research in the UK, by Rosenthal in social psychology, by
Glass in education, and by Jack Hunter and me in I-O psychology. Also, at about this same time, Hedges and Olkin
published major advances in meta-analytic methods. There are other examples of nearly simultaneous
developments in science—one example being the independent invention of calculus by both Newton and Leibniz
at about the same time.
I think it was clear that by the 1970s, the alternative to meta-analysis was epistemological despair—not an
attractive alternative given the scientific ideal of cumulative knowledge. Moreover, such despair was becoming
common and even mainstream. Even as eminent a methodologist as Lee J. Cronbach appears to have succumbed
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
4
to the temptation of epistemological despair. Cronbach (1975) argued that social and psychological reality
changes so fast and is so ephemeral that cumulative scientific knowledge is impossible in psychology and the
social sciences. In social psychology, Gergen (1982) was a well-known proponent of epistemological despair.
Meta-analysis lifted this veil of despair. It showed that cumulative knowledge was in fact possible.
Question 7, WS: What colleagues or events influenced you to elevate the field of meta-analysis? How did those
colleagues or events influence you?
Answer, FS: As I indicated earlier, the encouragement from Lee J. Cronbach was important. However, we also
had support from other equally eminent psychologists. Anne Anastasi praised our methods and findings and
featured them in the successive editions of her widely used textbook, Psychological Testing. Lloyd Humphries at
the University of Illinois was very supportive, as were Paul Meehl, Marvin Dunnette, and Tom Bouchard at the
University of Minnesota. I believe that these eminent figures were instrumental in the APA Distinguished Scientific
Contributions award that Jack Hunter and I received in 1994. I guess you could say that we were supported from
the commanding heights by some of the top brass in psychology, while at the same time being criticized and
rejected by many of the troops on the ground. Many personnel specialists and I-O practitioners were hostile to
our work because it refuted the 80-year-old belief in situational specificity of test validity and obviated the need
for the kinds of validity studies that they conducted.
Our meta-analysis methods also received great support from our colleagues at OPM, who promoted the
methods and who conducted and published many studies using these methods. These include Hannah Rothstein
(then Hannah Hirsh), Michael McDaniel, Kenneth Pearlman, Marvin Trattner, Lois Northrup, Ilene Gast, Murray
Mack, Deborah Whetzel, Guy Shane, Brian Stern at the ARI, and others. Later, at the University of Iowa, this list
included Deniz Ones, Vish Viswesvaran, Kenneth Law, Crissie Fry, Michael Judiesch, Kuh Yoon, Huy Le, Kevin
Carlson, Marc Orlitzky, In-Sue Oh, Jon Shaffer, Ben Postlethwaite, and others. The death of Jack Hunter in 2002
was a terrible loss. As befitting his eminent contributions, his obituary was published in The American Psychologist
(Schmidt, 2003).
Question 8, WS: How has your teaching of meta-analysis evolved over the years?
What would you like to say about your former students and their contributions to the field of meta-analysis?
Answer, FS: One change in my teaching over time was the increase in emphasis that I placed on applications of
meta-analysis in diverse areas—beyond just personnel selection. In fact, these other areas came to dominate my
teaching. This extended into meta-analysis of experimental studies. One reason for this change was the greater
acceptance of the methods in those other areas. Another was the fact that the PhD students in my meta-analysis
course came from a wide variety of areas—marketing, clinical psychology, engineering, sociology, human
resources, organizational behavior, nursing, and education. Almost all of these students submitted their required
class meta-analyses for publication, and almost all of these were published. This was also true of those students
whose dissertations were based on meta-analysis. Another change in my teaching is that the range of
methodological topics that I covered in my PhD meta-analysis course increased. One example is the introduction
of coverage of second-order sampling error. Meta-analysis greatly reduces sampling error in comparison to
individual studies but does not completely eliminate it. The remaining sampling error is second-order sampling
error. This emphasis eventually led to a new method of conducting second-order meta-analysis (Schmidt and
Oh, 2013), making possible the meta-analysis of meta-analyses.
The answer to the second part of this question can be found in my responses to questions 7 and 9.
Question 9, WS: What is your favorite (or among your favorites) application of meta-analysis and why?
Answer, FS: This is a difficult question because there have been so many excellent applications of our meta-
analysis methods. I will just select some of the larger ones that have had a big impact. The first is the large
meta-analysis that Jack Hunter performed for the US Employment Service in the US Department of Labor (Hunter,
1983). This included a huge database (515 studies) on the validity of the general intelligence measure of General
Aptitude Test Battery, and it demonstrated generalizable validity essentially for all jobs in the US economy, with
the magnitude of validity depending on the complexity of the job family. The main findings were later presented
in an article in the Psychological Bulletin (Hunter and Hunter, 1984).
Another is the Pearlman et al. (1980) meta-analysis of selection test validity values for clerical workers. This
study included over 650 validity studies focusing on a variety of test types and spanning a time period from
1920 to 1979. The largest meta-analysis in this study included 882 validity coefficients. This study is still a widely
cited classic. A recently published meta-analysis of studies conducted after the 1979 cutoff of Pearlman et al.
found essentially identical results (Whetzel et al., 2011), indicating stability of validity values over a period of
90 years despite the many changes in clerical duties and tasks that had occurred.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
5
Still another is the McDaniel et al. (1994) meta-analysis of employment interview validity values, also a citation
classic. Next, there is the Ones et al. (1993) meta-analysis of the validity of integrity tests used in hiring. This study
is probably the largest of all in terms of the sheer amount of data that it included. It has also been cited many
times.
Then, there is the Orlitzky et al. (2003) meta-analysis of the relation between corporate social responsibility and
corporate financial outcomes. This one not only is a citation classic but also has been reprinted in three books, and
it won the 2004 Moskowitz Award for Finance Research.
Finally, there is the Harter et al. (2002) meta-analysis of the relation between average level of employee job
engagement and the business outcomes of revenue, profit, customer satisfaction, and low employee turnover.
This one has also been highly cited.
All of these studies required a great deal of effort over an extended period of time. Meta-analysis does not
provide an easy road to fame and fortune.
Question 10, WS: What is your favorite (or among your favorites) methodology in meta-analysis and why?
Answer, FS: As you might expect, my favorite methodologies are the methods presented in the four books on
meta-analysis that I coauthored: Hunter et al. (1982), Hunter and Schmidt (1990, 2004), and Schmidt and Hunter
(2014). Jack Hunter took the lead in the 1982 and 1990 books; they could not have been written without his
contributions. A major advantage and unique feature of the methods in these books are that they simultaneously
take into account both sampling error and measurement error, the two distorting factors that are present in all
data sets and in all studies. They also allow for the biasing effects of other research artifacts such as range
restriction and dichotomization of continuous measures to be corrected when they are present. Other approaches
to meta-analysis do not do this.
Question 11, WS: Did you see your work on meta-analysis mainly as a statistical exercise or mainly as a review of
the evidence?
Answer, FS: Jack Hunter and I (and our colleagues too) did not see meta-analysis as a mere statistical exercise.
We saw it as a path to improved epistemology in research—a successful and superior way to attain cumulative
knowledge and establish general scientific principles in spite of the variability in individual study findings and
in spite of the confusion created by the nearly universal reliance on statistical significance testing in the analysis
of research data. This was a continuation of the view that inspired our earlier pre-meta-analytic work.
The basic question was always what do data really mean and how can we extract reliable knowledge from data
(cf., Schmidt, 1992, 2010). This included our work promoting the use of confidence intervals over significance tests
(Schmidt, 1996; Schmidt and Hunter, 1997), the detection and calibration of moderator variables (Schmidt and
Hunter, 1978), the problem of instability of regression weights in data analysis (Schmidt, 1971, 1972), the chance
frequency of racial differences in test validity (Schmidt et al., 1973; Hunter et al., 1979), and other work in this vein.
So, our view of meta-analysis was that it was a continuation of this epistemological quest.
Question 12, WS: Can you reflect on the relative roles of the broader field of systematic review versus meta-
analysis now compared with when you started?
Answer, FS: This distinction appears to be a matter of terminology. Some use the term meta-analysis to
designate only the quantitative procedures of data analysis in meta-analysis, and they view the term systematic
review as a broader term that includes the search for studies, the coding of studies, and the interpretation and
presentation of the results. However, in my field, the term meta-analysis includes all of these things, not just
the quantitative data analysis procedures. My impression is that this is the usage favored today by most people
concerned with meta-analysis. It is possible that the term systematic review originated in biomedical research,
and not in other areas that make use of meta-analysis. It is my impression that in the biomedical area, there
existed a concept of systemic review before meta-analysis developed, and when meta-analysis methods came
along, they were added as the quantitative component of these systematic reviews.
References
American Educational Research Association, American Psychological Association, National Council on
Measurement in Education. 2014. Standards for educational and psychological testing. Author: Washington, DC.
Callender JC, Osburn HG. 1980. Development and test of a new model for validity generalization. Journal of Applied
Psychology 65: 543–558.
Cronbach LJ. 1975. Beyond the two disciplines of scientific psychology revisited. American Psychologist 30:
116–127.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
6
DeGeest D, Schmidt FL. 2011. The impact of research synthesis methods on Industrial/Organizational Psychology:
The road from pessimism to optimism about cumulative knowledge. Research Synthesis Methods 1: 185–197.
Gergen KJ. 1982. Toward transformation in social knowledge. Springer-Verlag: New York.
Glass GV, McGaw B, Smith ML. 1981. Meta-analysis in social research. Sage: Beverly Hills, CA.
Harter JK, Schmidt FL, Hayes TL. 2002. Business unit level relationships between employee satisfaction/
engagement and business outcomes: A meta-analysis. Journal of Applied Psychology 87: 268–279.
Hedges LV. 2009. Statistical considerations. In Cooper H, Hedges LV, Valentine JC (eds). Handbook of research
synthesis and meta-analysis, 2nd edn. Russell Sage: New York, 37–48.
Hunter JE. 1983. Test validation for 12,000 jobs: An application of job classification and validity generalization to
the General Aptitude Test Battery (GATB). Test Research Report No. 45, U.S. Department of Labor, U.S.
Employment Service, Washington, DC.
Hunter JE, Hunter RF. 1984. Validity and utility of alternative predictors of job performance. Psychological Bulletin
96:72–98.
Hunter JE, Schmidt FL. 1990. Methods of meta-analysis: Correcting error and bias in research findings. Sage:
Thousand Oaks, CA.
Hunter JE, Schmidt FL. 2004. Methods of meta-analysis: Correcting error and bias in research findings, 2nd edn.
Sage: Thousand Oaks, CA.
Hunter JE, Schmidt FL, Hunter RF. 1979. Differential validity of employment tests by race: A comprehensive review
and analysis. Psychological Bulletin 86: 721–735.
Hunter JE, Schmidt FL, Jackson GB. 1982. Meta-analysis: Cumulating research findings across studies. Sage: Beverly
Hills, CA.
Matt GE, Cook TD. 2009. Threats to the validity of generalized inferences. In Cooper H, Hedges LV, Valentine JC
(eds). Handbook of research synthesis and meta-analysis, 2nd edn. Russell Sage: New York, 537–560.
McDaniel MA, Whetzel DL, Schmidt FL, Mauer S. 1994. The validity of employment interviews: A comprehensive
review and meta-analysis. Journal of Applied Psychology 79: 599–616.
Ones DS, Viswesvaran C, Schmidt FL. 1993. Comprehensive meta-analysis of integrity test validities: Findings and
implications for personnel selection and theories of job performance. Journal of Applied Psychology Monograph
78: 679–703.
Orlitzky M, Schmidt FL, Rynes SL. 2003. Corporate social and financial performance: A meta-analysis.
Organizational Studies 24: 403–441.
Pearlman K, Schmidt FL, Hunter JE. 1980. Validity generalization results for tests used to predict job proficiency
and training criteria in clerical occupations. Journal of Applied Psychology 65: 373–407.
Raju NS, Burke MJ. 1983. Two new procedures for studying validity generalization. Journal of Applied Psychology
68: 382–395.
Raju NS, Burke MJ, Normand J, Langlois GM. 1991. A new meta-analysis approach. Journal of Applied Psychology 76:
432–446.
Rubin D. 1990. A new perspective on meta-analysis. In Wachter KW, Straf ML (eds). The future of meta-analysis.
Russell Sage: New York, 155–166).
Schmidt FL. 1971. The relative efficiency of regression and simple unit predictor weights in applied differential
psychology. Educational and Psychological Measurement 31: 699–714.
Schmidt FL. 1972. The reliability of differences between linear regression weights in applied differential
psychology. Educational and Psychological Measurement 32: 879–886.
Schmidt FL. 1992. What do data really mean? Research findings, meta-analysis, and cumulative knowledge in
psychology. American Psychologist 47: 1173–1181.
Schmidt FL. 1996. Statistical significance testing and cumulative knowledge in psychology: Implications for the
training of researchers. Psychological Methods 1: 115–129.
Schmidt FL. 2003. John E. Hunter, 1939 –2002. American Psychologist 58: 238.
Schmidt FL. 2010. Detecting and correcting the lies that data tell. Perspectives on Psychological Science 5: 233–242.
Schmidt FL, Hunter JE. 1977. Development of a general solution to the problem of validity generalization. Journal
of Applied Psychology 62: 529–540.
Schmidt FL, Hunter JE. 1978. Moderator research and the law of small numbers. Personnel Psychology 31: 215–232.
Schmidt FL, Hunter JE. 1997. Eight common but false objections to the discontinuation of significance testing in
the analysis of research data. In Harlow L, Muliak S, Steiger J (eds). What if there were no significance tests?
Lawrence Erlbaum: Mahwah, NJ, 37–64.
Schmidt FL, Hunter JE. 2014. Methods of meta-analysis: Correcting error and bias in research findings, 3rd edn.
Sage: Thousand Oaks, CA.
Schmidt FL, Oh I-S. 2013. Methods for second order meta-analysis and illustrative applications. Organizational
Behavior and Human Decision Making 121: 204–218.
Schmidt FL, Berner JG, Hunter JE. 1973. Racial differences in validity of employment tests: Reality or illusion?
Journal of Applied Psychology 58:5–9.
Schmidt FL, Hunter JE, Pearlman K, Hirsh HR. 1985. Forty questions about validity generalization and meta-analysis.
Personnel Psychology 38: 697–798.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
7
Schmidt FL, Hunter JE, Urry VE. 1976. Statistical power in criterion-related validation studies. Journal of Applied
Psychology 61: 473–485.
Schmidt FL, Hunter JE, Pearlman K, Shane GS. 1979. Further tests of the Schmidt-Hunter Bayesian validity
generalization model. Personnel Psychology 32: 257–281.
Schmidt FL, Oh I-S, Hayes TL. 2009. Fixed versus random models in meta-analysis: Model properties and
comparison of differences in results. British Journal of Mathematical and Statistical Psychology 62:97–128.
Society for Industrial and Organizational Psychology. 2003. Principles for the validation and use of personnel
selection procedures, 4th edn. Author: Bowling Green, OH.
Whetzel DL, McCloy RA, Hooper A, Russell TL, Waters SD, Campbell WJ, Ramos RA. 2011. Meta-analysis of clerical
performance predictors: Still stable after all these years. International Journal of Selection and Assessment 19:
41–50.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
8