ArticlePDF Available

History and development of the Schmidt–Hunter meta-analysis methods

Authors:

Abstract

In this article, I provide answers to the questions posed by Will Shadish about the history and development of the Schmidt-Hunter methods of meta-analysis. In the 1970s, I headed a research program on personnel selection at the US Office of Personnel Management (OPM). After our research showed that validity studies have low statistical power, OPM felt a need for a better way to demonstrate test validity, especially in light of court cases challenging selection methods. In response, we created our method of meta-analysis (initially called validity generalization). Results showed that most of the variability of validity estimates from study to study was because of sampling error and other research artifacts such as variations in range restriction and measurement error. Corrections for these artifacts in our research and in replications by others showed that the predictive validity of most tests was high and generalizable. This conclusion challenged long-standing beliefs and so provoked resistance, which over time was overcome. The 1982 book that we published extending these methods to research areas beyond personnel selection was positively received and was followed by expanded books in 1990, 2004, and 2014. Today, these methods are being applied in a wide variety of areas. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
History and development of the
SchmidtHunter meta-analysis methods
Frank L. Schmidt*
In this article, I provide answers to the questions posed by Will Shadish about the history and development
of the SchmidtHunter methods of meta-analysis. In the 1970s, I headed a research program on personnel
selection at the US Ofce of Personnel Management (OPM). After our research showed that validity studies
have low statistical power, OPM felt a need for a better way to demonstrate test validity, especially in light
of court cases challenging selection methods. In response, we created our method of meta-analysis
(initially called validity generalization). Results showed that most of the variability of validity estimates
from study to study was because of sampling error and other research artifacts such as variations in range
restriction and measurement error. Corrections for these artifacts in our research and in replications by
others showed that the predictive validity of most tests was high and generalizable. This conclusion
challenged long-standing beliefs and so provoked resistance, which over time was overcome. The 1982
book that we published extending these methods to research areas beyond personnel selection was
positively received and was followed by expanded books in 1990, 2004, and 2014. Today, these methods
are being applied in a wide variety of areas. Copyright © 2015 John Wiley & Sons, Ltd.
Keywords: meta-analysis; measurement error; history of meta-analysis; sampling error
Question 1, WS: How did the idea rst occur to you? Did you think of it as meta-analysis (because Gene Glass had
coined the term)? Did you think of it under some other rubric?
Answer, FS: That is a long story. After obtaining my industrialorganizational psychology (I-O) PhD from Purdue
in 1970, my rst job was in the Michigan State University psychology department, an excellent and stimulating
department. However, my degree was in an applied area, and I felt a little like a fraud because I was teaching what
I had little experience applying myself. I obtained tenure early but still left in 1974 for the US Ofce of Personnel
Management (OPM) in Washington, hoping to obtain real-world experience. OPM is responsible for the methods
used to hire people for the Federal workforce, and my training was in personnel selection, psychological
measurement, and statistics. This was a time of turmoil in personnel selection because court challenges to hiring
procedures under the 1964 Civil Rights were at an all time high (One OPM hiring test went all the way to the
Supreme Court, where it was upheld.). As a result, it was very important for OPM to have research evidence to
defend its selection tests in court. I was asked to conduct and direct such research.
Legal defense of hiring tests consisted mostly of criterion-related validity studies (Criterion-related validity is
the correlation between scores on a hiring procedure and later performance on the job.). One of the rst
publications from our OPM research effort showed that the average statistical power of such studies was only
about .50 (Schmidt et al., 1976). This meant that it was very risky for an employer to conduct such a study because
if the validity was not statistically signicant, the employers own study could be used against him or her in court.
Much larger sample sizes were needed to overcome this problem, and other methods of showing test validity (e.
g., content validity) were needed.
Since about 1920, the belief in I-O psychology had been that test validity was situationally specic, meaning
that separate validity studies had to be conducted in each organization and for each job. Two pieces of evidence
supported this belief: the fact that signicant validity was found in only about half of all studies and the fact that
the actual magnitude of validity estimates (regardless of signicance levels) varied widely across studieseven
when the jobs in question appeared to be similar or identical. Our statistical power article showed why the rst
Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City, IA 52242, USA
*Correspondence to: Frank Schmidt, Department of Management and Organizations, Tippie College of Business, University of Iowa, Iowa City,
IA 52242 USA.
E-mail: frank-schmidt@uiowa.edu
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
Special Issue Paper
Received 27 October 2014, Revised 10 November 2014, Accepted 20 November 2014 Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1134
1
of these was not valid evidence for the situational specicity theory. However, we still had to address the second
evidential basis for the theory. This challenge led to the development of our meta-analysis methods.
In addition to my OPM job, I had an appointment at George Washington University (GWU). One day in 1975,
sitting in my GWU ofce and pondering several highly variable distributions of validity coefcients, each from a
different job, I remembered an experience that I had had in graduate school. As a PhD student at Purdue, I took
courses from Joseph Tifn and Hubert Brogden. Tifn had performed, and was then doing, validity studies in
private industry, and he reported in class that in his experience, validity ndings were highly variable and seemed
to depend on peculiarities of the local situation. This was also the position taken in textbooks and articles.
Brogden had been the Technical Director of the US Army Research Institute (ARI). The validity results that he
reported in class were quite stable and were very similar from study to study. So, I asked Brogden why there
was this difference. He said that the answer was sampling error. The army had very large samples (often 10,000
or more), so there was little sampling error in the results, while in industry, the Ns were smalloften 50, 70, or
100and there was a lot of sampling error.
I completely forgot about that until I faced the problem of the bouncing validityat OPM. I had been
indoctrinated with the idea of situational specicity of validity, but when I was looking at all the variability in
validity, I remembered Brogdens comment and decided to see if I could gure out a way to quantify the amount
of sampling error in a distribution of validitythat is, a way to calculate the amount of variance expected from
sampling errors and then subtract that amount from the observed variance of the validity coefcients. I found that
most of the variance70 to 80%was sampling error. After subtracting out sampling error variance and applying
the usual corrections for measurement error in the job performance measures and for range restriction on test
scores, the typical distribution of validity had a substantial mean of about .50 or so and an SD of around .10 to
.15. This meant that virtually all of the values were in the positive range, and therefore, it demonstrated that
validity values were generalizable across situations. When I called Jack Hunter about my results, he was positive
and enthusiastic, which was reassuring to me because I was not sure that I had not made some mistake. Together,
we then began rening these procedures into general meta-analysis methods to be used in personnel selection. In
late 1975, we wrote this workup but did not immediately submit it for publication because we wanted to enter it
into a research contest sponsored by the I-O psychology division of American Psychological Association (APA,
Division 14). One requirement was that entries could not already be published. We won that award, but as a result,
our rst meta-analysis article was not published until 1977 (Schmidt and Hunter, 1977). Gene Glass published his
rst article on meta-analysis in 1976. We were not aware of Glasswork at that time, but we later realized that had
we not delayed publication for the research award, we could have tied Glass for the rst published application of
meta-analysis.
At about that time, I was asked to be the OPM liaison to a National Research Council committee on selection
testing issues. Lee J. Cronbach was on the committee, and when I showed him our results, he told us about the
work of Gene Glass. He also stated that we should expand application of our meta-analysis methods beyond
personnel selection to psychological literatures in general, and to that end, we should write a book describing
our methods. That book was published in 1982 (Hunter et al., 1982). At this time, we also adopted Glassterm
meta-analysisto describe these procedures; we had previously used only the term validity generalization.
However, again, Glass got there rst. The Glass, McGaw, and Smith meta-analysis book came out in 1981 (As an
aside, a couple of years later, we had Barry McGaw, the middle author, give a talk at OPM. I remember that Barry
made the point that he had come between Gene Glass and his wife.At that time, the third author, Mary Lee
Smith, was Glasswife.).
Question 2, WS: Did you think it would take on such a major role in science from the start or was it a surprise to
you how quickly it developed?
Answer, FS: We initially thought of our meta-analysis methods as just a solution to the problem of validity
generalization. However, early on, Lee J. Cronbach told us that we should be thinking of wider applications,
and we realized that the methods had many potential applications in other areas of I-O psychology and in many
areas of general psychology. However, we did not at that time realize how widely the methods would eventually
be applied or that they would come to be used in areas beyond psychology. However, by the mid 1980s, this had
become apparent. The methods were being applied widely in a variety of different areas in I-O psychology,
management research, human resources, marketing, information sciences, and other areas. An overview of the
impact in I-O psychology up to 2010 is presented in DeGeest and Schmidt (2011). Ironically, the one area in which
there was strong resistance was in personnel selection, the area in which the method had originated. It appeared
that the method was not only accepted but also enthusiastically embraced in research on all relationships except
the relationship between hiring procedures and job performance. I discuss this anomaly further later.
Question 3, WS: What were the obstacles that you encountered?
Answer, FS: The one big obstacle was resistance in the area of personnel selection, both from practitioners and
from the government agencies responsible for administration of the 1964 Civil Rights Act. Many I-O practitioners
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
2
made a good living conducting validity studies for employers. The implication of our meta-analytic validity
generalization studies was that these expensive situationally specic validity studies were not necessary. Our
ndings had shown that validity could safely be generalized without such studies. The so-called enforcement
agenciesthe Equal Employment Opportunity Commission, the Ofce of Federal Contract Compliance, and the
Civil Rights Division of the Department of Justicealso resisted these new ndings, apparently on grounds that
they made it too easy for employers to demonstrate validity for selection methods on which minorities tended
to do less well than others (despite the large body of research showing that these measures predict performance
just as accurately for minorities as for others). To this day, these agencies have refused to revise and update the
government Uniform Guidelines on Employee Selection Procedures (UGLs) that were issued in 1978, before the
newer research ndings were available, despite appeals from the Society for Industrial and Organizational
Psychology (SIOP) and other groups stating that revision and updating were sorely needed. The UGLs are not only
out of date but also inconsistent with current professional standards. This is a much longer term example of what
we have today with the global warming deniers and evolution deniers: If you do not like the research evidence,
just pretend that it does not exist. Some I-O practitioners have been happy with the agenciesposition because
it meant that they could continue to get paid for conducting unneeded validity studies.
Although the UGLs have not been updated, professional standards have been. The Standards for Psychological
Testing, published by the American Educational Research Association (AERA), the APA, and the National Council on
Measurement in Education, incorporated new meta-analytic ndings on validity starting in the 1990s and
continuing in the most recent edition (AERA et al., 2014). The same is true for the Principles for Employee Selection,
published by SIOP (Society for Industrial and Organizational Psychology, 2003). There had initially been some
resistance from professional societies to meta-analytic ndings, but this changed as the supporting ndings
increased. In addition to the cumulative impact of the increasing number of meta-analytic validity studies being
published, three other factors contributed to eventual acceptance. First, the illogic of a situation in which our
meta-analysis methods were embraced and lauded for application in all areas of research except the one area of
personnel selection became apparent. People saw that this did not make sense. The second big factor was the
publication of the article Forty Questions about Validity Generalization and Meta-Analysis(Schmidt et al.,
1985). Jack Hunter and I, along with Hannah Hirsh (now Rothstein), Kenneth Pearlman, Jim Caplan, Michael
McDaniel, and others at OPM who were involved in this work, had collected a long list of criticisms of meta-
analysis that we had heard expressed and had written responses to each [As an example, one criticism contended
that sampling error was only an hypothesis, not a fact (!).]. Milton Hakel, then the editor of Personnel Psychology,a
top-tier journal, invited us to publish these 40 questions and answers. He also invited a group of eminent
psychologists to respond to our answers, and we to their responses. The result was the longest article that has
ever appeared in that journalover 100 pages. This article apparently successfully addressed most or all the
reservations that people had about validity generalization and meta-analysis because the level of opposition
dropped dramatically after that.
The last turnaround factor occurred in 1994, when the American Psychological Association awarded Jack
Hunter and me the Distinguished Scientic Contribution Award for our meta-analytic work. This is probably the
most prestigious research award in psychology (and far outweighs the award that we obtained from SIOP for this
work the following year). After that, there was really no signicant opposition. This process of acceptance took
nearly 20 years.
Question 4, WS: What did you see as the signicant events or ideas that shaped your work in meta-analysis?
Answer, FS: Many of the events and ideas that shaped our work in meta-analysis are described in answers to
previous questions. However, there are a couple of additional ones. The rst shaping event that I would list here
was the fact that Jack Hunter and I were trained in measurement. Many people who became involved in meta-
analysis were PhDs in statistics but had no training in measurement. Jack Hunter and I were psychological
researchers and so were trained not only in statistics but also in psychological measurement. The principles
and methods of measurement play a critical role in psychological research (actually in all research). It is critical
to understand measurement error and other research artifacts such as range restriction and dichotomization of
continuous measures and to be able to correct for the (often large) biases that they create. In personnel selection,
the area in which we started our work, it was accepted that in primary studies, one should correct observed
validity values for measurement error in the job performance measure and for range restriction on the test or
other selection method. Moreover, it was apparent that in theoretical research, correction should also be made
for measurement error in the independent variable measure because in that type of research, it is construct-level
relationships that are of scientic interest [as noted by Rubin (1990)]. We incorporated these corrections for bias
into our meta-analysis methods, improving and rening them over time and evaluating the accuracy of each
change via computer simulation studies. We evolved two general meta-analytic approaches: methods for
correcting each correlation or dvalue individually prior to meta-analysis and methods for making these
corrections in meta-analysis based on distributions of artifact values (reliability values and range restriction ratios).
These methods are described in our various meta-analysis books, most recently in Schmidt and Hunter (2014). A
little after our initial work, similar methods were developed by Callender and Osburn (1980) and Raju and his
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
3
associates (Raju and Burke, 1983; Raju et al., 1991). These methods produced results virtually identical to our
methods, which helped to buttress our ndings and conclusions.
Measurement error is in many ways the most important artifact because it is the only one (beyond sampling
error) that is always present in all data. My colleagues and I believe that the arc in research methods always bends
in the direction of increased accuracy. For this reason, we believe that other approaches to meta-analysis will
eventually come to include corrections for the biasing effects of measurement error (and perhaps other research
artifacts too). Hedges (2009) and Matt and Cook (2009) have acknowledged the need to correct for measurement
error.
The second shaping inuence here was the decision to use a subtractive model of meta-analysis. We faced the
need in personnel selection to estimate the distribution of true validity population parameters. This led to a model
in which variance as a result of sampling error and other artifacts is subtracted from the observed variance of
correlations or dvalues, leaving an estimate of the variability of the population parameters, which was required
for validity generalization purposes. One consequence of this was that all of our methods were random-effects
models from the beginning. We never had any xed-effects (FE) models. This turned out to be important because
the FE models are now rejected as unrealistic by most meta-analysts because of their assumption that all studies
in the meta-analysis are estimating exactly the same population parameter (cf., Schmidt et al., 2009).
Question 5, WS: Do you think you got anything wrong? What would you do differently now with 2020
hindsight? What would you keep the same?
Answer, FS: There were at least a couple of things that we did wrong. First, we had an error in our rst
published application of meta-analysis in 1977. John Callender and Bart Osburn informed us that their simulation
studies showed that there was an error, but they did not know what was causing it. This error led to
underestimation of the SD of the population parameters (i.e., true validity values). We found the cause of this error
and corrected it in a subsequent application of our meta-analysis method (Schmidt et al., 1979). When we
corrected the means of the validity distributions for measurement error and range restriction, those corrections
increased the SDs. We had previously neglected to increase the SDs appropriately.
The other thing that we did wrong was underestimating the potential seriousness of publication bias and other
forms of unrepresentativeness in study sets. This occurred because we were able to show by a variety of analyses
that there was strong evidence that this was not a problem in the personnel selection literature (Hunter and
Schmidt, 1990, chapter 4, 2004, chapter 4). The error was the assumption that this was probably also the case
in most or all other areas. There is now a lot of evidence that publication bias and other availability issues are a
serious problem in some areas, especially in biomedical areas and in social psychology lab experiments. In the
most recent edition of our meta-analysis book, these problems are explored in detail (Schmidt and Hunter,
2014, chapter 13).
Some have argued that we made a mistake in not more strongly emphasizing the detection of moderators in
meta-analysis. In validity generalization research, one need only show that all or nearly all validity values are above
some practically useful value. Variation that exists above this point does not have to be addressed for purposes of
practical application. Moreover, in any event, that variation is usually too small to be consistent with moderators of
any size. However, for applications in other research areas, we emphasized moderator detection via subgrouping
of studies. The other alternative is meta-regression, which is widely abused in published meta-analyses today.
Statistical power is typically quite low, and capitalization on sampling error is substantial (especially that as a result
ex post facto selection of moderator candidates). The result is often the deadly combination of a high Type I error
rate along with a high Type II error rate. Five additional problems in the use of meta-regression are discussed in
the current edition of our meta-analysis book (Schmidt and Hunter, 2014, chapter 9). The accurate detection of
moderators in primary research and in meta-analysis is much more difcult than many have been led to believe
(Schmidt and Hunter, 1978).
Question 6, WS: How do you view meta-analysis today from your vantage point in history?
Answer, FS: Looking back today, I view meta-analysis as an inevitable development. By the 1970s, research
literatures across almost all areas were becoming unmanageable in size. Frustration with the task of making sense
of large conicting literatures was growing. The time was just ripe; a new tool was needed. I think that this
explains why meta-analysis was invented independently several times in different areas: by Chalmers in
biomedical research in the USA, by Peto in biomedical research in the UK, by Rosenthal in social psychology, by
Glass in education, and by Jack Hunter and me in I-O psychology. Also, at about this same time, Hedges and Olkin
published major advances in meta-analytic methods. There are other examples of nearly simultaneous
developments in scienceone example being the independent invention of calculus by both Newton and Leibniz
at about the same time.
I think it was clear that by the 1970s, the alternative to meta-analysis was epistemological despairnot an
attractive alternative given the scientic ideal of cumulative knowledge. Moreover, such despair was becoming
common and even mainstream. Even as eminent a methodologist as Lee J. Cronbach appears to have succumbed
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
4
to the temptation of epistemological despair. Cronbach (1975) argued that social and psychological reality
changes so fast and is so ephemeral that cumulative scientic knowledge is impossible in psychology and the
social sciences. In social psychology, Gergen (1982) was a well-known proponent of epistemological despair.
Meta-analysis lifted this veil of despair. It showed that cumulative knowledge was in fact possible.
Question 7, WS: What colleagues or events inuenced you to elevate the eld of meta-analysis? How did those
colleagues or events inuence you?
Answer, FS: As I indicated earlier, the encouragement from Lee J. Cronbach was important. However, we also
had support from other equally eminent psychologists. Anne Anastasi praised our methods and ndings and
featured them in the successive editions of her widely used textbook, Psychological Testing. Lloyd Humphries at
the University of Illinois was very supportive, as were Paul Meehl, Marvin Dunnette, and Tom Bouchard at the
University of Minnesota. I believe that these eminent gures were instrumental in the APA Distinguished Scientic
Contributions award that Jack Hunter and I received in 1994. I guess you could say that we were supported from
the commanding heights by some of the top brass in psychology, while at the same time being criticized and
rejected by many of the troops on the ground. Many personnel specialists and I-O practitioners were hostile to
our work because it refuted the 80-year-old belief in situational specicity of test validity and obviated the need
for the kinds of validity studies that they conducted.
Our meta-analysis methods also received great support from our colleagues at OPM, who promoted the
methods and who conducted and published many studies using these methods. These include Hannah Rothstein
(then Hannah Hirsh), Michael McDaniel, Kenneth Pearlman, Marvin Trattner, Lois Northrup, Ilene Gast, Murray
Mack, Deborah Whetzel, Guy Shane, Brian Stern at the ARI, and others. Later, at the University of Iowa, this list
included Deniz Ones, Vish Viswesvaran, Kenneth Law, Crissie Fry, Michael Judiesch, Kuh Yoon, Huy Le, Kevin
Carlson, Marc Orlitzky, In-Sue Oh, Jon Shaffer, Ben Postlethwaite, and others. The death of Jack Hunter in 2002
was a terrible loss. As betting his eminent contributions, his obituary was published in The American Psychologist
(Schmidt, 2003).
Question 8, WS: How has your teaching of meta-analysis evolved over the years?
What would you like to say about your former students and their contributions to the eld of meta-analysis?
Answer, FS: One change in my teaching over time was the increase in emphasis that I placed on applications of
meta-analysis in diverse areasbeyond just personnel selection. In fact, these other areas came to dominate my
teaching. This extended into meta-analysis of experimental studies. One reason for this change was the greater
acceptance of the methods in those other areas. Another was the fact that the PhD students in my meta-analysis
course came from a wide variety of areasmarketing, clinical psychology, engineering, sociology, human
resources, organizational behavior, nursing, and education. Almost all of these students submitted their required
class meta-analyses for publication, and almost all of these were published. This was also true of those students
whose dissertations were based on meta-analysis. Another change in my teaching is that the range of
methodological topics that I covered in my PhD meta-analysis course increased. One example is the introduction
of coverage of second-order sampling error. Meta-analysis greatly reduces sampling error in comparison to
individual studies but does not completely eliminate it. The remaining sampling error is second-order sampling
error. This emphasis eventually led to a new method of conducting second-order meta-analysis (Schmidt and
Oh, 2013), making possible the meta-analysis of meta-analyses.
The answer to the second part of this question can be found in my responses to questions 7 and 9.
Question 9, WS: What is your favorite (or among your favorites) application of meta-analysis and why?
Answer, FS: This is a difcult question because there have been so many excellent applications of our meta-
analysis methods. I will just select some of the larger ones that have had a big impact. The rst is the large
meta-analysis that Jack Hunter performed for the US Employment Service in the US Department of Labor (Hunter,
1983). This included a huge database (515 studies) on the validity of the general intelligence measure of General
Aptitude Test Battery, and it demonstrated generalizable validity essentially for all jobs in the US economy, with
the magnitude of validity depending on the complexity of the job family. The main ndings were later presented
in an article in the Psychological Bulletin (Hunter and Hunter, 1984).
Another is the Pearlman et al. (1980) meta-analysis of selection test validity values for clerical workers. This
study included over 650 validity studies focusing on a variety of test types and spanning a time period from
1920 to 1979. The largest meta-analysis in this study included 882 validity coefcients. This study is still a widely
cited classic. A recently published meta-analysis of studies conducted after the 1979 cutoff of Pearlman et al.
found essentially identical results (Whetzel et al., 2011), indicating stability of validity values over a period of
90 years despite the many changes in clerical duties and tasks that had occurred.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
5
Still another is the McDaniel et al. (1994) meta-analysis of employment interview validity values, also a citation
classic. Next, there is the Ones et al. (1993) meta-analysis of the validity of integrity tests used in hiring. This study
is probably the largest of all in terms of the sheer amount of data that it included. It has also been cited many
times.
Then, there is the Orlitzky et al. (2003) meta-analysis of the relation between corporate social responsibility and
corporate nancial outcomes. This one not only is a citation classic but also has been reprinted in three books, and
it won the 2004 Moskowitz Award for Finance Research.
Finally, there is the Harter et al. (2002) meta-analysis of the relation between average level of employee job
engagement and the business outcomes of revenue, prot, customer satisfaction, and low employee turnover.
This one has also been highly cited.
All of these studies required a great deal of effort over an extended period of time. Meta-analysis does not
provide an easy road to fame and fortune.
Question 10, WS: What is your favorite (or among your favorites) methodology in meta-analysis and why?
Answer, FS: As you might expect, my favorite methodologies are the methods presented in the four books on
meta-analysis that I coauthored: Hunter et al. (1982), Hunter and Schmidt (1990, 2004), and Schmidt and Hunter
(2014). Jack Hunter took the lead in the 1982 and 1990 books; they could not have been written without his
contributions. A major advantage and unique feature of the methods in these books are that they simultaneously
take into account both sampling error and measurement error, the two distorting factors that are present in all
data sets and in all studies. They also allow for the biasing effects of other research artifacts such as range
restriction and dichotomization of continuous measures to be corrected when they are present. Other approaches
to meta-analysis do not do this.
Question 11, WS: Did you see your work on meta-analysis mainly as a statistical exercise or mainly as a review of
the evidence?
Answer, FS: Jack Hunter and I (and our colleagues too) did not see meta-analysis as a mere statistical exercise.
We saw it as a path to improved epistemology in researcha successful and superior way to attain cumulative
knowledge and establish general scientic principles in spite of the variability in individual study ndings and
in spite of the confusion created by the nearly universal reliance on statistical signicance testing in the analysis
of research data. This was a continuation of the view that inspired our earlier pre-meta-analytic work.
The basic question was always what do data really mean and how can we extract reliable knowledge from data
(cf., Schmidt, 1992, 2010). This included our work promoting the use of condence intervals over signicance tests
(Schmidt, 1996; Schmidt and Hunter, 1997), the detection and calibration of moderator variables (Schmidt and
Hunter, 1978), the problem of instability of regression weights in data analysis (Schmidt, 1971, 1972), the chance
frequency of racial differences in test validity (Schmidt et al., 1973; Hunter et al., 1979), and other work in this vein.
So, our view of meta-analysis was that it was a continuation of this epistemological quest.
Question 12, WS: Can you reect on the relative roles of the broader eld of systematic review versus meta-
analysis now compared with when you started?
Answer, FS: This distinction appears to be a matter of terminology. Some use the term meta-analysis to
designate only the quantitative procedures of data analysis in meta-analysis, and they view the term systematic
review as a broader term that includes the search for studies, the coding of studies, and the interpretation and
presentation of the results. However, in my eld, the term meta-analysis includes all of these things, not just
the quantitative data analysis procedures. My impression is that this is the usage favored today by most people
concerned with meta-analysis. It is possible that the term systematic review originated in biomedical research,
and not in other areas that make use of meta-analysis. It is my impression that in the biomedical area, there
existed a concept of systemic review before meta-analysis developed, and when meta-analysis methods came
along, they were added as the quantitative component of these systematic reviews.
References
American Educational Research Association, American Psychological Association, National Council on
Measurement in Education. 2014. Standards for educational and psychological testing. Author: Washington, DC.
Callender JC, Osburn HG. 1980. Development and test of a new model for validity generalization. Journal of Applied
Psychology 65: 543558.
Cronbach LJ. 1975. Beyond the two disciplines of scientic psychology revisited. American Psychologist 30:
116127.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
6
DeGeest D, Schmidt FL. 2011. The impact of research synthesis methods on Industrial/Organizational Psychology:
The road from pessimism to optimism about cumulative knowledge. Research Synthesis Methods 1: 185197.
Gergen KJ. 1982. Toward transformation in social knowledge. Springer-Verlag: New York.
Glass GV, McGaw B, Smith ML. 1981. Meta-analysis in social research. Sage: Beverly Hills, CA.
Harter JK, Schmidt FL, Hayes TL. 2002. Business unit level relationships between employee satisfaction/
engagement and business outcomes: A meta-analysis. Journal of Applied Psychology 87: 268279.
Hedges LV. 2009. Statistical considerations. In Cooper H, Hedges LV, Valentine JC (eds). Handbook of research
synthesis and meta-analysis, 2nd edn. Russell Sage: New York, 3748.
Hunter JE. 1983. Test validation for 12,000 jobs: An application of job classication and validity generalization to
the General Aptitude Test Battery (GATB). Test Research Report No. 45, U.S. Department of Labor, U.S.
Employment Service, Washington, DC.
Hunter JE, Hunter RF. 1984. Validity and utility of alternative predictors of job performance. Psychological Bulletin
96:7298.
Hunter JE, Schmidt FL. 1990. Methods of meta-analysis: Correcting error and bias in research ndings. Sage:
Thousand Oaks, CA.
Hunter JE, Schmidt FL. 2004. Methods of meta-analysis: Correcting error and bias in research ndings, 2nd edn.
Sage: Thousand Oaks, CA.
Hunter JE, Schmidt FL, Hunter RF. 1979. Differential validity of employment tests by race: A comprehensive review
and analysis. Psychological Bulletin 86: 721735.
Hunter JE, Schmidt FL, Jackson GB. 1982. Meta-analysis: Cumulating research ndings across studies. Sage: Beverly
Hills, CA.
Matt GE, Cook TD. 2009. Threats to the validity of generalized inferences. In Cooper H, Hedges LV, Valentine JC
(eds). Handbook of research synthesis and meta-analysis, 2nd edn. Russell Sage: New York, 537560.
McDaniel MA, Whetzel DL, Schmidt FL, Mauer S. 1994. The validity of employment interviews: A comprehensive
review and meta-analysis. Journal of Applied Psychology 79: 599616.
Ones DS, Viswesvaran C, Schmidt FL. 1993. Comprehensive meta-analysis of integrity test validities: Findings and
implications for personnel selection and theories of job performance. Journal of Applied Psychology Monograph
78: 679703.
Orlitzky M, Schmidt FL, Rynes SL. 2003. Corporate social and nancial performance: A meta-analysis.
Organizational Studies 24: 403441.
Pearlman K, Schmidt FL, Hunter JE. 1980. Validity generalization results for tests used to predict job prociency
and training criteria in clerical occupations. Journal of Applied Psychology 65: 373407.
Raju NS, Burke MJ. 1983. Two new procedures for studying validity generalization. Journal of Applied Psychology
68: 382395.
Raju NS, Burke MJ, Normand J, Langlois GM. 1991. A new meta-analysis approach. Journal of Applied Psychology 76:
432446.
Rubin D. 1990. A new perspective on meta-analysis. In Wachter KW, Straf ML (eds). The future of meta-analysis.
Russell Sage: New York, 155166).
Schmidt FL. 1971. The relative efciency of regression and simple unit predictor weights in applied differential
psychology. Educational and Psychological Measurement 31: 699714.
Schmidt FL. 1972. The reliability of differences between linear regression weights in applied differential
psychology. Educational and Psychological Measurement 32: 879886.
Schmidt FL. 1992. What do data really mean? Research ndings, meta-analysis, and cumulative knowledge in
psychology. American Psychologist 47: 11731181.
Schmidt FL. 1996. Statistical signicance testing and cumulative knowledge in psychology: Implications for the
training of researchers. Psychological Methods 1: 115129.
Schmidt FL. 2003. John E. Hunter, 1939 2002. American Psychologist 58: 238.
Schmidt FL. 2010. Detecting and correcting the lies that data tell. Perspectives on Psychological Science 5: 233242.
Schmidt FL, Hunter JE. 1977. Development of a general solution to the problem of validity generalization. Journal
of Applied Psychology 62: 529540.
Schmidt FL, Hunter JE. 1978. Moderator research and the law of small numbers. Personnel Psychology 31: 215232.
Schmidt FL, Hunter JE. 1997. Eight common but false objections to the discontinuation of signicance testing in
the analysis of research data. In Harlow L, Muliak S, Steiger J (eds). What if there were no signicance tests?
Lawrence Erlbaum: Mahwah, NJ, 3764.
Schmidt FL, Hunter JE. 2014. Methods of meta-analysis: Correcting error and bias in research ndings, 3rd edn.
Sage: Thousand Oaks, CA.
Schmidt FL, Oh I-S. 2013. Methods for second order meta-analysis and illustrative applications. Organizational
Behavior and Human Decision Making 121: 204218.
Schmidt FL, Berner JG, Hunter JE. 1973. Racial differences in validity of employment tests: Reality or illusion?
Journal of Applied Psychology 58:59.
Schmidt FL, Hunter JE, Pearlman K, Hirsh HR. 1985. Forty questions about validity generalization and meta-analysis.
Personnel Psychology 38: 697798.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
7
Schmidt FL, Hunter JE, Urry VE. 1976. Statistical power in criterion-related validation studies. Journal of Applied
Psychology 61: 473485.
Schmidt FL, Hunter JE, Pearlman K, Shane GS. 1979. Further tests of the Schmidt-Hunter Bayesian validity
generalization model. Personnel Psychology 32: 257281.
Schmidt FL, Oh I-S, Hayes TL. 2009. Fixed versus random models in meta-analysis: Model properties and
comparison of differences in results. British Journal of Mathematical and Statistical Psychology 62:97128.
Society for Industrial and Organizational Psychology. 2003. Principles for the validation and use of personnel
selection procedures, 4th edn. Author: Bowling Green, OH.
Whetzel DL, McCloy RA, Hooper A, Russell TL, Waters SD, Campbell WJ, Ramos RA. 2011. Meta-analysis of clerical
performance predictors: Still stable after all these years. International Journal of Selection and Assessment 19:
4150.
F. L. SCHMIDT
Copyright © 2015 John Wiley & Sons, Ltd. Res. Syn. Meth. 2015
8
... The results of the relation estimates (r i ') and SD errors (e i ) are used to calculate the compiled effect of the relations. This estimate is also based on the Hunter-Schmidt method (Schmidt, 2015), indicated for estimating research with psychometric data (Borenstein et al., 2009;Card, 2012). For this estimate, a model is applied that considers individual estimates in a random way (models with random effects). ...
... The estimates consider the random-effects model, which assumes that the correlations between the studies are different and samples of the population of interest are random (Borenstein et al., 2009;Card, 2012). The estimation method considers the Hunter-Schmidt method (Schmidt, 2015), recommended for psychometric data (Borenstein et al., 2009). The Q residual statistic indicates residues' heterogeneity, which corresponds to a weighted measure of the square of the errors, and the I 2 measure corresponds to the reason for the variation observed on the heterogeneity (Borenstein et al., 2009). ...
Article
Purpose This article aims to compare smart meters' acceptance studies worldwide to consolidate trends and highlight factors that are not a consensus. Design/methodology/approach This work performs a statistical meta-analysis, using the Hunter–Schmidt method and the UTAUT2 model, of the factors of acceptance of smart meters in the world literature. A meta-regression was also conducted to verify the moderation exercised by gender, level of education and timeline context of the articles. Findings The main results point to hedonic motivation, performance expectancy and effort expectancy as the leading influencers for smart meter's acceptance. Meta-regression indicates that the influence is more significant among the male gender and that over the years, the social influence must gain weight in the smart meter's acceptance. Social implications Specific strategies are suggested to improve projects for the implementation of smart meters based on the obtained results. Originality/value The contribution given by this work is relevant, considering it is the first meta-analysis focused on smart meters' acceptance published in the literature
... It was at OPM working with Jack Hunter (still at Michigan State) that the theory of validity generalization began to be developed in earnest and the story is well told in a 2015 paper (Schmidt, 2015a). In 1985, Frank was offered a chaired professorship at the University of Iowa Tippie College of Business. ...
... This was a continuation of the view that inspired our earlier pre-meta-analytic work." (Schmidt, 2015a). ...
... The data analysis technique used is statistical analysis and descriptive analysis. Statistical analysis refers to the steps of the Statistical Analysis technique according to [14], the details of the stages are as in Table 1. After the statistical analysis is carried out, the next step is to carry out a descriptive analysis in order to: ...
... The range of the subject number involved in each study was 15 to 42 students per treatment cell. The imbalance or variation in the number of subjects in this experimental study can have an impact on the correlation variance to the estimated correlation variance and confidence interval [14]. ...
Article
Full-text available
A meta-analysis of students’ thesis is very necessary to improve the quality of research in maintaining the consistency and scientific truth justification of the theory. This study aims to (i) obtain an effect-size estimation, which is the strength of the relationship between the variables studied in a quasi-experimental study; (ii) describe the effectiveness of student-centered learning models compared to teacher-centered learning on learning outcomes; (iii) describe the conditions that affect the effectiveness of the student-centered learning models in the final assignments of students of Informatics Engineering, Universitas Pendidikan Ganesha. In order to achieve this goal, 22 students’ final assignments were reviewed as research samples. The study focused on two things, namely, (a) calculating the effect size of each sample of the theses; (b) examining the theoretical and empirical justification used to discuss the acceptance of the hypothesis. Statistical analysis techniques were used to determine the value of the effect size. The results showed, the effect-size value for the entire sample was 1.109 in the high category. Then, the results of the study of the t value for all the thesis studied showed all t-count values were greater than t-table values at a significant level of 0.05. It means student-centered learning models are better than teacher-centered learning models on the learning outcome variables studied.
... While a systematic review aims to provide a comprehensive literature search with pre-defined eligibility criteria, a metaanalysis combines and synthesizes findings with statistical models (22). In doing so it statistically assesses effect sizes and models the effect sizes with study characteristics focusing on the magnitude of the effect size (23,24). Effect sizes are weighed by their precision and in addition to the ability to determine average effect size, one can also estimate the consistency of effects across different studies. ...
Article
Full-text available
The purpose of this study was to review and evaluate existing research that used risk adjusters in disability research. Risk adjustment controls for individual characteristics of persons when examining outcomes. We have conducted a systematic review and an evaluation of existing studies that included risk adjusters for outcomes of people with disabilities receiving services (home or community based). The process included coding each study according to the type(s) of risk adjusters employed and their relation to the specific population and outcomes within a framework. Panels were utilized to prioritize the risk adjusters. Findings indicate that four risk adjusters can be tentatively recommended as potential candidate risk adjusters: chronic conditions, functional disability, mental health status, and cognitive functioning. Holistic Health and Functioning far outweighed other outcomes studied to date. Further, there is a need for testing recommended risk adjusters across multiple outcomes and different populations of people with disabilities.
... Therefore, while studies with large samples in the fixed effect model have more weight in determining the general effect, the weight of the studies is more balanced in the random effect model. Schmidt (2015) states that the assumptions of the fixed effect model are unrealistic, so it would be appropriate to use the random effect model in meta-analysis. To calculate the general effect size, it must be decided which model to use. ...
... This very issue is addressed in Table 2. A formula was used to correct the observed correlation for attenuation to assess the measuring error in X and Y (see e.g, Schmidt, 2015). Because this formulation involves the reliability coefficients of the study variables, these have been derived from the calculated internal consistency of generativity, environmental concern, pro-social attitude, and green purchase behavior. ...
Article
Full-text available
Purpose of the study: This research intends to find out the role of generativity in green purchase behavior. Moreover, the mediating roles of environmental concern and pro-social attitude have also been proposed. Methodology: This study was conducted on 689 university students given the sensitivity towards the concepts of generativity, environmental concern, pro-social attitude, and green purchase behavior. The model validity was verified by performing the exploratory factor analysis and confirmatory factor analysis. Finally, to test the hypotheses, hierarchal regression was performed. Main Findings: Results endorsed the positive impact of generativity concern on green purchase behavior. Additionally, the mediating roles of Environmental concern and pro-social attitude were also verified. Applications of this study: This study will be very useful for companies that are offering green products/services. Novelty/Originality of this study: This study distinguishes itself from prior studies by adding new predictors to the model of green purchase behavior. Furthermore, this study verifies the role of Environmental concern and Pro-social attitude on the link of GEN-GPB.
Article
Purpose: The aim of the study was to present the systematic review and meta-analysis of the psychometrical analysis of Facial Disability Index (FDI) studies. Methods: A literature search was conducted in the relevant electronic databases “PubMed, Scopus, Web of Science (WoS), and Cochrane Library”. A total of 621 articles were obtained by searching the relevant keywords (PubMed:384, Cochrane Library:14, Web of Science:132, Scopus:91). A total of 8 papers were included. The four-point classification and rating-based “COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN)” tools were used to evaluate the bias risk and evidence levels. Results: Cronbach's alpha pooling of FDI total score was (ES):0.803 (95% CI:0.73-0.86). Heterogeneity for the Facial Disability Index Physical Function (FDI-PF) and Facial Disability Index Social Function (FDI-SF) subscore based on Intra-class Correlation Coefficient (ICC) were I2=84.2% (ICC:0.88, 95% CI:0.81-0.92) and I2=73.7% (ICC:0.87, 95% CI:0.81-0.90), respectively. Correlational results between Sunnybrook Facial Grading System (SFGS) with FDI-PF and FDI-SF were 0.38 and 0.22, respectively. The correlation of FDI-PF with Short Form-12 Physical Component Summary (SF-12-PCS) and Short Form-12 Mental Health Component Summary (SF-12-MCS) were 0.43 and R:0.28, respectively. Correlation results of FDI-SF with SF-12-PCS and SF-12-MCS were 0.23 and 0.57. The relationship results of Facial Clinimetric Evaluation with FDI-PF and FDI-SF were 0.71 and 0.57, respectively. Conclusion: FDI is a psychometrically valuable questionnaire, especially for the internal consistency, reliability and validity. In clinical practice, the use of FDI would be valuable, in addition to clinician-based grading, to see more of patients' social influences precisely.
Article
Full-text available
Purpose This study aims to integrate extant research on eWallet adoption to better understand the key antecedents to eWallet use intention and examine whether the relationships differ across multiple moderators. Design/methodology/approach To integrate eWallet adoption findings, the unified theory of acceptance and use of technology (UTAUT) and its extensions were utilized. Meta-analyses estimated the relationships between eWallet use intention and seven antecedents and the intercorrelations between antecedents. A total of 28 effects were calculated, utilizing 48 studies and 444 individual effect sizes, using 14,802 subjects. Using meta-analytically derived values, regression and relative weight analysis then determined each antecedent's relative utility. Furthermore, moderator analyses examined whether eight theoretically based moderators influenced the relationships between the antecedents and eWallet use intention. Findings Price value, hedonic motivation, facilitating conditions and social influence had the strongest relationships with the intention to use eWallets, accounting for virtually all the unique variance. The three weakest antecedents, however, still explained a large percentage of variance. No relationships were significantly moderated. Research limitations/implications Due to the lack of data in primary studies, some UTAUT moderators could not be analyzed. Also, common method variance may impact the findings because the primary studies used cross-sectional surveys. Practical implications This study provides guidance regarding how companies can increase eWallet adoption rates, which have lagged in certain countries. These recommendations include specific techniques for tailoring messages and emphasizing features and benefits. Originality/value To the best of the authors’ knowledge, this is the first integrative meta-analysis conducted on eWallet use. Combining meta-analysis, regression and relative weight analysis, this study provides an integration of what is currently known about eWallet use intentions.
Conference Paper
Full-text available
Resumo: As tecnologias denominadas como e-health são consideradas um campo emergente e crescente no setor de saúde. Apesar do potencial benefício da utilização dos sistemas e-health, a relação entre a tecnologia e os seus fornecedores com os potenciais usuários tende a ser complexa. Como forma de melhor compreender esta relação é importante interpretar os fatores que explicam a aceitação de novas tecnologias por parte dos potenciais usuários. A partir disso, este trabalho tem por objetivo sintetizar os resultados de aceitação de tecnologias e-health por seus usuários por meio da realização de uma meta-analise. Para tanto será utilizado como base as relações e os constructos propostos no modelo UTAUT de aceitação de tecnologia. Além disso, são testados os efeitos das variáveis moderadoras (gênero, faixa etária, presença de enfermidade, usuários, aplicação tecnológica e ano de publicação) nas relações propostas no UTAUT por meio do procedimento denominado de meta-regressão. O presente estudo apresenta estimativas dos fatores que determinam a aceitação de novas tecnologias para saúde e sugere uma orientação geral para o desenvolvimento de novas tecnologias e-health considerando sua aceitação por parte dos usuários. Palavras-chaves: healthcare 4.0, smart health, internet of health things, aceitação tecnológica, percepção do usuário. .
Article
In order to tailor the educational context to students’ individual needs, teachers must accurately judge their students’ abilities. Educational researchers and practitioners thus need robust estimates of teachers’ judgment accuracy and also need to understand how it can be improved. In the current study, we use a modern, psychometric meta-analytical approach to re-analyze the data from Hoge and Coladarci (1989) seminal, descriptive review of teachers’ judgment accuracy. Replication scientists advise first re-analyzing data before examining whether a new study produces the same results. We therefore conduct an important first step in checking whether their results can be replicated. We were particularly interested in whether correcting for artifacts and publication bias would lead to a new estimate of teachers’ judgment accuracy and whether there was evidence of potential moderators (subject, student learning disability, student grade level). The results indicate that Hoge and Coladarci (1989) underestimated teachers’ judgment accuracy (r = .65 in the original study versus r = .80 in the current study). We found no evidence that teachers’ judgment accuracy differed between “classical” or social judgment theory tasks, nor did we find evidence that teachers’ judgment accuracy depended on subject (language/mathematics/miscellaneous), whether students had a learning disability, or grade level. Consistent with previous research, our results demonstrate the importance of correcting for artifacts and publication bias when conducting meta-analyses. Future studies should use our psychometric approach to (re)analyze and systematically compare different samples of studies on teachers’ judgment accuracy.
Article
Full-text available
Meta-analysis is a way of synthesizing previous research on a subject in order to assess what has already been learned, and even to derive new conclusions from the mass of already researched data. In the opinion of many social scientists, it offers hope for a truly cumulative social scientific knowledge.
Article
Full-text available
This meta-analytic review presents the findings of a project investigating the validity of the employment interview. Analyses are based on 245 coefficients derived from 86,311 individuals. Results show that interview validity depends on the content of the interview (situational, job related, or psychological), how the interview is conducted (structured vs. unstructured; board vs. individual), and the nature of the criterion (job performance, training performance, and tenure; research or administrative ratings). Situational interviews had higher validity than did job-related interviews, which, in turn, had higher validity than did psychologically based interviews. Structured interviews were found to have higher validity than unstructured interviews. Interviews showed similar validity for job performance and training performance criteria, but validity for the tenure criteria was lower.
Article
Full-text available
Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use meta-analysis to show that reliance on significance testing retards the development of cumulative knowledge. But reform of teaching and practice will also require that researchers learn that the benefits that they believe flow from use of significance testing are illusory. Teachers must revamp their courses to bring students to understand that (a) reliance on significance testing retards the growth of cumulative research knowledge; (b) benefits widely believed to flow from significance testing do not in fact exist; and (c) significance testing methods must be replaced with point estimates and confidence intervals in individual studies and with meta-analyses in the integration of multiple studies. This reform is essential to the future progress of cumulative knowledge in psychological research.
Article
This chapter provides a nonstatistical way of summarizing many of the main points in the preceding chapters. In particular, it takes the major assumptions outlined and translates them from formal statistical notation into ordinary English. The emphasis is on expressing specific violations of formal meta-analytic assumptions as concretely labeled threats to valid inference. This explicitly integrates statistical approaches to meta-analysis with a falsificationist framework that stresses how secure knowledge depends on ruling out alternative interpretations. Thus, we aim to refocus readers' attention on the major rationales for research synthesis and the kinds of knowledge meta-analysts seek to achieve. The special promise of meta-analysis is to foster empirical knowledge about general associations, especially causal ones, that is more secure than what other methods typically warrant. In our view, no rationale for meta-analysis is more important than its ability to identify the realm of application of a knowledge claim - that is, identifying whether the association holds with specific populations of persons, settings, times and ways of varying the cause or measuring the effect; holds across different populations of people, settings, times, and ways of operationalizing a cause and effect; and can even be extrapolated to other populations of people, settings, times, causes, and effects than those studied to date. These are all generalization tasks that researchers face, perhaps no one more explicitly than the meta-analyst. It is easy to justify why we translate violated statistical assumptions into threats to validity, particularly threats to the validity of conclusions regarding the generality of an association. The past twenty-five years of meta-analytic practice have amply demonstrated that primary studies rarely present a census or even a random sample of the populations, universes, categories, classes, or entities (terms we use interchangeably) about which generalizations are sought. The salient exception is when random sampling occurs from some clearly designated universe, a procedure that does warrant valid generalization to the population from which the sample was drawn, usually a human population in the social sciences. But most surveys take place in decidedly restricted settings (a living room, for instance) and at a single time, and the relevant cause and effect constructs are measured without randomly selecting items. Moreover, many people are not interested in the population a particular random sample represents, but ask instead whether that same association hold with a different kind of person, in a different setting, at a different time, or with a different cause or effect. These questions concern generalization as extrapolation rather than representation (Cook 1990). How can we extrapolate from studied populations to populations with many, few, or even no overlapping attributes? The sad reality is that the general inferences meta-analysis seeks to provide cannot depend on formal sampling theory alone. Other warrants are also needed. This chapter assumes that ruling out threats to validity can serve as one such warrant. Doing so is not as simple or as elegant as sampling with known probability from a well-designated universe, but it is more flexible and has been used with success to justify how manipulations or measures are chosen to represent cause and effect constructs (that is, construct validity). If meta-analysis is to deal with generalization understood as both representation and extrapolation, we need ways of using a particular database to justify reasonable conclusions about what the available samples represent and how they can be used to extrapolate to other kinds of persons, settings, times, causes, and effects. This chapter is not the first to propose that a framework of validity threats allows us to probe the validity of research inferences when a fundamental statistical assumption has been violated. Donald Campbell introduced his internal validity threats for instances when primary studies lack random assignment, creating quasi-experimental design as a legitimate extension of the thinking R. A. Fisher had begun (Campbell 1957; Campbell and Stanley 1963). Similarly, this chapter seeks to identify threats to valid inferences about generalization that arise in metaanalyses, particularly those that follow from infrequent random sampling. Of course, Donald Campbell and Julian Stanley also had a list of threats to external validity, and these also have to do with generalization (1963). But their list was far from complete and was developed more with primary studies in mind than with research syntheses. The question this chapter asks is how one can proceed to justify claims about the generality of an association when the within-study selection of persons, settings, times, and measures is almost never random and when it is also not even reasonable to assume that the available sample of studies is itself unbiased. This chapter proposes a threats-to-validity approach rooted in a theory of construct validity as one way to throw provisional light on how to justify general inferences.