Conference PaperPDF Available

Improving Management Science: Problems and Solutions

Authors:
!
1!
Guidelines for Science: Evidence and Checklists
J. Scott Armstrong
The Wharton School, University of Pennsylvania, Philadelphia, PA, and Ehrenberg-Bass Institute,
University of South Australia, Adelaide, SA, Australia. armstrong@wharton.upenn.edu
Kesten C. Green
University of South Australia Business School and Ehrenberg-Bass Institute,
University of South Australia, Adelaide, SA, Australia. kesten.green@unisa.edu.au
December 8, 2016-R Working Paper Version 364
Abstract
Problem: The scientific method is unrivalled as a basis for generating useful knowledge, yet
research papers published in the management and social sciences and applied economics fields often
violate scientific principles. What can be done to to increase the publication of useful papers?
Methods: Evidence on researchers’ compliance with scientific principles was examined.
Guidelines aimed at reducing violations were then derived from established definitions of the
scientific method.
Findings: Violations of the principles of science are encouraged by: (a) funding for advocacy
research; (b) regulations that limit what research is permitted, how it must be designed and reported;
(c) political suppression of scientists’ speech; (d) universities’ use of invalid criteria to evaluate
researchsuch as grant money and counting of unscientific publications; (e) journals’ use of invalid
criteria, such as tests of statistical significance, for deciding which papers to publish.
Solutions: We created a checklist of 25 evidence-based operational guidelines to help researchers
follow scientific principles. For users, funders, courts, legislators and regulators, employers, journal
editors, reviewers and other stakeholders, we derived a checklist of seven criteria to evaluate whether
a research paper complies with science.
Originality: This paper provides the first comprehensive evidence-based checklists of guidelines
for conducting scientific research and for evaluating the scientific quality of research efforts.
Usefulness: Journals could increase the publication of useful papers by including a section
committed to publishing all useful papers that comply with science. By using the Criteria for Science
checklist, acceptance decisions could be made quickly and more objectively than is currently the case.
Other stakeholders, too, could contribute to increasing the production of useful research by using the
checklist.
Keywords: advocacy; big data, checklists; experiments; incentives; multiple hypotheses;
objectivity; regression analysis; regulation; replication; statistical significance
Acknowledgements: We thank our reviewers Dennis Ahlburg, Hal Arkes, Jeff Cai, Rui Du, Robert
Fildes, Lew Goldberg, Anne-Wil Harzing, Ray Hubbard, Gary Lilien, Edwin Locke, Nick Lee, Byron
Sharp, Malcolm Wright, and one anonymous person. Our thanks should not be taken to imply that any
individual reviewer agrees with our recommendations for improving the practice of science. In addition,
Mustafa Akben, Len Braitman, Heiner Evanschitsky, Bent Flyvbjerg, Andreas Graefe, Jay Koehler, Don
Peters, Paul Sherman, William H. Starbuck, and Arch Woodside provided useful suggestions. Hester
Green, Esther Park, and Lynn Selhat edited the paper. Scheherbano Rafay helped in the development of the
software and editorial help.
Authors’ notes: (1) Each paper we cite has been read by one or both of us. (2) To ensure that we
describe the findings accurately, we are attempting to contact all authors whose research we cited as
evidence. (3) We declare that we did our best to provide full disclosure and objective findings. (4)
Estimated reading time for a typical reader is about an hour.
Voluntary disclosure: We received no external funding for this paper.
!
2!
Introduction
We first present a working definition of science. We use that definition along with our review
of evidence on compliance with science by papers published in leading journals to develop
operational guidelines for implementing scientific principles. We then developed a checklist to
help researchers follow the guidelines, and another to help reviewersthose who fund, publish,
or use research to assess whether a paper complies with scientific principles.
While the scientific principles underlying our guidelines are well-established, our
presentation of them in the form of comprehensive checklists of operational guidance for science
is novel. We present evidence that in the absence of such an aid, researchers and research
stakeholders will often fail to observe scientific principles.
Defining Useful Science
We relied on well-accepted definitions of science. The definitions, which apply to science in
all fields, are consistent with one another.
Benjamin Franklin, the founder of the University of Pennsylvania, called for the university to
be involved in the discovery and dissemination of useful knowledge (Franklin, 1743). He did so
because he thought that universities around the world were failing in that regard. We propose that
useful knowledge is obtained by applying scientific principles to the study of important problems.
The value of scientific knowledge is commonly regarded as being based on its objectivity
(see, e.g., Reiss and Sprenger’s 2014 “scientific objectivity” entry in the Stanford Encycolpedia
of Philosophy.
In his 1620 Novum Organum, Sir Francis Bacon suggested that the scientific method involves
induction from systematic observation and experimentation. In the third edition of his
Philosophiae Naturalis Principia Mathematica, first published in 1726, Newton described four
“Rules of Reasoning in Philosophy.” The fourth rule reads, “In experimental philosophy we are to
look upon propositions collected by general induction from phænomena as accurately or very
nearly true, notwithstanding any contrary hypotheses that may be imagined, till such time as other
phænomena occur, by which they may either be made more accurate, or liable to exceptions”.
Berelson and Steiner’s (1964, pp.16-17) research on the scientific method provided six
guidelines that are consistent with the above definitions, and identified prediction as one of the
primary purposes of science.
The Oxford English Dictionary (2014) offers the following in their definition of scientific
method”: “It is now commonly represented as ideally comprising some or all of (a) systematic
observation, measurement, and experimentation, (b) induction and the formulation of hypotheses,
(c) the making of deductions from the hypotheses, (d) the experimental testing of the
deductions…”
Given Franklin’s injunction and the preceding definitions, we define useful science as…
An objective process of studying important problems by comparing multiple
hypotheses using using experiments (designed, natural, or quasi). The process
uses cumulative scientific knowledge and systematic measurement to obtain
valid and reliable data, valid and simple methods for analysis, logical deduction
that does not go beyond the evidence, tests of predictive validity, and disclosure
of all information needed for replication.
Our definitionin common with the OED’sdoes not allow that theorizing or observation
and measurement, while critically important, can on their own amount to useful science.
Following Popper (see Thornton, 2016), we reason that without empirical testing against other
hypotheses, the usefulness or otherwise of a theory remains unknown and so cannot contribute to
scientific knowledge. We concede that the findings of research that are not immediately useful
!
3!
can turn out to be useful in future, but suggest that scientists should be able to distinguish
between projects that will provide findings that could be useful in practice and those that could
not. We consider that our definition self-evidently applies to obtaining useful knowledge in any
domain, including the social, managerial, and economic domains.
Advocacy Research, Incentives, and the Practice of Science
Funding for researchers is often provided to gain support for a favored hypothesis.
Researchers are also rewarded for finding evidence that supports hypotheses favored by senior
colleagues. These incentives often lead to what we call “advocacy research,” an approach that is
contrary to the definition of science given the need of objectivity. In addition, university
researchers are typically rewarded with selection and promotion on the basis of their performance
against measures that have the effect of distracting them from doing useful scientific research.
These problems with the practice of science have long been noted in the management and
social sciences, and led Armstrong (1982) to propose “the author’s formula:” to improve their
chances of getting their papers published, researchers should avoid examining important
problems, challenging existing beliefs, obtaining surprising findings, using simple methods,
providing full disclosure, and writing clearly.
Advocacy Research
The human understanding when it has once adopted an opinion draws all things else to
support and agree with it. And though there be a greater number and weight of instances to
be found on the other side, yet these it either neglects and despises, or else by some distinction
sets aside and rejects, in order that by this great and pernicious predetermination the
authority of its former conclusion may remain inviolate.”
Francis Bacon (XLVI, 1620)
“When men want to construct or support a theory, how they torture facts into their service!”
Mackay (Ch.10, para. 168, 1852)
Advocacy research can be the product of a genuine belief that one’s preferred hypothesis
must be true, thus blinding the researcher to alternatives. The single-minded pursuit of support for
a favored hypothesis has also been referred to as “confirmation bias” (see, e.g., Nickerson, 1998,
for a history of confirmation bias).
Mitroff (1969, 1972a, 1972b) interviewed 40 eminent space scientists. He found that the
scientists with the highest prestige did not live up to the scientific standard of objectivity. Instead,
they were advocates for their hypotheses and resisted disconfirming evidence. Rather than
viewing advocacy research as harmful to the pursuit of useful knowledge, Mitroff considered it to
be a legitimate way to do science. Armstrong (1980a) disagreed and used advocacy research to
prove that Mitroff was a fictitious name for a group of scientists who wished to demonstrate that
papers that violated scientific principles could be published in a scientific journal. In doing so,
Armstrong avoided mentioning disconfirming evidence: that he knew Ian Mitroff.
Furthermore, journal reviewers often act as advocates by recommending the rejection of
papers that challenge popular theories. Mahoney (1977) asked Journal of Applied Behavior
Analysis reviewers to review a contrived paper. One version described findings that supported the
accepted hypothesis in the field represented by the journal, while the other paper, with the same
methods, reversed the findings. The ten reviewers who rated the paper that supported the common
belief gave an average rating of 4.2 on a 6-point scale for quality of methodology, while the 14
who rated the paper that challenged the common belief gave an average rating of 2.4. Reviewers’
recommendations on whether to publish, or not, were mostly consistent with their ratings on the
methodology. Abramowitz, Gomes, and Abramowitz (1975), Goodstein and Brazis (1970),
!
4!
Koehler (1993), and Smart (1964) reported similar findings in psychology. Young, Ioannidis, and
Al-Ubaydli (2008) describe the same problem of biased reviewing in biomedical research.
Advocacy research is common in the management sciences. An audit of 120 empirical papers
published in Management Science from 1955 to 1976 found that 64 percent advocated for a
preferred hypothesis (Armstrong, 1979). An audit of 1,700 empirical papers in six leading
marketing journals from 1984 to 1999 found that 74 percent used advocacy (Armstrong, Brodie,
and Parsons, 2001).
From their audit of research findings from 3,500 studies in 87 areas of empirical economics,
Doucouliagos and Stanley (2013) concluded that for topics about which there is a consensus,
findings that challenge that consensus were less often published than would be expected by
chance alone. The bias against publishing findings that are unsupportive of the consensus
alternatively, against submitting papers with anti-consensus findings or against testing anti-
consensus hypothesessurely undermines policy decisions. Indeed, in a survey of 6,700
empirical economics studies covering 159 areas providing 64,076 effect size estimates, Ioannidis,
Stanley, and Doucouliagos (2015) found that estimates from the minority of studies with adequate
statistical power were typically no more than one-half the size of the average effect estimates. For
example, the value of a statistical life calculated as the simple average of the estimates from all 39
studies examined by the authors was $9.5 million, whereas the average from the 10 adequately
powered studies alone was $1.5 million.
Distracting Incentives
Researchers in universities and many other organizations are typically subject to incentives
that are unrelated to or detrimental to Franklin’s call for useful research. In particular, university
administrators reward researchers primarily for publishing papers in high-status journals, and for
obtaining grants.
Research income from grants to universities and other organizations
Grants are often awarded with an explicit or implicit requirement to conduct advocacy
research, and thus to do research that is unscientific. If you do succeed in obtaining funding, you
are likely to lose some freedom on what to study and how to do so. In addition, the overall costs
of the project become much higher due to overheads charged by universities. Finally, grants may
lead researchers away from what they consider to be the most important problems that they could
address.
Publication counts
The number of papers published in academic journals is a poor measure of useful scientific
output. Many papers address trivial problems and, as we will show, few papers comply with
scientific principles.
Publication counts of useful scientific studies are relevant. However, counts of papers that
violate scientific principles are useless at best. In particular, advocacy research promoting
commonly held beliefs attracts many citations by those who share the beliefs.
In addition, the incentive of paper counting encourages strategies such as multiple authorship,
breaking papers into small pieces for publication, and publishing papers regardless of value. In
sum, publication counts encourage quantity over quality.
In turn, prestigious journal in the social sciences typically insist that empirical papers include
statistically significant findings. The requirement has been increasing over the past century to the
extent that it is now a dominant influence on whether an empirical paper will be published
(Hubbard, 2016, Chapter 2). Hubbard showed that by 2007, statistical significance testing was
included in 98.6 percent of published empirical studies in accounting, and over 90 percent in
political science, economics, finance, management, and marketing.
!
5!
Unfortunately, this occurs despite the absence of evidence to support their validity (see, e.g.,
Hunter, 1997; Schmidt and Hunter, 1997; and Armstrong, 2007a, 2007b). Hubbard (2016, pp.
232-234) lists 19 well-cited books and articles published since 1960 describing why such tests are
invalid.
Readers have difficulty understanding statistical significance. Even leading academic
econometricians have that problem, as experiments by Soyer and Hogarth (2012) found. Their
subjects made errors when they asked to interpret standard statistical summaries of regression
analyses.
McShane and Gal (2015) described an experiment with 261 subjects who were recruited from
among researchers who had published in the American Journal of Epidemiology. They were
presented with the findings of a comparative drug test, and asked which of the two drugs they
would recommend for a patient. More than 90 percent of subjects presented with statistically
significant drug test findings (p < 0.05) recommend the more effective drug, while fewer than
half of those who were presented with results that were not statistically significant advised the use
of the more effective drug.
Such significance testing serves no need. If one is concerned about uncertainty, confidence
interval estimates are easier to understand and are less likely to confuse researchers and
practitioners.
Moreover, the pressure to obtain statistically significant findings leads researchers to dubious
practices. For example, Hubbard and Armstrong (1992) analyzed 32 randomly selected issues of each
of the Journal of Marketing, Journal of Marketing Research, and Journal of Consumer Research
for the period 1974 through 1989. Of the 692 papers using tests of statistical significance, 92
percent rejected the null hypothesis. Hubbard and Armstrong (1997) noted similar proportions in
accounting, marketing, medicine, psychology, and sociology. Many studies have provided
evidence on the seriousness of the problem over recent decades (Hubbard, 2016, pp. 43-47).
In addition, Bedeian, Taylor, and Miller’s (2010) survey of management faculty found that 92
percent claimed to know of researchers who, within the previous year, had developed hypotheses
after they analyzed the data. Given these pressures on researchers, it is would not be surprising if
many engaged in questionable research practices. In fact, 35 percent of the respondents in a
survey of over 2,000 psychologists by John, Lowenstein, and Prelec (2012) admitted to “reporting
an unexpected finding as having been predicted from the start.” Moreover, 43 percent decided to
“exclude data after looking at the impact of doing so on the results.
Alternatively, support for a preferred hypothesis—in the form of a statistically significant
difference from a senseless null hypothesisis easily obtained by applying multiple regression
and similar methods to non-experimental data. This practice has been used increasingly since the
1960s. A recent term for it is “p-hacking.”
The problems are especially serious when one has data on many variables (big data). For
example, advocates have been able to analyze non-experimental data to support their hypothesis
that competitor-oriented objectivessuch as market sharelead to higher profits. In contrast,
analyses of experimental studieswith their control over relevant variableshave shown that
market share objectives are detrimental to the profitability and survival of firms (Armstrong and
Collopy, 1996; and Armstrong and Green, 2007). Similarly, analyses of non-experimental data by
economists have supported the hypothesis that high payments for top corporate executives are
beneficial to stockholders whereas experimental studies by organizational behavior researchers
lead to the conclusion that CEOs are typically overpaid. The result is that the profitability of firms
is reduced (Jacquart and Armstrong, 2013).
Effects on Science
In a paper titled “Why most published research findings are false,” Ioannidis (2005)
demonstrated how incentives, flexibility in research methods, the use of statistical significance
!
6!
testing, and advocacy of a favored hypothesis will typically lead to the publication of incorrect
findings. Hubbard’s (2016) meta-analysis of 804 replication outcomes in 16 studies in seven areas
of management science (pp. 140-141) found that the authors of the replications reported conflicts
with the original study for an average of 46 percent of the replications. Moreover, the Open
Science Collaboration (OSC 2015) study of 100 direct replications claimed that 36 percent of
replication attempts failed. Even allowing for the likelihood that many of the replications were
identified as failed by the researchers due to their inappropriate use of statistical significance as a
criterion, the findings raise concerns.
Armstrong and Hubbard (1991) conducted a survey of editors of American Psychological
Association (APA) journals that asked: “To the best of your memory, during the last two years of
your tenure as editor of an APA journal, did your journal publish one or more papers that were
considered to be both controversial and empirical? (That is, papers that presented empirical
evidence contradicting the prevailing wisdom.)” Sixteen of the 20 editors replied: seven could
recall none, four said there was one, while three said there was at least one and two said they
published several such papers.
Fortunately, it occurs to some researchers and to some research organizations that their
proper objective is to produce useful scientific findings. As a result, one can look in almost any
area and find useful scientific research. Our concern in this paper is not the absence of important
papers, but rather their infrequency. That concern is related to what Holub, Tappeiner, and
Eberharter (1991)referring to the field of economicscalled the Iron Law of Important Papers:
Rapid increases in government funding has increased the number of papers published but seems
to have had little effect on the number of papers with useful scientific findings. Consider the
problem of forecasting. The number of papers on forecasting increased enormously over the latter
half of the 20th Century, yet the number of useful papers averaged only one per month over that
period. Moreover, that pace increased only moderately despite attempts to improve it, including
two new journals and an annual conference devoted to forecasting methods (Armstrong and
Pagell 2003).
On the Value of Checklists
We propose the use of evidence-based checklists of operational guidelines to increase the
output of useful research. Checklists are used in many fields to help ensure that the proper
procedures are used. In the fields of engineering, aeronautics, and medicine, failures to follow
checklists of evidence-based guidelines can be used in court cases to assign blame for bad
outcomes.
Checklists are based on the method of decomposition, whereby a complex problem is
analyzed in parts that can be solved more easily than the whole. Macgregor’s (2001) review
provides experimental evidence on the usefulness of judgmental decomposition. Arkes, Shaffer,
and Dawes (2006) found decomposed ratings to be more reliable than holistic ratings procedure
for selecting research proposals to be funded by the National Institutes of Health. In three
experiments on job selection and college selection, Arkes et al. (2010) found that decomposition
improved judgments compared to holistic ratings.
In their review of 15 experimental studies on the use of checklists in healthcare, Hales and
Pronovost (2006) found that checklists led to substantial improvements in outcomes in all studies.
For example, one experiment examined the application of a 19-item checklist for surgical
procedures on thousands of patients in eight hospitals around the world. Use of the checklist
reduced death rates at those hospitals by half (Haynes et al. 2009).
Checklists can help even when the users are aware of proper procedures. For example, an
experiment on avoiding infection in the intensive care units of 103 Michigan-based hospitals
required physicians to follow five guidelines that they were already familiar with when they were
!
7!
inserting catheters. Following the simple checklist reduced the infection rate from 2.7 per 1,000
patients to zero after three months.
Checklists are expected to be most effective when experts know little about the relevant
evidence-based principles. This was shown in Armstrong et. al, (2016), where advertising novices
were asked to use a checklist to rate the compliance of 96 pairs of ads with 195 evidence-based
persuasion principles. By using the checklist, they made 44 percent fewer errors in predicting
which ad was more effective than did the novices who were unaided by the checklist.
Checklists must be based on evidence, or on logical principles that are obvious. In this paper,
we take the principles of sciencewhich have proven their worth over centuriesas obvious.
Checklists can be harmful if the guidelines lack a basis in evidence or logic and thereby lead
users to follow invalid guidelines more consistently.
Many checklists for management science, such as Porter’s five forces framework (Porter
1980), are based on opinions rather than experimental evidence. Another example is the Boston
Consulting Group’s (BCG) matrix for portfolio planning, which experiments by Armstrong and
Brodie (1994) showed to be invalid. Despite the lack of evidence on their predictive validity,
those checklists continue to be widely taught in business schools, Porter’s paper had almost
43,000 hits on Google Scholar by September 2016 and the BCG checklist had over 4,700.
Operational Guidelines for Scientists
We were unable to find a comprehensive evidence-based checklist of operational guidelines
for conducting scientific research. We did find advice in the Operations Research Society of
America report, “Guidelines by the Ad Hoc Committee on Professional Standards” (1971), and
the CONSORT 2010 checklist (Schulz, Altman, and Moher, 2010; Moher et al., 2010). As far as
we were able to determine, those guidelines are the product of a consensus of expert opinions.
Nevertheless, they were useful in helping to formulate our operational guidelines.
The primary bases for our guidelines were the established definitions of science we described
above. To determine whether there is a need for operational guidelines, we searched the literature
on the practice of science for evidence on whether papers published in scientific journals comply
with principles identified in the standard definitions of science. This search is described below.
Having established a need, we developed guidelines as simple operational steps for complying
with scientific principles. We did not challenge the established understanding of the scientific
method.
Searching for Evidence on the Practice of Science
In this paper we examined evidence on the extent to which researchers comply with science.
The first author has been researching the issue since the early-1970s.
While we searched the Internet, our primary sources were references from key books and
articles. For example, Hubbard (2016) provided a review of 900 relevant studies, 94 percent of
which were published during the past half-century.
To ensure that our interpretations of others’ findings are correct, we are contacting all
researchers whose findings we cite substantively asking them if our summary of their findings is
correct. We also ask what papers we might have overlooked, stressing that we are looking for
papers with evidence that would challenge our findings. At the time of writing, this survey is still
ongoing.
Figure 1 presents the checklist of 25 guidelines for scientists that we developed. We describe
the guidelines under the six broad headings shown in Figure 1Selecting a problem, Designing a
study, Collecting data, Analyzing data, Writing a scientific paper, and Disseminating the findings.
!
8!
Figure 1: Guidelines for Scientists
Selecting a problem
1.
¨
Seek an important problem
2.
¨
Be skeptical about findings, theories, policies, methods, data, especially absent experimental evidence
3.
¨
Consider replications and extensions of useful papers that examine experimental evidence
4.
¨
Ensure that you can address the problem objectively
5.
¨
If you need funding, ensure that you will nevertheless have control over all aspects of your study
Designing a study
6.
¨
Acquire existing knowledge about the problem
7.
¨
Develop multiple reasonable hypotheses
8.
¨
Design experiments with specified conditions to test hypotheses in new situations
Collecting data
9.
¨
Obtain valid data
10.
¨
Ensure that the data are reliable
Analyzing data
11.
¨
Use validated methods
12.
¨
Use simple methods
13.
¨
Use methods that incorporate cumulative knowledge
14.
¨
Estimate effect sizes and confidence
15.
¨
Draw logical conclusions on the practical implications of findings from the tests of hypotheses
Writing a scientific paper
16.
¨
Disclose research hypotheses, procedures, and data
17.
¨
Cite all relevant scientific papers when presenting evidence
18.
¨
Ensure summaries of prior findings that you cite are correct
19.
¨
Explain why your findings are useful
20.
¨
Write clearly and succinctly for the audience for whom the findings might be useful
21.
¨
Obtain extensive peer review and editing before submitting a paper for publication
Disseminating the findings
22.
¨
Provide thorough responses to journal reviewers, including reasons for not following suggestions
23.
¨
Challenge rejection, but only if your case is strong
24.
¨
Consider alternative ways to publish your findings
25.
¨
Inform those who can use your findings
J. Scott Armstrong and Kesten C. Green, December 8, 2016
Selecting a Problem
Research will not lead to useful findings when the topic of the research is unimportant. An
important research problem is one for which new knowledge could be used to make changes that
substantively improve forecasting or decision-making. Such problems can involve developing
and improving useful techniques, identifying and estimating causal relationships, or developing
principles.
!
9!
1. Seek an important problem
In our literature searches in various fields, we have found it easy to agree on which papers
address important problems by simply looking at their titles, abstracts, tables of results, and
conclusions. Sometimes the title alone is sufficient to make a judgment. If you are not convinced,
try the procedure yourself.
Creativity is essential for identifying important problems. There is much evidence that
working in groups depresses creativity and productivity, especially if the group meets face-to-
face (Armstrong, 2006). We suggest that you work on your own to seek important problems.
Common sense suggests that you eliminate distractions.
Make a list of problems that affect people in important ways. Who might benefit from the
study? Show it to people who face the problems that you propose to study. Would they regard the
findings that might arise from the research as useful in any practical sense? Notice that the
question is not whether the findings would be interesting, clever, or entertaining.
It helps if you work in an organization that has important and obvious problems that need to
be solved. Gordon and Marquis (1966), in their analysis of 245 research projects, found that
academic researchers in social science departmentswhere there are relatively few obvious
problems to addressproduced less innovative research than those in organizations closer to a
problem area, such as researchers in hospitals.
The statement of the problem limits the search for solutions. To avoid that, state the problem in
many different ways prior to searching for solutions, a technique known as “problem storming.” Then
search for solutions for each statement of the problem. For example, politicians who are concerned that
higher education is not effective usually state the problem as “how can we improve teaching?” An
alternative is, “how can we improve learning?” The latter approach yields recommendations that are
different from those of the first, as was shown in Armstrong (2012a).
Ask others to review your problem statement. One way to present your problem statement is
to write a press release that describes the possible findings from your study, assuming that
everything turns out the best you can imagine. Show it to people who might benefit and ask them
how they could use your findings.
Hal Arkes, who has a history of important discoveries in the management sciences, uses his
“Aunt Mary test.” At Thanksgiving each year, his Aunt Mary would ask him to tell her about his
important new research. When Aunt Mary was skeptical about a research idea, he said, “I didn't
always abandon it, but I always reevaluated it, usually resulting in some kind of modification of
the idea to make it simpler or more practical” (Arkes, personal communication, 2016).
There is little reason to believe that committees of officials in governments, corporations, or
foundations can and do identify projects that would lead to useful scientific findings better than
individual researchers can and do. Creativity is an individual activity and designing research
projects is best left to scientists,
2. Be skeptical about findings, theories, policies, methods, and data, especially absent
experimental evidence !
“I would rather have questions that can’t be answered than
answers that can’t be questioned.” Attributed to Richard Feynman
Skepticism drives progress in science. Unfortunately, skepticism can also annoy other
researchers and thus reduce opportunities for employment, funding, publication, and citations.
Researchers in universities go to considerable lengths to ensure a common core of beliefs.
That tendency is witnessed by the fact that over the past half century, political conservatives have
become rare in social science departments at leading U.S. universities (Duarte et al., 2015, and
Langbert, Quain, and Klein 2016). An important consequence has been a loss of skepticism
toward fashionable beliefs.
!
10!
Research is more likely to be useful if it addresses problems that have been the subject of
few, if any, experimental studies. It seems to us that there are many important problems that lack
credible experimental evidence.
Ignaz Semmelweis’s experiments provide a classic example of a researcher taking a skeptical
approach to then current beliefs and practiceswhich had not been tested beforeand making a
life-saving discovery as a result. He found that when doctors washed their hands after dissecting
cadavers, deaths in the maternity ward they then visited to inspect patients fell from 14 percent in
1846 to 1.3 percent in 1848 (Routh, 1849).
Some important methods are supported only by expert judgment. For example, game theory
is often recommended as a way to forecast what will happen in conflict situations. However,
experimental tests have found that leading game theorists’ forecasts of decisions in conflicts were
no more accurate than unaided guesses by naïve subjects. In addition, the methods of structured
analogies and simulated interaction provided forecasts that were much more accurate (Green,
2002 and 2005, and Green and Armstrong, 2007 and 2011).
3. Consider replications and extensions of useful papers that examine experimental evidence
Replicationsdirect replications and extensionsof scientific studies that influence policies
and decisions are important regardless of whether they support or conflict with the original study.
Direct replications of studies that did not lead to useful findings are of no value.
Direct replications are helpful when there are reasons to be suspicious about findings relating
to an important problem, such as with claims of “cold fusion.” Otherwise, extensions are more
important as they provide evidence about the conditions under which the findings apply.
Unfortunately, replications are often difficult to conduct due to a lack of sufficient disclosure
in published papers and uncooperative authors (Hubbard, 2016, p.149; Iqbal et al., 2016). In
addition, a replication that fails to support the original findings might not be welcomed by editors
and reviewers at the journal that published the original paper (Hubbard, 2016, section 5.5.8,
summarizes evidence on that issue).
There is reason for optimism, however, as some journals have recently adopted policies
encouraging replications, and some have published special issues of replications. Further, if you
are engaged in an important and well-designed study, consider conducting an extension of your
own study, as is recommended in some psychology journals.
One example of how replications can play an important role is Iyengar and Lepper’s (2000)
experiment. When shoppers in their study were offered a choice of 24 exotic jams, fewer than 3
percent made a purchase, whereas 30 percent of those offered a choice of six jams did so. The
initial conclusion was that customers should not be offered too many choices. As it happened, an
attempt to replicate the jam study failed, and a meta-analysis of 50 related empirical studies failed
to find the “too-many-choices” effect (Scheibehenne, Greifeneder, and Todd, 2010). The
extensions found that the number of choices that consumers prefer is affected by various
conditions (Armstrong, 2010, pp. 35-39).
Another example of the importance of replications is Hirschman’s “hiding hand”
endorsement of ignorance in planning public projects. Based on a study of 11 large development
projects financed by the World Bank, Hirschman (1967) concluded that while planners
underestimate costs, they underestimate benefits as well so that, on balance, such projects are
beneficial. That finding has apparently been influential in garnering support for big development
projects. Flyvbjerg (2016) replicated Hirschman’s study using a sample of 2,062 projects
involving eight types of infrastructure in 104 countries on six continents during the period 1927
to 2013. In contrast to Hirschman, he found that, on average, costs overran by 39 percent and
benefits fell short by 10 percent.
!
11!
4. Ensure that you can address the problem objectively
Once you have a list of important problems, choose those that you could address without bias.
Aversion to disconfirming evidence is a common human trait. This effect was shown in Festinger,
Riecken, and Schacter's (1956) paper about a cult that predicted the end of the world. When it did
not end, the cult members became more confident in their belief that they could predict the end of
the world. In a related experiment, when subjects who believed that Jesus Christ was God were
given what they believed to be authentic evidence that he was not God, they increased their belief
that Christ was God, (Batson, 1975). In other words, stronger evidence led to increased resistance
to change.
It does little good to try to be as objective as possible. That is too vague. What information
would cause you to conclude that your favored hypothesis was inferior to other hypotheses?
Laboratory experiments by Koriat, Lichtenstein, and Fischhoff (1980) and Lord, Lepper, and
Preston (1984) found this approach helped.
If you cannot think of any information that would threaten belief in your preferred
hypothesis, work on a different problem.
5. If you need funding, ensure that you will nevertheless have control over all aspects of your
study
Researchers are responsible for all aspects of the design of their research. This includes
ensuring that the study is important, free of bias, truthful, cost effective, and ethical. Who else
could reasonably share that responsibility? Include a declaration of the authors’ responsibility for
a research work in the design or in an “authors’ notessection.
Some universities and departments, such as ours, provide faculty members with research
budgets to be allocated as they see fit. That arrangement reduces the pressure to obtain findings
that please a funder. If you require external funding to complete the research, explain to potential
funders that you must retain responsibility for the design of the research, and accept funding only
if you have the final say. Failure to do so could lead to ethical breaches as was shown by
Milgram’s (1969) iconic study in which “experimenters” believed that they were killing
subjectswhen the responsibility for their actions was in the hands of a higher authority. They
clearly acted in ways they would not have done had they regarded themselves as being
responsible for ethical treatment. This study was extended by many researchers (see Armstrong
1977).
Scientists who prefer to research problems of their own choosing and to design proper
research might consider working for organizations that do not receive U.S. federal research funds,
such as privately funded foundations. Alternatively, by conducting secondary analyses of
experimental data obtained by others, researchers can be subject to fewer restrictions on what
they can chose to study. Not-for profit organizations might also provide environments in which
researchers are less subject to government intrusion. Scientists might also consider employment
in a university in a country whose government does not try to control their research.
Designing a Study
The next three guidelines describe how to design experiments so as to help ensure objectivity.
6. Acquire existing knowledge about the problem
“If I have seen further, it is by standing on the shoulders of giants.” Isaac Newton
To contribute to useful scientific knowledge, researchers must first become knowledgeable
about what is already known, a process often referred to as an a priori analysis. As Lund et al.
(2016) write in their manifesto for evidence-based research, “embarking on research without
!
12!
reviewing systematically what is already known… is unethical, unscientific, and wasteful” (p. 4).
This requires a meta-analysis of the scientific literature. Meta-analyses have been shown to be
more valid and objective than traditional (narrative) reviews (Beaman 1991).
Use pre-specified search criteria to help ensure that the search will be comprehensive and
objective. Use findings only from scientific papers or insights logically deduced from prior
experimental findings or from what all scientists would agree to be self-evident knowledge. Thus,
ignore findings from advocacy research papers.
The most effective way to find relevant publications is to ask leading scientists in the area of
investigation to suggest relevant papers with experimental findings. Use the citations in those
papers to find additional papers.
The ready availability of regression analysis software has led many researchers to skip the
use of cumulative knowledge and instead to choose variables on the basis of statistically
significant correlations in the available data. Armstrong (1970) showed how easy it is to get
statistically significant findings from random numbers by using stepwise regression with standard
search rules. As shown in the review in Armstrong (2012b) variables should not be selected on
that basis. Unfortunately, Ziliak and McCloskey (2004) found that 32 percent of papers published
in the American Economic Review in the 1980s used statistical significance tests to select causal
variables. By the 1990s, the proportion had increased to 74 percent.
In recent years, researchers have turned to Internet searches to find prior knowledge. Given
the enormous pool of academic works available online, it is common to find many that seem
promising. In our many reviews, however, we have found that few of them provide useful
evidence. Moreover, relevant papers are overlooked. For example, Armstrong and Pagell (2003),
in a search for studies on forecasting methods, found that Google Scholar searches identified only
one-sixth of the papers that were eventually cited in their review.
Authors should help other researchers discover their papers by giving them descriptive titles,
rather than clever, complex, or mysterious ones. Most importantly, provide an abstract that
summarizes your paper. Abstracts should describe the findings, how they were obtained, and why
they are important. Armstrong and Pagell’s (2003) examination of abstracts in 69 papers in the
International Journal of Forecasting and 68 in the Journal of Forecasting, found that only 13
percent provided that information in their abstracts, even though the journals’ instructions to
authors specified that these should be included.
Advocacy research leads to biased searches for prior knowledge. For example, Gigerenzer
(2015) showed that the literature referred to by those urging governments to “nudge” citizens to
adopt preferred behaviorssuch as requiring people to actively opt out of an alternative that the
government has chosen for themoverlooked an extensive body of evidence that conflicts with
that recommendation.
7. Develop multiple reasonable hypotheses
In 1620, Francis Bacon advised researchers to consider “any contrary hypotheses that may be
imagined.” In 1890, Chamberlin observed that the fields of science that made the most progress
were those that tested all reasonable hypotheses. In 1964, Platt argued for more attention to
Chamberlain.
Ask others to suggest alternative hypotheses relevant to your problem. Seek out people who
have ideas and knowledge that differs from yours. Horwitz and Horwitz (2007) in their meta-
analysis, found that diversity of task related diversity improves the number and quality of
solutions. On the other hand, they found that bio-demographic diversity had small detrimental
effects.
Investigate which hypothesis provides the most cost-effective solution. If you pick an
important problem, any scientific finding from tests of alternative reasonable hypotheses will be
useful and deserves to be published.
!
13!
Kealey’s (1996, pp. 47-89) review of natural experiments supports Chamberlin’s conclusions
about the importance of multiple hypotheses. For example, agriculture showed little progress for
centuries. That changed in the early 1700s, when English landowners began to conduct
experiments to compare the effects of alternative ways of growing crops.
An audit of 120 empirical papers published in Management Science from 1955 to 1976 found
that only 22 percent used the method of multiple reasonable hypotheses (Armstrong, 1979).
Armstrong, Brodie, and Parsons (2001) found that while leading marketing scientists were of the
opinion that the method of multiple hypotheses is the best approach, their audit of 1,700 empirical
papers published in six leading marketing journals between 1984 to 1999 found that only 13
percent used multiple competing hypotheses. Of those that did, only 11 percent included
conditions. Thus, only one or two percent of the papers published in leading marketing journals
complied with these two aspects of the scientific method. Moreover, in some of the studies, the
hypotheses failed to encompass all reasonable hypotheses and some of the studies were likely to
have violated other scientific principles. Finally, some studies failed to address important
problems. Because of the violations, we expect that the percentage of useful scientific studies in
that study was a small fraction of one percent of the papers published in scientific journals in
management during that time period.
8. Design experiments with specified conditions to test hypotheses using observations from new
situations.
Experiments provide the only valid way to gain knowledge about causal factors. Non-
experimental data is akin to data from a poorly designed experiment, or one for which the
conditions are poorly known. There is no way to recover valid data from a badly designed
experiment, although hope springs eternal among statisticians.
Predictive validity is the strongest test when comparing different hypotheses. It requires that
the alternative hypotheses are tested using data other than those used to develop the hypotheses;
in other words, “out-of-sample testing.” Milton Friedman recommended such testing as an
important part of the scientific method (Friedman, 1953).
Experiments can take the form of laboratory or field studies. The latter may be either
controlled experiments or natural experiments. While laboratory experiments allow for better
control over the conditions and field experiments are more realistic, Locke’s (1986) comparative
study of laboratory and field experiments in 14 areas of organizational behavior concluded that
the findings were, in general, quite similar.
Natural experiments are strong on validity, but weak on reliabilitya natural experiment
with sample size of one, for example, should be assessed skeptically. They are most likely to
provide valid data when the change in a commonly accepted causal variable was due to factors
other than attempts to deliberately change the dependent variable. For example, the economic
freedoms of East Germans (the citizens of the German Democratic Republic) and North Koreans
were severely constrained by their governments relative to those of their former compatriots in
the West and South, respectively. Because their governments constrained their freedoms for
reasons other than economic growth policy, comparisons with their much freer neighbors provide
evidence on the effect that freedom has on economic growth.
As with natural experiments, quasi-experiments are characterized by control over most, but
not all key variables. Armstrong and Patnaik (2009) examined the directional consistency of the
effects of proposed causal variables for persuasion principles estimated from quasi-experimental
data when compared with estimates from controlled experiments. The sample sizes of the quasi-
experimental studies were small for each principle, ranging from 6 to 118 with an average of 31.
The directions of causal effects from quasi-experimental analyses were consistent with those from
field experiments for all seven principles for which such comparisons were possible as well as for
all 26 principles for which comparisons with laboratory experiments were available. They were
consistent with the directions of effects from meta-analyses for seven principles. In contrast,
!
14!
directional findings from non-experimental analyses of the persuasion principles were
inconsistent with one-third of the experimental findings.
Crucially, your experiments must be capable of disproving your hypotheses. Specify the
conditions for each experiment. Doing so will allow people to determine when the findings apply.
It also helps other researchers to test how the findings stand up under different conditions.
Findings obtained from experimental and non-experimental data on important issues often
differ. For example, non-experimental research suggests that consumer satisfaction surveys
improve consumers’ satisfaction. In contrast, a series of well-designed experiments by Ofir and
Simonson (2001) showed that such surveys harm customer satisfaction. In education, they harm
satisfaction and reduce learning (Armstrong, 2012b).
Analyses of natural experiments have found that regulation is harmful (Winston, 1993). That
conclusion is supported by experimental evidence even for situations where regulation might
seem likely to increase the general welfare, such as government programs to encourage
corporate social responsibility(Armstrong and Green, 2013).
Collecting Data
Scientists should ensure that their data are valid and reliable. They should inform readers
about any problems with the data. Furthermore, they should use all data that have been shown to
be valid and reliable, and nothing more. We stress “nothing more” because with the increasing
power of computers, analysts have been turning to data mining with “big data,” which is an
unscientific practice.
9. Obtain valid data
Validity is the extent to which the data measure the concept that they purport to measure.
Many economic disputes arise due to differences in how to measure concepts. For example, what
is the best way to measure economic inequality”?
Explain how you searched for and obtained data, and why you chose the data that you used.
Include all relevant data in your analysis and explain the strengths and weaknesses of each. When
there is more than one set of valid data, use them all.
Be skeptical of measurement procedures. Consider the measurement of global mean
temperature. The dominant measure consists of an average of daily high and low temperatures for
selected locations around the world from 1850. What would a skeptical scientist suggest? For
example, Las Vegas experienced an increasing trend in unadjusted daily average temperatures
from 1937 to 2012. When the maximum and minimum temperature series are analyzed
separately, however, the trend in the daily maximum temperature was downward over the period,
while the daily minimum temperature was up. One explanation is that buildings absorb heat
during the day and air conditioners release it at night (Watts, 2014).
10. Ensure that the data are reliable
Given valid data, the next issue is how to ensure that the data are reliable: Do repeated
measures produce the same results? For example, if the measure is based on expert judgments,
are the judgments similar across judges and across time? Have the measuring instruments
changed over time? Are the measurement instruments in good working order? Have any
unexplained revisions been made in the data? Measurement issues have led to substantial
differences among researchers on the issue of climate change, as described by Ball (2014).
!
15!
Analyzing Data
Scientists are responsible for ensuring that they know and use proper methods for analyzing
their data. Describe the procedures you will use to analyze the data before you start your analysis,
and record any changes in the data or procedures!as the project develops.
11. Use validated methods
Scientists are responsible for providing evidence that their methods have been validated for
the purpose for which they have been used, unless their validity is obvious. Many studies are
published without evidence on the validity of the methods used by the researchers, thereby
leaving readers unable to judge the value of the findings.
In particular, the statistical fit of a model to a set of data—a commonly-used testshould be
avoided. For example, Pant and Starbuck (1990) compared the fit and out-of-sample predictions
of 21 models. They found negative rank order correlations between model fit and the accuracy of
predictions from the models: r = -.11 for one-period and r = -.08 for six-periods ahead. Five other
comparative studies were found, and they also reached the conclusion that fit to the data is not a
valid criterion (Armstrong 2000).
Data mining, a technique that generally ignores prior evidence, relies on tests of statistical
significance, and includes irrelevant variable variables. The technique has been gaining adherents
over recent decades. The first author of Keogh and Kasetty’s (2003) review of research stated in
personal correspondence to us in 2015 that,although I read every paper on time-series data
mining, I have never seen a paper that convinced me that they were doing anything better than
random guessing for prediction. Maybe there is such a paper out there, but I doubt it.”
12. Use simple methods
“There is, perhaps, no beguilement more insidious and dangerous than
an elaborate and elegant mathematical process built upon unfortified premises.”
Chamberlin (1899, p. 890)
The call for simplicity in science goes back at least to Aristotle, but the 14th century
formulation, Occam’s razor, is more familiar (Charlesworth, 1956). The use of complex methods
reduces the ability of potential users and other researchers to understand what was done and
therefore to detect mistakes and assess uncertainty.
Green and Armstrong (2015) tested the value of simplicity by searching for published
forecasting studies that compared the out-of-sample accuracy of forecasts from simple methods
with those from methods that are more complex. That paper defines a simple method as one about
which forecast users understand the (1) procedures, (2) representation of prior knowledge in
models, (3) relationships among the model elements, and (4) relationships among models,
forecasts, and decisions. Simplicity improved forecast accuracy in all of the 32 papers
encompassing 97 comparisons; on average, it decreased forecast errors by 21 percent for the 25
papers that provided quantitative comparisons.
13. Use methods that incorporate cumulative knowledge
Prior knowledge can be used to identify causal factors and the directions of their effects, and
in some cases, to estimate the likely ranges of the effect sizes. Two ways to incorporate prior
knowledge on causal variables into a model of the problem being studied are to specify an index
model, or to decompose the problem into segments based on causal forces.
The index method was inspired by an approach to decision-making that Benjamin Franklin
used. It involves identifying all of the important evidence-basedor logically obviouscausal
variables, then examining which hypothesis does best on each variable, and then summing across
variables to determine which hypothesis is superior. The value of equal weights was shown in the
!
16!
review of evidence in Armstrong, Du, Green, and Graefe (2016) to provide more accurate out-of-
sample predictions than are obtained using regression weights. Gains in the out-of sample
predictive validity of the index method are greatest when all variables that are important are
included, which is typically far more variables than can be included in a model estimated using
regression analysis.
Segmentation models can be developed by decomposing the data on the basis of causal
priorities. Differing causal effects can then be accounted for in each segment. The approach
makes effective use of enormous samples and it avoids problems with inter-correlations and
interactions (Armstrong, 1985, Chapter 9).
14. Estimate effect sizes and confidence
When considering policy changes, researchers should estimate the size of the effects in causal
relationships. For example, how many people would be killed if right-turn-on-red traffic signals
were introduced versus not making that change? Hauer (2004) describes how tests of statistical
significance led decision makers to ignore the evidence that more people were killed with the
right-turn-on-red rule.
Despite the utility of effect sizes, Ziliak and McCloskey (2008, Chapter 7) found in their
audit of empirical papers published in the American Economic Review in the 1980s that only 30
percent of papers distinguished statistical significance from economic significance. In the 1990s,
the figure had decreased to 21 percent.
Given the causal variables and their expected direction of effects, regression analysis can be
used to estimate effect sizes. However, one should test the estimated effect sizes against the use
of equal weights in out-of-sample predictions. Regression analyses are also helpful in estimating
confidence intervals.
15. Draw logical conclusions on the practical implications of findings from the tests of the
hypotheses
The conclusions should follow logically from the evidence provided by your findings and
cumulative knowledge. Consider rewriting your conclusions using symbols in order to check the
logic without emotion. For example, following Beardsley (1950, pp. 374-375), the argument “if
P, then Q. Not P, therefore not Q” is easily recognized as a logical fallacy“denying the
antecedent”but is hard to recognize when emotional terms are used instead of letters.
Writing a Scientific Paper
Document your prior knowledge and hypotheses by keeping electronic copies of your drafts.
They should provide a record of how your hypotheses changed, such as by discoveries in
previously overlooked research that would affect the conditions under which the various
hypotheses apply.
16. Disclose research hypotheses, procedures, and data
The responsibility for deciding what is relevant rests with the researchers. They best
understand what information should be included
Iqbal et al.’s (2016) review of publication practices in medical journals found that many
papers failed to provide full disclosure of the method and data. The problem of incomplete
disclosure is also common in applied economics and the social sciences (Hubbard, 2016, pp.147-
153). Direct replication is impossible without full disclosure of the methods and data. Journals
should require this as a condition for publication as a scientific paper.
Describe how you searched for cumulative knowledge, designed your experimente.g., how
you ensured that you tested alternative hypotheses that others would consider reasonable
analyzed the findings using validated methods, and so on. Describe the steps you took to find
!
17!
evidence that might conflict with your preferred hypothesis. Address any issues in your paper that
might cause concern to a reader.
Include in your submission letter to a journal or in your paper an “oath” that you have
followed proper scientific methods. When people are mindful of their own standards, they try to
live up to them. Armstrong (2010, pp. 89-94) summarized the evidence. For example, in one
experiment, subjects were paid according to the number of correct answers on a task involving a series of
puzzles. The task included the opportunity to falsify their report on how many puzzles they solved. Most of
those in the control groups cheated, but none of the subjects who had been asked to write as many of the
Ten Commandments as they could remember just before taking the test did (Mazar, Amir, and Ariely,
2008).
Researchers should not include information that would be useless, harmful, confusing or
misleading. For example, the insistence by journals on disclosure of all sources of funding
while presumably intended to improve the reporting of scienceis, in practice, expected to be
harmful. In their review of experimental studies on mandatory disclosures, Ben-Shahar and
Schneider (2014) found that they confuse the people they are intended to benefit and harm their
decision-making. Think of a scientist who needs funding to run experiments to assess the net
benefit of a government policy compared to feasible alternatives. Donors might be willing to
help, but not if doing so would lead to them being subject to censure, boycotts of their businesses,
attacks on their websites, and demands for the prosecution of the scientists by politicians.
Scientists should design their study to avoid bias, such as by using the method of multiple
hypotheses. If the scientist is confident that the procedures control for bias, there is no need to
mention the source of funding. Scientists should design their study to avoid bias, such as by using
the method of multiple hypotheses. If the scientist is confident that the procedures control for
bias, there is not need to mention the source of funding.
It is rare for scientists to behave in an unethical manner because the vast majority of them
want to contribute to science. In addition, if unethical behavior is discovered, they will probably
need to find a new career. Science has an effective alternative to mandatory disclosures. If you
are skeptical of a study’s findings for whatever reasons, conduct a direct replication. If the
scientists responsible for the original study fail to providing the necessary materials, report that as
a failure to follow a required scientific procedure, but do not publically accuse them of unethical
behavior because an omission could be due to an unintended error or to a misunderstanding on
your part, or it might lead to a libel case against you, or other such reasons described in
Armstrong (1986).
In practice, hypotheses are often developed after analyzing the data. Again, science provides
a solution—keep a log to track important changes in your hypotheses or procedures. Doing so
may also resolve disputes as to who made a discovery. For example, Alexander Graham Bell’s
log for his telephone experiments had a two-week gap. The log picked up with a proposal for a
new approach that was almost identical to an application to the U.S. patent office by another
inventor on the same day. The inventor who had applied for the patent sued, but the courts
concluded there was not sufficient evidence to convict Bell of stealing the patent. Shulman (2008)
eventually discovered the missing pages from Bell’s log, and concluded that Bell had stolen the
patent.
17. Cite all relevant scientific papers when presenting evidence
Citations in scientific papers often imply evidence. Give readers an indication of what
evidence they would find in the cited work.
Do not cite advocacy research as scientific evidence. Kabat (2008) concluded that the use of
the advocacy method in studies on health risks is harmful to science as such studies find many
false relationships and thereby mislead researchers, doctors, patients, and the public.
If a cited paper provides only opinions, make that clear to readers. Limit the space given to
opinions. By doing so, you will be able to shorten your paper and add force to your findings.
!
18!
18. Ensure summaries of prior findings that you cite are correct
Include a statement in your paper verifying that at least one author has read each of the works
cited. Contact authors of papers that you cite in a substantive way. Send them your paper and ask
if you have described their findings correctly, if your citation and reference are correct, and
whether you have overlooked any relevant papers.
Why? Because authors often make mistakes in referencing and provide incorrect summaries
of other researchers’ findings. We have been following the practice we describe for years. Many
researchers reply with important corrections or suggestions for improvements, and often with
references for relevant studies. And they thank us for checking with them.
Evans, Nadjari, and Burchell (1990) found a 48 percent error rate in the references in three
medical journals. They wrote, “a detailed analysis of quotation errors raises doubts in many cases
that the original reference was read by the authors.” In addition, Eichorn and Yankauer (1987)
found that authors’ descriptions of cited studies differed from the original authors’ interpretations
for 30 percent of the papers and half of those descriptions were unrelated to the authors’
contentions.
Wright and Armstrong’s (2008) audit found that 98 percent of a sample of 50 papers citing
Armstrong and Overton (1977) did so incorrectly. Only one of the thousands of researchers who
cited the 1977 paper had asked the authors if they had used the paper’s findings correctly.
Harzing (2002) provided 12 guidelines for referencing papers that few researchers would
disagree with: Reproduce the correct reference, refer to the correct publication, do not use “empty
references” (i.e., those that contain no evidence), use reliable sources, use generalizable sources
for generalized statements, do not misrepresent the content of the reference, make clear which
references support which conclusions, do not copy someone else’s references, do not cite out-of-
date references, do not be impressed by top journals, do not try to reconcile conflicting evidence,
and actively search for counter-evidence. Harzing’s analysis in one research area found that many
of these guidelines were violated.
19. Explain why your findings are useful
Authors must try to convince readers that their findings are a useful addition to existing
knowledge. In other words, answer the “so what?question. Be specific about how the findings
can be used to improve understanding of causal factors, prediction, decision-making, policy, or
methods and procedures, compared with what was already known.
For a scientific finding to be useful, the problem must be important (see Guideline 1). Some
problems jump off the page as being important, such as Milgram’s question on whether people
might act irresponsibly if an authority takes responsibility from individuals. We think there are
many problems that are important such as “Is there any scientific evidence that government
regulations have provided better long-term outcomes than a free market?”, or “What is the best
level of CEO remuneration to maximize a firm’s long-term profitability.”
The usefulness of findings may also depend on the size of the effect. For example, Milgram’s
obedience experiments (Milgram 1969) provided evidence that the size of the obedience to
authority effect is large. Knowing the effect size means that evidence-based cost-benefit analyses
are possible.
Milgram’s (1969) study also shows the importance of surprise as a way to demonstrate
usefulness. Show the design of your experiment to people who make decisions that might be
affected by your findings and ask them to predict your findings. If their predictions are wrong, the
findings should be useful to them. However, do not ask decision makers if they are surprised after
you have told them your findings: Three experiments by Slovic and Fischhoff (1977) showed that
researchers will seldom express surprise, no matter what the findings.
In an attempt to demonstrate the value of academic research on consumer behavior,
Armstrong (1991) used 20 empirical papers from the Journal of Consumer Research (JCR). Each
!
19!
paper described how the researchers’ hypotheses were based on their reviews of prior research.
All of their favored hypotheses were supported by statistically significant findings. Descriptions
of a sample of hypotheses from the 20 studies were presented to 16 academics randomly sampled
from the membership of the Association for Consumer Research, 12 marketing practitioners, and
43 high-school students. In all, they made 1,736 directional predictions about the studiesfindings
on 105 hypotheses. The practitioners were correct on 58 percent of the hypotheses, students on 57
percent, and academics on 51 percent. That finding failed to support the hypothesis that the 20
JCR papers contributed to useful knowledge about consumer behavior. These findings were
surprising to the author, many academicsand to the public. in the opinion of the Wall Street
Journal.
Emphasize the importance of the paper in the title, Most important, describe how your
findings are useful in the abstract. To do so, put yourself in the role of a busy potential reader
who, if the title has caught his attention, will read only the abstract to determine whether the
paper is useful to him. Finally, discuss the importance of the findings in the conclusions section
of your paper, by explaining who can use them to improve on their current practices and how.
If you cannot show that the paper is useful, do not publish it. When the first author started his
career, his first submission involved sophisticated statistical analyses of a large data set. It was
accepted by the leading journal in the field. However, in the time from submission to acceptance,
he became skeptical that the analyses, while technically correct, were of any use. As a result, he
withdrew his name from the paper. The paper was published and it proved to be of no apparent
value.
20. Write clearly and succinctly for the widest audience for whom the findings might be useful
Scientists should seek a large audience of those who might be able to use the findings of their
research. Clear writing helps. Evidence-based principles for writing persuasive reports are
provided at Persuasive Reports Checklist available from the advertisingprinciples.com Internet
site.
Write objectively. Take care in using adjectives and adverbs to ensure that they are necessary
and that they avoid an evaluative tone.
Use common words. Use short sentences. Avoid complex mathematical notation. Round
numbers to the make it easier to read and rememberand to avoid implying a high degree of
precision. Reduce the number of words without eliminating important contentHaslam (2010)
found that shorter papers get more citations per page.
Revise often in order to reduce errors and length, and to improve clarity. Typically, we revise
our papers more than 100 times: the more important the paper, the more revisions. For example,
over a three-year period, we worked through 456 revisions of our paper, the “Golden Rule of
Forecasting” (Armstrong, Green, and Graefe, 2015).
Use editors to improve clarity and reduce length. We typically use several copy editors for
each paper.
When revising your own paper, print it out and then edit the printed copy. Doing so is more
effective than is editing the document on a computer screen. While we have found no direct
testing of that approach for editing papers, many experiments have found that comprehension and
retention are superior for print compared to on-screen reading (Jeong, 2012; Mangen, Walgermo
and Bronnick, 2013). Copy editors that we use tell us that editing printed copy is common
practice.
Unfortunately for researchers who wish to communicate their findings widely, complexity
impresses journal reviewers. Armstrong (1980b) found that academic reviewers rated the authors
of abstracts from published journal papers that were altered to be more complex as more
competent than the authors of abstracts altered to be simpler. In another experiment, reviewers
rated papers that included irrelevant complex mathematics more highly (Eriksson 2012). In an
experiment by Weisberg, et al. (2008), readers found papers that included irrelevant words related
!
20!
to neuroscience to be more convincing. While this impresses reviewers, it is of no value to those
who would like to use your research findings. In other words, when you have useful scientific
findings and want people to benefit, write clearly.
21. Obtain extensive peer review and editing before submitting a paper for publication
Contact reviewers individually. Mass appeals, such as posting a working paper on the
Internet, have rarely led to useful reviews for us, but direct requests have often been successful.
People tend to respond to requests if you are dealing with a problem that they believe to be
important. Ask for specific suggestions on ways to improve the paper. People asked to be in a
helping role respond differently than those asked to evaluate a paper. In our experience,
anonymous journal reviewers seldom offer useful advice, and what advice they provide is rarely
supported by evidence. On the other hand, when we ask for help from those we know to be
knowledgeable in the area, we get useful suggestions presented in a helpful manner.
While peer review helps to reduce errors, reviewers miss many important errors, as we
discuss later in the paper, so you will need to use many competent reviewers to reduce errors. We
suggest at least ten reviewers. If that seems excessive, consider that Frey (2003) acknowledged
the help of 50 reviewers.
Make an effort to find reviewers who are likely to dispute your findings. Ask them to direct
you to experimental evidence that refutes your conclusions. Grade yourself on how many of a
reviewer’s suggestions you were able to use.
Before submitting the paper, show the final version to the reviewers and ask if they would
like to be acknowledged. Acknowledging reviewers in your paper reassures readers. Also thank
all those that provided useful support, especially funders. If the funders wish to remain
anonymous, simply say “anonymous donors.” If a reader is concerned that fundingwhether the
source is identified or notmight have biased the research, the scientific response for the reader
is to conduct a replication.
Disseminating the Findings
As Benjamin Franklin implied, scientists should communicate their findings. Note, however,
that Franklin was referring to useful scientific findings.
Publishing in an academic journal is, on its own, insufficient. Try also to communicate your
findings to those who could use them. Do so in sufficient detail that users are able to understand
the procedures and the conditions under which the findings apply.
Scientists, and their funders, must be patient. The lead times for adoption of useful findings
are typically long in the social sciences. Paul Meehl’s (1950) finding that models are superior to
expert judgment for personnel selection was widely cited, confirmed by many other studies, and
recognized as one of the most useful findings in management, but almost half a century elapsed
before it gained acceptance in sports, where it was shown to be much more accurate than expert
judgments. Armstrong (2012c) describes the path to implementation in baseball. As far as we are
aware, universities, some of which teach Meehl’s findings, still seldom use them; nor do
corporations or governments.
22. Provide thorough responses to journal reviewers, including reasons for not following
suggestions
Provide detailed point-by-point responses to journal reviewers’ comments and suggestions. If
you believe that journal reviewers were wrong in their assessment of your paper, state your
objections to the journal editor in a calm manner. For example, run new experiments to test the
reviewer’s opinions. The first author’s experience has been that while doing so tends to upset
reviewers, some editors find it convincing.
!
21!
Frey (2003) suggests that scientists should not make changes suggested by reviewers if they
believe the changes would be harmful. In a survey of authors of papers in psychology journals by
Bradley (1981), 73 percent of the authors reported encountering invalid criticisms. To their credit,
92 percent of those authors did not make changes they knew to be wrong.
23. Challenge rejection, but only if your case is strong
Some research suggests that journal editors almost always go along with the journal
reviewers’ recommendations. Nevertheless, objecting has worked well for the first author, whose
most useful and surprising papers have almost all been initially rejected by journals. Despite the
large number of rejections, all of his important papers were eventually published in respectable
journals. Those experiences demonstrate the importance of persistence in challenging poor
reviews, submitting to other journals, and good luck in finding editors who do not blindly follow
reviewers’ recommendations.
24. Consider alternative ways to publish your findings
If you have a paper that you consider to be useful, send it to the relevant editor of the journal
of your choice and ask her if she would invite your paper without deferring to reviewers’
recommendations on whether to publish. By following this approach, you have not formally
submitted” your paper; thus you could make the offer to a number of journals at the same time,
but inform the editors that you have done so.
Why might the strategy work? Because, as described by Frey (2003), unlike journal
reviewers, journal editors have some interest in publishing useful papers.
If the editor agrees to your proposal, it is your responsibility to obtain reviews. Tell your
reviewers that the paper has been accepted, and ask them to send you suggestions for improving
the paper.
The journal ranking system creates long lead times for publishing in “top” journals in the
social and management sciences, and low probabilities for acceptance. Paul Meehl was reportedly
asked why he published in an obscure journal without a strong reputation as “peer reviewed.
Meehl responded that “it was late in his career, and he did not have the time nor patience to deal
with picky reviewers who were often poorly informed” (Lee Sechrest, as quoted in Gelman,
2015).
Consider writing a chapter in an edited book, a paper in an open access Internet journal, or a
monograph. Scientific books offer an opportunity to provide a complete review of research on a
given problem along with full disclosure and without the need to satisfy reviewers who wish to
enforce their views.
Books can provide readers with convenient access to the cumulative scientific knowledge on
a topic. On the negative side, scientific books are time-consuming for authors. The lead author
has published three books and, on average, each took about nine years to complete.
Avoid pop-management books. Typically, they are a reflection of the authors’ opinions and
experiences, rather than summaries of scientific knowledge. In their efforts to provide simple
messages, pop-management authors often fail to properly explain the conditions or the evidence
behind their conclusions. Armstrong (2011) found that students who had read relevant pop-
management books provided fewer correct such books.
25. Inform those who can use your findings
The primary responsibility for disseminating research findings falls upon the researcher. You
have the copyright to the working paper that you submit to a journal, and so you have the right to
post it on your website and on repositories such as SSRN, ResearchGate, RePEc, and Scholarly
Commons. Send copies to colleagues, researchers you cited in important ways, people who
helped on the paper, reviewers, and those who do research on the topic.
!
22!
Make the paper easy to obtain. Consider journals that support Green Open Access policies,
whereby the final accepted version of the working paper might be publishable after an embargo
period. Or pay for Gold Open Access whereby the paper can be freely downloaded from the
journal.
Seek coverage for your useful findings in the mass media. When doing so, be objective in
your description of the findings.
When presenting findings that call for substantial changes to current procedures, it is more
persuasive to present the findings softly and to let the data speak for you. Direct conclusions lead
people to defend their current beliefs and practices. For evidence on that point, see the chapter on
Resistance” in Armstrong (2010).
Citations of useful papers provide a good measure of dissemination. However, do not despair
when your most useful papers are cited less often than your other papers. A survey of 123 of the
most-cited biomedical scientists revealed that their most innovative and surprising papers had
lower citation rates than did their other papers (Ioannidis, 2014).
Criteria for Useful Scientific Findings
Much of the responsibility for creating an environment in which science can advance belongs
to the stakeholders who review the work of scientists with the purpose of funding, hiring,
promoting, or firing them. We have discussed the importance of eliminating incentives that drive
researchers to do unscientific and unimportant research. Here we are concerned with the positive
role of stakeholders in setting objectives and standards for conducting scientific research. We
propose the providing them with the “Guidelines for Scientists” checklist (Figure 1).
Stakeholders also need to review or audit research outputs in order determine whether they
meet the criteria for useful scientific findings. For that we developed a checklist for stakeholders
by using the “Criteria for Useful Scientific Findingschecklist. It consists of criteria for useful
scientific findings that that can be observed from the paper.
Researchers must convince raters that they have met the criteria. Raters should not give the
benefit of the doubt to a paper that lacks clarity or sufficient information.
All identifying information about the researchers should be removed from the paper before it
is provided to raters in an effort to avoid bias. The rater then uses the criteria checklist. If the
purpose of the audit is to improve the paper or to help the researcher, the ratings would be
provided to the researcher.
A software version of the checklist of “Criteria for Useful Scientific Findings”with
supporting material from the Guidelines for Scientists for raters, and provision for raters to make
suggestions on individual criteriawill be provided at guidelinesforscientists.com.
Stakeholders may vary in their assessments of the weights that each criterion should be given
when auditing different papers in different situations. In order to encourage fair ratings of
compliance with individual criteria, regardless of the situation, the software will not provide an
overall compliance score. The summary page of ratings and comments from the software should
be made available with any paper that is published or otherwise used so that other stakeholders
can decide for themselves whether the ratings indicate adequate compliance for their own
situation. Figure 2 provides an outline of the checklist.
!
23!
Figure 2
Outline Checklist of Criteria for Useful Scientific Findings
Paper title:
Reviewer:
Date:
/ /
Clear descriptions of the research process, findings, and conclusions are critical. The researcher must
convince you, the rater. If you are unsure from the description whether the research complies, check False.
(The section of this paper on “Writing the paper” provides guidance for researchers, who should rate their
own paper using this checklist before submitting.)
Criteria for Useful Scientific Findings (A-G)
Complies
True False
A. findings appear useful? True
o
1
Importance of research problem evidenced by title, abstract, result tables, or conclusions
o o
2
Improved understanding of causality, prediction, decision-making, policy, or methods clearly described
o o
B. comprehensive prior knowledge sought and used? True
o
1
Unbiased search for cumulative knowledge clearly described
o o
2
Use of cumulative knowledge about the problem in the research clearly described
o o
C. hypotheses, data, and procedures fully disclosed? True
o
1
Hypotheses disclosed
o o
2
Procedures disclosed
o o
3
Data disclosed
o o
4
Any other relevant information needed to allow replication disclosed
o o
D. data are valid and reliable? True
o
1
Unbiased search for all relevant and valid data clearly described
o o
2
Reliability of the data that were used described
o o
E. methods are valid and simple? True
o
1
Evidence that methods were validated for the purpose used, unless obvious
o o
2
Methods used all relevant and valid data
o o
3
Methods sufficiently simple for potential users of the findings to understand were used
o o
F. uses experimental evidence to test all reasonable alternative hypotheses? True
o
1
Objectivity demonstrated by research design that allowed favored hypothesis to be found inferior
o o
2
Experimental evidence used to compare all reasonable hypotheses under stated conditions
o o
3
Predictive validity of hypotheses tested using out-of-sample data
o o
G. conclusions are consistent with the evidence and logic? True
o
1
Conclusions drawn logically from the evidence
o o
J. Scott Armstrong and Kesten C. Green, December 6, 2016, v18
!
24!
To improve the reliability of the ratings, we suggest using three raters, and up to five if there
is little agreement. We estimate from our limited testing to date that it takes less than an hour to
learn to use the Guidelines for Science checklist and about 20 minutes to rate a paper. The time to
complete the evaluation of a paper will typically be about four hours, which is similar to the time
that academics spend on traditional reviews (Armstrong, 2003). Here, however, it is not necessary
to use academics for making the ratings.
Using the Criteria for Useful Scientific Findings
In this section, we make suggestions on how the primary stakeholdersresearchers, scientific
journals, universities and other research funders, andcan help to better achieve the objective of
discovering and disseminating useful scientific findings.
Researchers
Researchers in the management and social sciences can take action by demonstrating that
they are submitting papers that conform to scientific criteria. They can do that by submitting their
ratings from the Checklist of Criteria for Useful Scientific Findings along with their paper even if
the journal has no section for useful scientific findings.
To address stakeholder concerns about potential harm to research subjects, researchers should
consider providing a signed a statement to the effect that they have taken responsibility for
protecting participants in their experiments. For higher risk situations, researchers might want to
consider providing insurance against harm. If funding has been provided, researchers should
explain that they have retained the responsibility for compliance with science.
In our work as expert witnesses, for example, we only accept cases where we believe that the
client who asks us has what we believe to be the ethical position. We then give them our ethical
principles stating that we will comply with scientific procedures and testify only on what we
found. (A high percentage do not ask us to represent them at that point but some firms view this
as a good policy.)
Governments
Adam Smith wondered why Scotland’s relatively few academics were responsible for many
scientific advances during the Industrial Revolution, while England's larger number of academics
contributed little. He concluded that because the government provided them with generous
support, academics in England had little motivation to do useful research (Kealey 1996, pp. 60-
89). Modern universities around the world tend to be more like those of 18th century England than
those of 18th century Scotland. Should we expect different results?
Natural experiments have shown governments to be less effective than private enterprises at
delivering services, with the eventual consequence that privatizations became commonplace in
the late-20th and early-21st Centuries (Poole, 2008).
Governments are inclined to support advocacy, which is not consistent with the role of
representative government. Consider environmental alarms. A search identified 26 such alarms
over a period of two hundred years: Dangerous global cooling, and forests dying due to acid rain
are two examples. None of the 26 alarms was the product of scientific forecasting procedures.
Governments chose to support twenty-three of the alarms to the extent that they led to substantive
government taxes, spending, or regulation. In all cases, the alarming predictions were wrong. The
government actions were harmful in 20 cases, and of no benefit in any (Green and Armstrong,
2014).
Further, government-supported advocacy leads to the suppression of speech by scientists. The
response to Galileo’s calculations of the movement of planets is perhaps the best-known example.
In modern times, the U.S.S.R. government’s endorsement of Lysenko’s politically congenial but
flawed theories about plant breeding led to the persecution of scientists with contrary views
!
25!
(Miller, 1996). Currently, some scientists whose findings conflict with the U.S. government’s
position on the global warming alarm have been threatened, harassed, fired from government and
university positions, subjected to hacking of their websites, and threatened with prosecution under
racketeering (RICO) laws (see, e.g., Curry, 2015).
For centuries, scientists have been concerned about designing research studies that would
avoid harming participants in experiments. They know that ignoring the natural concern for the
welfare of others would lead to disgrace as a scientist and to exposure to lawsuits brought by
those who were harmed. We suggest that individual scientists are themselves best able to design
safeguards for their subjects. In contrast, regulators are not constrained in the same direct and
personal way. Moreover, if harm occurs they can claim that more regulation is needed. In other
words, failure can lead, perversely, to increased demand for the regulators’ services.
Starting in the mid-1960s, the U.S. government required “Institutional Review Boards”
(IRBs) to license and monitor research using human subjects in the health sciences. We fail to
understand why the government thought that research ethics would be improved by removing
responsibility from researchers. It runs counter to the findings in the many “blind obedience
studies initiated by Milgram. As Milgram’s (1969) experiments showed, while individuals are
reluctant to harm others, they can be induced to do so if they are directed by an authority.
Schrag (2010) concluded from his review of the process used to create IRBs that there was no
scientific evidence of any need for them in the social sciences, nor was there any analysis of
whether the regulations would improve upon the long-standing arrangement of holding
researchers responsible. Indeed, several surveys have failed to find evidence of substantive harm
from scientific research projects conducted without IRB involvement (Schrag, pp. 63-67). For
example, only three projects, in a study of 2,039 in non-biomedical fields, reported a breach of
confidentiality that harmed or embarrassed a subject. Schrag (2010) and Schneider (2015) both
show that the costs of IRBs are enormous, and neither of them were able to obtain evidence that
IRBs have had beneficial effects on long-term welfare. Schrag continues to describe the effects of
IRBs on his Institutional Review Blog.
Instead of evidence and analysis, the regulators relied on examples to justify IRB regulations.
Among the most unethical examples were the U.S. government experiments known as the
Tuskegee Syphilis Study; a radiation study where prisoners, mentally handicapped teenagers, and
newborn babies were injected with plutonium; and Germany’s biomedical experiments in Nazi
internment camps. Are there examples of experiments by individual scientists that involved
research that nearly all people would regard as highly unethical? We find it difficult to believe
that scientists would conduct such unethical projects without the support and protection of
government.
The regulations expanded until, from the late-1990s, nearly all researchers in institutions that
receive federal funding must receive permission from an IRB or a member of its staff if the study
involves human subjects. In effect, researchers must get permission to study a particular topic,
and obtain approval for the design and reporting of their study. The IRB requirements apply even
when the researcher does not receive government funding (Schneider, 2015, p. xix; Schrag, 2010,
187-192). While it does attract headlines, cheating is rare among scientists. For example, the rate
of journal retractions in medical research was around one in 10,000 from the 1970s to the year
2000, and only some of those retractions were due to cheating (Brembs, et al. 2013).
The application of IRB regulations to the social sciences seems especially odd. Schrag
(p.189) found that the extension of regulations that were written for medical science to the social
sciences was “so flawed that no one has been willing to take responsibility for it.”
Since the IRB regulations have been in place, little has been done by IRB administrators to
monitor their effect. For example, might the regulations have been responsible for the increase in
journal retractions by a factor of 20 between 2000 to 2011 (Brembs et al 2013)?
One effect of the IRB regulations has been to suppress scientists’ speech. For example,
Citizen A can ask Citizen B about her opinions, and a journalist can ask people about their
!
26!
opinions, but a researcher from a university must receive permission from the government in
order to ask people questions in a survey.
Kealey’s (1996) studies of natural experiments over history suggest that government directed
research not only changes and displaces research that would otherwise be conducted by the
private sector, it is more expensive and less successful. The findings of systematic analyses are
consistent with that conclusion. For example, Karpoff (2001) compared 35 government and 57
private missions conducted during the great age of Arctic exploration in the 19th Century. The
privately run expeditions were substantially safer, more successful, and less expensive than the
government-sponsored ones. In short, governments could best help to advance scientific
knowledge by eliminating funding for advocacy research, removing regulatory impediments, and
protecting scientists’ freedom of speech.
Universities and Other Funding Agencies
While we have seen no systematic research on the sources of useful scientific findings, we
have done research in many areas involving management and public policy. In each of those
areas, we have found much useful research. In our searches for evidence, we have found that
researchers in universities were responsible for the overwhelming majority of the useful scientific
research on management and in the social sciences.
We argue that universities should take the lead in the quest to increase the discovery and
dissemination of useful scientific knowledge. One way is to reject government funding tied to
advocacy research. Instead, provide funding that allows researchers to work on problems that
they judge they can best contribute useful knowledge to. Some universities do this, such as the
ones that employ us. For example, the first author has published over 200 papers and the second
author 34 papers that have been cited at least once on Google Scholar. Neither of us has received
grants for our research.
Researchers should be told that their task is to contribute to the discovery and dissemination
of useful scientific findings, as Benjamin Franklin advised. Explicit criteria for useful scientific
findings, such as our Figure 2, should be provided. Distracting incentivessuch as rewards for
grants, publications that do not report useful scientific findings and citations of such publications,
and media coverage that is unrelated to the scientific effortsshould be abandoned. In addition,
researchers should not be rewarded for publishing in highly ranked journals, Brembs et al’s
(2013) review found that papers in higher ranked journals were more likely to overestimate effect
size and more likely to be found to be fraudulent.
The Guidelines for Science checklist can help universities and other organizations identify
and hire researchers who have done scientific research, and reward those who continue to do so.
Funders of scientific research programs should specify that researchers engaged on the program
must use a checklist to ensure that they follow scientific procedures.
Scientific Journals
We believe that the most important function of scientific journals is to communicate useful
scientific findings. Communication requires clarity of thought and expression. Yet we have
shown that complex writing by experts has been used to persuade reviewers and readers that a
paper must have merit, in the manner of the weavers in the Hans Christian Andersen story of
“The Emperor’s New Clothes”. The strategy has often been used by those doing advocacy
research. Moreover, over the past half-century, that objective of communication seems to have
become subordinate to the use of journals to “certify” academic papers, and thereby to signal that
the researchers who wrote them are fit to obtain research grants and jobs as university professors.
According to Burnham (1990), mandatory journal peer review was not common until
sometime after World War II. Did mandatory journal peer review prove to be better that the
earlier procedure of editors making decisions and seeking advice from people they thought might
!
27!
be able to help? He did not find evidence that the prior system was faulty. The change seemed to
be due to the increase in number of papers submitted. In addition, since the reviews confer
certification, they must be “fair” to all. Thus, researcher who have established an excellent
reputation for their research should not be identified lest they get an advantage. An unusual
approach, given that most of us choose doctors and plumbers, for example, at least partly on the
basis of their reputations.
How good are academics as journal reviewers? Baxt, Waeckerie, Berlin, and Callaham
(1998) sent a fictitious paper with 10 major and 13 minor errors to 262 reviewers. Of that
number, 199 submitted reviews. On average, the reviewers identified only 23 percent of the
errors. They missed some big errors; for example, 68 percent of the reviewers did not realize that
the results did not support the conclusions.
In a similar study, Schroter et al. (2008) gave “journal reviewers” papers containing nine
major intentional errors. The typical reviewer found only 2.6 (29 percent) of the errors. But most
concerning is the study’s evidence reviewers seldom systematically assess whether submitted
papers present useful scientific findings.
In an infamous real life case, John Darsee, a highly regarded medical researcher at Emory and
then Harvard, admitted to a fabricating paper that he published. A committee was formed to
investigate. They concluded that he had fabricated data in 109 publications (conducted with 47
other researchers who were apparently unaware of the fabrications). The papers were published in
leading peer-reviewed journals. Many of the fabrications were preposterous, such as a paper
using data on a 17-year old father who had four children, ages 8, 7, 5, and 4.
Reviewers are often unaware of the research in the areas of the papers they are asked to
review. For example, Peters and Ceci (1982) resubmitted 12 papers to the same prestigious
psychology journal where they had been published a few years earlier. They reported on the times
that a) the paper was detected as having been previously published (25%); b) those not detected
were rejected (89%); and c) those that were rejected for failing to add anything new (0%).
A computer software (SCIgen) was created to randomly select complex words commonly
used in a topic area and to then use grammar rules to produceacademic papers.This was done
to test whether reviewers would accept complex senseless papers for conferences. The title of one
such paper was “Simulating Flip-flop Gates Using Peer-to-peer Methodologies.” Interestingly,
some were accepted. Later, some researchers used the program to pad their resumes by
submitting SCIgen papers to scientific journals. At least 120 SCIgen papers were published in
established peer-reviewed scientific journals before the practice was discovered (Lott, 2014).
PLOS, an online journal, resolves some problems for those doing scientific research. They
offer to publish all papers that meet their explicit criteria, and to do so rapidly. Their acceptance
rate of 70 percent is high relative to that of prestigious journals in the social and management
sciences. Five years after its introduction, almost 14,000 articles were published in PLOS, which
made it the largest journal in the world (Wikipedia). Many of their papers have been important
and widely read. The journal does well compared to established journals on the basis of citations.
Acceptance by PLOS is based on soundness”—which includes criteria that overlap with our
guidelines. For example, “provide enough detail to allow suitably skilled investigators to fully
replicate your study” and “the article is presented in an intelligible fashion and is written in
standard English” (PLoS ONE, 2016a & b). Most importantly, PLOS does not impose obvious
barriers to publication of useful scientific findings. PLOS’s soundness requirements do not,
however, directly assess objectivity and usefulness. Thus, soundbut useless papers might be
published, as might advocacy research.
We suggest that traditional high-status scientific journals add a section devoted touseful
scientific findings.” The aim would be to publish all relevant papers that comply with science and
to do so rapidly on the Internet. It would help journals by increasing the number of useful
scientific papers published and reduce the cost of processing a paper.
!
28!
To increase attention paid to papers presenting useful scientific findings, publishers could
insist that authors provide a structured abstract. The management journal publisher Emerald, for
example, does that.
Editors responsible for the useful scientific findings section of a journal could provide a
trained staff for rating papers for their “compliance to science.” Their reviews would require less
time than is needed for traditional reviewing. Raters should provide suggestions for improving the
paper’s compliance. For papers that fall short, authors could be invited to revise papers in
response to the rater’s suggestions to achieve compliance.
If a journal wishes to publish a paper that is objective and provides full disclosure, but is
questionable with respect to other aspects of compliance with science, the paper could be
published with its compliance-with-science scores and descriptions of its deficiencies. Readers
would then be well informed to judge how much confidence they should place in the findings.
To encourage useful scientific papers about important problems, editors can invite papers that
address specific problems. By “inviting” we mean that the paper will be published when the
researchers say it is ready. Inviting papers is a way for journal editors to publish more important
papers than would otherwise be the case, and to do so less expensively in that the authors are
responsible for obtaining reviews. In Armstrong and Pagell’s (2003) audit of 545 papers, invited
papers were 20 times more importantbased on the papers with useful scientific findings that
were published and the number of citations for those papersthan those submitted in the
traditional manner. By relying heavily on that strategy for the 1982 introduction of the Journal of
Forecasting, its impact factor for 1982 to 1983 was seventh among all journals in business,
management, and planning.
After publication, ongoing peer review could be provided for each paper in a moderated
website requiring civil discourse along with reviewer’s name, contact information, and access to a
resume. Reviewers should be required to verify that they read the paper in question and avoid
emotional and ad hominem arguments, opinions, and inappropriate language. These reviews
should be easily located, such as is done with Amazon.com reviews.
Journals should ask authors to obtain their own reviews. To the extent that journals use
external reviewers, they should be asked how a paper might be improved, rather than whether it
should be published.
While we are concerned in this paper with the publishing of papers that provide useful
scientific findings, leading journals should continue to have sections for other contributions such
as exploratory studies, expositions of theories, applications, opinions, editorials, obituaries,
tutorials, commentaries, book reviews, ethical issues, logical application of existing scientific
knowledge, corrections, think papers, announcements, identification of new problem that call for
research, and so on. The leading journals provide a forum for the cooperative effort to improve
science and to provide repositories for the cumulative knowledge base. These are all important
functions that cannot easily be replaced by new Internet journals.
Discussion
Identification of overlooked evidence, correction of any errors, and future research findings
will likely help improve the guidelines presented in this paper. In addition, researchers may find
there is a need to modify the operational procedures depending on the particular area of research.
That said, the scientific principles that underlie the guidelines apply to all areas of science.
The cost of using the checklists would be low in absolute terms, and low relative to current
methods for designing and evaluating research.
Getting evidence-based checklists adopted can be easy. In a previous study, we had no
trouble convincing people to use a 195-item checklist when we paid them a small fee for doing so
(Armstrong, et al. 2016). That experience suggests that researchers would comply with science
guidelines if universities, journals, and funders asked them to do so in clear operational terms.
!
29!
The Criteria for Useful Science checklist could be used by university departments, public
policy research organizations, to demonstrate that they produce useful scientific research; for
courts to assess the quality of evidence; for researchers to apply for jobs; for governments to
propose or revise policies; and for professional societies to develop standards. As with flying an
airplane, checklists only work if you provide a response for each item.
Summary and Conclusions
“The prospect of domination of the nation's scholars by Federal employment, project
allocations, and the power of money is ever present and is gravely to be regarded. Yet,
in holding scientific research and discovery in respect, as we should, we must also be
alert to the… danger that public policy could itself become
the captive of a scientific-technological elite.”
Dwight D. Eisenhower (1961)
No changes are needed for science as defined in this paper. The problem lies in practice of
science.
Research practice would improve substantially by eliminating funding of advocacy research.
Additional benefits would follow from the elimination of regulations that constrain scientists’
responsibilities incentives on scientists should be consistent with the discovery of useful scientific
research
Universities could improve scientific progress by discarding incentives that distract
researchers from the discovery of useful scientific knowledge. They could also reduce their
reliance on government funding of advocacy research, and seek funding that would allow
researchers to select useful problems that are suited to their interests and skills.
The checklist of “Guidelines for Scientists” can help scientists increase the supply of useful
research. Individual scientists can take immediate action. We expect that many journals would
welcome papers that follow scientific principles. Moreover, thanks to the Internet, opportunities
for publishing useful scientific findings are expanding.
TheCriteria for Useful Scientific Findings” can be used by funders, courts, journals,
government policy makers, and managers—to identify papers that provide useful scientific
findings. To promote advances in scientific knowledge, journal editors could devote a section to
papers with useful scientific findings, as judged by their compliance to science. Given the
Internet, journals could publish all compliant papers that are relevant to their field.
References
Abramowitz, S. I., Gomes, B., & Abramowitz, C. V. (1975). Publish or politic: Referee bias in manuscript
review. Journal of Applied Social Psychology, 5(3), 187-200.
Arkes, H. R., Gonzalez-Vallejo, C., Bonham, A. J., Kung, Y-H., & Bailey N. (2010). Assessing the merits
and faults of holistic and disaggregated judgments. Journal of Behavioral Decision Making, 23,
250-270.
Arkes, H. R., Shaffer, V. A., & Dawes, R. M. (2006). Comparing holistic and disaggregated ratings in the
evaluation of scientific presentations. Journal of Behavioral Decision Making, 19, 429-439.
Armstrong, J. S. (1970). How to avoid exploratory research, Journal of Advertising Research, 10(4), 27-30.
!
Armstrong, J. S. (1977). Social irresponsibility in management. Journal of Business Research 5, (3), 185-
213.
Armstrong, J. S. (1979). Advocacy and objectivity in science, Management Science, 25(5), 423-428.
Armstrong, J. S. (1980a). Advocacy as a scientific strategy: The Mitroff myth, Academy of Management,
Review, 5, 509-511.
Armstrong, J. S. (1980b). Unintelligible management research and academic prestige, Interfaces, 10, 80-86.
Armstrong, J. S. (1982), Barriers to scientific contributions: The author's formula, The Behavioral and
!
30!
Brain Sciences, 5, 197-199.
Armstrong, J. S. (1983). Cheating in management science, Interfaces, 13(4), 20-29.
Armstrong, J. S. (1985). Long-range forecasting: From crystal ball to computer, New York: John Wiley &
Sons.
Armstrong, J. S. (1991). Prediction of consumer behavior by experts and novices, Journal of Consumer
Research, 18 (September), 251-256.
Armstrong, J. S. (2000)). Principles of Forecasting. Kluwer Academic Publishers, Boston.
Armstrong, J. S. (2003). Discovery and communication of important marketing findings,” Journal of
Business Research, 56 ,69-84.
Armstrong, J. S. (2006). How to make better forecasts and decisions: Avoid face-to-face meetings,
Foresight: The International Journal of Applied Forecasting, 5(Fall), 3-15.
Armstrong, J. S. (2007a). Significance tests harm progress in forecasting. International Journal of
Forecasting, 23, 321-327.
Armstrong (2007b), Statistical significance tests are unnecessary even when properly done, International
Journal of Forecasting, 23, 335 - 336
Armstrong, J. S., (2010). Persuasive Advertising. Palgrave Macmillan, Hampshire, UK.
Armstrong, J. S. (2011). Evidence-based advertising: An application to persuasion, International Journal of
Advertising, 30(5), 743-767.
Armstrong, J. S. (2012a). Natural learning in higher education, Encyclopedia of the Sciences of Learning,
ed. by N. Seal, 2426-2433.
Armstrong, J. S., (2012b). Illusions in regression analysis. International Journal of Forecasting, 28, 689-
694.
Armstrong, J. S. (2012c). Predicting job performance: The money ball factor, Foresight, 25, 31-34
Armstrong, J. S., & Brodie, R. J. (1994). Effects of portfolio planning methods on decision making:
Experimental results, International Journal of Research in Marketing, 11, 73-84.
Armstrong, J. S., Brodie, R., & Parsons, A. (2001). Hypotheses in marketing science: Literature review
and publication audit, Marketing Letters, 12, 171-187.
Armstrong, J. S., & Collopy, F. (1996). Competitor orientation: Effects of objectives and information on
managerial decisions and profitability, Journal of Marketing Research, 33, 188-199.
Armstrong, J. S., Du, R., Green, K. C. & Graefe, A. (2016). Predictive validity of evidence-based
persuasion principles. European Journal of Marketing, 50, 276-293 (followed by Commentaries,
pp. 294-316).
Armstrong, J. S., & Green, K. C. (2007). Competitor-oriented objectives: The myth of market share,
International Journal of Business, 12, 117 -136.
Armstrong, J. S., & Green, K. C. (2013). Effects of corporate social responsibility and irresponsibility
policies: Conclusions from evidence-based research, Journal of Business Research, 66, 1922 -
1927.
Armstrong, J. S., Green, K. C., & Graefe, A. (2015). Golden rule of forecasting: Be conservative. Journal
of Business Research, 68, 1717-1731.
Armstrong, J. S., & Hubbard, R. (1991). Does the need for agreement among reviewers inhibit the
publication of controversial findings? Behavioral and Brain Sciences, 14, 136-137.
Armstrong, J. S., & Overton, T. S. (1977). Estimating nonresponse bias in mail surveys, Journal of
Marketing Research, 14, 396-402.
Armstrong, J. S., & Pagell, R. (2003). Reaping benefits from management research: Lessons from the
forecasting principles project, Interfaces, 33(6), 89-111.
Armstrong, J. S., & Patnaik, S. (2009). Using quasi-experimental data to develop principles for persuasive
advertising, Journal of Advertising Research, 49(2), 170-175.
Bacon, F. (1620). The New Organon: Or The True Directions Concerning the Interpretation of Nature.
Retrieved from http://www.constitution.org/bacon/nov_org.htm
Ball, T. (2014). The Deliberate Corruption of Climate Science. Seattle: Stairway Press.
Batson, C. D. (1975). Rational processing or rationalization? The effect of disconfirming information on a
stated religious belief. Journal of Personality and Social Psychology, 32(1), 176-184.
Baxt, W. G., Waeckerie, J. F., Berlin, J. A., & Callaham, M.L. (1998). Who reviews reviewers? Feasibility
of using a fictitious manuscript to evaluate peer reviewer performance. Annals of Emergency
Medicine, 32, 310-317.
!
31!
Beaman, A. L. (1991). “An empirical comparison of meta-analytic and traditional reviews,” Personality and Social
Psychology Bulletin, 17, 252-257.
Beardsley, M. C. (1950). Practical Logic. New York: Prentice Hall.
Bedeian, A. G., Taylor, S. G., & Miller, A. L. (2010). Management science on the credibility bubble:
Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9, 715-
725.
Ben-Shahar, O. & Schneider, C. E. (2014). More than you wanted to know: The failure of mandated
disclosure. Princeton: Princeton University Press.
Berelson, B., & Steiner, G.A. (1964). Human Behavior: An Inventory of Scientific Findings, New York:
Harcourt, Brace & World.
Bradley, J. V. (1981). Pernicious publication practices. Bulletin of the Psychonomic Society, 18, 31-
34.Brembs, B., Button, K., & Munafo, M. (2013), Deep impact: Unintended consequences of
journal rank. Frontiers in Human Neuroscience, 7, 1-12.
Brush, S, G (1974), The prayer test: The proposal of a scientificexperiment to determine the power of
prayer kindled a raging debate between Victorian men of science and theologians, American
Scientist, 62, No. 5 (September-October), 561-563.
Burnham, J. C. (1990). The evolution of editorial peer review. Journal of the American Medical Review,
263, 1323-1329.
Chamberlin, T. C. (1890). The method of multiple working hypotheses. Reprinted in 1965 in Science, 148,
754-759.
Chamberlin, T. C. (1899). Lord Kelvin’s address on the age of the Earth as an abode fitted for life. Science,
9 (235), 889-901.
Charlesworth, M. J. (1956). Aristotle’s razor. Philosophical Studies, 6, 105-112.
Curry, J. (2015). A new low in science: criminalizing climate change skeptics. FoxNews.com, September
28. Retrieved from http://www.foxnews.com/opinion/2015/09/28/new-low-in-science-
criminalizing-climate-change-skeptics.html
Doucouliagos, C., & Stanley, T. D. (2013). Are all economic facts greatly exaggerated? Theory
competition and selectivity. Journal of Economic Surveys, 27(2), 316-339.
Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P.E. (2015). Political diversity will
improve social psychological science. Behavioral and Brain Sciences, 38. DOI:
http://dx.doi.org/10.1017/S0140525X14000430
Eichorn, P., & Yankauer, A. (1987). Do authors check their references? A survey of accuracy of references
in three public health journals. American Journal of Public Health, 77, 1011-1012.
Eisenhower, D. D. (1961). Farewell address to the nation. Retrieved from
http://mcadams.posc.mu.edu/ike.htm
Eriksson, K. (2012). The nonsense math effect. Judgment and Decision Making, 7(6), 746-749.
Evans, J. T., Nadjari, H. I., & Burchell, S. A. (1990). Quotational and reference accuracy in surgical
journals: A continuing peer review problem. JAMA, 263(10), 1353-1354.
Festinger, L., Rieken, H. W., & Schachter, S. (1956). When Prophecy Fails. A Social and Psychological
Study of a Modern Group that Predicted the Destruction of the World. Minneapolis, MN:
University of Minnesota Press.
Flyvbjerg, B. (2016). The fallacy of beneficial ignorance: A test of Hirschman’s hiding hand. World
Development, 84, 176-189.
Franklin, B. (1743). A Proposal for Promoting Useful Knowledge. Founders Online, National Archives
(http://founders.archives.gov/documents/Franklin/01-02-02-0092 [last update: 2016-03-28]).
Source: The Papers of Benjamin Franklin, vol. 2, January 1, 1735, through December 31, 1744,
ed. L. W. Labaree. New Haven: Yale University Press, 1961, pp. 378-383.
Frey, B. S. (2003). Publishing as prostitution. Public Choice, 116, 205-223.
Friedman, M. (1953). The methodology of positive economics, from Essays in Positive Economics
reprinted in Hausman, D. M. (ed.) The philosophy of Economics: An anthology (3rd Ed.),
Cambridge: Cambridge University Press, 145-178.
Gelman, A. (2015). Paul Meehl continues to be the boss. Statistical Modeling, Causal Inference, and Social
Science, blog at andrewgelman.com, 23 March, 10:00 a.m.
Gigerenzer, G. (2015). On the supposed evidence for libertarian paternalism. Review of Philosophy and
Psychology, 6(3), 361-383.
!
32!
Goodstein, L. D., & Brazis, K. L. (1970). Psychology of scientist: XXX. Credibility of psychologists: an
empirical study. Psychological Reports, 27, 835-838.
Gordon G., & Marquis S. (1966). Freedom, visibility of consequences and scientific innovation. American
Journal of Sociology, 72, 195-202.
Green, K. C. (2002). Forecasting decisions in conflict situations: a comparison of game theory, role-
playing, and unaided judgement. International Journal of Forecasting, 18, 321-344.
Green, K. C. (2005). Game theory, simulated interaction, and unaided judgement for forecasting decisions
in conflicts: Further evidence. International Journal of Forecasting, 21, 463-472.
Green, K. C., & Armstrong, J. S. (2007). The value of expertise for forecasting decisions in conflicts.
Interfaces, 37, 287-299.
Green, K. C., & Armstrong, J. S. (2011). Role thinking: Standing in other people’s shoes to forecast
decisions in conflicts. International Journal of Forecasting, 27, 6980.
Green, K. C., & Armstrong, J. S. (2014). Forecasting global climate change, In A. Moran (Ed.), Climate
change: The facts 2014, pp. 170186, Melbourne: Institute of Public Affairs.
Green, K. C., & Armstrong, J. S. (2015). Simple versus complex forecasting: The evidence. Journal of
Business Research, 68, 1678-1685.
Hales, B. M., & Pronovost, P. J. (2006). The checklista tool for error management and performance
improvement. Journal of Critical Care, 21, 231-235.
Harzing, A. (2002). Are our referencing errors undermining our scholarship and credibility? The case of
expatriate failure rates. Journal of Organizational Behavior, 23, 127-148.
Haslam, N. (2010). Bite-size science: Relative impact of short article formats. Perspectives on
Psychological Science., 5, 263-264.
Hauer, E. (2004). The harm done by tests of significance. Accident Analysis and Prevention, 36, 495-500.
Haynes, A. B., Weiser, T. G., Berry, W. R., Lipsitz, S. R., Breizat, A.-H. S., Dellinger, E. P., Herbosa, T.,
Joseph, S., Kibatala, P. L., Lapitan, M. C. M., Merry, A. F., Moorthy, K., Reznick, R. K., Taylor,
B., & Gawande, A. A. (2009). A surgical safety checklist to reduce morbidity and mortality in a
global population. New England Journal of Medicine, 360(5), 491-499.
Hirschman, A. O. (1967). The principle of the hiding hand. The Public Interest, 6(Winter), 1-23.
Holub, H.W., Tappeiner, G. & Eberharter, V. (1991). The iron law of important articles. Southern
Economic Journal, 58, 317-328.
Horwitz, S. K., & Horwitz, I. B. (2007). The effects of team diversity on team outcomes: A meta-analytic
review of team demography. Journal of Management, 33, 987-1015.
Hubbard, R. (2016). Corrupt Research: The Case for Reconceptualizing Empirical Management and Social
Science. New York: Sage.
Hubbard, R., & Armstrong, J. S. (1992). Are null results becoming an endangered species in marketing?
Marketing Letters, 3, 127-136.
Hubbard, R., & Armstrong, J. S. (1997). Publication bias against null results. Psychological Reports, 80,
337-338.
Hunter, J. E. (1997). Needed: A ban on the significance test. Psychological Science, 8(1), 1-20.
Ioannidis, J. P. A. (2005), Why most published findings are false. PLOS Medicine, 2(8): e124. doi:
10.1371/journal.pmed.0020124
Ioannidis, J. P.A. (2014). Is your most cited work your best? Nature, 514, 561-2.
Ioannidis, J. P.A., Stanley, T. D., & Doucouliagos, H. (2015). The power of bias in economics research.
Deakin Economics Series working paper SWP 2016/1.
Iqbal, S. A., Wallach, J. D., Khoury, M. J., Schully, S. D., & Ioannidis, J. P. A. (2016) Reproducible
research practices and transparency across the biomedical literature. PLOS Biology,
14(1).doi:10.1371/journal.pbio.1002333
Iyengar, S. S., & Lepper, M. R. (2000). When choice is demotivating: Can one desire too much of a good
thing? Journal of personality and social psychology, 79, 995-1006.
Jacquart, P., & Armstrong, J. S. (2013). Are top executives paid enough? An evidence-based review.
Interfaces, 43, 580-589.
Jeong, H. (2012). A comparison of the influence of electronic books and paper books on reading
comprehension, eye fatigue, and perception. The Electronic Library, 30, 390-408.
John, L.K., Lowenstein, G., & Prelec, D. (2012). Matching the prevalence of questionable research
practices with incentives for truth telling. Psychological Science, 23, 524-532.
!
33!
Kabat, G. C. (2008). Hyping Health Risks: Environmental Hazards in Daily Life and the Science of
Epidemiology. New York: Columbia University Press.
Karpoff, J. M. (2001). Private versus public initiative in Arctic exploration: The effects of incentives and
organizational structure. Journal of Political Economy, 109, 38-78.
Kealey, T. (1996). The Economic Laws of Scientific Research. London: Macmillan.
Koehler J. J. (1996). The base rate fallacy reconsidered: Descriptive, normative, and methodological
challenges. Behavioral and Brain Sciences, 19, 1-53.
Koehler J. J. (1993). The influence of prior beliefs on scientific judgments of evidence quality.
Organizational Behavior and Human Decision Processes, 56, 28-55.
Keogh, E., & Kasetty, S. (2003). On the need for time series data mining benchmarks: A survey and
empirical demonstration. Data Mining and Knowledge Discovery, 7, 349371.
Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental
Social Psychology: Human Learning and Memory, 47, 1231-1234.
Langbert, M., Quain, A. J., & Klein, D. B. (2016). Faculty voter registration in economics, history, journalism, law,
and psychology. Econ Journal Watch, 13, 422-451.
Locke, E. A. (1986). Generalizing from Laboratory to Field Settings. Lexington, MA: Lexington Books.
Lord, C. G., Lepper, M. R., & Preston, E. (1984). Considering the opposite: A corrective strategy for social judgment.
Journal of Personality and Social Psychology, 47, 1231-1243.
Lott, J. R., Jr. (2010). More Guns, Less Crime: Understanding Crime and Gun Control Laws. 3rd ed. Chicago:
University of Chicago.
Lott, M. (2014). Over 100 published science journal articles just gibberish. FoxNews.com, March 01.
MacGregor, D. G. (2001). Decomposition for judgmental forecasting and estimation, in J. S. Armstrong, Principles of
Forecasting. London: Kluwer Academic Publishers, pp. 107-123.
Mackay, C. (1852). Memoirs of Extraordinary Popular Delusions & the Madness of Crowds. New York: Three
Rivers Press. London: Office of the National Illustrated Library. 1852.
Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer
review system. Cognitive Therapy and Research, 1, 161-175.
Mangen, A., Walgermo, B.R., & Bronnick, K. (2013). Reading linear texts on paper versus computer
screen: Effects on reading comprehension. International Journal of Educational Research, 58, 61-
68.
Mazar, N., Amir O., & Ariely, D. (2008). The dishonesty of honest people: A theory of self-concept
management. Journal of Marketing Research, 45, 633-644.
McShane, B. B. & Gal, D. (2015). Blinding us to the obvious? The effect of statistical training on the
evaluation of evidence. Management Science, 62, 1707-1718.
Meehl, P. E. (1950). Clinical versus statistical prediction: A theoretical analysis and a review of the
evidence. Minneapolis: University of Minnesota Press.
Miller, H. I. (1996). When politics drives science: Lysenko, Gore, and U.S. Biotechnology Policy. Social
Philosophy and Policy, 13, 96-112. doi:10.1017/S0265052500003472
Milgram, S. (1969). Obedience to Authority. New York: Harper & Row.
Mitroff, I. (1969). Fundamental issues in the simulation of human behavior: A case study of the strategy of
behavioral science. Management Science, 15, B635-B649.
Mitroff, I. (1972a), The myth of objectivity, or why science needs a new psychology of science.
Management Science, 18, B613-B618.
Mitroff, I. (1972b) The mythology of methodology: An essay on the nature of a feeling science. Theory &
Decision, 2, 274-290.
Moher, D., Hopewell, S., Schulz, K. F., et. al., CONSORT 2010 explanation and elaboration: Updated
guidelines for reporting parallel group randomised trials. British Medical Journal, 340:c869. doi:
10.1136/bmj.c869
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General
Psychology, 2(2), 175-220.
Ofir, C., & Simonson, I. (2001). In search of negative customer feedback: the effect of expecting to
evaluate on satisfaction evaluations. Journal of Marketing Research, 38, 170-182.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (2016). Realizing the full potential of psychometric meta-
analysis for a cumulative science and practice of human resource management. Human Resource
Management Review, http://dx.doi.org/10.1016/j.hrmr.2016.09.011
!
34!
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349,
aaac4716. DOI: 10.1126/science.aac4716
ORSA Committee on Professional Standards (1971). Guidelines for the practice of operations research.
Operations Research, 19(5), 1123-1258.
Pant, P. N., & Starbuck, W. H. (1990). Innocents in the forest: Forecasting and research methods. Journal
of Management, 16, 433-460.
Peters, D. P. and Ceci, S. J. (1982a). Peer-review practices of psychological journals: The fate of published
articles, submitted again. Behavioral and Brain Sciences, 5, 187-195.
Platt, J. R. (1964). Strong Inference. Science, 146(3642), 347-353.
PLOS ONE (2016a). Submission guidelines. Available at http://journals.plos.org/plosone/s/submission-
guidelines#loc-style-and-format
PLOS ONE (2016b). Criteria for publication. Available at http://journals.plos.org/plosone/s/criteria-for-
publication
Poole, R. W., Jr. (2008). Privatization. The Concise Encycopedia of Economics. Library of Economics and
Liberty [Online] available from http://www.econlib.org/library/Enc/Privatization.html; accessed
29 September 2016
Porter, M. (1980). Competitive Strategy: Techniques for Analyzing Industries and Competitors. New York:
Free Press.
Reiss, J., & Sprenger, J. (2014). scientific objectivity. The Stanford Encyclopedia of Philosophy (Summer
2016 Edition), Edward N. Zalta (ed.). View on 17 October 2016, at
http://plato.stanford.edu/archives/sum2016/entries/scientific-objectivity/
Routh, C. H. F. (1849). On the causes of the endemic puerperal fever of Vienna. Medico-Chirurgical
Transactions, 32, 27-40.
Scheibehenne, B., Greifeneder, R., & Todd, P. M. (2010). Can there ever be too many options? A meta-
analytic review of choice overload. Journal of Consumer Research, 37, 409-425.
Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of
significance testing in the analysis of research data, in Harlow, L. L., Mulaik, S. A. & Steiger, J.
H. What if there were no Significance Tests? London: Lawrence Erlbaun.
Schneider, C. E. (2015). The Censor’s Hand: The Misregulation of Human Subject Research. Cambridge,
Mass: The MIT Press.
Schrag, Z. M. (2010). Ethical imperialism: Institutional review boards and the social sciences. 1965-2009.
The Johns Hopkins University Press: Baltimore, MD.
Schroter, S., Black, N., Evans, S., Godlee, F., Osorio, L., & Smith, R. (2008). What errors do peer reviewer
detect, and does training improve their ability to detect them? Journal of the Royal Society of
Medicine, 101, 507-514. doi: 10.1258/jrsm.2008.080062
Schulz, K. F., Altman, D. G., & Moher, D., CONSORT Group (2010). CONSORT 2010 Statement:
Updated Guidelines for Reporting Parallel Group Randomised Trials. PLOS Medicine, 7(3),
e1000251. doi:10.1371/journal.pmed.1000251
scientific method, n (2014). In Oxford English Dictionary, Third Edition, Online version September 2016,
Oxford University Press. Viewed 17 October 2016.
Shulman, S. (2008). The Telephone Gambit: Chasing Alexander Graham Bell's Secret. New York: W.W.
Norton & Company.
Slovic, P., & Fishhoff, B. (1977). On the psychology of experimental surprises. Journal of Experimental
Psychology: Human Perception and Performance, 3, 544-551.
Smart, R. G. (1964). The importance of negative results in psychological research. Canadian Psychologist,
5, 225-232.
Stewart, W.W., & Feder, N. (1987), The integrity of the scientific literature, Nature, 325 (January 15), 207-
214.
Soyer, E., & Hogarth, R. M. (2012). The illusion of predictability: How regression statistics mislead
experts. International Journal of Forecasting, 28(3), 695-711.
Thornton, S. (2016). Karl Popper. The Stanford Encycopedia of Philosophy (Winter Edition), Zaita, E. N.
(ed.), http://plato.stanford.edu/entries/popper/
Watts, A. (2014). Why would climate skeptics hold a conference in HOT Las Vegas? WUWT: Watts Up
With That? Retrieved from https://wattsupwiththat.com/2014/06/24/why-would-climate-skeptics-
hold-a-conference-in-hot-las-vegas/
!
35!
Weisberg, D. S., Keil, F. C., Goodstein, J., Rawson, E., & Gray, J. R. (2008). The seductive allure of
neuroscience explanations. Journal of Cognitive Neuroscience, 20, 470-477.
Winston, C. (1993). Economic deregulation: Days of reckoning for microeconomists. Journal of Economic
Literature, 31, 1263-1289.
Wright, M., & Armstrong, J. S. (2008). Verification of citations: Fawlty towers of knowledge. Interfaces,
38, 125-139.
Young, N. S., Ioannidis, J. P. A., & Al-Ubaydli, O. (2008). Why current publication practices may distort
science. PLOS Medicine, 5(10), e201. doi:10.1371/journal.pmed.0050201
Ziliak, S. T., & McCloskey D. N. (2004). Size matters: The standard error of regressions in the American
Economic Review. The Journal of Socio-Economics, 33, 527546.
Ziliak S. T., & McCloskey, D. N. (2008). The Cult of Statistical Significance. University of Michigan: Ann
Arbor.
Total Words 19,970
Text only 16,600
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This article reviews experimental evidence on the effects of policies intended to promote behavior by firms that is more socially responsible and less socially irresponsible. Corporate social responsibility (CSR) can provide firms with opportunities for profit, but changes are likely to increase total welfare only if firms adopt them freely and without taxpayer subsidies. Mandated CSR circumvents people's own plans and preferences, distorts the allocation of resources, and increases the likelihood of irresponsible decisions. Evidence that government policies will increase welfare and a compelling argument that proven benefits outweigh reductions in freedom are necessary in order to justify CSR mandates. To date, this has apparently not been achieved. Corporate social irresponsibility (CSI) is concerned with whether firms undertake harmful actions that managers would be unwilling to undertake acting for themselves, or that a reasonable person would expect to cause substantive net harm when all parties are considered. Markets in which stakeholders are free to make decisions in their own interests provide some protection against CSI. Tort and contract law provide additional protection. Nevertheless, managers sometimes act irresponsibly. Codes of ethics that require fair treatment of stakeholders while pursuing long-term profit can reduce the risk of irresponsible decisions. Management support and stakeholder accounting are important for successful implementation. Firms may wish to consider these measures; many already have.
Article
Full-text available
A committee created a fictitious author, Ian Mitroff, who published a paper that violated scientific guidelines. The Mitroff paper recommended an advocacy strategy for scientific research; it said that scientists should vigorously defend their initial hypothesis. I use the advocacy strategy to scientifically prove that Mitroff does not exist.
Article
We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.
Book
An argument that the system of boards that license human-subject research is so fundamentally misconceived that it inevitably does more harm than good. Medical and social progress depend on research with human subjects. When that research is done in institutions getting federal money, it is regulated (often minutely) by federally required and supervised bureaucracies called “institutional review boards” (IRBs). Do—can—these IRBs do more harm than good? In The Censor's Hand, Schneider addresses this crucial but long-unasked question. Schneider answers the question by consulting a critical but ignored experience—the law's learning about regulation—and by amassing empirical evidence that is scattered around many literatures. He concludes that IRBs were fundamentally misconceived. Their usefulness to human subjects is doubtful, but they clearly delay, distort, and deter research that can save people's lives, soothe their suffering, and enhance their welfare. IRBs demonstrably make decisions poorly. They cannot be expected to make decisions well, for they lack the expertise, ethical principles, legal rules, effective procedures, and accountability essential to good regulation. And IRBs are censors in the place censorship is most damaging—universities. In sum, Schneider argues that IRBs are bad regulation that inescapably do more harm than good. They were an irreparable mistake that should be abandoned so that research can be conducted properly and regulated sensibly.