Article

Balanced versus Randomized Field Experiments in Economics: Why W. S. Gosset aka "Student" Matters

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Over the past decade randomized field experiments have gained prominence in the toolkit of empirical economics and policy making. In an article titled "Field Experiments in Economics: The Past, the Present, and the Future," Levitt and List (2009) make three main claims about the history, philosophy, and future of field experiments in economics. (1) They claim that field experiments in economics began in the 1920s and 1930s in agricultural work by Neyman and Fisher. (2) They claim that artificial randomization is essential for good experimental design because, they claim, randomization is the only valid justification for Student's test of significance. (3) They claim that decision-making in private sector firms will be advanced by partnering with economists doing randomized experiments. Several areas of research have been influenced by the article despite the absence of historical and methodological review. This paper seeks to fill that gap in the literature. The power and efficiency of balanced over random designs — discovered by William S. Gosset aka Student, and confirmed by Pearson, Neyman, Jeffreys, and others adopting a balanced, decision-theoretic and/or Bayesian approach to experiments — is not mentioned in the Levitt and List article. Neglect of Student is regrettable. A body of evidence descending from Student (1911) and extending to Heckman and Vytlacil (2007) suggests that artificial randomization is neither necessary nor sufficient for improving efficiency, identifying causal relationships, and discovering economically significant differences. Three easy ways to improve field experiments are proposed and briefly illustrated.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... We reviewed one of them in Section 3.2, which is that LFEs, a type of field experiment, have a historical and methodological origin distinct from RFEs. Economists have elaborated other reasons, including statistical reasons (Ziliak, 2014;Heckman, 1992;Heckman and Smith, 1995;Heckman, 2010) and more general considerations about the limited control afforded by RFEs (Harrison, 2005(Harrison, , 2014Ortmann, 2005), and the different research purposes that different experimental designs should serve (Harrison, 2005;List, 2007a). ...
... Ziliak (2014) provides a nuanced account of such a pre-history highlighting the role of Student. According to Ziliak (2014) this role is too often neglected, which Heckman's (1992; work continues. Both Ziliak and Heckmann highlight the absence of randomization in some social evaluation. ...
... 9 See , Grossman and Mackenzie (2005), Cartwright (2007), Ravallion (2009aRavallion ( ,b, 2012, Rodrik (2009), Barrett and Carter (2010), Deaton (2010a), Keane (2010), Baele (2013), Basu (2014), Mulligan (2014), Pritchett and Sandefur (2015), Favereau (2016), Ziliak andTeather-Posadas (2016), Hammer (2017), Deaton and Cartwright (2018), Gibson (2019), Pritchett (2020) and Young (2019). ...
... For an academic research group to get that far in 30 On the history of RCTs in US social policy see the discussions in Burtless (1995) and List and Rasul (2011). Other commentaries on the history of RCTs more generally can be found in Ziliak (2014) and Leigh (2018). 31 Banerjee et al. (2019) claim that the use of RCTs in development was "kick-started" in 1994 in a study by one of the authors (Kremer). ...
Article
Full-text available
In October 2019, Abhijit Banerjee, Esther Duflo, and Michael Kremer jointly won the 51st Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel "for their experimental approach to alleviating global poverty." But what is the exact scope of their experimental method, known as randomized control trials (RCTs)? Which sorts of questions are RCTs able to address and which do they fail to answer? This book provides answers to these questions, explaining how RCTs work, what they can achieve, why they sometimes fail, how they can be improved and why other methods are both useful and necessary. Chapters contributed by leading specialists in the field present a full and coherent picture of the main strengths and weaknesses of RCTs in the field of development. Looking beyond the epistemological, political, and ethical differences underlying many of the disagreements surrounding RCTs, it explores the implementation of RCTs on the ground, outside of their ideal theoretical conditions and reveals some unsuspected uses and effects, their disruptive potential, but also their political uses. The contributions uncover the implicit worldview that many RCTs draw on and disseminate, and probe the gap between the method's narrow scope and its success, while also proposing improvements and alternatives. This book warns against the potential dangers of their excessive use, arguing that the best use for RCTs is not necessarily that which immediately springs to mind, and offering opportunity to come to an informed and reasoned judgement on RCTs and what they can bring to development.
... One of these conditions is knowledge: if one doesn't know enough to implement the other methods of control, randomization is a viable alternative. But if one has this knowledge, other methods might yield better results (Savage, 1962;Ziliak, 2014). Further conditions are sample size, and whether one can repeat the assignment process. ...
Article
Full-text available
A lot of philosophy taught to science students consists of scientific methodology. But many philosophy of science textbooks have a fraught relationship with methodology, presenting it either a system of universal principles or entirely permeated by contingent factors not subject to normative assessment. In this paper, I argue for an alternative, heuristic perspective for teaching methodology: as fallible, purpose- and context-dependent, subject to cost-effectiveness considerations and systematically biased, but nevertheless subject to normative assessment. My pedagogical conclusion from this perspective is that philosophers should aim to teach science students heuristic reasoning: strategies of normative method choice appraisal that are sensitive to purposes, contexts, biases and cost-effectiveness considerations; and that we should do so by teaching them exemplars of such reasoning. I illustrate this proposal at the hand of three such exemplars, showing how they help students to appreciate the heuristic nature of both methods and methodology, and to normatively assess method choice in such circumstances.
... The sampling protocol described by [11] leveraged a manually collected dataset of approximately 800 fruit cluster sunlight exposure values, as determined by enhanced point quadrat analysis [12], to demonstrate that whole-block sample sizes could be reduced by up 60% compared to random sampling. Optimization of sampling operations using known spatial patterns can lead to a lower information cost [13]. ...
Article
Full-text available
Vineyards are sampled on multiple occasions during the growing season for a range of purposes, particularly to assess fruit maturation. The objective of this work was to determine if satellite normalized difference vegetation index (NDVI) vineyard images could be used to compute optimal spatially explicit sampling protocols for determining fruit maturation and quality, and minimize the number of locations physically sampled in a vineyard. An algorithm was designed to process Landsat images to locate three consecutive pixels that best represent the three quantile means representing the left tail, center, and right tail of the NDVI pixel population of a vineyard block. This new method (NDVI3) was compared to a commonly used method (CM8) and random sampling (R20) in 13 and 16 vineyard blocks in 2016 and 2017, respectively, in the Central Valley of California. Both NDVI3 and CM8 were highly correlated with R20 in pairwise comparisons of soluble sugars, pH, titratable acidity, and total anthocyanins. Kolmogorov-Smirnov tests indicated that NDVI pixels sampled via the NDVI3 method generally better represented the block population than pixels selected by CM8 or R20. Analysis of 24 blocks over a 3-year period indicated that sampling solutions were temporally stable.
... The literature on the precision of ATEs estimated from RCTs goes back to the very beginning. Gosset (writing as `Student') never accepted Fisher's arguments for randomization in agricultural field trials and argued convincingly that his own nonrandom designs for the placement of treatment and controls yielded more precise estimates of treatment effects (see Student (1938) and Ziliak (2014)). Gosset worked for Guinness where inefficiency meant lost revenue, so he had reasons to care, as should we. ...
Article
Full-text available
Randomized Controlled Trials (RCTs) are increasingly popular in the social sciences, not only in medicine. We argue that the lay public, and sometimes researchers, put too much trust in RCTs over other methods of investigation. Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) covariates. Finding out whether an estimate was generated by chance is more difficult than commonly believed. At best, an RCT yields an unbiased estimate, but this property is of limited practical value. Even then, estimates apply only to the sample selected for the trial, often no more than a convenience sample, and justification is required to extend the results to other groups, including any population to which the trial sample belongs, or to any individual, including an individual in the trial. Demanding 'external validity' is unhelpful because it expects too much of an RCT while undervaluing its potential contribution. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not 'what works', but 'why things work'.
... Yet it should be noted that this "randomization principle" is questionable. As been argued by Stephen Ziliak, randomization is not necessarily the most adequate method to conduct experiments, and is is very likely to have less statistical power than "balanced" designs (Ziliak et al., 2014;Ziliak and Teather-Posadas, 2016). Once again, these questions are usually not discussed in EE's literature. ...
... Lachin et al. (1988) showed that for small studies (i.e., n < 100 overall or within any principal group), imbalances that might affect power are more likely with complete or simple randomization. Balanced designs discovered by William S. Gosset aka "Student" on the other hand are known to be more powerful and hence are more efficient than randomized designs (Ziliak, 2014). A balanced design tries to control for factors that may be confounded with the outcome of interest hence leading to more valid inferences than those from a completely randomized design. ...
Article
Full-text available
Background: Batch effects in DNA methylation microarray experiments can lead to spurious results if not properly handled during the plating of samples. Methods: Two pilot studies examining the association of DNA methylation patterns across the genome with obesity in Samoan men were investigated for chip- and row-specific batch effects. For each study, the DNA of 46 obese men and 46 lean men were assayed using Illumina's Infinium HumanMethylation450 BeadChip. In the first study (Sample One), samples from obese and lean subjects were examined on separate chips. In the second study (Sample Two), the samples were balanced on the chips by lean/obese status, age group, and census region. We used methylumi, watermelon, and limma R packages, as well as ComBat, to analyze the data. Principal component analysis and linear regression were, respectively, employed to identify the top principal components and to test for their association with the batches and lean/obese status. To identify differentially methylated positions (DMPs) between obese and lean males at each locus, we used a moderated t-test. Results: Chip effects were effectively removed from Sample Two but not Sample One. In addition, dramatic differences were observed between the two sets of DMP results. After “removing” batch effects with ComBat, Sample One had 94,191 probes differentially methylated at a q-value threshold of 0.05 while Sample Two had zero differentially methylated probes. The disparate results from Sample One and Sample Two likely arise due to the confounding of lean/obese status with chip and row batch effects. Conclusion: Even the best possible statistical adjustments for batch effects may not completely remove them. Proper study design is vital for guarding against spurious findings due to such effects.
Book
Amidst concerns about replicability but also thanks to the professionalisation of labs, the rise of pre-registration, the switch to online experiments, and enhanced computational power, experimental economics is undergoing rapid changes. They all call for efficient designs and data analysis, that is, they require that, given the constraints on participants' time, experiments provide as rich information as possible. In this Element the authors explore some ways in which this goal may be reached.
Article
Statistical research on correlation with spatial data dates at least to Student's (W. S. Gosset's) 1914 paper on “the elimination of spurious correlation due to position in time and space.” Since 1968, much of this work has been organized around the concept of spatial autocorrelation (SA). A growing statistical literature is now organized around the concept of “spatial confounding” (SC) but is estranged from, and often at odds with, the SA literature and its history. The SC literature is producing new, sometimes flawed, statistical techniques such as Restricted Spatial Regression (RSR). This article brings the SC literature into conversation with the SA literature and provides a theoretically grounded review of the history of research on correlation with spatial data, explaining some of its implications for the the SC literature. The article builds upon principles of plausible inference to synthesize a guiding theoretical thread that runs throughout the SA literature. This leads to a concise theoretical critique of RSR and a clarification of the logic behind standard spatial‐statistical models.
Article
Full-text available
A crisis of validity has emerged from three related crises of science, that is, the crises of statistical significance and complete randomization, of replication, and of reproducibility. Guinnessometrics takes commonplace assumptions and methods of statistical science and stands them on their head, from little p-values to unstructured Big Data. Guinnessometrics focuses instead on the substantive significance which emerges from a small series of independent and economical yet balanced and repeated experiments. Originally developed and market-tested by William S. Gosset aka “Student” in his job as Head Experimental Brewer at the Guinness Brewery in Dublin, Gosset’s economic and common sense approach to statistical inference and scientific method has been unwisely neglected. In many areas of science and life, the 10 principles of Guinnessometrics or G-values outlined here can help. Other things equal, the larger the G-values, the better the science and judgment. By now a colleague, neighbor, or YouTube junkie has probably shown you one of those wacky psychology experiments in a video involving a gorilla, and testing the limits of human cognition. In one video, a person wearing a gorilla suit suddenly appears on the scene among humans, who are themselves engaged in some ordinary, mundane activity such as passing a basketball. The funny thing is, prankster researchers have discovered, when observers are asked to think about the mundane activity (such as by counting the number of observed passes of a basketball), the unexpected gorilla is frequently unseen (for discussion see Kahneman 2011 Kahneman, D. (2011), Thinking Fast and Slow, New York: Farrar, Straus and Giroux. [Google Scholar]). The gorilla is invisible. People don’t see it.
Chapter
Here are highlighted only a few of the most egregious and common mistakes made in modeling. Particular models are not emphasized so much as how model results should be communicated. The goal of probability models is to quantify uncertainty in an observable Y given assumptions or observations X. That and nothing more. This, and only this, form of model result should be presented. Regression is of paramount importance. The horrors to thought and clear reasoning committed in its name are legion. Scarcely any user of regression knows its limitations, mainly because of the fallacies of hypothesis testing and the over-certainty of parameter-based reporting. The Deadly Sin of Reification is detailed. The map is not the territory, though this fictional land is unfortunately where many choose to live. When the data do not match a theory, it is often the data that are suspected, not the theory. Models should never take the place of actual data, though they often do, particularly in time series. Risk is nearly always exaggerated. The fallacious belief that we can quantify the unquantifiable is responsible for scientism. “Smoothed” data is often given pride of place over actual observations. Over-certainty rampages across the land and leads to irreproducible results.
Article
Field trials and quasi-experiments are comparative tests in which we assess the effects of one intervention (or a set thereof) on a group of subjects as compared to another intervention on another group of similar characteristics. The main difference between field trials and quasi-experiments is in the way the interventions are assigned to the groups: in the former the allocation is randomized whereas in the latter it is not. We are going to see first the different roles played by randomization in medical experiments. Then we will discuss how controlled field trials, originating in psychology, spread to the social sciences throughout the twentieth century. Finally, we will show how the idea of a quasi-experiment appeared around a debate on what constitutes a valid test and what sort of controls guarantee it.
Article
Full-text available
Recent Monte Carlo work (Lusk and Norwood, 2005) on how to choose an experimental design for a discrete choice experiment appears to greatly simply this issue for applied researchers by showing that a number of commonly used designs generated unbiased estimates for models with both main effects only and main effects plus higher order terms and that random designs were more efficient than main effects designs. We show that these results are very specific to the particular utility functions examined and do not generalize well and inferences drawn about random designs are based on an implementation that is infeasible in the field. We show that parameter estimates obtained when one follows the recommendations made can be unidentified (although technically often estimable), inconsistent, and/or substantially inefficient.
Article
This paper is a practical guide (a toolkit) for researchers, students and practitioners wishing to introduce randomization as part of a research design in the field. It first covers the rationale for the use of randomization, as a solution to selection bias and a partial solution to publication biases. Second, it discusses various ways in which randomization can be practically introduced in a field setting. Third, it discusses design issues such as sample size requirements, stratification, level of randomization and data collection methods. Fourth, it discusses how to analyze data from randomized evaluations when there are departures from the basic framework. It reviews in particular how to handle imperfect compliance and externalities. Finally, it discusses some of the issues involved in drawing general conclusions from randomized evaluations, including the necessary use of theory as a guide when designing evaluations and interpreting results.
Chapter
The development of the analysis of variance and many of its applications is one of the main evidences of Fisher’s genius. In this lecture I have described some of Fisher’s papers on analysis of variance that particularly interested me. The first paper on this topic (with W.A. Mackenzie) appeared in 1923 [CP 32]. Two aspects of this paper are of historical interest. At that time Fisher did not fully understand the rules of the analysis of variance — his analysis is wrong — nor the role of randomization. Secondly, although the analysis of variance is closely tied to additive models, Fisher rejects the additive model in his first analysis of variance, proceeding to a multiplicative model as more reasonable.
Article
Sir Ronald Fisher is rightly regarded as the founder of the modern methods of design and analysis of experiments. It would be wrong, however, to imagine that there had been no development of experimental design before Fisher. In agricultural field trials, as in other experimental work, replication was often used to increase accuracy, and to give some indication of the reliability of the results. Various types of layout for replicated trials, some of which served their objective of further increasing accuracy reasonably well, had been devised from commonsense considerations; arrangement of the separate replicates in blocks was customary, and although some experimenters were in the habit of assigning the treatments systematically in the same order in each block, others adopted more sophisticated and ingenious arrangements designed to eliminate the effects of fertility gradients, etc. Furthermore, some statistically-minded agronomists had been making investigations on uniformity trial data to study the nature and magnitude of the errors in field trials. What was almost completely lacking was any coherent theory on the estimation of errors from the results themselves, except in the simple case of a comparison of two treatments only. It was generally recognised that an estimate of error could then be obtained from the variability of the treatment differences in the different replicates; indeed 'Student' [1908] had derived what is now known as the t-test for testing the significance of a mean or mean difference based on a few replicates, though as Fisher commented in his obituary of 'Student' [1939] this was received by the Pearsonian school with 'weighty apathy'.
Article
Student's exacting theory of errors, both random and real, marked a significant advance over ambiguous reports of plant life and fermentation asserted by chemists from Priestley and Lavoisier down to Pasteur and Johannsen, working at the Carlsberg Laboratory. One reason seems to be that William Sealy Gosset (1876–1937) aka “Student” – he of Student's t-table and test of statistical significance – rejected artificial rules about sample size, experimental design, and the level of significance, and took instead an economic approach to the logic of decisions made under uncertainty. In his job as Apprentice Brewer, Head Experimental Brewer, and finally Head Brewer of Guinness, Student produced small samples of experimental barley, malt, and hops, seeking guidance for industrial quality control and maximum expected profit at the large scale brewery. In the process Student invented or inspired half of modern statistics. This article draws on original archival evidence, shedding light on several core yet neglected aspects of Student's methods, that is, Guinnessometrics, not discussed by Ronald A. Fisher (1890–1962). The focus is on Student's small sample, economic approach to real error minimization, particularly in field and laboratory experiments he conducted on barley and malt, 1904 to 1937. Balanced designs of experiments, he found, are more efficient than random and have higher power to detect large and real treatment differences in a series of repeated and independent experiments. Student's world-class achievement poses a challenge to every science. Should statistical methods – such as the choice of sample size, experimental design, and level of significance – follow the purpose of the experiment, rather than the other way around? (JEL classification codes: C10, C90, C93, L66)
Article
Randomised control trials have become popular tools in development economics. The key idea is to exploit deliberate or naturally occurring randomisation of treatments in order to make causal inferences about "what works" to promote some development objective. The expression"what works" is crucial: the emphasis is on evidence-based conclusions that will have immediate policy use. No room for good intentions, wishful thinking, ideological biases, Washington Consensus, cost-benefit calculations or even parametric stochastic assumptions. A valuable byproduct has been the identification of questions that other methods might answer, or that subsequent randomised evaluations might address. An unattractive byproduct has been the dumbing down of econometric practice, the omission of any cost-benefit analytics and an arrogance towards other methodologies. Fortunately, the latter are gratuitous, and the former point towards important complementarities in methods to help address knotty, substantive issues in development economics. Copyright 2011 , Oxford University Press.
Article
Biometrics has done damage with levels of R or p or Student's t. The damage widened with R. A. Fisher's victory in the 1920s and 1930s in devising mechanical methods of "testing," against methods of common sense and scientific impact, "oomph." The scale along which one would measure oomph is particularly clear in bio-medical sciences: life or death. Cardiovascular epidemiology, to take one example, combines with gusto the "fallacy of the transposed conditional" and what we call the "sizeless stare" of statistical significance. Some medical editors have battled against the 5% philosophy, as did for example Kenneth Rothman, the founder of Epidemiology. And decades ago a sensible few in education, ecology, and sociology initiated a "significance test controversy." But grantors, journal referees, and tenure committees in the statistical sciences had faith that probability spaces can substitute for scientific judgment. A finding of p < . 05 is deemed to be "better" for variable X than p < .11 for variable Y. It is not. It depends on the oomph of X and Y—the effect size, size judged in the light of how much it matters for scientific or clinical purposes. In 1995 a Cancer Trialists' Collaborative Group, for example, came to a rare consensus on effect size: ten different studies had agreed that a certain drug for treating prostate cancer can increase patient survival by 12%. An eleventh study published in the New England Journal in 1998 dismissed the drug. The dismissal was based on a t test, not on what William Gosset (the "Student" of Student's t) had called, against R. A. Fisher's machinery, "real" error.
Article
Researchers who study punishment and social control, like those who study other social phenomena, typically seek to generalize their findings from the data they have to some larger context: in statistical jargon, they generalize from a sample to a population. Generalizations are one important product of empirical inquiry. Of course, the process by which the data are selected introduces uncertainty. Indeed, any given dataset is but one of many that could have been studied. If the dataset had been different, the statistical summaries would have been different, and so would the conclusions, at least by a little. How do we calibrate the uncertainty introduced by data collection? Nowadays, this question has become quite salient, and it is routinely answered using well-known methods of statistical inference, with standard errors, t-tests, and P-values, culminating in the "tabular asterisks" of Meehl (1978). These conventional answers, however, turn out to depend critically on certain rather restrictive assumptions, for instance, random sampling., When the data are generated by random sampling from a clearly defined population, and when the goal is to estimate population param-eters from sample statistics, statistical inference can be relatively straightforward. The usual textbook formulas apply; tests of statistical significance and confidence intervals follow. If the random-sampling assumptions do not apply, or the parameters are not clearly defined, or the inferences are to a population that is only vaguely defined, the calibration of uncertainty offered by contemporary statistical technique is in turn rather questionable., Thus, investigators who use conventional statistical technique
Article
Owing to the work of the International Statistical Institute,* and perhaps still more to personal achievements of Professor A.L. Bowley, the theory and the possibility of practical applications of the representative method has attracted the attention of many statisticians in different countries. Very probably this popularity of the representative method is also partly due to the general crisis, to the scarcity of money and to the necessity of carrying out statistical investigations connected with social life in a somewhat hasty way. The results are wanted in some few months, sometimes in a few weeks after the beginning of the work, and there is neither time nor money for an exhaustive research.
Article
A number of colleagues have made helpful criticism and comments. They certainly do not uniformly agree with my judgments and emphases, but my warm appreciation goes to Keith Baker, Albert Biderman, Richard Brown, K. Alexander Brownlee, Donald T. Campbell, William G. Cochran, Lee J. Cronbach, Cuthbert Daniel, F.N. David, Arthur P. Dempster, Churchill Eisenhart, Stephen E. Fienberg, David Finney, Milton Friedman, I.J. Good, Bernard G. Greenberg, N.T. Gridgeman, William Jaffé, Oscar Kempthorne, Erich L. Lehmann, Richard C. Lewontin, Donald MacKenzie, William G. Madow, Margaret E. Martin, Frederick Mosteller, Jerzy Neyman, John W. Pratt, Donald B. Rubin, I.R. Savage, Hilary L. Seal, Hanan Selvin, Oscar B. Sheynin, David L. Sills, Theodor D. Sterling, George Stigler, Stephen M. Stigler, Fred L. Strodtbeck, Alan Stuart, Judith M. Tanur, Ronald Thisted, Howard Wainer, Frank Yates, Arnold Zellner and Harriet Zuckerman.Joan Fisher Box's biography of R.A. Fisher presents a lively and detailed description of his life and scientific work, both in statistics and genetics. The book's greatest contribution is the background and motivation it provides in studying Fisher's ideas. The book's major weaknesses are its failure to confront fundamental paradoxes in Fisher's thought and its lack of precision at some points of exposition. The review includes discussions of randomization, significance testing, and eugenics.
Article
The present position of the art of field experimentation is one of rather special interest. For more than fifteen years the attention of agriculturalists has been turned to the errors of field experiments. During this period, experiments of the uniformity trial type have demonstrated the magnitude and ubiquity of that class of error which cannot be ascribed to carelessness in measuring the land or weighing the produce, and which is consequently described as due to “soil heterogeneity”; much ingenuity has been expended in devising plans for the proper arrangement of the plots; and not without result, for there can be little doubt that the standard of accuracy has been materially, though very irregularly, raised. What makes the present position interesting is that it is now possible to demonstrate (a) that the actual position of the problem is very much more intricate than was till recently imagined, but that realising this (b) the problem itself becomes much more definite and (c) its solution correspondingly more rigorous.
Article
This paper traces major features of the development of methods of field experimentation in agriculture from the work of the English agronomist Arthur Young in the 1960's to that of R.A. Fisher in the 1920's. Issues recognized by Young were the necessity that experiments be comparative, the importance of replication, the human tendency to bias, and the difficulties in making inferences beyond the results of an individual series of experiments. (Modified author abstract)
Book
"Billions of government dollars, and thousands of charitable organizations and NGOs, are dedicated to helping the world's poor. But much of the work they do is based on assumptions that are untested generalizations at best, flat out harmful misperceptions at worst. Banerjee and Duflo have pioneered the use of randomized control trials in development economics. Work based on these principles, supervised by the Poverty Action Lab at MIT, is being carried out in dozens of countries. Their work transforms certain presumptions: that microfinance is a cure-all, that schooling equals learning, that poverty at the level of 99 cents a day is just a more extreme version of the experience any of us have when our income falls uncomfortably low. Throughout, the authors emphasize that life for the poor is simply not like life for everyone else: it is a much more perilous adventure, denied many of the cushions and advantages that are routinely provided to the more affluent"--
Article
The magnitude of experimental error attaching to one or more field plots is a question of extreme importance in Agricultural Science, because upon its proper recognition depends the degree of confidence which may be attached to the results obtained in field work. A very cursory examination of the results of any set of field trials will serve to show that a pair of plots similarly treated may be expected to yield considerably different results, even when the soil appears to be uniform and the conditions under which the experiment is conducted are carefully designed to reduce errors in weighing and measurement.
Article
It is also suggested that accurate results may be obtained by employing large numbers of very small plots, even as small as one square yard. This method is useful for nursery work in testing the cropping power of new varieties of cereals where very little seed is available.
Article
It is not infrequently assumed that varieties of cultivated plants differ not only in their suitability to different climatic and soil conditions, but in their response to different manures. Since the experimental error of field experiments is often underestimated, this supposition affords a means of explaining discrepancies between the results of manurial experiments conducted with different varieties; in the absence of experimental evidence adequate to prove or disprove the supposed differences between varieties in their response to manures such explanations cannot be definitely set aside, although we very often suspect that the discrepancies are in reality due to the normal errors of field experiments.(Received March 20 1923)
Article
Background and Aims: Environmental variables within vineyards are spatially correlated, impacting the economic efficiency of cultural practices and accuracy of viticultural studies that utilise random sampling. This study aimed to test the performance of non-random sampling protocols that account for known spatial structures (‘spatially explicit protocols’) in reducing sampling requirements versus random sampling. Methods and Results: Canopy microclimate data were collected across multiple sites/seasons/training systems. Autocorrelation was found in all systems, with a periodicity generally corresponding to vine spacing. Three spatially explicit sampling models were developed to optimise the balance between minimum sample sizes and maximum fit to a known probability density function. A globally optimised explicit sampling (GOES) model, which performed multivariate optimisation to determine best-case sampling locations for measuring fruit exposure, reduced fruit cluster sample size requirements versus random sampling by up to 60%. Two stratified sampling protocols were derived from GOES solutions. Spatially weighted template sampling (STS) reduced sampling requirements up to 24% when based on probabilistic panel weighting (PW), and up to 21% when preferentially selecting specific locations within canopy architecture (AW). Conclusions: GOES, PW STS and AW STS each reduced required sample size versus random sampling. Comparative analyses suggested that optimal sampling strategies should simultaneously account for spatial variability at multiple scales. Significance of the Study: This study demonstrates that dynamically optimised sampling can decrease sample sizes required by researchers and/or wineries.
Article
“Matrixx's argument rests on the premise that statistical significance is the only reliable indication of causation. This premise is flawed.” So ruled the US Supreme Court this year in a judgment of huge significance. As a drug company learns that it must disclose even statistically insignificant side effects, Stephen T. Ziliak congratulates the judges, and two giants of statistics fight on from their graves.
Book
Based on two lectures presented as part of The Stone Lectures in Economics series, Arnold Zellner describes the structural econometric time series analysis (SEMTSA) approach to statistical and econometric modeling. Developed by Zellner and Franz Palm, the SEMTSA approach produces an understanding of the relationship of univariate and multivariate time series forecasting models and dynamic, time series structural econometric models. As scientists and decision-makers in industry and government world-wide adopt the Bayesian approach to scientific inference, decision-making and forecasting, Zellner offers an in-depth analysis and appreciation of this important paradigm shift. Finally Zellner discusses the alternative approaches to model building and looks at how the use and development of the SEMTSA approach has led to the production of a Marshallian Macroeconomic Model that will prove valuable to many. Written by one of the foremost practitioners of econometrics, this book will have wide academic and professional appeal.
Article
Agricultural economics that has played a prime role in economics for a long time is losing its importance more recently. The future of field experimental methods can be importantly engineered by agricultural and resource economists. The field experimenter does not exert the same degree of control over real markets as the scientist does in the lab. Agricultural economists have a comparative advantage in several factors instrumental in conducting cutting-edge field experimental research. Scholars working on agricultural issues once held a prominent role in conceptualizing key empirical approaches that impacted the economics profession more generally. It is time theorists combine with extension specialists to lend insights into both normative and positive issues of the day since the defeat of the Farm Management Movement. Much requires to be done whether field experiments revolve around exploring optimal incentive schemes of farm laborers in Europe, measuring demand elasticities for Bt cotton in the South.
Article
This special issue highlights an empirical approach that has increasingly grown in prominence in the last decade--field experiments. While field experiments can be used quite generally in economics to test theories' predictions, to measure key parameters, and to provide insights into the generalizability of empirical results, this special issue focuses on using field experiments to explore questions within the economics of charity. The issue contains six distinct field experimental studies that investigate various aspects associated with the economics of charitable giving. The issue also includes a fitting tribute to one of the earliest experimenters to depart from traditional lab methods, Peter Bohm, who curiously has not received deep credit or broad acclaim. Hopefully this issue will begin to rectify this oversight.
Article
Spatial heterogeneity in fields may affect the outcome of experiments. The conventional randomized allocation of treatments to plots may cause bias and variable precision in the presence of trends (including periodicity) and spatial autocorrelation. Agricultural scientists appear to mostly use conventional experimental designs that are susceptible to adverse affects from field variability. The objectives of this research were to (i) quantify the use of different experimental designs in agronomic field experiments, and (ii) develop spatially-balanced designs that are insensitive to the effects of both trends and spatial autocorrelation. A review was performed of all research efforts reported in Volumes 93–95 of the Agronomy Journal and the frequency of various experimental designs was determined. It showed that the vast majority (96.7%) of agronomic field experiments are implemented through Randomized Complete Block (RCB) designs. The method of simulated annealing was used to develop Spatially-Balanced Complete Block (SBCB) designs based on two objective functions: promoting spatial balance among treatment contrasts, and disallowing treatments to occur in the same position in different blocks, when possible. SBCB designs were successfully developed for designs up to 15 treatments and 15 replications. Square SBCB designs were realized as Latin Squares, and perfect spatial balance was obtained when feasible. SBCB designs are simple to implement, are analyzed through conventional ANOVAs, and provide protection against the adverse effects of spatial heterogeneity, while randomized allocation of treatments still ensures against user bias.