Article

Confusion of empirical and statistical aspects that lead to controversy

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Discusses the possibility of confusion between the statistical and empirical domains within behavioral science. Although the 2 domains influence one another, they are separate and obey different sets of rules. Examples of empirical–statistical confusion are encountered in the issue of 1- vs 2-tailed tests and the issue involved in the measurement-statistics controversy. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
For a wide range of tests of single-df hypotheses, the sample size needed to achieve 50% power is readily approximated by setting N such that a significance test conducted on data that fit one's assumptions perfectly just barely achieves statistical significance at one's chosen alpha level. If the effect size assumed in establishing one's N is the minimally important effect size (i.e., that effect size such that population differences or correlations smaller than that are not of any practical or theoretical significance, whether statistically significant or not), then 50% power is optimal, because the probability of rejecting the null hypothesis should be greater than .5 when the population difference is of practical or theoretical significance but lower than .5 when it is not. Moreover, the actual power of the test in this case will be considerably higher than .5, exceeding .95 for a population difference two or more times as large as the minimally important difference (MID). This minimally important difference significant (MIDS) criterion extends naturally to specific comparisons following (or substituting for) overall tests such as the ANOVA F and chi-square for contingency tables, although the power of the overall test (i.e., the probability of finding some statistically significant specific comparison) is considerably greater than .5 when the MIDS criterion is applied to the overall test. However, the proper focus for power computations is one or more specific comparisons (rather than the omnibus test), and the MIDS criterion is well suited to setting sample size on this basis. Whereas Nmids(the sample size specified by the MIDS criterion) is much too small for the case in which we wish to prove the modified H0 that there is no important population effect, it nonetheless provides a useful metric for specifying the necessary sample size. In particular, the sample size needed to have a 1 - ? probability that the (1 ? ?)-level confidence interval around one's population parameter includes no important departure from H0 is four times Nmids when H0 is true and approximately [4/(1 - b)2].NMIDS when b (the ratio between the actual population difference and the minimally important difference) is between zero and unity. The MIDS criterion for sample size provides a useful alternative to the methods currently most commonly employed and taught.
Article
Stevens' (1946) distinctions among nominal, ordinal, interval and ratio measurements have had a marked influence on researchers of social, psychological and statistical sciences. This paper reviews those distinctions and includes some comments concerning the applicability of different statistical techniques to each scale. A detailed bibliography is also provided.
Article
Full-text available
Comments on an article by J. Gaito (see record 1980-22405-001) in which Gaito assailed the proposition that fundamental measurement theory has any relevance to statistical theory and procedures. The present authors contend that Gaito's arguments contain nothing that constitutes a logical or empirical refutation of the tenets of fundamental measurement theory or its implications for statistical analyses of data. Examples of statistical pitfalls to be met in ignorance of measurement considerations are given. (9 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
S. S. Stevens (1946) stated that there is a relationship between psychological measurement scales and statistical procedures such that parametric techniques require the presence of at least interval scale data. This idea was incorporated into numerous statistics books, but has been attacked and shown to be fallacious. This problem is reviewed, and measurement scales and statistical aspects are considered. The misconception was previously and is presently based on a confusion between measurement theory and statistical theory. For statistical tests of null hypothesis, "the numbers do not know where they came from." (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
"The failure, among psychologists, to utilize the one-tailed statistical test, where it is appropriate, very likely is due to the propagation of the two-tailed model by writers of text-books in psychological statistics. It is typical, in such texts, to find little or no attention given to one-tailed tests. Since the test of the null hypothesis against a one-sided alternative is the most powerful test for all directional hypotheses, it is strongly recommended that the one-tailed model be adopted wherever its use is appropriate." (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Tells a story about Professor X who was given an assignment to distribute "football numbers" to his university. Professor X liked dealing with cardinal numbers not ordinal numbers because cardinal numbers had rules that could be followed and obeyed. After a dispute about the sample and distribution of numbers he realizes that "football numbers" obey the same laws of sampling as they would if they were real honest-to-God cardinal numbers. The next year, he thinks, he will arrange things so that the population distribution of his "football numbers" is approximately normal. Then the means and standard deviations that he calculates from these numbers will obey the usual mathematical relations that have been proven to be applicable to random samples from any normal population. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Discusses some problems generated by S. S. Stevens's (1946) pronouncement that measurement scales (nominal, ordinal, interval, ratio) determine specific statistical procedures. It is argued that the Stevens approach may lead to the introduction of irrelevant empirical considerations within conclusions emanating from a statistical analysis. Such pronouncements are faced with certain logical inconsistencies. Within a statistical analysis there are different contexts or levels of number analysis of different scale nature; yet these differences are not considered in the Stevens approach. The Stevens admonitions can impede progress with theoretical and/or empirical problems, as illustrated by an example from intelligence measurement research. (French abstract) (21 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Issue is taken with Burke's position (see 28: 5211) that the user of a one-tailed test be willing to publicly defend the proposition of no difference when there is actually a large difference in a reverse direction to that predicted. This is not necessary, since the hypothesis correctly stated tests the null (zero or negative differences) against the alternative positive difference. The remainder of Burke's points are pragmatic and valid only if it is assumed that every application of one-tailed tests is an abuse of experimental methodology.
Article
Statistical methods begin and end with numbers. The use of sample mean and standard deviation does no violence to data, whatever the properties of the measurement scale.
Article
Electric threshold (occurrence of phosphenes) as a function of photic stimulation constitutes the bulk of the data. The results of Motokawa and his collaborators are organized under the headings: (1) excitability (sensitization, adaptation level, time, position, intensity); (2) color discrimination (deficiency, wave length, intensity, inhibition, microstimulation); (3) summation, contrast and optical illusions; (4) the stimulus strength-frequency relationships; (5) a new measure of general fatigue. There is critical discussion of techniques, measurements and sampling. 88-item bibliography.
Article
The wide-spread use of one-tailed tests as advocated by Jones (see 27: 35) and Marks (see 26: 634) can lead to serious abuses. Although their position is technically correct, indiscriminate application of one-tailed tests will lead to barren controversy; the discovery of new psychological phenomena will be hindered if the acceptance of the null hypothesis is automatic when very large differences in a direction opposite from the predicted one occur. (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
The following criteria are proposed as temporary guideposts for the use of one-tailed tests: "1. Use a one-tailed test when a difference in the unpredicted direction, while possible, would be psychologically meaningless 2. Use a one-tailed test when results in the unpredicted direction will, under no conditions, be used to determine a course of behavior different in any way from that determined by no difference at all 3. Use a one-tailed test when a directional hypothesis is deducible from psychological theory but results in the opposite direction are not deducible from coexisting psychological theory."
Article
Notes that there has been much controversy regarding the use of the one-tailed test of significance. The important question debated is not if it should be used, but rather when it should be used. Kimmel (1957; has recently attempted to resolve the controversy by suggesting criteria for the use of one-tailed tests. He maintains that one-tailed tests may be used when results in the opposite direction: (a) will not be used to determine any new course of behavior, (b) will be psychologically meaningless, or (c) cannot be deduced by any psychological theory, while an outcome in the expected direction can. It is these last two instances that will be dealt with in this paper. The current author suggests that the criteria of theoretical predictability and psychological meaninglessness are not as decisive as they may appear to be. He argues that the decision to use a one-tailed test should be made in light of the difficulties with which the investigator is confronted when the results occur in the "unexpected" direction.
Article
Time and value are related concepts that influence human behaviour. Although classical topics in human thinking throughout the ages, few environmental economic non-market valuation studies have attempted to link the two concepts. Economists have estimated non-market environmental values in monetary terms for over 30 years. This history of valuation provides an opportunity to compare value estimates and how valuation techniques have changed over time. This research aims to compare value estimates of benefits of a protected natural area. In 1978, Nadgee Nature Reserve on the far south coast of New South Wales was the focus of the first application of the contingent valuation method in Australia. This research aims to replicate that study using both the original 1978 contingent valuation method questionnaire and sampling technique, as well as state of the art non-market valuation tools. This replication will provide insights into the extent and direction of changes in environmental values over time. It will also highlight the impact on value estimates of methodological evolution. These insights will help make allocating resources more efficient.
Onthe theory of scalesof measurement
  • S S Stevens
STEVENS, S. S. (1946). Onthe theory of scalesof measurement. Science, 103, 677-680.