Article

Testing multiple statistical hypotheses resulted in spurious associations: A study of astrological signs and health

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To illustrate how multiple hypotheses testing can produce associations with no clinical plausibility. We conducted a study of all 10,674,945 residents of Ontario aged between 18 and 100 years in 2000. Residents were randomly assigned to equally sized derivation and validation cohorts and classified according to their astrological sign. Using the derivation cohort, we searched through 223 of the most common diagnoses for hospitalization until we identified two for which subjects born under one astrological sign had a significantly higher probability of hospitalization compared to subjects born under the remaining signs combined (P<0.05). We tested these 24 associations in the independent validation cohort. Residents born under Leo had a higher probability of gastrointestinal hemorrhage (P=0.0447), while Sagittarians had a higher probability of humerus fracture (P=0.0123) compared to all other signs combined. After adjusting the significance level to account for multiple comparisons, none of the identified associations remained significant in either the derivation or validation cohort. Our analyses illustrate how the testing of multiple, non-prespecified hypotheses increases the likelihood of detecting implausible associations. Our findings have important implications for the analysis and interpretation of clinical studies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Very instructive examples of the apparent demonstration of non-existent relationships are two works published by their authors just to show readers the dangers resulting from making multiple comparisons and grouping post hoc variables [6,7]. ...
... In the first one, the relationships between astrological zodiac signs and the 223 most frequent reasons for hospitalisation of the inhabitants of Canada were evaluated [6]. A group of over 10 million people was randomly divided into two groups: a cohort in which possible relationships were tested and an independent validation cohort. ...
... Among them, we would like to mention Judea Pearl, who championed the probabilistic approach in AI, developer of the Bayesian Networks 3 (Pearl 1999) as well as the creator of a theory of causal and counterfactual inference based on structural models ). 4 Again, we 2 Data mining is a crucial discipline in the twenty-first century, but must be run under solid techniques. Austin et al. (2006) have shown how intensive data mining can obtain invalid results, like the false relationships they obtained associating causally astrological signs and health. The nontrivial extraction of implicit, previously unknown, and potentially useful information from data is not free from error. ...
... Let any one of you who is without priors be the first to throw a formula at the others. N Napoleon I, 41 Newton, 41 Neyman,7,45,[53][54][55][56][57]62,63,66,67,71,72,95,106 Nishiyama,4 O Ojose,4 P Pascal,22,23,78 Pearl,94,105,106 Pearson,7,44,45,50,51,[54][55][56][57]63,66,70,71 Piaget,4,7 Popper,34,56,57,79 Price,[37][38][39]43 ...
Chapter
The response to subjective probabilities of the Bayesian approach was frequentism, that is, the analysis of long-run series of frequencies of an event from which came the possibilities to extract statistical data. Frequentism became the dominant view in scientific practices during most of the twentieth century. This academic view was espoused by several authors, like Pearson, Fisher, Gosset, and Neyman–Pearson, not all of them agreeing about the best ways to perform this statistical approach. The main ideas and internal debates are analyzed here.
... In a poll from 2008, it was found that 31% of Americans believed in astrology, and it is notable that not many more (47%) believed in Darwin's theory of evolution. 5 Associations between the zodiac signs and specifi c diagnoses have previously been reported, 6 and the zodiac sign Pisces (Latin, meaning fi sh) was found to be strongly associated with the diagnosis of heart failure (P = 0.001). Given the association between zodiac signs and diseases, we hypothesised, a long time after fi nishing a trial, that specifi c zodiac signs would be associated with differences in survival in an RCT. ...
... 16 Surprisingly, we could not confi rm the previously reported association between history of heart failure and being born under the sign of Pisces (Table 1). 6 We propose two plausible explanations for this. First, it could be due to a vast difference in power; Austin and colleagues included more than 10 million residents, 6 but we had data on 798 patients, which implies that we would need more than 12 000 similar fl uid trials to achieve the same power. ...
Article
Full-text available
OBJECTIVE: To maximise the yield of existing data by assessing the effect on mortality of being born under the zodiac sign Pisces in a trial of intravenous (IV) fluids. DESIGN, SETTING AND PARTICIPANTS: A retrospective observational study, with no predefined hypothesis or statistical analysis plan, of 26 Scandinavian intensive care units between 2009 and 2011. Patients aged 18 years or older with severe sepsis and in need of fluid resuscitation, randomised in the Scandinavian Starch for Severe Sepsis/ Septic Shock (6S) trial. MAIN OUTCOME MEASURE: Ninety-day mortality. RESULTS: We included all 798 randomised patients in our study; 70 (9%) were born under the sign of Pisces. The primary outcome (death within 90 days after randomisation) occurred in 25 patients (35.7%) in the Pisces group, compared with 348 patients (48%) in the non-Pisces group (relative risk, 0.75; 95% CI, 0.54-1.03; one-sided P = 0.03). CONCLUSIONS: In a multicentre randomised clinical trial of IV fluids, being born under the sign of Pisces was associated with a decreased risk of death. Our study shows that with convenient use of statistics and an enticing explanatory hypothesis, it is possible to achieve significant findings in post-hoc analyses of data from large trials.
... Among them, we would like to mention Judea Pearl, who championed the probabilistic approach in AI, developer of the Bayesian Networks 3 (Pearl 1999) as well as the creator of a theory of causal and counterfactual inference based on structural models ). 4 Again, we 2 Data mining is a crucial discipline in the twenty-first century, but must be run under solid techniques. Austin et al. (2006) have shown how intensive data mining can obtain invalid results, like the false relationships they obtained associating causally astrological signs and health. The nontrivial extraction of implicit, previously unknown, and potentially useful information from data is not free from error. ...
... Let any one of you who is without priors be the first to throw a formula at the others. N Napoleon I, 41 Newton, 41 Neyman,7,45,[53][54][55][56][57]62,63,66,67,71,72,95,106 Nishiyama,4 O Ojose,4 P Pascal,22,23,78 Pearl,94,105,106 Pearson,7,44,45,50,51,[54][55][56][57]63,66,70,71 Piaget,4,7 Popper,34,56,57,79 Price,[37][38][39]43 ...
... 48 * 100 Z 91,47% probability of a false-positive. It has been empirically shown that multiple post-hoc analyses often result in spurious and potentially misleading findings [5], especially due to their observational nature and the fact that they're not based on randomized comparisons. ...
... Although the heterogeneity was explained by some variables, it remained significant in certain subgroups. Besides, the overall estimates in our meta-analysis were obtained using randomeffects analysis that takes between-study variations into account [5]. It should be noted that high heterogeneity between studies is a common issue among meta-analyses [6,7]. ...
... Among them, we would like to mention Judea Pearl, who championed the probabilistic approach in AI, developer of the Bayesian Networks 3 (Pearl 1999) as well as the creator of a theory of causal and counterfactual inference based on structural models ). 4 Again, we 2 Data mining is a crucial discipline in the twenty-first century, but must be run under solid techniques. Austin et al. (2006) have shown how intensive data mining can obtain invalid results, like the false relationships they obtained associating causally astrological signs and health. The nontrivial extraction of implicit, previously unknown, and potentially useful information from data is not free from error. ...
... Let any one of you who is without priors be the first to throw a formula at the others. N Napoleon I, 41 Newton, 41 Neyman,7,45,[53][54][55][56][57]62,63,66,67,71,72,95,106 Nishiyama,4 O Ojose,4 P Pascal,22,23,78 Pearl,94,105,106 Pearson,7,44,45,50,51,[54][55][56][57]63,66,70,71 Piaget,4,7 Popper,34,56,57,79 Price,[37][38][39]43 ...
Chapter
Rev. Bayes and his friend Richard Price created a new way to deal with the philosophical and theological problems on induction as were explained by Hume. This mathematical formula included the notion of subjective probabilities and, consequently, opened a debate on its validity. French mathematician Pierre-Simon Laplace applied it successfully to astronomical calculations just before starting to change his mind over the correctness of Bayes’ formula. Several objections and practical challenges made the general implementation of Bayes’ ideas impossible.
... Among them, we would like to mention Judea Pearl, who championed the probabilistic approach in AI, developer of the Bayesian Networks 3 (Pearl 1999) as well as the creator of a theory of causal and counterfactual inference based on structural models ). 4 Again, we 2 Data mining is a crucial discipline in the twenty-first century, but must be run under solid techniques. Austin et al. (2006) have shown how intensive data mining can obtain invalid results, like the false relationships they obtained associating causally astrological signs and health. The nontrivial extraction of implicit, previously unknown, and potentially useful information from data is not free from error. ...
... Let any one of you who is without priors be the first to throw a formula at the others. N Napoleon I, 41 Newton, 41 Neyman,7,45,[53][54][55][56][57]62,63,66,67,71,72,95,106 Nishiyama,4 O Ojose,4 P Pascal,22,23,78 Pearl,94,105,106 Pearson,7,44,45,50,51,[54][55][56][57]63,66,70,71 Piaget,4,7 Popper,34,56,57,79 Price,[37][38][39]43 ...
Chapter
The emergence of a new discipline, epidemiology, contributes to the understanding of the evolution of scientific attitudes and ideas about causality. After an initial and ancient belief in single causes supported by classic philosophers and nineteenth century physicians like Koch that can be expressed as a monocausality view, the complexity of real medical and toxicological problems forced researchers to embrace the notion of multicausality and similar approaches (web of causes, chain of events). All these debates fed the evolution of statistical methodologies employed as well as led to a new way to understand causality within complex systems or contexts.
... We could do with much less of this research, particularly when it has almost always been conducted with an unknown selection of both independent and dependent variables examined in an unknown number of analyses with an unknown basis for selecting and publishing results. Under these conditions, one can even demonstrate that astrological signs have comparable associations with physical health outcomes [31]. ...
Article
Full-text available
Replication initiatives in psychology continue to gather considerable attention from far outside the field, as well as controversy from within. Some accomplishments of these initiatives are noted, but this article focuses on why they do not provide a general solution for what ails psychology. There are inherent limitations to mass replications ever being conducted in many areas of psychology, both in terms of their practicality and their prospects for improving the science. Unnecessary compromises were built into the ground rules for design and publication of the Open Science Collaboration: Psychology that undermine its effectiveness. Some ground rules could actually be flipped into guidance for how not to conduct replications. Greater adherence to best publication practices, transparency in the design and publishing of research, strengthening of independent post-publication peer review and firmer enforcement of rules about data sharing and declarations of conflict of interest would make many replications unnecessary. Yet, it has been difficult to move beyond simple endorsement of these measures to consistent implementation. Given the strong institutional support for questionable publication practices, progress will depend on effective individual and collective use of social media to expose lapses and demand reform. Some recent incidents highlight the necessity of this.
... Lacking any direct means of taking account of plausibility, the NHST paradigm is notoriously capable of lending support to patently spurious claims (e.g., Hines 1998;Austin et al. 2006;Bennett, Miller, and Wolford 2009). While often blamed on practices such as data-dredging, failure to take account of plausibility is more pernicious as it can undermine entire areas of research (e.g., Bracken 2009; Ioannidis 2013). ...
Article
Full-text available
It is now widely accepted that the techniques of null hypothesis significance testing (NHST) are routinely misused and misinterpreted by researchers seeking insight from data. There is, however, no consensus on acceptable alternatives, leaving researchers with little choice but to continue using NHST, regardless of its failings. I examine the potential for the Analysis of Credibility (AnCred) to resolve this impasse. Using real-life examples, I assess the ability of AnCred to provide researchers with a simple but robust framework for assessing study findings that goes beyond the standard dichotomy of statistical significance/nonsignificance. By extracting more insight from standard summary statistics while offering more protection against inferential fallacies, AnCred may encourage researchers to move toward the post p < 0.05 era.
... For instance, in a survey of published work in neuroimaging journals for the year 2008, Bennet and colleagues [20] found that between 15% and 40% of studies failed to include adjustments for multiple testing. Austin and colleagues [21] have also shown how failure to adjust for testing multiple exploratory hypotheses can result in implausible practical results. In another survey of 800 pathology papers published in 2003, it was found that out of the 37 studies who had performed multiple comparisons, 56% failed to account for inflated Type I error rate [22]. ...
Article
The 'p' value statistic remains a ubiquitous indicator of the verisimilitude of experimental hypotheses. However, testing multiple hypotheses poses a problem as Type I error rate is inflated. Despite known solutions, this problem remains largely neglected for two reasons: 1) most data analysis tools offer limited multiple tests correction options; 2) the learning curve of existing tools requires hefty time investment. To address these concerns, we present a free, easy-to-use and convenient Software, built around established python libraries, that allows users to apply a MUltiple tests correction and get the results in a readily-understood, FOrmatted table (MUFOS) – 'https://github.com/nikbpetrov/mufos'.
... Collecting data also is very much in harmony with the modern "big data" approach to solving problems. Big data, and data mining are somewhat controversial (Austin and Goldwasser 2008, Austin et al. 2006, Dye 2007, Frické 2015. The worry is that collecting data blind is suspect methodologically. ...
Article
It is argued that data should be defined as information on properties of units of analysis. Epistemo-logically it is important to establish that what is considered data by somebody need not be data for somebody else. Tms article considers the nature ot data and big data and the relation between data, intormation, knowledge and documents. It is common for all these concepts that they are about phenomena produced in specific contexts for specific purposes and may be represented in documents, including as representations in databases. In that process, they are taken out of their original contexts and put into new ones and thereby data loses some or all their meaning due to the principle of semantic holism. Some of this lost meaning should be reestablished in the databases and the representations of data/documents cannot be understood as a neutral activity, but as an activity supporting the overall goal implicit in establishing the database. To utilize (big) data (as it is the case with utilizing information, knowledge and documents) demands first of all the identification of the potentials of these data for relevant purposes. The most fruitfiil theoretical frame for knowledge organization and data science is the social epistemology suggested by Shera (1951). One important aspect about big data is that they are often unintentional traces we leave during all kinds of activities. Their potential to inform somebody about something is therefore less direct compared to data that have been produced intentionally as, for example, scientific databases. © 2019 International Society for Knowledge Organization. All Rights Reserved.
... Second, many of the studies which have demonstrated the presence of a weekend or weekday effect suffer from the problem of multiple comparisons increasing the likelihood of spurious association [36]. Third, in many of the studies, there is the possibility of inaccurate coding of diagnoses at admission. ...
Article
Full-text available
Background Increased 30-day mortality rates have been reported in patients undergoing elective surgery later compared with earlier in the week. However, these reports have been conflicting for esophageal surgery. We conducted a study to assess the differences in outcomes of patients undergoing surgery for esophageal cancer earlier in the week (Tuesday) versus later (Friday). Methods This retrospective analysis of a prospectively maintained database included patients with esophageal cancer who underwent esophageal resection in a tertiary cancer center between 1 January 2005 and 31 December 2017. We compared patients operated on Tuesdays versus Fridays. The primary outcome was a composite of major morbidity (defined as Clavien-Dindo grade 3 or more) and/or mortality. Secondary outcomes included duration of post-operative ventilation, and length of ICU and hospital stay. Results Among 1300 patients included, 733 were operated on a Tuesday and 567 on a Friday. Patient and surgery characteristics were similar in the two groups. The primary outcome (composite of major morbidity and mortality) was 23.6% in the Tuesday group versus 26.3% in the Friday group. Mortality was similar in the two groups (6.0%). Multivariable logistic regression analysis showed that the day of surgery was not a predictor of major morbidity or mortality. Conclusions In patients undergoing esophagectomy at tertiary care high volume cancer center, there was no difference in major morbidity and mortality whether the surgery was performed early in the week (Tuesday) or closer to the weekend (Friday).
... Such a scenario arises frequently when a statistical analysis is applied to individual items in a Likert scale [11]. In 2006, [3] conducted a study investigating whether individuals born under a certain astrological sign were more likely to be hospitalized for a certain diagnosis. The authors tested for over 200 diseases and found that Leos had a statistically higher probability of being hospitalized for gastrointestinal hemorrhage and Sagittarians had a statistically higher probability of a fractured humerus. ...
Preprint
As robots become more prevalent, the importance of the field of human-robot interaction (HRI) grows accordingly. As such, we should endeavor to employ the best statistical practices. Likert scales are commonly used metrics in HRI to measure perceptions and attitudes. Due to misinformation or honest mistakes, most HRI researchers do not adopt best practices when analyzing Likert data. We conduct a review of psychometric literature to determine the current standard for Likert scale design and analysis. Next, we conduct a survey of four years of the International Conference on Human-Robot Interaction (2016 through 2019) and report on incorrect statistical practices and design of Likert scales. During these years, only 3 of the 110 papers applied proper statistical testing to correctly-designed Likert scales. Our analysis suggests there are areas for meaningful improvement in the design and testing of Likert scales. Lastly, we provide recommendations to improve the accuracy of conclusions drawn from Likert data.
... The investigators searched 223 of the most common diagnoses for hospitalization in the medical records of the participants and found that 24 were statistically significant in the first cohort based on individuals' astrological signs. Two of these associations remained statistically significant in the second cohort with relative risks of 1.15 and 1.38, numbers in the same range as the relative risk of processed meat and colon cancer (Austin et al., 2006). ...
Article
Full-text available
Red meat is a nutrient dense food providing important amounts of protein, essential amino acids, vitamins, and minerals that are the most common nutrient shortages in the world, including vitamin A, iron, and zinc. Despite claims by the World Health Organization (WHO) that eating processed meat causes colon cancer and red meat probably causes cancer, the observational data used to support the claims are weak, confounded by multiple unmeasured factors, and not supported by other types of research needed for such a conclusion. Although intervention studies are designed to test the validity of associations found in observational studies, two interventions of low-fat, low-meat diets in volunteers that failed to find a benefit on cancer were not considered in the WHO decision. It is likely that the association of red-meat consumption with colon cancer is explained either by an inability of epidemiology to detect such a small risk or by combinations of other factors such as greater overweight, less exercise, lower vegetable or dietary fiber intake, and perhaps other habits that differentiate those who eat the most meat from those who eat the least.
... Person correlation coefficient was used to determine the extent of association of each predicator variable with each canonical discriminant function. In addition, Bonferroni multiple pairwise comparisons analyses was performed to test for differences between groups (Austin et al. 2006). Furthermore, the accuracy of the model was assessed from the classification (or confusion) matrix, which gives percent of correctly classified instances. ...
Article
Full-text available
Incursion of water hyacinth, Eichhornia crassipes, has been a potential threat to Lake Tana and its ecosystem services. Its expansion is currently managed by abstraction (removing by hand); nonetheless, the disposal of mats and formation of pools are remaining problematic. This study aimed to assess the potential effects of water hyacinth and its management on water quality and human health. Biotic and abiotic data were collected on open water, water hyacinth covered and water hyacinth cleared out habitats. A total of 3673 invertebrates belonging to twenty-one families were collected from 45 sites. Culicidae was the most abundant family accounting (37.2%), followed by Unionoidae (19.4%) and Sphaeriidae (8.1%). Abundance of anopheline and culicine larvae were significantly higher in water hyacinth cleared out habitats (p < 0.05). Water conductivity and total dissolved solids were significantly higher in habitats covered with water hyacinth (p < 0.05). In conclusion, water hyacinth infestation had a negative impact on water quality and biotic communities. The physical abstraction of water hyacinth provided a very good habitat for the proliferation of mosquito larvae. Therefore, integrating water hyacinth management practices along with mosquito larvae control strategy could help to abate the potential risk of malaria outbreak in the region. In addition, developing watershed scale nutrient management systems could have a vital contribution for managing water hyacinth invasion in the study area.
... Furthermore, to avoid any possible problem of discriminant validity affecting the test of hypothesis, we conducted multicollinearity analyses and found that it did not influence the relationship between variables presented in our models (Shiu, Pervan, Bove and Beatty, 2011). Additionally, to avoid any risk of spurious relationships, we drew on the recommendation of Austin, Mamdani, Juurlink and Hux (2006) and Picard and Berk (1990). We randomly split the sample into two subsamples, and ran all the linear regressions on both. ...
Article
This paper investigates the implications of perceived Socio-Ideological Organizational Controls (SIOC) dimensions on actors' lived experiences in the workplace. We explored whether emotions mediated the dyad control-resistance. Data was collected from 385 participants, via a self-administered questionnaire framed as part of a cross-sectional survey design. Our findings suggest that SIOC dimension related to the promotion of values is an important predictor of experiencing higher positive emotions and lower negative emotions at work. The positive emotions, in turn, predict higher organisational citizenship levels and lower resistance behaviours. Based on these findings, we discuss the role and effectiveness of organisational controls inspired by discursive practices.
... 10,11 This is of particular concern in veterinary medicine because these errors are much more likely to occur when sample sizes are small. False positive results are very likely to occur when researchers comb the data for statistically significant findings, 12,13 as often occurs in retrospective and other observational study designs that compose much of the veterinary literature. This is called the multiple comparisons problem and means that the risk of type I errors increases as more tests are performed. ...
Article
Full-text available
Clinical research attempts to answer questions about patient populations by studying small samples of patients drawn from those populations. Statistics are used to describe the data collected in a study and to make inferences about the larger populations. Practitioners of evidence-based practice need a basic understanding of these principles to critically appraise the results of research studies. The main paradigm for statistical inference in medicine is called hypothesis testing, which involves generating a null hypothesis and examining the strength of evidence against it.
... First, preregistration does not prevent researchers from making theoretically or biologically implausible hypotheses or predictions. For example, there is no mechanism in place to prevent an ardent astrologer from predicting that zodiac signs influence athletic performance [43]. No matter where they are hosted, preregistrations are not typically reviewed by peers prior to data collection and analysis possibly harming the quality of the final publication [44]. ...
Preprint
The primary means for disseminating sport and exercise science research is currently through journal articles. However, not all studies, especially those with null findings, make it to formal publication. This publication bias towards positive findings may contribute to questionable research practices. Preregistration is a solution to prevent the publication of distorted evidence resulting from this system. This process asks authors to register their hypotheses and methods before data collection on a publicly available repository or by submitting a Registered Report. In the Registered Reports format, authors submit a Stage 1 manuscript to a participating journal that includes an introduction, methods, and any pilot data indicating the exploratory or confirmatory nature of the study. After a Stage 1 peer review, the manuscript can then be offered in-principle acceptance, rejected, or sent back for revisions to improve the quality of the study. If accepted, the project is guaranteed publication, assuming the authors follow the data collection and analysis protocol. After data collection, authors re-submit a Stage 2 manuscript that includes the results and discussion, and the study is evaluated on clarity and conformity with the planned analysis. In its final form, Registered Reports appear almost identical to a typical publication, but give readers confidence that the hypotheses and main analyses are less susceptible to bias from questionable research practices. From this perspective, we argue that inclusion of Registered Reports by researchers and journals will improve the transparency, replicability, and trust in sport and exercise science research. To view the full-text please follow this link: https://osf.io/preprints/sportrxiv/fxe7a
... 36 Yet, analyses of observational data often yield spurious associations-a problem that may be increased by orders of magnitude with big data because they allow researchers readily to test multiple hypotheses, increasing the likelihood of finding one or more associations to be "positive" simply because of the play of chance (that is, false positives). 37 In a study involving 17 275 patients and 835 deaths that compared treatment effects on mortality using routinely collected data and subsequent randomized controlled trials, the authors reported that "real world data" analyses "showed significantly more favorable mortality benefits by 31% than subsequent trials (summary relative odds ratio 1.31 (95% confidence interval 1.03 to 1.65; I 2 =0%))." The difference was apparent even with statistical attempts to reduce confounding bias in each of the observational studies. ...
... First, preregistration does not prevent researchers from making theoretically or biologically implausible hypotheses or predictions. For example, there is no mechanism in place to prevent an ardent astrologer from predicting that zodiac signs influence athletic performance [49]. No matter where they are hosted, preregistrations are not typically reviewed by peers prior to data collection and analysis, possibly harming the quality of the final publication [50]. ...
Article
Full-text available
The primary means of disseminating sport and exercise science research is currently through journal articles. However, not all studies, especially those with null findings, make it to formal publication. This publication bias towards positive findings may contribute to questionable research practices. Preregistration is a solution to prevent the publication of distorted evidence resulting from this system. This process asks authors to register their hypotheses and methods before data collection on a publicly available repository or by submitting a Registered Report. In the Registered Report format, authors submit a stage 1 manuscript to a participating journal that includes an introduction, methods, and any pilot data indicating the exploratory or confirmatory nature of the study. After a stage 1 peer review, the manuscript can then be offered in-principle acceptance, rejected, or sent back for revisions to improve the quality of the study. If accepted, the project is guaranteed publication, assuming the authors follow the data collection and analysis protocol. After data collection, authors re-submit a stage 2 manuscript that includes the results and discussion, and the study is evaluated on clarity and conformity with the planned analysis. In its final form, Registered Reports appear almost identical to a typical publication, but give readers confidence that the hypotheses and main analyses are less susceptible to bias from questionable research practices. From this perspective, we argue that inclusion of Registered Reports by researchers and journals will improve the transparency, replicability, and trust in sport and exercise science research. The preprint version of this work is available on SportRxiv: https://osf.io/preprints/sportrxiv/fxe7a/. Free access to the published version: https://rdcu.be/b1jfo
... For additional examples, see Fry (2019) and Sleight (2000). For a replication of sorts, see Austin, Mamdani, Juurlink, and Hux (2006), who set out explicitly to debunk statistical significance hunting with a study of correlations between various maladies and astrological signs. Their large statistical analysis revealed that "residents born under Leo had a higher probability of gastrointestinal hemorrhage (P50.0447), while Sagittarians had a higher probability of humerus fracture (P50.0123) ...
Article
This essay focuses on a particular subset of educational research, questions that are researchable and that, ultimately, will make a positive difference for the educational enterprise. It argues that to be researchable, tasks need to be addressable in operational terms – that questions that hinge on values (e.g. “are large small classes better than large classes?”) are inherently unaddressable until the values undergirding the questions are specified in clear and meaningful ways. Key claims of this paper are that in addition to being addressable, to advance the field, research questions should be meaningful and generative. What is meant by meaningful is that the answer to the questions posed should matter to either practice or theory in some important way. What is meant by generative is that working on the problem, whether it is “solved” or not, is likely to provide valuable insights. Interestingly, time is not necessarily an issue – a problem that takes a century to solve may still be extremely valuable – if the subproblems it spawns are rich, interesting, and addressable. From this author’s perspective, for a problem to be researchable (in effect, answerable) is only a first step. What makes problems valuable to the field is their potential importance and their fruitfulness in producing ongoing insights and meaningful questions. A range of examples including design experiments, assessments, and the evolution of an extended research program are discussed as illustrations of the above claims.
... Large data sets cause impressive p-values with minor differences in biology. Are they clinically relevant [13,22,26,27]? Despite substantially different study populations and sample sizes, dramatically different p-values for two validated outcomes (CHD and hypertension) are noteworthy: p-values<0.001 ...
Article
Full-text available
In the modern era, with high-throughput technology and large data size, associational studies are actively being generated. Some have statistical and clinical validity and utility, or at least have biologically plausible relationships, while others may not. Recently, the potential effect of birth month on lifetime disease risks has been studied in a phenome-wide model. We evaluated the associations between birth month and 5 cardiovascular disease-related outcomes in an independent registry of 8,346 patients from Ontario, Canada in 1977-2014. We used descriptive statistics and logistic regression, along with model-fit and discrimination statistics. Hypertension and coronary heart disease (of primary interest) were most prevalent in those who were born in January and April, respectively, as observed in the previous study. Other outcomes showed weak or opposite associations. Ancillary analyses (based on raw blood pressures and subgroup analyses by sex) demonstrated inconsistent patterns and high randomness. Our study was based on a high risk population and could not provide scientific explanations. As scientific values and clinical implications can be different, readers are encouraged to read the original and our papers together for more objective interpretations of the potential impact of birth month on individual and public health as well as toward cumulative/total evidence in general.
... From the pragmatic point of view, it is easier to investigate the association between two medical conditions than to study the effect of an intervention on a medical condition, in particular when the intervention, such as a medication, is time-dependent. Testing multiple hypotheses at the same time definitely increases the likelihood of finding an association [31]. Therefore, conducting studies investigating the association between two medical conditions within a large database appears to be a shortcut to increase research output. ...
Article
Full-text available
Background: Studies using Taiwan's National Health Insurance (NHI) claims data have expanded rapidly both in quantity and quality during the first decade following the first study published in 2000. However, some of these studies were criticized for being merely data-dredging studies rather than hypothesis-driven. In addition, the use of claims data without the explicit authorization from individual patients has incurred litigation. Objective: This study aimed to investigate whether the research output during the second decade after the release of the NHI claims database continues growing, to explore how the emergence of open access mega journals (OAMJs) and lawsuit against the use of this database affect the research topics and publication volume and to discuss the underlying reasons. Methods: PubMed was used to locate publications based on NHI claims data between 1996 and 2017. Concept extraction using MetaMap was employed to mine research topics from article titles. Research trends were analyzed from various aspects, including publication amount, journals, research topics and types, and cooperation between authors. Results: A total of 4473 articles were identified. A rapid growth in publications was witnessed from 2000 to 2015, followed by a plateau. Diabetes, stroke, and dementia were the top 3 most popular research topics whereas statin therapy, metformin, and Chinese herbal medicine were the most investigated interventions. Approximately one-third of the articles were published in open access journals. Studies with two or more medical conditions, but without any intervention, were the most common study type. Studies of this type tended to be contributed by prolific authors and published in OAMJs. Conclusions: The growth in publication volume during the second decade after the release of the NHI claims database was different from that during the first decade. OAMJs appeared to provide fertile soil for the rapid growth of research based on NHI claims data, in particular for those studies with two or medical conditions in the article title. A halt in the growth of publication volume was observed after the use of NHI claims data for research purposes had been restricted in response to legal controversy. More efforts are needed to improve the impact of knowledge gained from NHI claims data on medical decisions and policy making.
... To generate hypothesis for drug repurposing candidates, many researchers prefer to use nominal P-values to protect from type II errors of missing an opportunity (25). However, there are also concerns of finding spurious associations due to multiple testing (26). As such, we also provide false-discovery rate (FDR) adjusted P-values under the advanced options tab. ...
Article
Full-text available
The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce ‘RE:fine Drugs’, a freely available interactive website for integrated search and discovery of drug repurposing candidates from GWAS and PheWAS repurposing datasets constructed using previously reported methods in Nature Biotechnology. ‘RE:fine Drugs’ demonstrates the possibilities to identify and prioritize novelty of candidates for drug repurposing based on the theory of transitive Drug–Gene–Disease triads. This public website provides a starting point for research, industry, clinical and regulatory communities to accelerate the investigation and validation of new therapeutic use of old drugs. Database URL: http://drug-repurposing.nationwidechildrens.org
... In pharmacoepidemiology, a prior specification of the research question (and study population, study design, and data analysis plan) in the format of a study protocol is recommended to minimize the risk of "cherry-picking" interesting findings and a related issue of observing spurious findings because of multiple hypothesis testing (Austin et al. 2006). The rationale for the study should be explicitly stated, along with what a new study can add to existing knowledge. ...
Chapter
Qualitative research within pharmacy practice is concerned with understanding the behavior and underlying motives, perceptions, and ideas of actors such as pharmacy staff, pharmacy owners, patients, other health care professionals, and politicians to explore various types of existing practices and beliefs in order to improve them. As qualitative research attempts to answer the “why” questions, it is useful for describing, in rich detail, complex phenomena that are situated and embedded in local contexts. Typical methods include interviews, observation, documentary analysis, netnography, and visual methods. Qualitative research has to live up to a set of quality criteria of research conduct in order to provide trustworthy results that contribute to the further development of the area.
... Concerns about a lack of confirmation have been raised in pharmacovigilance [149], '-omics' research [150], studies of clinical biomarkers [151], psychology [152], ecology [153], neuroscience [154], and economics [155]. Although there is abundant commentary on the subject [5,7,9,148,[156][157][158][159][160][161][162], relatively few studies have offered empirical assessments of multiplicity [163][164][165], and to our knowledge no studies have systematically collected data on the extent and influence of both multiplicity and incomplete reporting from the perspective of a systematic review. ...
Article
Full-text available
Background Assessments of scientific evidence often involve a systematic comparison of findings within and across studies. An important consideration in systematic reviews is multiplicity, which can arise from simultaneous consideration of multiple independent and dependent variables, use of multiple statistical models, or multiple subgroup analyses. Multiplicity can affect the results of evidence assessment, if accompanied by incomplete or selective reporting. Birth cohort studies investigating prenatal/neonatal exposure to polychlorinated biphenyls (PCBs) and their relation to neurodevelopmental measures during follow up offer an interesting opportunity for assessing multiplicity and completeness of reporting because the literature on the subject is voluminous, and the data typically allow considerable flexibility in terms of choice of analysis and reporting of findings. Methods Following a systematic search, each relevant publication was characterized with respect to its methods of exposure assessment, outcome characterization, analysis and reporting. Based on the total number of unique exposure and outcome categories we calculated the number of possible exposure-outcome associations that could have been examined within and across cohorts. Each association was categorized as “reported” or “not reported/not evaluated”, and the number of studies that reported each association was ascertained. Results A total of 208 prenatal/neonatal exposures and 461 outcomes were measured across 34 cohorts and 111 publications. Only 29 associations were presented in at least three studies; of those, only 21 associations were reported within the same age group allowing a meaningful side-by-side comparison. Percentage of within-cohort associations that were reported among all those that could have been reported based on the available data ranged from 6% to 100%. Conclusions The literature on PCBs and neurodevelopment exemplifies a situation wherein despite large numbers of published studies, inconsistent or incomplete reporting of multiple results impede systematic reviews.
... Further, χ 2 tests are carried out (presented on page 175), but not on all possible combinationsjust a sub-setand no correction is made for multiple comparisons across the same data-set. The multiple-comparisons problem is also an issue in the series of χ 2 tests reported on page 355; see Winter (2020) for a discussion of this problem, and Austin et al. (2006) for illustration. ...
... 4 Again, we 2 Data mining is a crucial discipline in the twenty-first century, but must be run under solid techniques. Austin et al. (2006) have shown how intensive data mining can obtain invalid results, like the false relationships they obtained associating causally astrological signs and health. The nontrivial extraction of implicit, previously unknown, and potentially useful information from data is not free from error. ...
Chapter
Full-text available
Computer sciences have completely changed the way scientific and social research is performed nowadays. This chapter analyzes the role of Bayesianism and frequentism into the emergence of e-science, artificial intelligence, and robotics, the generation of expert systems, and the overwhelming problem of how to analyze Big Data, a process called “data mining.” This review of the main systems and ideas will show us how Bayesianism is acquiring a determinant position among worldwide users of statistical tools.
... Epidemiologic data can be useful to assess food safety, that is, to identify anomalies and outbreaks related to food supply, and to evaluate the eorts toward their prevention. Still, the purpose of correlating the whole set of molecular food exposures to food-borne diseases raises formidable methodological challenges [70]. While always remaining within the food exposome, food intake depends on individual processes and lifestyle preferences as well as on global factors, such as economic inuences and social changes. ...
... This result, according to the authors, contributes to the growing body of statistical literature that demonstrates that databased analysis methods can lead to misleading inferences (Austin & Goldwasser, 2008). Dealing specifically with information systems, Austin et al. (2006) and Frické (2009) consider that the conclusions obtained from data mining should deserve a degree of skepticism. ...
Article
Full-text available
The Information Pyramid has been used in technical and academic texts for a long time. Its origin is still uncertain, and it is likely to remain so, but the structure established by Russell Ackoff in 1989 has been the basis for most of the representations found in articles and books. This pyramid has been the subject of criticism from several authors in different research fields. In this theoretical essay, some of the pyramid's development trails are retrieved. Different expressions of the pyramid are discussed, comparing and contrasting its elements, assumptions and implications, in search of a more comprehensive understanding of these elements and their intertwining. To make the exposure more fluid, the reviews were grouped into categories; these, however, should not be taken in isolation, since the focus of attention is the representation, its premises and its implications. It is concluded that the hierarchical representations of the relationships between data, information, knowledge and others are unable to adequately represent, even in a simplified way, the complex processes it intends to subsume. However, it is considered that this representation can still be an instrument of learning, since used critically, supporting discussions about the complexity and circularity of the phenomena that this representation expresses.
... As displayed in Tables 1 and 2, the F-value significantly predicts the dependent variable (p < 0.05) in 25 out of 36 regression models, thus reporting an overall good fit for the data analyzed. In order to increase the reliability of the results, and to avoid any risk of spurious relations, we followed the recommendation of Austin, Mamdani, Juurlink, and Hux (2006), and of Picard and Berk (1990), randomly splitting the sample into two subsamples, and running all the linear regressions for the two subsamples. We repeated this procedure three times. ...
Article
Countries’ image is a multifaceted construct. Its symbolic dimensions have shown to play an important role both on consumer behavior and on the attraction that organizations can have during the recruitment process. This paper offers a comprehensive model of international mobility decisions encompassing the antecedents and consequences of perceptions about emerging economies, proposing that country image depends on individuals’ background and social identities. In this context, countries’ evaluations can play a major role on influencing the willingness to accept expatriate job offers. We used a within-subject design asking for opinions about hypothetical job offers on six particular host countries: Algeria, Democratic Republic of Congo, Argentina, Chile, Angola and Mozambique. Survey results from more than 500 engineers, (125 French nationals, 121 Spanish, 131 Portuguese, with the remaining 138 coming from 42 different countries, yet working in one of the three above-mentioned European countries), evidence, that language proficiency influences the evaluation of specific expatriate locations. Our results also convey the critical role of the perceived level of safety and cultural attraction in predicting the willingness to accept expatriate job offers. We conclude by discussing the theoretical and practical implications for human resource management.
... This problem increases with the length of the time period, that is, monthly versus yearly data. Since none of our techniques eliminate all of the significant championship variables, we echo the advice that is in Austin et al. (2006) who warn against 'the hazards of testing multiple, non-prespecified hypotheses' (p. 968). ...
Article
Full-text available
Nowoczesne technologie w neuronauce – możliwości, ograniczenia i perspektywy rozwoju Słowa kluczowe: neuronauka poznawcza, neuroobrazowanie, wnioskowanie statystyczne Wstęp Neuronauka poznawcza jest interdyscyplinarną dziedziną nauki, powstałą w wyniku połączenia pracy m. in. psychologów, biologów, inżynierów, informatyków. Przedmiotem jej badań są mechanizmy neuronalne leżące u podłoża funkcji poznawczych człowieka. Jest to możliwe dzięki zastosowaniu urządzeń do neuroobrazowania, takich jak elektroencefalografia (EEG), funkcjonalny i dyfuzyjny rezonans magnetyczny (fMRI, dMRI), czy eyetracking (okulografia). Wszystkie te metody pozwalają na zbadanie innych parametrów pracy mózgu, co powoduje, iż każda z nich ma ograniczony zakres zastosowań. Jako że złożoność budowy mózgu oraz dróg komunikacji neuronalnej można zaobserwować i zrozumieć tylko przy uwzględnieniu wielu płaszczyzn (np. przestrzennej, czasowej, elektrochemicznej), obiecującym rozwiązaniem wydaje się być łączenie różnych metod neuroobrazowania. Przykładem tej aktualnie rozwijanej tendencji może być metoda FRP (fixation related potentials), czyli badanie mózgowych potencjałów wywołanych związanych z trajektorią ruchu gałek ocznych lub też magnetoencefalografia (MEG), która uzupełnia tradycyjne EEG o lokalizację aktywnych obszarów mózgu w czasie wykonywania zadania poznawczego.
Chapter
A famous article by Chris Anderson, published in Wired in 2008, was titled: “The data deluge makes the scientific method obsolete.” Its basic argument asserts that many researchers currently fall for the lure of massive amounts of data combined with applied mathematics, which in effect replace every other tool that might be useful for making sound predictions in science. Following the latter orientation, theories of human behavior results are apparently abandoned, no matter whether such theories are grounded in psychology, sociology, or economics. From Anderson’s perspective, it is simply considered not to be interesting anymore to understand the driving forces of why people do what they do. In contrast, observation of human behavior is often (yet, wrongly) assumed to be sufficient, also because one can easily track and measure such behavior and record data on it. Then, given myriads of “reliable” data, the numbers are supposed to speak for themselves.
Article
Full-text available
Many biomechanics studies have small sample sizes and incorrect statistical analyses, so reporting of inaccurate inferences and inflated magnitude of effects are common in the field. This review examines these issues in biomechanics research and summarises potential solutions from research in other fields to increase the confidence in the experimental effects reported in biomechanics. Authors, reviewers and editors of biomechanics research reports are encouraged to improve sample sizes and the resulting statistical power, improve reporting transparency, improve the rigour of statistical analyses used, and increase the acceptance of replication studies to improve the validity of inferences from data in biomechanics research. The application of sports biomechanics research results would also improve if a larger percentage of unbiased effects and their uncertainty were reported in the literature.
Chapter
Allein für Deutschland weist die amtliche Statistik rund 2,5 Mio. Unternehmen aus, die eine Bruttowertschöpfung von knapp 1,6 Bio. € erbringen. Dazu gehören große multinationale Konzerne mit mehreren Hunderttausend Mitarbeitern genauso wie Kleinstunternehmen, Traditionsunternehmen mit einer über 100-jährigen Firmengeschichte genauso wie Start-ups, die am Anfang der Umsetzung ihrer Geschäftsideen stehen.
Chapter
Pharmacoepidemiology studies the utilization patterns of medicines—also known as drug utilization research—which is an important component of pharmacy practice research. Pharmacoepidemiology also studies the relationship between medicines or other medical treatments and outcomes in large populations under nonexperimental situations. Providing an introduction to pharmacoepidemiology, this chapter describes frequently used metrics to understand drug utilization and medication adherence. This chapter also covers the key concepts involved in studying the association between medical or surgical treatments and outcomes. These concepts include forming a research question, selecting sources of data, defining the study population, and defining drug exposures, covariates, and outcomes. The chapter also discusses a range of study designs used in pharmacoepidemiologic research, including, but not limited to, cohort studies, case-control studies, within-subject studies, cross-sectional studies, ecological studies, and quasi-experimental designs. Finally, the chapter draws on key challenges such as confounding bias as well as commonly used analytical techniques to overcome these challenges.
Chapter
Palaeopathology is an evidence-based guide to the principal types of pathological lesions often found in human remains and how to diagnose them. Tony Waldron presents an innovative method of arriving at a diagnosis in the skeleton by applying what he refers to as 'operational definitions'. The method ensures that those who study bones will use the same criteria for diagnosing disease, thereby enabling valid comparisons to be made between studies. Waldron's book is based on modern clinical knowledge and provides background information on the natural history of bone disease. In addition, the volume demonstrates how results from studies should be analysed, methods of determining the frequency of disease, and other types of epidemiological analysis. This edition includes new chapters on the development of palaeopathology, basic concepts, health and disease, diagnosis, and spinal pathology. Chapters on analysis and interpretation have been thoroughly revised and enlarged.
Article
Objective ‘Improvement science’ is used to describe specific quality improvement methods (including tests of change and statistical process control). The approach is spreading from clinical settings to population-wide interventions and is being extended from supporting the adoption of proven interventions to making generalisable claims about new interventions. The objective of this narrative review is to evaluate the strengths and risks of current improvement science practice, particularly in relation to how they might be used in population health. Methods A purposive sampling of published studies to identify how improvement science methods are being used and for what purpose. The setting was Scotland and studies that focused on health and wellbeing outcomes. Results We have identified a range of improvement science approaches which provide practitioners with accessible tools to assess small-scale changes in policy and practice. The strengths of such approaches are that they facilitate consistent implementation of interventions already known to be effective and motivate and empower staff to make local improvements. However, we also identified a number of potential risks. In particular, their use to assess the effectiveness of new interventions often seems to pay insufficient attention to random variation, measurement bias, confounding and ethical issues. Conclusions The use of current improvement science methods to generate evidence of effectiveness for population-wide interventions is problematic and risks unjustified claims of effectiveness, inefficient resource use and harm to those not offered alternative effective interventions. Newer methodological approaches offer alternatives and should be more widely considered.
Book
Full-text available
Data and its economic impact permeates all sectors of the economy. The data economy is not a new sector, but more like a challenge for all firms to compete and innovate as part of a new wave of economic value creation. With data playing an increasingly important role across all sectors of the economy, the results of this report point European policymakers to promote the development and adoption of unified reference architectures. These architectures constitute a technology-neutral and cross-sectoral approach that will enable companies small and large to compete and to innovate—unlocking the economic potential of data capture in an increasingly digitized world. Data access appears to be less of a hindrance to a thriving data economy due to the net increase in capabilities in data capture, elevation, and analysis. What does prove difficult for firms is discovering existing datasets and establishing their suitability for achieving their economic objectives. Reference architectures can facilitate this process as they provide a framework to locate potential providers of relevant datasets and carry sufficient additional information (metadata) about datasets to enable firms to understand whether a particular dataset, or parts of it, fits their purpose. Whether third-party data access is suitable to solve a specific business task in the first place ought to be a decision at the discretion of the economic actors involved. As our report underscores, data captured in one context with a specific purpose may not be fit for another context or another purpose. Consequently, a firm has to evaluate case-by-case whether first-party data capture, third-party data access, or a mixed approach is the best solution. This evaluation will naturally depend on whether there is any other firm capturing data suitable for the task that is willing to negotiate conditions for third-party access to this data. Unified data architectures may also lower the barriers for a firm capturing suitable data to engage in negotiations, since its adoption will lower the costs of making the data ready for a successful exchange. Such architectures may further integrate licensing provisions ensuring that data, once exchanged, is not used beyond the agreed purpose. It can also bring in functions that improve the discoverability of potential data providers.
Article
An exploratory analysis of registry data from 2437 patients with advanced gastric cancer revealed a surprising association between astrological birth sign and overall survival (OS) with p = 0.01. After dichotomizing or changing the reference sign, p-values <0.05 were observed for several birth signs following adjustments for multiple comparisons. Bayesian models with moderately skeptical priors still pointed to these associations. A more plausible causal model, justified by contextual knowledge, revealed that these associations arose from the astrological sign association with seasonality. This case study illustrates how causal considerations can guide analyses through what would otherwise be a hopeless maze of statistical possibilities.
Article
We discuss inference after data exploration, with a particular focus on inference after model or variable selection. We review three popular approaches to this problem: sample splitting, simultaneous inference, and conditional selective inference. We explain how each approach works and highlight its advantages and disadvantages. We also provide an illustration of these post-selection inference approaches. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 9 is March 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Full-text available
A key principle for interpretation of subgroup results is that quantitative interactions (differences in degree) are much more likely than qualitative interactions (differences in kind). Quantitative interactions are likely to be truly present whether or not they are apparent, whereas apparent qualitative interactions should generally be disbelieved as they have usually not been replicated consistently. Therefore, the overall trial result is usually a better guide to the direction of effect in subgroups than the apparent effect observed within a subgroup. Failure to specify prior hypotheses, to account for multiple comparisons, or to correct P values increases the chance of finding spurious subgroup effects. Conversely, inadequate sample size, classification of patients into the wrong subgroup, and low power of tests of interaction make finding true subgroup effects difficult. We recommend examining the architecture of the entire set of subgroups within a trial, analyzing similar subgroups across independent trials, and interpreting the evidence in the context of known biologic mechanisms and patient prognosis.
Article
Full-text available
Previous studies have shown that calcium-channel blockers increase morbidity and mortality in patients with chronic heart failure. We studied the effect of a new calcium-channel blocker, amlodipine, in patients with severe chronic heart failure. We randomly assigned 1153 patients with severe chronic heart failure and ejection fractions of less than 30 percent to double-blind treatment with either placebo (582 patients) or amlodipine (571 patients) for 6 to 33 months, while their usual therapy was continued. The randomization was stratified on the basis of whether patients had ischemic or nonischemic causes of heart failure. The primary end point of the study was death from any cause and hospitalization for major cardiovascular events. Primary end points were reached in 42 percent of the placebo group and 39 percent of the amlodipine group, representing a 9 percent reduction in the combined risk of fatal and nonfatal events with amlodipine (95 percent confidence interval, 24 percent reduction to 10 percent increase; P=0.31). A total of 38 percent of the patients in the placebo group died, as compared with 33 percent of those in the amlodipine group, representing a 16 percent reduction in the risk of death with amlodipine (95 percent confidence interval, 31 percent reduction to 2 percent increase; P=0.07). Among patients with ischemic heart disease, there was no difference between the amlodipine and placebo groups in the occurrence of either end point. In contrast, among patients with nonischemic cardiomyopathy, amlodipine reduced the combined risk of fatal and nonfatal events by 31 percent (P=0.04) and decreased the risk of death by 46 percent (P<0.001). Amlodipine did not increase cardiovascular morbidity or mortality in patients with severe heart failure. The possibility that amlodipine prolongs survival in patients with nonischemic dilated cardiomyopathy requires further study.
Article
Full-text available
Analysis of subgroup results in a clinical trial is surprisingly unreliable, even in a large trial. This is the result of a combination of reduced statistical power, increased variance and the play of chance. Reliance on such analyses is likely to be more erroneous, and hence harmful, than application of the overall proportional (or relative) result in the whole trial to the estimate of absolute risk in that subgroup. Plausible explanations can usually be found for effects that are, in reality, simply due to the play of chance. When clinicians believe such subgroup analyses, there is a real danger of harm to the individual patient.
Article
Confirmatory clinical trials often classify clinical response variables into primary and secondary endpoints. The presence of two or more primary endpoints in a clinical trial usually means that some adjustments of the observed p-values for multiplicity of tests may be required for the control of the type I error rate. In this paper, we discuss statistical concerns associated with some commonly used multiple endpoint adjustment procedures. We also present limited Monte Carlo simulation results to demonstrate the performance of selected p-value-based methods in protecting the type I error rate. © 1997 by John Wiley & Sons, Ltd.
Article
Altruism and trust lie at the heart of research on human subjects. Altruistic individuals volunteer for research because they trust that their participation will contribute to improved health for others and that researchers will minimize risks to participants. In return for the altruism and trust that make clinical research possible, the research enterprise has an obligation to conduct research ethically and to report it honestly. Honest reporting begins with revealing the existence of all clinical studies, even those that reflect unfavorably on a research sponsor's product.
Article
Data splitting is the act of partitioning available data into two portions, usually for cross-validatory purposes. One portion of the data is used to develop a predictive model and the other to evaluate the model's performance. This article reviews data splitting in the context of regression. Guidelines for splitting are described, and the merits of predictive assessments derived from data splitting relative to those derived from alternative approaches are discussed.
Article
Background: Zodiac signs are believed to influence personality and health. Those born under the sun sign Leo are alleged to be big-hearted and at risk for heart disease. Methods: We compared the demographic and exercise variables among a cohort of 32,386 patients undergoing stress testing at a large academic center. Further, we studied the association of Leo sign with long-term mortality. Results: There were only minor differences in the baseline or exercise variables among Leos and the remainder of the cohort. There was a total of 2586 deaths over a period of 5 years. There was a slight excess of deaths among the Leos (9.6% vs. 8.7). This relationship acquired a borderline significance after multivariate adjustment (hazard ratio, 1.17; 95% confidence interval [CI], 1.03 to 1.33, p=0.019). However, no significant difference in risk of mortality was seen in a propensity-adjusted model (adjusted hazard ratio, 1.10; 95% CI, 0.92 to 1.31; p=0.30). Conclusion: Being a Leo is not associated with any adverse cardiac risk. Our findings should provide reassurance to the large population of Leos that together make up approximately one twelfth of mankind.
Article
The extent to which a clinician should believe and act on the results of subgroup analyses of data from randomized trials or meta-analyses is controversial. Guidelines are provided in this paper for making these decisions. The strength of inference regarding a proposed difference in treatment effect among subgroups is dependent on the magnitude of the difference, the statistical significance of the difference, whether the hypothesis preceded or followed the analysis, whether the subgroup analysis was one of a small number of hypotheses tested, whether the difference was suggested by comparisons within or between studies, the consistency of the difference, and the existence of indirect evidence that supports the difference. Application of these guidelines will assist clinicians in making decisions regarding whether to base a treatment decision on overall results or on the results of a subgroup analysis.
Article
We examined the deaths of 28,169 adult Chinese-Americans, and 412,632 randomly selected, matched controls coded "white" on the death certificate. Chinese-Americans, but not whites, die significantly earlier than normal (1.3-4.9 yr) if they have a combination of disease and birthyear which Chinese astrology and medicine consider ill-fated. The more strongly a group is attached to Chinese traditions, the more years of life are lost. Our results hold for nearly all major causes of death studied. The reduction in survival cannot be completely explained by a change in the behaviour of the Chinese patient, doctor, or death-registrar, but seems to result at least partly from psychosomatic processes.
Article
To determine whether specific angiotensin II receptor blockade with losartan offers safety and efficacy advantages in the treatment of heart failure over angiotensin-converting-enzyme (ACE) inhibition with captopril, the ELITE study compared losartan with captopril in older heart-failure patients. We randomly assigned 722 ACE inhibitor naive patients (aged 65 years or more) with New York Heart Association (NYHA) class II-IV heart failure and ejection fractions of 40% or less to double-blind losartan (n = 352) titrated to 50 mg once daily or captopril (n = 370) titrated to 50 mg three times daily, for 48 weeks. The primary endpoint was the tolerability measure of a persisting increase in serum creatinine of 26.5 mumol/L or more (> or = 0.3 mg/dL) on therapy; the secondary endpoint was the composite of death and/or hospital admission for heart failure; and other efficacy measures were total mortality, admission for heart failure, NYHA class, and admission for myocardial infarction or unstable angina. The frequency of persisting increases in serum creatinine was the same in both groups (10.5%). Fewer losartan patients discontinued therapy for adverse experiences (12.2% vs 20.8% for captopril, p = 0.002). No losartan-treated patients discontinued due to cough compared with 14 in the captopril group. Death and/or hospital admission for heart failure was recorded in 9.4% of the losartan and 13.2% of the captopril patients (risk reduction 32% [95% CI -4% to + 55%], p = 0.075). This risk reduction was primarily due to a decrease in all-cause mortality (4.8% vs 8.7%; risk reduction 46% [95% CI 5-69%], p = 0.035). Admissions with heart failure were the same in both groups (5.7%), as was improvement in NYHA functional class from baseline. Admission to hospital for any reason was less frequent with losartan than with captopril treatment (22.2% vs 29.7%). In this study of elderly heart-failure patients, treatment with losartan was associated with an unexpected lower mortality than that found with captopril. Although there was no difference in renal dysfunction, losartan was generally better tolerated than captopril and fewer patients discontinued losartan therapy. A further trial, evaluating the effects of losartan and captopril on mortality and morbidity in a larger number of patients with heart failure, is in progress.
Article
Confirmatory clinical trials often classify clinical response variables into primary and secondary endpoints. The presence of two or more primary endpoints in a clinical trial usually means that some adjustments of the observed p-values for multiplicity of tests may be required for the control of the type I error rate. In this paper, we discuss statistical concerns associated with some commonly used multiple endpoint adjustment procedures. We also present limited Monte Carlo simulation results to demonstrate the performance of selected p-value-based methods in protecting the type I error rate.
Article
The ELITE study showed an association between the angiotensin II antagonist losartan and an unexpected survival benefit in elderly heart-failure patients, compared with captopril, an angiotensin-converting-enzyme (ACE) inhibitor. We did the ELITE II Losartan Heart Failure Survival Study to confirm whether losartan is superior to captopril in improving survival and is better tolerated. We undertook a double-blind, randomised, controlled trial of 3,152 patients aged 60 years or older with New York Heart Association class II-IV heart failure and ejection fraction of 40% or less. Patients, stratified for beta-blocker use, were randomly assigned losartan (n=1,578) titrated to 50 mg once daily or captopril (n=1,574) titrated to 50 mg three times daily. The primary and secondary endpoints were all-cause mortality, and sudden death or resuscitated arrest. We assessed safety and tolerability. Analysis was by intention to treat. Median follow-up was 555 days. There were no significant differences in all-cause mortality (11.7 vs 10.4% average annual mortality rate) or sudden death or resuscitated arrests (9.0 vs 7.3%) between the two treatment groups (hazard ratios 1.13 [95.7% CI 0.95-1.35], p=0.16 and 1.25 [95% CI 0.98-1.60], p=0.08). Significantly fewer patients in the losartan group (excluding those who died) discontinued study treatment because of adverse effects (9.7 vs 14.7%, p<0.001), including cough (0.3 vs 2.7%).
Article
This is a summary of reports of presentation made at the American College of Cardiology 49th Scientific Sessions, Anaheim, 12-15 March 2000. Studies with a particular interest for heart failure physicians have been reviewed. OPTIME-CHF: Outcomes of a Prospective Trial of Intravenous Milrinone for Exacerbations of Chronic Heart Failure. OPTIME-CHF was a randomised-controlled trial comparing a 48-h infusion of Milrinone or standard therapy in 951 patients recruited over a 2-year period. Patients were excluded if the investigator believed their clinical condition mandated inotropic therapy. Patients were randomised within 48 h of admission for an acute exacerbation of chronic heart failure to receive Milrinone or placebo infision for 48 h. Of the patients 43% were diabetics, 70% were receiving an angiotensin converting enzyme inhibitor, 25% were already on a beta-Blocker, and 34% had atrial fibrillation. There was no significant difference between the two groups in length of hospital stay during the index admission, subsequent readmissions and days in hospital over the following 60 days. Subjective clinical assessment scores were also no different. There was an average admission rate over the next year of one per patient in both groups. However, there was a significant increase in the incidence of sustained hypotension in the Milrinone group, which accounted for all of the increased adverse event rates for the active therapy. The 60-day mortality was 10% in both groups. This and previous trials of the oral formulation of Milrinone have now clearly demonstrated a lack of benefit with Milrinone in either during acute exacerbations of or in stable severe chronic heart failure [Packer M, Carver JR, Rodeheffer RJ et al. Effect of oral Milrinone on mortality in severe chronic heart failure. N Engl J Med 1991;325:1468-1475.]. Medium sized studies of Milrinone in patients with milder severities of heart failure also suggested an adverse impact on prognosis in the presence or absence of digoxin [DiBianco R, Shabetai R, Kostuk W, Moran J, Schlant RC, Wright R. A comparison of oral Milrinone, digoxin, and their combination in the treatment of patients with chronic heart failure. N Engl J Med 1989;320:677-683.]. Whether Milrinone even has a role for the management of a haemodyamic crisis requiring inotropic therapy must also be questioned.
Article
Multiplicity of data, hypotheses, and analyses is a common problem in biomedical and epidemiological research. Multiple testing theory provides a framework for defining and controlling appropriate error rates in order to protect against wrong conclusions. However, the corresponding multiple test procedures are underutilized in biomedical and epidemiological research. In this article, the existing multiple test procedures are summarized for the most important multiplicity situations. It is emphasized that adjustments for multiple testing are required in confirmatory studies whenever results from multiple tests have to be combined in one final conclusion and decision. In case of multiple significance tests a note on the error rate that will be controlled for is desirable.
Article
Impressive results for secondary outcomes or subgroup analyses pose problems for those trying to value the benefits observed in clinical trials. In the prospective randomised amlodipine survival evaluation study, comparing amlodipine with placebo in patients with severe heart failure, a prospectively defined subgroup of patients with non-ischaemic heart failure showed a 46% reduction in the risk of death (95% confidence interval 21% to 63%).1 This was achieved alongside a non-significant reduction in death from any cause or admission to hospital for major cardiovascular events (P=0.31), the prospectively defined primary outcome measure, and no observed benefits in the ischaemic group. The authors of the report commented: “Although this benefit was seen only in a subgroup of patients, it is likely that it reflects a true effect of amlodipine, since the randomisation procedure was stratified according to the cause of heart failure and a significant difference between the ischaemic and non-ischaemic strata was noted for both the primary and secondary end points of the study.”1 This article examines the interpretation that may be placed on the results of secondary end points and subgroup analyses in the context of clinical practice and health policy. With regard to health policy, it emphasises the need for discipline in interpreting clinical trials. #### Summary points Impressive results in subgroup analyses and secondary outcomes can be hard to interpret For individual patients, subgroup analyses and secondary end points can provide the best guide for clinical intervention Health policy decisions such as those taken by NICE aim to guide the treatment of future patients and will be difficult to change Health policy should be protected from undue inference by considering the results of predetermined primary outcomes Randomised trials commonly include a range of patients with a particular disorder and estimate the average effect of the intervention being studied. Clinicians …
Article
Altruistic motives and trust are central to scientific investigations involving people. These prompt volunteers to participate in clinical trials. However, publication bias and other causes of the failure to report trial results may lead to an overly positive view of medical interventions in the published evidence available. Registration of randomised controlled trials right from the start is therefore warranted. The International Committee of Medical Journal Editors has issued a statement to the effect that the 11 journals represented in the Committee will not consider publication of the results of trials that have not been registered in a publicly accessible register such as www.clinicaltrials.gov. Patients who voluntarily participate in clinical trials need to know that their contribution to better human healthcare is available for decision making in clinical practice.
Article
Large pragmatic trials provide the most reliable data about the effects of treatments, but should be designed, analysed, and reported to enable the most effective use of treatments in routine practice. Subgroup analyses are important if there are potentially large differences between groups in the risk of a poor outcome with or without treatment, if there is potential heterogeneity of treatment effect in relation to pathophysiology, if there are practical questions about when to treat, or if there are doubts about benefit in specific groups, such as elderly people, which are leading to potentially inappropriate undertreatment. Analyses must be predefined, carefully justified, and limited to a few clinically important questions, and post-hoc observations should be treated with scepticism irrespective of their statistical significance. If important subgroup effects are anticipated, trials should either be powered to detect them reliably or pooled analyses of several trials should be undertaken. Formal rules for the planning, analysis, and reporting of subgroup analyses are proposed.
Perspectives on large-scale cardiovascular clinical trials for the new millennium:1072e82. [7] Sleight P. Debate: subgroup analyses in clinical trials e fun to look at, but don't believe them?
  • Ej Topol
  • Rm Califf
  • Van
  • F Werf
  • M Simoons
  • J Hampton
  • Lee
  • Kl
Topol EJ, Califf RM, Van de Werf F, Simoons M, Hampton J, Lee KL, et al. Perspectives on large-scale cardiovascular clinical trials for the new millennium. Circulation 1997;95:1072e82. [7] Sleight P. Debate: subgroup analyses in clinical trials e fun to look at, but don't believe them? Curr Control Trials Cardiovasc Med 2000;1:25e7.
Data splitting Effect of amlodipine on morbidity and mortality in severe chronic heart failure
  • Rr Picard
  • Kn Berk
  • O M Packer
  • Cm Connor
  • Jk Ghali
  • Ml Pressler
  • Pe Carson
  • Rn Belkin
Picard RR, Berk KN. Data splitting. Am Stat 1990;44:140e7. [14] Packer M, O'Connor CM, Ghali JK, Pressler ML, Carson PE, Belkin RN, et al. Effect of amlodipine on morbidity and mortality in severe chronic heart failure. N Engl J Med 1996;335:1107e14.
Available at http://www.rgrossman.com/dm.htm 2005
  • Accessed December
Accessed December 14. Available at http://www.rgrossman.com/dm.htm 2005. 969 P.C. Austin et al. / Journal of Clinical Epidemiology 59 (2006) 964e969
Available at http://www.rgrossman.com/dm
Accessed December 14. Available at http://www.rgrossman.com/dm.htm 2005.
Some comments on frequently used multiple endpoint adjustment methods in clinical trials
  • Sankoh