Article

Addressing Publication Bias in Meta-Analysis: Empirical Findings From Community-Augmented Meta-Analyses of Infant Language Development

Authors:
  • Osnabrück University of Applied Sciences
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Meta-analyses are an indispensable research synthesis tool for characterizing bodies of literature and advancing theories. One important open question concerns the inclusion of unpublished data into meta-analyses. Finding such studies can be effortful, but their exclusion potentially leads to consequential biases like overestimation of a literature’s mean effect. We address two questions about unpublished data using MetaLab, a collection of community-augmented meta-analyses focused on developmental psychology. First, we assess to what extent MetaLab datasets include gray literature, and by what search strategies they are unearthed. We find that an average of 11% of datapoints are from unpublished literature; standard search strategies like database searches, complemented with individualized approaches like including authors’ own data, contribute the majority of this literature. Second, we analyze the effect of including versus excluding unpublished literature on estimates of effect size and publication bias, and find this decision does not affect outcomes. We discuss lessons learned and implications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The results of empirical tests on the extent and prevalence of publication bias in certain fields are widely available (Fanelli, Costas, & Ioannidis, 2017;Kvarven, Stromland, & Johannesson, 2020;Polanin et al., 2016;Tsuji, Cristia, Frank, & Bergmann, 2020;van Aert et al., 2019). Overall, these results have indicated that publication bias is present in all areas of research, but the extent to which this bias is present varies across disciplines Fanelli et al., 2017). ...
... Overall, these results have indicated that publication bias is present in all areas of research, but the extent to which this bias is present varies across disciplines Fanelli et al., 2017). Moreover, empirical evidence suggests that the degree of attention to the publication bias issue, both in the techniques used to assess and correct publication bias and the frequency with which the analyses are conducted, depends on the fields Ding et al., 2020;Kepes et al., 2012;Renkewitz & Keiner, 2019;Tsuji et al., 2020). ...
... Unlike the psychology and medical science fields (Ding et al., 2020;Tsuji et al., 2020), no attempt has been made to examine whether and how meta-analytic research studies published in the leading CCJ journals address publication bias. The current content analysis aimed to address this limitation. ...
Article
Objectives: The aims of the current study were 2-folded. This study examined the recent meta-analyses in the criminology and criminal justice (CCJ) field to capture a snapshot of the current state of publication bias practices and provided practical guidelines and recommendations on how to address the issues of publication bias in meta-analytic reviews. Methods: The content analysis reviewed 64 meta-analyses published in top-tier journals in the CCJ field in 2019–2020. The narrative review of previous simulation studies in medical science and psychology fields was performed to synthesize practical guidelines on publication bias in meta-analyses. Results: Recent CCJ meta-analytic studies have at least partially addressed the issue of publication bias by employing systematic search and statistical methods. However, the current state of CCJ meta-analyses does not meet the expectation required in medical science and psychology that all meta-analytic reviews report the range of effect size estimates across multiple publication bias detection and correction tests. The statistical methods commonly used for assessing publication bias are applied without testing and interpreting assumptions about the missing studies. Conclusions: There is a need to continue monitoring the quality of meta-analyses to gain a comprehensive picture of how bias leaves a potential imprint in CCJ research. Full text available (until 01/12/2022) https://authors.elsevier.com/a/1e7sXAlNEaPCY
... or we directly searched the journal's website for papers with "metaanalysis" in the title or abstract. For Metalab, we used Table 1 from Tsuji et al. 35 to screen 10 existing Metalab meta-analyses using our inclusion criterion for the number of point estimates. ...
... This is consistent with previous findings suggesting that the inclusion of studies from the grey literature did not consistently reduce publication bias in these meta-analyses. 35 ...
Article
Selective publication and reporting in individual papers compromise the scientific record, but are meta‐analyses as compromised as their constituent studies? We systematically sampled 63 meta‐analyses each comprising least 40 studies in PLOS One, top medical journals, top psychology journals, and Metalab, an online, open‐data database of developmental psychology meta‐analyses. We empirically estimated publication bias in each, including only the peerreviewed studies in each. Across all meta‐analyses, we estimated that “statistically significant” results in the expected direction were only 1:17 times more likely to be published than “nonsignificant” results or those in the unexpected direction (95% CI: [0:93, 1:47]), with a confidence interval substantially overlapping the null. Comparable estimates were 0.83 for meta‐analyses in PLOS One, 1.02 for top medical journals, 1.54 for top psychology journals, and 4.70 for Metalab. The severity of publication bias did differ across individual meta‐analyses; in a small minority (10%; 95% CI: [2%, 21%]), publication bias appeared to favor “significant” results in the expected direction by more than 3‐fold. We estimated that for 89% of meta‐analyses, the amount of publication bias that would be required to attenuate the point estimate to the null exceeded the amount of publication bias estimated to be actually present in the vast majority of meta‐analyses from the relevant scientific discipline (exceeding the 95th percentile of publication bias). Study‐level measures (“statistical significance” with a point estimate in the expected direction and point estimate size) did not indicate more publication bias in higher‐tier versus lower‐tier journals, nor in the earliest studies published on a topic versus later studies. Overall, the mere act of performing a meta‐analysis with a large number of studies (at least 40) and that includes non‐headline results may largely mitigate publication bias in meta‐analyses, suggesting optimism about the validity of meta‐analytic results. This article is protected by copyright. All rights reserved.
... This means a meta-analysis can be updated as new studies emerge or null results are dredged from the file drawer. In this paper we aim to instruct how to conduct a meta-analysis that is ready to become a CAMA on MetaLab or a similar platform (Burgard et al., demonstrating the added benefit of making such data available (e.g., Bergmann et al., 2018Bergmann et al., , 2017Tsuji et al., 2020;. ...
Article
Full-text available
Meta-analyses provide researchers with an overview of the body of evidence in a topic, with quantified estimates of effect sizes and the role of moderators, and weighting studies according to their precision. We provide a guide for conducting a transparent and reproducible meta-analysis in the field of developmental psychology within the framework of the MetaLab platform, in 10 steps: 1) Choose a topic for your meta-analysis, 2) Formulate your research question and specify inclusion criteria, 3) Preregister and document all stages of your meta-analysis, 4) Conduct the literature search, 5) Collect and screen records, 6) Extract data from eligible studies, 7) Read the data into analysis software and compute effect sizes, 8) Visualize your data, 9) Create meta-analytic models to assess the strength of the effect and investigate possible moderators, 10) Write up and promote your meta-analysis. Meta-analyses can inform future studies, through power calculations, by identifying robust methods and exposing research gaps. By adding a new meta-analysis to MetaLab, datasets across multiple topics of developmental psychology can be synthesized, and the dataset can be maintained as a living, community-augmented meta-analysis to which researchers add new data, allowing for a cumulative approach to evidence synthesis.
... In this paper we aim to instruct how to conduct a meta-analysis that is ready to become a CAMA on MetaLab or a similar platform (Burgard et al., 2021). There are various publications that used MetaLab data beyond the initial meta-analysis, demonstrating the added benefit of making such data available (e.g., Bergmann et al., 2018Bergmann et al., , 2017Tsuji et al., 2020;. ...
Preprint
Full-text available
Meta-analyses provide researchers with an overview of the body of evidence in a topic, with quantified estimates of effect sizes and the role of moderators, and weighting studies according to their precision. We provide a guide for conducting a transparent and reproducible meta-analysis in the field of developmental psychology within the framework of the MetaLab platform, in 10 steps: 1) Choose a topic for your meta-analysis, 2) Formulate your research question and specify inclusion criteria, 3) Preregister and document all stages of your meta-analysis, 4) Conduct the literature search, 5) Collect and screen records, 6) Extract data from eligible studies, 7) Read the data into analysis software and compute effect sizes, 8) Visualize your data, 9) Create meta-analytic models to assess the strength of the effect and investigate possible moderators, 10) Write up and promote your meta-analysis. Meta-analyses can inform future studies, through power calculations, by identifying robust methods and exposing research gaps. By adding a new meta-analysis to MetaLab, datasets across multiple topics of developmental psychology can be synthesized, and the dataset can be maintained as a living, community-augmented meta-analysis to which researchers add new data, allowing for a cumulative approach to evidence synthesis.
... In general, and most importantly, the ability of a meta-analysis to perform a reliable calculation is dependent upon the quality of the studies that have been combined. For this reason, to move forward we do not simply need to adopt a meta-analytic mindset, but we also need to start combining unbiased results, e.g., results of registered reports that likely are not as biased as non-preregistered studies, or at least, in the meta-analytic process, weigh differently results that are likely biased and results that are likely not biased (Tsuji, Cristia, Frank, & Bergmann, 2020). ...
Article
Full-text available
Among infant researchers there is growing concern regarding the widespread practice of undertaking studies that have small sample sizes and employ tests with low statistical power (to detect a wide range of possible effects). For many researchers, issues of confidence may be partially resolved by relying on replications. Here, we bring further evidence that the classical logic of confirmation, according to which the result of a replication study confirms the original finding when it reaches statistical significance, could be usefully abandoned. With real examples taken from the infant literature and Monte Carlo simulations, we show that a very wide range of possible replication results would in a formal statistical sense constitute confirmation as they can be explained simply due to sampling error. Thus, often no useful conclusion can be derived from a single or small number of replication studies. We suggest that, in order to accumulate and generate new knowledge, the dichotomous view of replication as confirmatory/disconfirmatory can be replaced by an approach that emphasizes the estimation of effect sizes via meta-analysis. Moreover, we discuss possible solutions for reducing problems affecting the validity of conclusions drawn from meta-analyses in infant research.
... First application of the sunset funnel plot with two published meta-analyses from medicine and psychology are presented, and software to create this variation of the funnel plot is provided via a tailored R function. Tsuji, Cristia, Frank, and Bergmann (2020) addressed two questions about unpublished data using MetaLab, a collection of community-augmented meta-analyses focused on developmental psychology. First, the authors assessed to what extent MetaLab datasets include gray literature, and by what search strategies they are unearthed. ...
Article
This editorial gives a brief introduction to the six articles included in the fourth “Hotspots in Psychology” of the Zeitschrift für Psychologie. The format is devoted to systematic reviews and meta-analyses in research-active fields that have generated a considerable number of primary studies. The common denominator is the research synthesis nature of the included articles, and not a specific psychological topic or theme that all articles have to address. Moreover, methodological advances in research synthesis methods relevant for any subfield of psychology are being addressed. Comprehensive supplemental material to the articles can be found in PsychArchives ( https://www.psycharchives.org ).
... However, note the comparatively low rates in the anonymous self-reports in Eason et al., 2017 and the large number of null results and little evidence for publication bias observed in a collection of meta-analyses on infant data at metalab.stanford.edu (Bergmann et al., 2018;Tsuji, Cristia, Frank, & Bergmann, 2019 Another source of false positives is p-hacking (see Table 1). Preregistration of exclusion criteria can protect against one type of p-hacking (or, again the appearance thereof), where analyses are run on the full sample, and again after excluding a subset of participants (see Figure 1). ...
Preprint
Full-text available
Preregistration, the act of specifying a research plan in advance, is becoming a central step in the way science is conducted. Preregistration for infant researchers might be different than in other fields, due to the specific challenges having to do with testing infants. Infants are a hard-to-reach population, usually yielding small sample sizes, they have a low attention span which usually can limit the number of trials, and they can be excluded based on hard to predict complications (e.g., parental interference, fussiness). In addition, as effects themselves potentially change with age and population, it is hard to calculate an a priori effect size. At the same time, these very factors make preregistration in infant studies a valuable tool. A priori examination of the planned study, including the hypotheses, sample size, and resulting statistical power, increase the credibility of single studies and thus add value to the field. It might arguably also improve explicit decision-making to create better studies. We present an in-depth discussion of the issues uniquely relevant to infant researchers, and ways to contend with them in preregistration and study planning. We provide recommendations to researchers interested in following current best practices.
... In general, and most importantly, the ability of a meta-analysis to perform a reliable calculation is dependent upon the quality of the studies that have been combined. For this reason, to move forward we do not simply need to adopt a meta-analytic mindset, but we also need to start combining unbiased results, e.g., results of registered reports that likely are not as biased as non-preregistered studies, or at least, in the meta-analytic process, weigh differently results that are likely biased and results that are likely not biased (Tsuji, Cristia, Frank, & Bergmann, 2020). ...
Preprint
Full-text available
Infant research is making considerable progresses. However, among infant researchers there is growing concern regarding the widespread habit of undertaking studies that have small sample sizes and employ tests with low statistical power (to detect a wide range of possible effects). For many researchers, issues of confidence may be partially resolved by relying on replications. Here, we bring further evidence that the classical logic of confirmation, according to which the result of a replication study confirms the original finding when it reaches statistical significance, could be usefully abandoned. With real examples taken from the infant literature and Monte Carlo simulations, we show that a very wide range of possible replication results would in a formal statistical sense constitute confirmation as they can be explained simply due to sampling error. Thus, often no useful conclusion can be derived from a single or small number of replication studies. We suggest that, in order to accumulate and generate new knowledge, the dichotomous view of replication as confirmatory/disconfirmatory can be replaced by an approach that emphasizes the estimation of effect sizes via meta-analysis. Moreover, we discuss possible solutions for reducing problems affecting the validity of conclusions drawn from meta-analyses in infant research.
Article
Video game training can effectively improve the cognition of older adults. However, whether video game types and game devices influence the training effects of video games remains controversial. This meta-analysis aimed to access and evaluate the effects of video game types and game devices in video game training on the cognition of older adults. Interestingly, results indicated that mouse/keyboard was superior over other video game devices on perceptual–motor function. The effect size (Hedge's g) for perceptual–motor function decreased by 1.777 and 1.722 when the video game training device changed from mouse/keyboard to driving simulator and motion controller. The effects of cognitive training game and conventional video game were moderated by session length. More well-designed studies are required to clarify the unique efficacy of video game types and devices for older adults with video game training.
Article
Full-text available
Infants are able to use the contexts in which familiar words appear to guide their inferences about the syntactic category of novel words (e.g., “This is a” + “dax” ‐> dax = object). The current study examined whether 18‐month‐old infants can rapidly adapt these expectations by tracking the distribution of syntactic structures in their input. In French, la petite can be followed by both nouns (la petite balle, “the little ball”) and verbs (la petite mange, “the little one is eating”). Infants were habituated to a novel word, as well as to familiar nouns or verbs (depending on the experimental group), all appearing after la petite. The familiar words served to create an expectation that la petite would be followed by either nouns or verbs. If infants can utilize their knowledge of a few frequent words to adjust their expectations, then they could use this information to infer the syntactic category of a novel word – and be surprised when the novel word is used in a context that is incongruent with their expectations. However, infants in both groups did not show a difference between noun and verb test trials. Thus, no evidence for adaptation‐based learning was found. We propose that infants have to entertain strong expectations about syntactic contexts before they can adapt these expectations based on recent input.
Article
In 2015, the American Psychological Association (APA) released a task-force technical report on video-game violence with a concurrent resolution statement linking violent games to aggression but not violent crime. The task-force report has proven to be controversial; many scholars have criticized language implying conclusive evidence linking violent games to aggression as well as technical concerns regarding the meta-analysis that formed the basis of the technical report and resolution statement. In the current article, we attempt a reevaluation of the 2015 technical report meta-analysis. The intent of this reevaluation was to examine whether the data foundations behind the APA’s resolution on video-game violence were sound. Reproducing the original meta-analysis proved difficult because some studies were included that did not appear to have relevant data, and many other available studies were not included. The current analysis revealed negligible relationships between violent games and aggressive or prosocial behavior, small relationships with aggressive affect and cognitions, and stronger relationships with desensitization. However, effect sizes appeared to be elevated because of non-best-practices and researcher-expectancy effects, particularly for experimental studies. It is concluded that evidence warrants a more cautious interpretation of the effects of violent games on aggression than provided by the APA technical report or resolution statement.
Article
Full-text available
Preregistration, the act of specifying a research plan in advance, is becoming more common in scientific research. Infant researchers contend with unique problems that might make preregistration particularly challenging. Infants are a hard‐to‐reach population, usually yielding small sample sizes, they can only complete a limited number of trials, and they can be excluded based on hard‐to‐predict complications (e.g., parental interference, fussiness). In addition, as effects themselves potentially change with age and population, it is hard to calculate an a priori effect size. At the same time, these very factors make preregistration in infant studies a valuable tool. A priori examination of the planned study, including the hypotheses, sample size, and resulting statistical power, increases the credibility of single studies and adds value to the field. Preregistration might also improve explicit decision making to create better studies. We present an in‐depth discussion of the issues uniquely relevant to infant researchers, and ways to contend with them in preregistration and study planning. We provide recommendations to researchers interested in following current best practices.
Article
Full-text available
Associative word learning, the ability to pair a concept to a word, is an essential mechanism for early language development. One common method by which researchers measure this ability is the Switch task (Werker, Cohen, Lloyd, Casasola, & Stager, 1998), wherein infants are habituated to 2 word-object pairings and then tested on their ability to notice a switch in those pairings. In this comprehensive meta-analysis, we summarized 141 Switch task studies involving 2,723 infants of 12 to 20 months to estimate an average effect size for the task (random-effect model) and to explore how key experimental factors affect infants' performance (fixed-effect model). The average effect size was low to moderate in size, Cohen's d = 0.32. The use of language-typical and dissimilarsounding words as well as the presence of additional facilitative cues aided performance, particularly for younger infants. Infants learning 2 languages at home outperformed those learning 1, indicating a bilingual advantage in learning word-object associations. Together, these findings support the Processing Rich Information from Multidimensional Interactive Representations (PRIMIR) theoretical framework of infant speech perception and word learning (e.g., Werker & Curtin, 2005), but invite further theoretical work to account for the observed bilingual advantage. Lastly, some of our analyses raised the possibility of questionable research practices in this literature. Therefore, we conclude with suggestions (e.g., preregistration, transparent data peeking, and alternate statistical approaches) for how to address this important issue.
Article
Full-text available
Everyone agrees that infants possess general mechanisms for learning about the world, but the existence and operation of more specialized mechanisms is controversial. One mechanism—rule learning—has been proposed as potentially specific to speech, based on findings that 7‐month‐olds can learn abstract repetition rules from spoken syllables (e.g. ABB patterns: wo‐fe‐fe, ga‐tu‐tu…) but not from closely matched stimuli, such as tones. Subsequent work has shown that learning of abstract patterns is not simply specific to speech. However, we still lack a parsimonious explanation to tie together the diverse, messy, and occasionally contradictory findings in that literature. We took two routes to creating a new profile of rule learning: meta‐analysis of 20 prior reports on infants’ learning of abstract repetition rules (including 1,318 infants in 63 experiments total), and an experiment on learning of such rules from a natural, non‐speech communicative signal. These complementary approaches revealed that infants were most likely to learn abstract patterns from meaningful stimuli. We argue that the ability to detect and generalize simple patterns supports learning across domains in infancy but chiefly when the signal is meaningfully relevant to infants’ experience with sounds, objects, language, and people.
Preprint
Full-text available
Meta-analyses are an important tool to evaluate the literature. It is essential that meta-analyses can easily be reproduced to allow researchers to evaluate the impact of subjective choices on meta-analytic effect sizes, but also to update meta-analyses as new data comes in, or as novel statistical techniques (for example to correct for publication bias) are developed. Research in medicine has revealed meta-analyses often cannot be reproduced. In this project, we examined the reproducibility of meta-analyses in psychology by reproducing twenty published meta-analyses. Reproducing published meta-analyses was surprisingly difficult. 96% of meta-analyses published in 2013-2014 did not adhere to reporting guidelines. A third of these meta-analyses did not contain a table specifying all individual effect sizes. Five of the 20 randomly selected meta-analyses we attempted to reproduce could not be reproduced at all due to lack of access to raw data, no details about the effect sizes extracted from each study, or a lack of information about how effect sizes were coded. In the remaining meta-analyses, differences between the reported and reproduced effect size or sample size were common. We discuss a range of possible improvements, such as more clearly indicating which data were used to calculate an effect size, specifying all individual effect sizes, adding detailed information about equations that are used, and how multiple effect size estimates from the same study are combined, but also sharing raw data retrieved from original authors, or unpublished research reports. This project clearly illustrates there is a lot of room for improvement when it comes to the transparency and reproducibility of published meta-analyses.
Article
Full-text available
Meta-analyses are used to make educational decisions in policy and practice. Publication bias refers to the extent to which published literature is more likely to have statistically significant results and larger sample sizes than studies that do not make it through the publication process. The purpose of the present study is to estimate the extent to which publication bias is present in a broad set of education and special education journals. We reviewed 222 meta-analyses to describe the prevalence of publication bias tests, and further identified 29 that met inclusion criteria for effect size extraction. Descriptive data reveal that 58% of meta-analyses (n = 128) documented no effort to test for possible publication bias, and analyses of 72 difference statistics revealed that published studies were associated with significantly larger effect sizes than unpublished studies (d = 0.64). Exploratory moderator analyses revealed that effect size metric was a significant predictors of the difference between published and unpublished studies.
Article
Full-text available
Adults and toddlers systematically associate pseudowords such as ‘bouba’ and ‘kiki’ with round and spiky shapes respectively, a sound symbolic phenomenon known as the “bouba-kiki effect”. To date, whether this sound symbolic effect is a property of the infant brain present at birth or is a learned aspect of language perception remains unknown. Yet, solving this question is fundamental for our understanding of early language acquisition. Indeed, an early sensitivity to such sound symbolic associations could provide a powerful mechanism for language learning, playing a bootstrapping role in the establishment of novel sound-meaning associations. The aim of the present meta-analysis (SymBouKi) is to provide a quantitative overview of the emergence of the bouba-kiki effect in infancy and early childhood. It allows a high-powered assessment of the true sound symbolic effect size by pooling over the entire set of 11 extant studies (6 published, 5 unpublished), entailing data from 425 participants between 4-38 months of age. The quantitative data provide statistical support for a moderate, but significant sound symbolic effect. Further analysis found a greater sensitivity to sound symbolism for bouba-type pseudowords (i.e., round sound-shape correspondences) than for kiki-type pseudowords (i.e., spiky sound-shape correspondences). For the kiki-type pseudowords, the effect emerged with age. Such discrepancy challenges the view that sensitivity to sound symbolism is an innate language mechanism rooted in an exuberant interconnected brain. We propose alternative hypotheses where both innate and learned mechanisms are at play in the emergence of sensitivity to sound symbolic relationships.
Article
Full-text available
In the last decade, numerous studies reported that infants prefer prosocial agents (those who provide help, comfort or fairness in distributive actions) to antisocial agents (those who harm others or distribute goods unfairly). We meta-analyzed the results of published and unpublished studies on infants aged 4-32 months and estimated that approximately two infants out of three, when given a choice between a prosocial and an antisocial agent, choose the former. This preference was not significantly affected by age and other factors such as the type of dependent variable (selective reaching or helping) and the modality of stimulus presentation (cartoons or real events). Effect size was affected by the type of familiarization events: giving/taking actions increased its magnitude compared to helping/hindering actions. There was evidence of a publication bias, suggesting that the effect size in published studies is likely to be inflated. Also, the distribution of children who chose the prosocial agent in experiments with N = 16 suggested a file drawer problem.
Article
Full-text available
Background: We explore whether the number of null results in large National Heart Lung, and Blood Institute (NHLBI) funded trials has increased over time. Methods: We identified all large NHLBI supported RCTs between 1970 and 2012 evaluating drugs or dietary supplements for the treatment or prevention of cardiovascular disease. Trials were included if direct costs >$500,000/year, participants were adult humans, and the primary outcome was cardiovascular risk, disease or death. The 55 trials meeting these criteria were coded for whether they were published prior to or after the year 2000, whether they registered in clinicaltrials.gov prior to publication, used active or placebo comparator, and whether or not the trial had industry co-sponsorship. We tabulated whether the study reported a positive, negative, or null result on the primary outcome variable and for total mortality. Results: 17 of 30 studies (57%) published prior to 2000 showed a significant benefit of intervention on the primary outcome in comparison to only 2 among the 25 (8%) trials published after 2000 (χ2=12.2,df= 1, p=0.0005). There has been no change in the proportion of trials that compared treatment to placebo versus active comparator. Industry co-sponsorship was unrelated to the probability of reporting a significant benefit. Pre-registration in clinical trials.gov was strongly associated with the trend toward null findings. Conclusions: The number NHLBI trials reporting positive results declined after the year 2000. Prospective declaration of outcomes in RCTs, and the adoption of transparent reporting standards, as required by clinicaltrials.gov, may have contributed to the trend toward null findings.
Article
Full-text available
We present the concept of a community-augmented meta-analysis (CAMA), a simple yet novel tool that significantly facilitates the accumulation and evaluation of previous studies within a specific scientific field. A CAMA is a combination of a meta-analysis and an open repository. Like a meta-analysis, it is centered around a psychologically relevant topic and includes methodological details and standardized effect sizes. As in a repository, data do not remain undisclosed and static after publication but can be used and extended by the research community, as anyone can download all information and can add new data via simple forms. Based on our experiences with building three CAMAs, we illustrate the concept and explain how CAMAs can facilitate improving our research practices via the integration of past research, the accumulation of knowledge, and the documentation of file-drawer studies. © The Author(s) 2014.
Article
Full-text available
Background: In searches for clinical trials and systematic reviews, it is said that Google Scholar (GS) should never be used in isolation, but in addition to PubMed, Cochrane, and other trusted sources of information. We therefore performed a study to assess the coverage of GS specifically for the studies included in systematic reviews and evaluate if GS was sensitive enough to be used alone for systematic reviews. Methods: All the original studies included in 29 systematic reviews published in the Cochrane Database Syst Rev or in the JAMA in 2009 were gathered in a gold standard database. GS was searched for all these studies one by one to assess the percentage of studies which could have been identified by searching only GS. Results: All the 738 original studies included in the gold standard database were retrieved in GS (100%). Conclusion: The coverage of GS for the studies included in the systematic reviews is 100%. If the authors of the 29 systematic reviews had used only GS, no reference would have been missed. With some improvement in the research options, to increase its precision, GS could become the leading bibliographic database in medicine and could be used alone for systematic reviews.
Article
Full-text available
If science were a game, a dominant rule would probably be to collect results that are statistically significant. Several reviews of the psychological literature have shown that around 96% of papers involving the use of null hypothesis significance testing report significant outcomes for their main results but that the typical studies are insufficiently powerful for such a track record. We explain this paradox by showing that the use of several small underpowered samples often represents a more efficient research strategy (in terms of finding p < .05) than does the use of one larger (more powerful) sample. Publication bias and the most efficient strategy lead to inflated effects and high rates of false positives, especially when researchers also resorted to questionable research practices, such as adding participants after intermediate testing. We provide simulations that highlight the severity of such biases in meta-analyses. We consider 13 meta-analyses covering 281 primary studies in various fields of psychology and find indications of biases and/or an excess of significant results in seven. These results highlight the need for sufficiently powerful replications and changes in journal policies. © The Author(s) 2012.
Article
Full-text available
Objective: Funnel plots (plots of effect estimates against sample size) may be useful to detect bias in meta-analyses that were later contradicted by large trials. We examined whether a simple test of asymmetry of funnel plots predicts discordance of results when meta-analyses are compared to large trials, and we assessed the prevalence of bias in published meta-analyses. Design: Medline search to identify pairs consisting of a meta-analysis and a single large trial (concordance of results was assumed if effects were in the same direction and the meta-analytic estimate was within 30
Article
Full-text available
We assessed the adequacy of randomized controlled trial (RCT) registration, changes to registration data and reporting completeness for articles in ICMJE journals during 2.5 years after registration requirement policy. For a set of 149 reports of 152 RCTs with ClinicalTrials.gov registration number, published from September 2005 to April 2008, we evaluated the completeness of 9 items from WHO 20-item Minimum Data Set relevant for assessing trial quality. We also assessed changes to the registration elements at the Archive site of ClinicalTrials.gov and compared published and registry data. RCTs were mostly registered before 13 September 2005 deadline (n = 101, 66.4%); 118 (77.6%) started recruitment before and 31 (20.4%) after registration. At the time of registration, 152 RCTs had a total of 224 missing registry fields, most commonly 'Key secondary outcomes' (44.1% RCTs) and 'Primary outcome' (38.8%). More RCTs with post-registration recruitment had missing Minimum Data Set items than RCTs with pre-registration recruitment: 57/118 (48.3%) vs. 24/31 (77.4%) (χ(2) (1) = 7.255, P = 0.007). Major changes in the data entries were found for 31 (25.2%) RCTs. The number of RCTs with differences between registered and published data ranged from 21 (13.8%) for Study type to 118 (77.6%) for Target sample size. ICMJE journals published RCTs with proper registration but the registration data were often not adequate, underwent substantial changes in the registry over time and differed in registered and published data. Editors need to establish quality control procedures in the journals so that they continue to contribute to the increased transparency of clinical trials.
Article
Full-text available
The metafor package provides functions for conducting meta-analyses in R. The package includes functions for fitting the meta-analytic fixed- and random-effects models and allows for the inclusion of moderators variables (study-level covariates) in these models. Meta-regression analyses with continuous and categorical moderators can be conducted in this way. Functions for the Mantel-Haenszel and Peto&apos;s one-step method for meta-analyses of 2 x 2 table data are also available. Finally, the package provides various plot functions (for example, for forest, funnel, and radial plots) and functions for assessing the model fit, for obtaining case diagnostics, and for tests of publication bias.
Article
Full-text available
Funnel plots (plots of effect estimates against sample size) may be useful to detect bias in meta-analyses that were later contradicted by large trials. We examined whether a simple test of asymmetry of funnel plots predicts discordance of results when meta-analyses are compared to large trials, and we assessed the prevalence of bias in published meta-analyses. Medline search to identify pairs consisting of a meta-analysis and a single large trial (concordance of results was assumed if effects were in the same direction and the meta-analytic estimate was within 30% of the trial); analysis of funnel plots from 37 meta-analyses identified from a hand search of four leading general medicine journals 1993-6 and 38 meta-analyses from the second 1996 issue of the Cochrane Database of Systematic Reviews. Degree of funnel plot asymmetry as measured by the intercept from regression of standard normal deviates against precision. In the eight pairs of meta-analysis and large trial that were identified (five from cardiovascular medicine, one from diabetic medicine, one from geriatric medicine, one from perinatal medicine) there were four concordant and four discordant pairs. In all cases discordance was due to meta-analyses showing larger effects. Funnel plot asymmetry was present in three out of four discordant pairs but in none of concordant pairs. In 14 (38%) journal meta-analyses and 5 (13%) Cochrane reviews, funnel plot asymmetry indicated that there was bias. A simple analysis of funnel plots provides a useful test for the likely presence of bias in meta-analyses, but as the capacity to detect bias will be limited when meta-analyses are based on a limited number of small trials the results from such analyses should be treated with considerable caution.
Article
Full-text available
Evidence based medicine insists on rigorous standards to appraise clinical interventions. Failure to apply the same rules to its own tools could be equally damaging The advent of evidence based medicine has generated considerable interest in developing and applying methods that can improve the appraisal and synthesis of data from diverse studies. Some methods have become an integral part of systematic reviews and meta-analyses, with reviewers, editors, instructional handbooks, and guidelines encouraging their routine inclusion. However, the evidence for using these methods is sometimes lacking, as the reliance on funnel plots shows. The funnel plot is a scatter plot of the component studies in a meta-analysis, with the treatment effect on the horizontal axis and some measure of weight, such as the inverse variance, the standard error, or the sample size, on the vertical axis. Light and Pillemer proposed in 1984: “If all studies come from a single underlying population, this graph should look like a funnel, with the effect sizes homing in on the true underlying value as n increases. [If there is publication bias] there should be a bite out of the funnel.”1 Many meta-analyses show funnel plots or perform various tests that examine whether there is asymmetry in the funnel plot and directly interpret the results as showing evidence for or against the presence of publication bias. The plot's wide popularity followed an article published in the BMJ in 1997.2 That pivotal article has already received over 800 citations (as of December 2005) in the Web of Science. With two exceptions, this is more citations than for any other paper published by the BMJ in the past decade. The authors were careful to state many reasons why funnel plot asymmetry may not necessarily reflect publication bias. However, apparently many readers did not go beyond the …
Article
Before infants become mature speakers of their native language, they must acquire a robust word-recognition system which allows them to strike the balance between allowing some variation (mood, voice, accent) and recognizing variability that potentially changes meaning (e.g. cat vs hat). The current meta-analysis quantifies how the latter, termed mispronunciation sensitivity, changes over infants first three years, testing competing predictions of mainstream language acquisition theories. Our results show that infants were sensitive to mispronunciations, but accepted them as labels for target objects. Interestingly, and in contrast to predictions of mainstream theories, mispronunciation sensitivity was not modulated by infant age, suggesting that a sufficiently flexible understanding of native language phonology is in place at a young age.
Article
Background: Sharing individual participant data (IPD) among researchers, upon request, is an ethical and responsible practice. Despite numerous calls for this practice to be standard, however, research indicates that primary study authors are often unwilling to share IPD, even for use in a meta-analysis. Objectives: This study sought to examine researchers' reservations about data sharing and to evaluate the impact of sending a data-sharing agreement on researchers' attitudes toward sharing IPD. Methods: To investigate these questions, we conducted a randomized controlled trial in conjunction with a Web-based survey. We searched for and invited primary study authors of studies included in recent meta-analyses. We emailed more than 1,200 individuals, and 247 participated. The survey asked individuals about their transparent research practices, general concerns about sharing data, attitudes toward sharing data for inclusion in a meta-analysis, and concerns about sharing data in the context of a meta-analysis. We hypothesized that participants who were randomly assigned to receive a data-sharing agreement would be more willing to share their primary study's IPD. Results: Results indicated that participants who received a data-sharing agreement were more willing to share their dataset, compared with control participants, even after controlling for demographics and pretest values (d = 0.65, 95% CI[0.39, 0.90]). A member of the control group is 24 percent more likely to share her dataset should she receive the data-sharing agreement. Conclusions: These findings shed light on data-sharing practices, attitudes, and concerns and can be used to inform future meta-analysis projects seeking to collect IPD, as well as the field at large.
Article
Previous work suggests key factors for replicability, a necessary feature for theory building, include statistical power and appropriate research planning. These factors are examined by analyzing a collection of 12 standardized meta-analyses on language development between birth and 5 years. With a median effect size of Cohen's d = 0.45 and typical sample size of 18 participants, most research is underpowered (range: 6%-99%; median 44%); and calculating power based on seminal publications is not a suitable strategy. Method choice can be improved, as shown in analyses on exclusion rates and effect size as a function of method. The article ends with a discussion on how to increase replicability in both language acquisition studies specifically and developmental research more generally.
Article
Two of the key tasks facing the language-learning infant lie at the level of phonology: establishing which sounds are contrastive in the native inventory, and determining what their possible syllabic positions and permissible combinations (phonotactics) are. In 2002-2003, two theoretical proposals, one bearing on how infants can learn sounds (Maye, Werker, & Gerken, 2002) and the other on phonotactics (Chambers, Onishi, & Fisher, 2003), were put forward on the pages of Cognition, each supported by two laboratory experiments, wherein a group of infants was briefly exposed to a set of pseudo-words, and plausible phonological generalizations were tested subsequently. These two papers have received considerable attention from the general scientific community, and inspired a flurry of follow-up work. In the context of questions regarding the replicability of psychological science, the present work uses a meta-analytic approach to appraise extant empirical evidence for infant phonological learning in the laboratory. It is found that neither seminal finding (on learning sounds and learning phonotactics) holds up when close methodological replications are integrated, although less close methodological replications do provide some evidence in favor of the sound learning strand of work. Implications for authors and readers of this literature are drawn out. It would be desirable that additional mechanisms for phonological learning be explored, and that future infant laboratory work employ paradigms that rely on constrained and unambiguous links between experimental exposure and measured infant behavior.
Conference Paper
Theories of language acquisition and perceptual learning increasingly rely on statistical learning mechanisms. The current meta-analysis aims to clarify the robustness of this capacity in infancy within the word segmentation literature. Our analysis reveals a significant, small effect size for conceptual replications of Saffran, Aslin, & Newport (1996), and a nonsignificant effect across all studies that incorporate transitional probabilities to segment words. In both conceptual replications and the broader literature, however, statistical learning is moderated by whether stimuli are naturally produced or synthesized. These findings invite deeper questions about the complex factors that influence statistical learning, and the role of statistical learning in language acquisition.
Article
Infants start learning words, the building blocks of language, at least by 6 months. To do so, they must be able to extract the phonological form of words from running speech. A rich literature has investigated this process, termed word segmentation. We addressed the fundamental question of how infants of different ages segment words from their native language using a meta-analytic approach. Based on previous popular theoretical and experimental work, we expected infants to display familiarity preferences early on, with a switch to novelty preferences as infants become more proficient at processing and segmenting native speech. We also considered the possibility that this switch may occur at different points in time as a function of infants' native language and took into account the impact of various task- and stimulus-related factors that might affect difficulty. The combined results from 168 experiments reporting on data gathered from 3774 infants revealed a persistent familiarity preference across all ages. There was no significant effect of additional factors, including native language and experiment design. Further analyses revealed no sign of selective data collection or reporting. We conclude that models of infant information processing that are frequently cited in this domain may not, in fact, apply in the case of segmenting words from native speech.
Article
Practitioners and policymakers rely on meta-analyses to inform decision making around the allocation of resources to individuals and organizations. It is therefore paramount to consider the validity of these results. A well-documented threat to the validity of research synthesis results is the presence of publication bias, a phenomenon where studies with large and/or statistically significant effects, relative to studies with small or null effects, are more likely to be published. We investigated this phenomenon empirically by reviewing meta-analyses published in top-tier journals between 1986 and 2013 that quantified the difference between effect sizes from published and unpublished research. We reviewed 383 meta-analyses of which 81 had sufficient information to calculate an effect size. Results indicated that published studies yielded larger effect sizes than those from unpublished studies ( = 0.18, 95% confidence interval [0.10, 0.25]). Moderator analyses revealed that the difference was larger in meta-analyses that included a wide range of unpublished literature. We conclude that intervention researchers require continued support to publish null findings and that meta-analyses should include unpublished studies to mitigate the potential bias from publication status.
Article
Although the majority of evidence on perceptual narrowing in speech sounds is based on consonants, most models of infant speech perception generalize these findings to vowels, assuming that vowel perception improves for vowel sounds that are present in the infant's native language within the first year of life, and deteriorates for non-native vowel sounds over the same period of time. The present meta-analysis contributes to assessing to what extent these descriptions are accurate in the first comprehensive quantitative meta-analysis of perceptual narrowing in infant vowel discrimination, including results from behavioral, electrophysiological, and neuroimaging methods applied to infants 0-14 months of age. An analysis of effect sizes for native and non-native vowel discrimination over the first year of life revealed that they changed with age in opposite directions, being significant by about 6 months of age. © 2013 Wiley Periodicals, Inc. Dev Psychobiol 2013.
Article
Publication bias remains a controversial issue in psychological science. The tendency of psychological science to avoid publishing null results produces a situation that limits the replicability assumption of science, as replication cannot be meaningful without the potential acknowledgment of failed replications. We argue that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science's capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous "undead" theories that are ideologically popular but have little basis in fact. © The Author(s) 2012.
Article
Meta-analytic methods have been widely applied to education, medicine, and the social sciences. Much of meta-analytic data are hierarchically structured because effect size estimates are nested within studies, and in turn, studies can be nested within level-3 units such as laboratories or investigators, and so forth. Thus, multilevel models are a natural framework for analyzing meta-analytic data. This paper discusses the application of a Fisher scoring method in two-level and three-level meta-analysis that takes into account random variation at the second and third levels. The usefulness of the model is demonstrated using data that provide information about school calendar types. sas proc mixed and hlm can be used to compute the estimates of fixed effects and variance components. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.
Article
In the GRADE approach, randomized trials start as high-quality evidence and observational studies as low-quality evidence, but both can be rated down if a body of evidence is associated with a high risk of publication bias. Even when individual studies included in best-evidence summaries have a low risk of bias, publication bias can result in substantial overestimates of effect. Authors should suspect publication bias when available evidence comes from a number of small studies, most of which have been commercially funded. A number of approaches based on examination of the pattern of data are available to help assess publication bias. The most popular of these is the funnel plot; all, however, have substantial limitations. Publication bias is likely frequent, and caution in the face of early results, particularly with small sample size and number of events, is warranted.
Article
The issue of publication bias in psychological science is one that has remained difficult to address despite decades of discussion and debate. The current article examines a sample of 91 recent meta-analyses published in American Psychological Association and Association for Psychological Science journals and the methods used in these analyses to identify and control for publication bias. Of the 91 studies analyzed, 64 (70%) made some effort to analyze publication bias, and 26 (41%) reported finding evidence of bias. Approaches to controlling publication bias were heterogeneous among studies. Of these studies, 57 (63%) attempted to find unpublished studies to control for publication bias. Nonetheless, those studies that included unpublished studies were just as likely to find evidence for publication bias as those that did not. Furthermore, authors of meta-analyses themselves were overrepresented in unpublished studies acquired, as compared with published studies, suggesting that searches for unpublished studies may increase rather than decrease some sources of bias. A subset of 48 meta-analyses for which study sample sizes and effect sizes were available was further analyzed with a conservative and newly developed tandem procedure of assessing publication bias. Results indicated that publication bias was worrisome in about 25% of meta-analyses. Meta-analyses that included unpublished studies were more likely to show bias than those that did not, likely due to selection bias in unpublished literature searches. Sources of publication bias and implications for the use of meta-analysis are discussed.
Article
Meta-analyses are subject to bias for many of reasons, including publication bias. Asymmetry in a funnel plot of study size against treatment effect is often used to identify such bias. We compare the performance of three simple methods of testing for bias: the rank correlation method; a simple linear regression of the standardized estimate of treatment effect on the precision of the estimate; and a regression of the treatment effect on sample size. The tests are applied to simulated meta-analyses in the presence and absence of publication bias. Both one-sided and two-sided censoring of studies based on statistical significance was used. The results indicate that none of the tests performs consistently well. Test performance varied with the magnitude of the true treatment effect, distribution of study size and whether a one- or two-tailed significance test was employed. Overall, the power of the tests was low when the number of studies per meta-analysis was close to that often observed in practice. Tests that showed the highest power also had type I error rates higher than the nominal level. Based on the empirical type I error rates, a regression of treatment effect on sample size, weighted by the inverse of the variance of the logit of the pooled proportion (using the marginal total) is the preferred method.
Article
To summarize the evidence concerning bias and confounding in conducting systematic reviews (SRs). Literature was identified through searching the Cochrane Library, MEDLINE, PsycINFO until November 2006, and the authors' files. Studies were included if they were SRs of bias that can occur while conducting a SR. Risk of bias in the SRs was appraised using the Oxman and Guyatt index. Ten SRs were included. All examined biases related to searching for evidence (e.g., publication bias). One also reported bias associated with obtaining data from included studies (e.g., outcome reporting bias). To minimize bias, data suggest including unpublished material, hand searching for additional material, searching multiple databases, assessing for publication bias, and periodically updating SRs. No SRs were found examining biases related to choosing studies for inclusion or combining studies. There is little evidence from SRs to support commonly practiced methods for conducting SRs. No SRs summarized studies with prospective designs and most had moderate or minimal risk of bias. Future research should examine bias that can occur during the selection of studies for inclusion and the synthesis of studies, as well as systematically review the existing empirical evidence.
Unpublished doctoral dissertation)
  • M J Carbajal
tidyverse: Easily Install and Load the 'Tidyverse
  • H Wickham
Wickham, H. (2017). tidyverse: Easily Install and Load the 'Tidyverse'. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse