This book serves as an accessible introduction into how meta-analyses can be conducted in R. Essential steps for meta-analysis are covered, including pooling of outcome measures, forest plots, heterogeneity diagnostics, subgroup analyses, meta-regression, methods to control for publication bias, risk of bias assessments and plotting tools. Advanced, but highly relevant topics such as network meta-analysis, multi-/three-level meta-analyses, Bayesian meta-analysis approaches, SEM meta-analysis are also covered. The programming and statistical background covered in the book are kept at a non-expert level. A print version of this book has been published with Chapman & Hall/CRC Press (Taylor & Francis). The complete book can be accessed online:
Doing Meta-Analysis
with R
A Hands-On Guide
Mathias Harrer
Pim Cuijpers
Toshi A. Furukawa
David D. Ebert
Dear colleague,
As of September 2021, a fully revised and extended version of “Doing Meta-Analysis with R A
Hands-On Guide” has been published with CRC Press/Chapman & Hall (Taylor & Francis). A
physical copy of the book can be purchased, for example, via the Routledge Online Shop.
Unfortunately, it is therefore not possible to provide an openly accessible PDF Version of the
guide on the Internet any longer.
We strongly believe that everyone should have access to academic publications, regardless of
their background or financial resources. Thus, we are very grateful that CRC allows us to
maintain a full online version of the guide, which you can find here. On this website, the entire
contents of book can be accessed for free. The online version even contains some additional
content which did not make it into the printed version.
You are still free to use and adapt contents of the book, for example for your own teaching. If
you need more or other material for yourself or your students, feel free to contact Mathias
We hope you are not too disappointed that the PDF ends here, and that you find reading the
guide useful, either in its printed form or on the web.
Kind regards,
Mathias, Pim, Toshi, David
June 2021
↗ Open Online Version
... All statistical tests within this article were performed in jamovi (The jamovi project, 2021). We used the MAJOR (Hamilton, 2021) jamovi module to perform a correlation coefficients meta-analysis, following recommendations by Harrer, Cuijpers, Furukawa, and Ebert (2021). The correlation coefficients of the associations between perceived body odour and facial attractiveness and body odour and vocal attractiveness were converted with Fisher's r-to-z transformation and accompanied by their 95% CI. ...
... The correlation coefficients of the associations between perceived body odour and facial attractiveness and body odour and vocal attractiveness were converted with Fisher's r-to-z transformation and accompanied by their 95% CI. Fisher's r-to-z transform is the recommended procedure for correcting for bias in studies with small sample sizes (Harrer et al., 2021). 2 Separate meta-analyses were performed for correlations between each pair of stimuli (body odourfacial attractiveness and body odourvocal attractiveness). ...
... We assumed that variation in effect sizes between studies was due to sampling error of true effect sizes or because of other (e.g., methodological) differences between studies. Therefore, we used the random-effects model with a restricted maximum-likelihood estimator (Harrer, Cuijpers, Furukawa, & Ebert, 2021) for heterogeneity statistics (Tau 2 ). Heterogeneity examines whether variation in the observed correlations results from sampling error. ...
Assessing the attractiveness of potential mating partners typically involves multiple sensory modalities, including the integration of olfactory, visual, and auditory cues. However, predictions diverge on how the individual modalities should relate to each other. According to the backup signals hypothesis, multimodal cues provide redundant information, whereas the multiple messages hypothesis suggests that different modalities provide independent and distinct information about an individual's mating-related quality. The backup signals hypothesis predicts a positive association between assessments based on different modalities, whereas no substantial correlation across modalities is expected under the multiple messageshypothesis. Previous studies testing the two hypotheses have provided mixed results, and a systematic evaluation is currently missing. We performed a systematic review and a meta-analysis of published and unpublished studies to examine the congruence in assessments between human body odour and facial attractiveness, and between body odour and vocal attractiveness. We found positive but weak associations between ratings of body odours and faces (r = 0.1, k = 25), and between body odours and voices (r = 0.1, k = 9). No sex differences were observed in the magnitude of effects. Compared to judgments of facial and vocal attractiveness, our results suggest that assessment of body odour provides independent and non-redundant information about human mating-related quality. Our findings thus provide little support for the backup signals hypothesis and may be better explained by the multiple messages hypothesis. Download the article till January, 19th, 2023 here:
... Analyses were carried out in R (www.r-proje ct. org), using the package metafor [40] and the code provided by Assink and Wibbelink [41] and Harrer et al. [42]. First, Pearson correlation coefficients were calculated for all studies which reported effect sizes other than Pearson's r, using the formulas described by Lipsey and Wilson [43]. ...
Full-text available
Child maltreatment can negatively impact not only survivors but also survivors’ children. However, research on the intergenerational effect of maternal childhood maltreatment on child externalizing behaviour has yielded contradictory results and has not yet been systematically synthesised. The current three-level meta-analysis and systematic review aimed to provide a quantitative estimate of the strength of the association between maternal childhood maltreatment and child externalizing behaviour and to summarise research on potential mediating factors of this association. PsycINFO, PubMed, and Embase were searched and 39 studies with 82 effects sizes were included in the meta-analysis. Results revealed a small significant association between maternal childhood maltreatment and child externalizing behaviour (r = 0.16; 95% CI 0.12–0.19; publication bias-adjusted effect size: r = 0.12, 95% CI 0.08–0.16). Maternal mental health, particularly depressive symptoms, maternal parenting and children’s maltreatment exposure were the most frequently examined mediators of this association, with relatively robust mediating effects for children’s maltreatment exposure and maternal depressive symptoms, but mixed evidence for the mediating role of maternal parenting. This meta-analysis provides evidence for a small but significant association between maternal childhood maltreatment and children’s externalizing behaviour, emphasizing the need to develop effective preventive and intervention strategies to minimise the effects of childhood maltreatment on the next generation.
... 33 Meta-analyses calculating dissociation scores used a randomeffects model, with restricted maximum likelihood estimation 34 to calculate heterogeneity variance τ 2 , Hartung-Knapp adjustments 35 and Hedges' g effect size metric, and were run on Windows 10 using the meta-R package 36 with guidance. 37 Funnel plots were generated to assess the risk of bias due to missing results. Meta regressions incorporating risk of bias categories were used to assess confidence in the body of evidence for each outcome. ...
Full-text available
Background Studies have reported elevated rates of dissociative symptoms and comorbid dissociative disorders in functional neurological disorder (FND); however, a comprehensive review is lacking. Aims To systematically review the severity of dissociative symptoms and prevalence of comorbid dissociative disorders in FND and summarise their biological and clinical associations. Method We searched Embase, PsycInfo and MEDLINE up to June 2021, combining terms for FND and dissociation. Studies were eligible if reporting dissociative symptom scores or rates of comorbid dissociative disorder in FND samples. Risk of bias was appraised using modified Newcastle–Ottawa criteria. The findings were synthesised qualitatively and dissociative symptom scores were included in a meta-analysis (PROSPERO CRD42020173263). Results Seventy-five studies were eligible (FND n = 3940; control n = 3073), most commonly prospective case–control studies (k = 54). Dissociative disorders were frequently comorbid in FND. Psychoform dissociation was elevated in FND compared with healthy (g = 0.90, 95% CI 0.66–1.14, I2 = 70%) and neurological controls (g = 0.56, 95% CI 0.19–0.92, I2 = 67%). Greater psychoform dissociation was observed in FND samples with seizure symptoms versus healthy controls (g = 0.94, 95% CI 0.65–1.22, I2 = 42%) and FND samples with motor symptoms (g = 0.40, 95% CI −0.18 to 1.00, I2 = 54%). Somatoform dissociation was elevated in FND versus healthy controls (g = 1.80, 95% CI 1.25–2.34, I2 = 75%). Dissociation in FND was associated with more severe functional symptoms, worse quality of life and brain alterations. Conclusions Our findings highlight the potential clinical utility of assessing patients with FND for dissociative symptomatology. However, fewer studies investigated FND samples with motor symptoms and heterogeneity between studies and risk of bias were high. Rigorous investigation of the prevalence, features
... Given that between-studies heterogeneity can be resulted from one or more studies with extreme effect sizes, and such outlier(s) might have even distorted the overall effect, outlier(s) with extreme effect size will be detected and excluded to obtain a new pooled effect estimate. A study will be regarded as an outlier if its confidence interval (CI) does not overlap with the CI of the pooled effect (Harrer et al., 2021). Influential analyses were also conducted based on the leave-one-out method to detect studies that influence the overall estimate the most and have the potential to distort the pooled effect (Viechtbauer & Cheung, 2010). ...
Purpose: Pitch plays an important role in auditory perception of music and language. This study provides a systematic review with meta-analysis to investigate whether individuals with autism spectrum disorder (ASD) have enhanced pitch processing ability and to identify the potential factors associated with processing differences between ASD and neurotypicals. Method: We conducted a systematic search through six major electronic databases focusing on the studies that used nonspeech stimuli to provide a qualitative and quantitative assessment across existing studies on pitch perception in autism. We identified potential participant- and methodology-related moderators and conducted metaregression analyses using mixed-effects models. Results: On the basis of 22 studies with a total of 464 participants with ASD, we obtained a small-to-medium positive effect size ( g = 0.26) in support of enhanced pitch perception in ASD. Moreover, the mean age and nonverbal IQ of participants were found to significantly moderate the between-studies heterogeneity. Conclusions: Our study provides the first meta-analysis on auditory pitch perception in ASD and demonstrates the existence of different developmental trajectories between autistic individuals and neurotypicals. In addition to age, nonverbal ability is found to be a significant contributor to the lower level/local processing bias in ASD. We highlight the need for further investigation of pitch perception in ASD under challenging listening conditions. Future neurophysiological and brain imaging studies with a longitudinal design are also needed to better understand the underlying neural mechanisms of atypical pitch processing in ASD and to help guide auditory-based interventions for improving language and social functioning. Supplemental Material
... As the outcomes of interest were MD and SD, the outcomes presented in the median range or median-quartile range were transformed using methods proposed by Luo et al. [37], Shi et al. [38,39], and Wan et al. [40]. Prior to conducting NMA, pairwise meta-analyses using a random effect model were carried out to evaluate the between-study heterogeneity of all directly compared interventions, using dmetar and meta packages [41] in RStudio (version 1.4.1106) [42]. ...
Full-text available
This review aimed to evaluate the effectiveness of systemic antibiotics as adjunctive treatment to subgingival debridement in patients with periodontitis. Randomized controlled trials were included that assessed the effectiveness of systemic antibiotics in improving periodontal status, indicated by clinical attachment gain level, probable pocket depth reduction, and bleeding on probing reduction of patients with any form of periodontitis at any follow-up time. Network meta-analyses with a frequentist model using random effects was employed to synthesize the data. The relative effects were reported as mean difference with a 95% confidence interval. Subsequently, all treatments were ranked based on their P-scores. A total of 30 randomized controlled trials were included in this network meta-analyses. Minimally important clinical differences were observed following the adjunctive use of satranidazole, metronidazole, and clindamycin for clinical attachment gain level and probable pocket depth reduction. For bleeding on probing reduction, minimally important clinical differences were observed following the adjunctive use of metronidazole and a combination of amoxycillin and metronidazole. However, the network estimates were supported by evidence with certainty ranging from very low to high. Therefore, the findings of this network meta-analyses should be interpreted with caution. Moreover, the use of these antibiotics adjunct to subgingival debridement should be weighed against possible harm to avoid overuse and inappropriate use of these antibiotics in patients with periodontitis.
... Data analyses were carried out using R (R Core Team 2021) and meta (Balduzzi et al. 2019), metafor (Viechtbauer 2010) and dmetar ((Harrer et al. 2021): Appendix D). Hedge's g effect size was calculated for the behavioural measures of interest, chosen for its robustness to small sample sizes in comparison to Cohen's d. ...
Full-text available
Rationale Unconditioned tasks in rodents have been the mainstay of behavioural assessment for decades, but their validity and sensitivity to detect the behavioural consequences of early life stress (ELS) remains contentious and highly variable. Objectives In the present study, we carried out a meta-analysis to investigate whether persistent behavioural effects, as assessed using unconditioned procedures in rats, are a reliable consequence of early repeated maternal separation, a commonly used procedure in rodents to study ELS. Methods A literature search identified 100 studies involving maternally separated rats and the following unconditioned procedures: the elevated plus maze (EPM); open field test (OFT); sucrose preference test (SPT) and forced swim task (FST). Studies were included for analysis if the separation of offspring from the dam was at least 60 min every day during the pre-weaning period prior to the start of adolescence. Results Our findings show that unconditioned tasks are generally poor at consistently demonstrating differences between control and separated groups with pooled effect sizes that were either small or non-existent (EPM: Hedge’s g = − 0.35, p = 0.01, OFT: Hedge’s g = − 0.32, p = 0.05, SPT: Hedge’s g = − 0.33, p = 0.21, FST: Hedge’s g = 0.99, p = 0.0001). Despite considerable procedural variability between studies, heterogeneity statistics were low; indicating the lack of standardization in the maternal separation protocol was the not the cause of these inconsistent effects. Conclusions Our findings indicate that in general, unconditioned tests of depression and anxiety are not sufficient to reveal the full behavioural repertoire of maternal separation stress should not be relied upon in isolation. We argue that more objective tasks that sensitively detect specific cognitive processes are better suited for translational research on stress-related disorders such as depression.
Full-text available
Osteochondral lesions of the femoral head are rare. For the treatment of these lesions, various joint- preserving procedures, particularly in young, active patients, have been developed. Mosaicplasty is a well- established surgical procedure for the knee. However, there is little evidence that this method can also be used to treat osteochondral lesions in the hip. The indication for cartilage procedures continues to evolve for the knee, and a similar strategy may be adopted for the hip joint. Due to limited evidence and a lack of experience, mosaicplasty treatment of these lesions remains challenging, especially in young patients. This study shows that open and arthroscopic management using the knee and femoral head as donor sites yielded good to excellent short- to mid-term outcomes. For osteochondral lesions of the femoral head, mosaicplasty may be a new alternative treatment option, although this needs to be proven with longer follow-ups and in a larger sample of patients.
Topic To evaluate the prognostic association between preoperative features seen on optical coherence tomography (OCT) imaging and postoperative visual acuity (VA) outcomes in rhegmatogenous retinal detachments (RRD). Clinical Relevance Currently, there is limited literature on the prognostic value of preoperative RRD OCT features. Methods A literature search was conducted on Ovid MEDLINE, EMBASE and Cochrane CENTRAL from inception through September 15, 2022. A meta-analysis was performed using a random-effects model. Quality of studies and evidence was assessed using the JBI tools and GRADE framework, respectively. Results A total of 1,671 eyes of 1,670 patients from 29 observational studies were included. Eighty-nine percent of eyes had a macula-off RRD at presentation. The mean average duration of detachment was 15±10 days. The majority of eyes (62%) underwent pars plana vitrectomy. Six preoperative OCT features were analyzed: height of retinal detachment (HRD) at the fovea, central macular thickness (CMT), disruption of the ellipsoid zone (EZ) and/or external limiting membrane (ELM), intraretinal cystic cavities (ICCs), outer retinal corrugations (ORCs) and macular detachment. A greater HRD was weakly associated with postoperative VA (Pearson’s correlation r=0.35, 95% CI=[0.20─0.48], p<0.01), and there was no change in this association throughout the postoperative follow-up period. The CMT was not associated with postoperative VA. Eyes with disruption of the EZ and/or ELM had a postoperative VA worse by 0.35 logMAR (95% CI=[0.15─0.54], p<0.01) or three Snellen lines. Eyes with ICCs had a postoperative VA worse by 0.14 logMAR (95% CI=[0.01─0.26], p<0.01) or two Snellen lines. Eyes with ORCs did not have a significantly different postoperative VA than eyes without. Eyes with macular detachment had a postoperative VA worse by 0.15 logMAR (95% CI=[-0.31─0.00], p=0.02) or two Snellen lines. Overall, the quality of studies ranged from moderate to good (73─100%). All associations had a low quality of evidence, with CMT being very low quality. Conclusions Despite the low quality of evidence, a greater HRD, disruption of the EZ and/or ELM, presence of ICCs and macular detachment were associated with a poor postoperative VA. We propose a standardized nomenclature for consistency and accuracy in reporting preoperative RRD OCT features for future studies.
Background This study aimed to investigate the application of deep learning (DL) models for the detection of subdural hematoma (SDH). Methods We conducted a comprehensive search using relevant keywords. Articles extracted were original studies in which sensitivity and/or specificity were reported. Two different approaches of frequentist and Bayesian inference were applied. For quality and risk of bias assessment we used Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Results We analyzed 22 articles that included 1,997,749 patients. In the first step, the frequentist method showed a pooled sensitivity of 88.8% (95% confidence interval (CI): 83.9% to 92.4%) and a specificity of 97.2% (95% CI 94.6% to 98.6%). In the second step, using Bayesian methods including 11 studies that reported sensitivity and specificity, a sensitivity rate of 86.8% (95% CI: 77.6% to 92.9%) at a specificity level of 86.9% (95% CI: 60.9% to 97.2%) was achieved. The risk of bias assessment was not remarkable using QUADAS-2. Conclusion DL models might be an appropriate tool for detecting SDHs with a reasonably high sensitivity and specificity.
Full-text available
Although learning disorders (LD) and developmental language disorder (DLD) can be linked to overlapping psychological and behavioral deficits, such as phonological, morphological, orthographic, semantic, and syntactic deficits, as well as academic (e.g., reading) difficulties, they are currently separate diagnoses in the DSM-5 with explicit phenotypic differences. At a neural level, it is yet to be determined to what extent they have overlapping or distinct signatures. The identification of such neural markers/endophenotypes could be important for the development of physiological diagnostic tools, as well as an understanding of disorders across different dimensions, as recommended by the Research Domain Criteria Initiative (RDoC). The current systematic review and meta-analysis examined whether the two disorders can be differentiated based on the auditory brainstem response (ABR). Even though both diagnoses require hearing problems to be ruled out, a number of articles have demonstrated associations of these disorders with the auditory brainstem response. We demonstrated that both LD and DLD are associated with longer latencies in ABR Waves III, V, and A, as well as reduced amplitude in Waves V and A. However, multilevel subgroup analyses revealed that LD and DLD do not significantly differ for any of these ABR waves. Results suggest that less efficient early auditory processing is a shared mechanism underlying both LD and DLD.
Full-text available
The objective of this study is to describe the general approaches to network meta-analysis that are available for quantitative data synthesis using R software. We conducted a network meta-analysis using two approaches: Bayesian and frequentist methods. The corresponding R packages were "gemtc" for the Bayesian approach and "netmeta" for the frequentist approach. In estimating a network meta-analysis model using a Bayesian framework, the "rjags" package is a common tool. "rjags" implements Markov chain Monte Carlo simulation with a graphical output. The estimated overall effect sizes, test for heterogeneity, moderator effects, and publication bias were reported using R software. The authors focus on two flexible models, Bayesian and frequentist, to determine overall effect sizes in network meta-analysis. This study focused on the practical methods of network meta-analysis rather than theoretical concepts, making the material easy to understand for Korean researchers who did not major in statistics. The authors hope that this study will help many Korean researchers to perform network meta-analyses and conduct related research more easily with R software.
Full-text available
We introduce the statistical concept known as likelihood and discuss how it underlies common Frequentist and Bayesian statistical methods. This article is suitable for researchers interested in understanding the basis of their statistical tools, and is also ideal for teachers to use in their classrooms to introduce the topic to students at a conceptual level.
Cambridge Core - Statistics for Life Sciences, Medicine and Health - Foundations of Agnostic Statistics - by Peter M. Aronow
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without requiring access to nonsignificant results. It capitalizes on the fact that the distribution of significant p values, p-curve, is a function of the true underlying effect. Researchers armed only with sample sizes and test results of the published findings can correct for publication bias. We validate the technique with simulations and by reanalyzing data from the Many-Labs Replication project. We demonstrate that p-curve can arrive at conclusions opposite that of existing tools by reanalyzing the meta-analysis of the “choice overload” literature.