GRADE Working Group: Systems for grading the quality of evidence and the strength of recommendations II: Pilot study of a new system

Center for Practice and Technology Assessment, Agency for Healthcare Research and Quality, 540 Gaither Rd. Rokville, MD 20852, USA. <>
BMC Health Services Research (Impact Factor: 1.71). 04/2005; 5(1):25. DOI: 10.1186/1472-6963-5-25
Source: PubMed


Systems that are used by different organisations to grade the quality of evidence and the strength of recommendations vary. They have different strengths and weaknesses. The GRADE Working Group has developed an approach that addresses key shortcomings in these systems. The aim of this study was to pilot test and further develop the GRADE approach to grading evidence and recommendations.
A GRADE evidence profile consists of two tables: a quality assessment and a summary of findings. Twelve evidence profiles were used in this pilot study. Each evidence profile was made based on information available in a systematic review. Seventeen people were given instructions and independently graded the level of evidence and strength of recommendation for each of the 12 evidence profiles. For each example judgements were collected, summarised and discussed in the group with the aim of improving the proposed grading system. Kappas were calculated as a measure of chance-corrected agreement for the quality of evidence for each outcome for each of the twelve evidence profiles. The seventeen judges were also asked about the ease of understanding and the sensibility of the approach. All of the judgements were recorded and disagreements discussed.
There was a varied amount of agreement on the quality of evidence for the outcomes relating to each of the twelve questions (kappa coefficients for agreement beyond chance ranged from 0 to 0.82). However, there was fair agreement about the relative importance of each outcome. There was poor agreement about the balance of benefits and harms and recommendations. Most of the disagreements were easily resolved through discussion. In general we found the GRADE approach to be clear, understandable and sensible. Some modifications were made in the approach and it was agreed that more information was needed in the evidence profiles.
Judgements about evidence and recommendations are complex. Some subjectivity, especially regarding recommendations, is unavoidable. We believe our system for guiding these complex judgements appropriately balances the need for simplicity with the need for full and transparent consideration of all important issues.


Available from: Signe Agnes Flottorp
  • Source
    • "All 14 papers were assessed for quality of evidence. There are many tools available to evaluate quality, such as the Grading of Recommendations Assessment, Development and Evaluation (GRADE) guidelines (Atkins et al. 2005) or the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al. 2009). However, given the small scale of this review, these tools were overly complex and exclusive. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There is evidence that group cognitive behavioural therapy for psychosis (CBTp) is an effective treatment, but much of this research has been conducted with outpatient populations. The aim of this review was to determine the utility of group CBTp for inpatients. We systematically searched Scopus, Web of Science and EBSCO electronic databases to identify relevant research. We reviewed the resulting articles and included those which had been conducted with inpatients, with symptoms of psychosis, using cognitive behaviour therapy, delivered in a group format. Fourteen articles relating to ten studies were identified. Two were randomized controlled trials; two were cohort studies and the rest were pre-/post-intervention studies. There was considerable heterogeneity between the studies and all had methodological limitations. The findings suggest positive trends towards the reduction of distress associated with psychotic symptoms, increased knowledge of symptoms, decreased affective symptoms and reduced readmissions over several years. However, there is currently not enough evidence to draw any strong conclusions regarding the utility of group CBTp for inpatients due to the small number of studies and limitations in quality and generalizability. Therefore, this review indicates the need for further research, particularly large, methodologically rigorous, randomized controlled trials.
    The Cognitive Behaviour Therapist 01/2015; 8. DOI:10.1017/S1754470X15000021
  • Source
    • "Evidence-based practice guidelines are often distinguished from consensus-based practice guidelines or advisories, as the former systematically review all available research on the specified topic, and then grade the level of evidence to make a clinical recommendation [203, 204]. However, not all research designs are given equal weight; standards of evidence for guidelines have evolved to place greater emphasis on RCTs [205]. The reason for this is that these designs are the most replicable, have the fewest sources of bias, and all else being equal, have the greatest power to detect evidence that a screening practice or treatment results in a net benefit or net harm [206]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There are exciting findings in the field of depression and coronary heart disease. Whether diagnosed or simply self-reported, depression continues to mark very high risk for a recurrent acute coronary syndrome or for death in patients with coronary heart disease. Many intriguing mechanisms have been posited to be implicated in the association between depression and heart disease, and randomized controlled trials of depression treatment are beginning to delineate the types of depression management strategies that may benefit the many coronary heart disease patients with depression.
    11/2012; 2012(123):743813. DOI:10.5402/2012/743813
  • Source
    • "Despite numerous publications about the GRADE tool (including some criticisms [16]), there have been few empirical and systematic evaluations of applying the tool in practice [17]. As grading the evidence is now a recommended step in Cochrane systematic reviews and in other evidence synthesis initiatives outside The Cochrane Collaboration [15], [18], new users with variable levels of training are likely to be applying the tool. "
    [Show abstract] [Hide abstract]
    ABSTRACT: GRADE was developed to address shortcomings of tools to rate the quality of a body of evidence. While much has been published about GRADE, there are few empirical and systematic evaluations. To assess GRADE for systematic reviews (SRs) in terms of inter-rater agreement and identify areas of uncertainty. Cross-sectional, descriptive study. We applied GRADE to three SRs (n = 48, 66, and 75 studies, respectively) with 29 comparisons and 12 outcomes overall. Two reviewers graded evidence independently for outcomes deemed clinically important a priori. Inter-rater reliability was assessed using kappas for four main domains (risk of bias, consistency, directness, and precision) and overall quality of evidence. FOR THE FIRST REVIEW, RELIABILITY WAS: κ = 0.41 for risk of bias; 0.84 consistency; 0.18 precision; and 0.44 overall quality. Kappa could not be calculated for directness as one rater assessed all items as direct; assessors agreed in 41% of cases. For the second review reliability was: 0.37 consistency and 0.19 precision. Kappa could not be assessed for other items; assessors agreed in 33% of cases for risk of bias; 100% directness; and 58% overall quality. For the third review, reliability was: 0.06 risk of bias; 0.79 consistency; 0.21 precision; and 0.18 overall quality. Assessors agreed in 100% of cases for directness. Precision created the most uncertainty due to difficulties in identifying "optimal" information size and "clinical decision threshold", as well as making assessments when there was no meta-analysis. The risk of bias domain created uncertainty, particularly for nonrandomized studies. As researchers with varied levels of training and experience use GRADE, there is risk for variability in interpretation and application. This study shows variable agreement across the GRADE domains, reflecting areas where further guidance is required.
    PLoS ONE 04/2012; 7(4):e34697. DOI:10.1371/journal.pone.0034697 · 3.23 Impact Factor
Show more