Schunemann HJ, Oxman AD, Brozek J, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies

Department of Epidemiology, Italian National Cancer Institute Regina Elena, 00144 Rome, Italy.
BMJ (online) (Impact Factor: 17.45). 06/2008; 336(7653):1106-10. DOI: 10.1136/bmj.39500.677199.AE
Source: PubMed

ABSTRACT The GRADE system can be used to grade the quality of evidence and strength of recommendations for diagnostic tests or strategies. This article explains how patient-important outcomes are taken into account in this processSummary pointsAs for other interventions, the GRADE approach to grading the quality of evidence and strength of recommendations for diagnostic tests or strategies provides a comprehensive and transparent approach for developing recommendationsCross sectional or cohort studies can provide high quality evidence of test accuracyHowever, test accuracy is a surrogate for patient-important outcomes, so such studies often provide low quality evidence for recommendations about diagnostic tests, even when the studies do not have serious limitationsInferring from data on accuracy that a diagnostic test or strategy improves patient-important outcomes will require the availability of effective treatment, reduction of test related adverse effects or anxiety, or improvement of patients’ wellbeing from prognostic informationJudgments are thus needed to assess the directness of test results in relation to consequences of diagnostic recommendations that are important to patientsIn this fourth article of the five part series, we describe how guideline developers are using GRADE to rate the quality of evidence and move from evidence to a recommendation for diagnostic tests and strategies. Although recommendations on diagnostic testing share the fundamental logic of recommendations on treatment, they present unique challenges. We will describe why guideline panels should be cautious when they use evidence of the accuracy of tests (“test accuracy”) as the basis for recommendations and why evidence of test accuracy often provides low quality evidence for making recommendations.Testing makes a variety of contributions to patient careClinicians use tests that are usually referred to as “diagnostic”—including signs and symptoms, imaging, biochemistry, pathology, and psychological testing—for various purposes.1 These purposes include identifying physiological derangements, establishing prognosis, monitoring illness and response to treatment, and diagnosis. This article …

Download full-text


Available from: Paul Glasziou, Sep 26, 2015
1 Follower
32 Reads
  • Source
    • "In step four, a generalized evidence grading based on GRADE is provided to rate the quality of the bodies of evidence. In this step, approaches previously discussed and proposed by the GRADE Working Group [4] [5] or WHO [6] [7] are applied. The latter is used by the WHO Strategic Advisory Group of Experts (SAGE) for the development of vaccination recommendations and includes a modification of the GRADE methodology which allows uprating of evidence quality in the presence of " consistency across investigators, study designs and settings " [7]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Project on a Framework for Rating Evidence in Public Health (PRECEPT) is an international collaboration of public health institutes and universities which has been funded by the European Centre for Disease Prevention and Control (ECDC) since 2012. Main objective is to define a framework for evaluating and grading evidence in the field of public health, with particular focus on infectious disease prevention and control. As part of the peer review process, an international expert meeting was held on 13-14 June 2013 in Berlin. Participants were members of the PRECEPT team and selected experts from national public health institutes, World Health Organization (WHO), and academic institutions. The aim of the meeting was to discuss the draft framework and its application to two examples from infectious disease prevention and control. This article introduces the draft PRECEPT framework and reports on the meeting, its structure, most relevant discussions and major conclusions.
    Health Policy 03/2015; 119(6). DOI:10.1016/j.healthpol.2015.02.010 · 1.91 Impact Factor
  • Source
    • "The most conclusive evidence regarding patient outcomes can be derived from diagnostic randomized controlled trials (D-RCTs), in which participants are randomized to have a new diagnostic test vs. a control or no test [5e7]. Performing such trials is challenging [8] [9], but randomized controlled trials (RCTs) represent a rigorous approach to diagnostic test evaluation [10]. D-RCTs evaluating patient outcomes are relatively uncommon [11], but their examination can offer useful insights. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objectives To evaluate the effects of diagnostic testing on patient outcomes in a large sample of diagnostic randomized controlled trials (D-RCTs) and to examine whether the effects for patient outcomes correlate with the effects on management and with diagnostic accuracy. Study Design and Setting We considered D-RCTs that evaluated diagnostic interventions for any condition and reported effectiveness data on one or more patient outcomes. We calculated odds ratios for patient outcomes and outcomes pertaining to the use of further diagnostic and therapeutic interventions and the diagnostic odds ratio (DOR) for the accuracy of experimental tests. Results One hundred forty trials (153 comparisons) were eligible. Patient outcomes were significantly improved in 28 comparisons (18%). There was no concordance in significance and direction of effects between the patient outcome and outcomes for use of further diagnostic or therapeutic interventions (weighted κ 0.02 and 0.09, respectively). The effect size for the patient outcome did not correlate with the effect sizes for use of further diagnostic (r = 0.05; P = 0.78) or therapeutic interventions (r = 0.18; P = 0.08) or the experimental intervention DOR in the same trial (r = −0.24; P = 0.51). Conclusion Few tests have well-documented benefits on patient outcomes. Diagnostic performance or the effects on management decisions are not necessarily indicative of patient benefits.
    Journal of clinical epidemiology 06/2014; 67(6). DOI:10.1016/j.jclinepi.2013.12.008 · 3.42 Impact Factor
  • Source
    • "The literature on diagnostic test evaluation has centered on estimation of sensitivity and specificity, measures that do not directly convey the clinical impact of a given test [1-3]. The added value of a test will depend on how much information is already available from the diagnostic work-up and whether the test result actually changes clinical decisions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The absence of a gold standard, i.e., a diagnostic reference standard having perfect sensitivity and specificity, is a common problem in clinical practice and in diagnostic research studies. There is a need for methods to estimate the incremental value of a new, imperfect test in this context. Methods We use a Bayesian approach to estimate the probability of the unknown disease status via a latent class model and extend two commonly-used measures of incremental value based on predictive values [difference in the area under the ROC curve (AUC) and integrated discrimination improvement (IDI)] to the context where no gold standard exists. The methods are illustrated using simulated data and applied to the problem of estimating the incremental value of a novel interferon-gamma release assay (IGRA) over the tuberculin skin test (TST) for latent tuberculosis (TB) screening. We also show how to estimate the incremental value of IGRAs when decisions are based on observed test results rather than predictive values. Results We showed that the incremental value is greatest when both sensitivity and specificity of the new test are better and that conditional dependence between the tests reduces the incremental value. The incremental value of the IGRA depends on the sensitivity and specificity of the TST, as well as the prevalence of latent TB, and may thus vary in different populations. Conclusions Even in the absence of a gold standard, incremental value statistics may be estimated and can aid decisions about the practical value of a new diagnostic test.
    BMC Medical Research Methodology 05/2014; 14(1):67. DOI:10.1186/1471-2288-14-67 · 2.27 Impact Factor
Show more