The parents of a healthy, asymptomatic 5-year-old boy are anxious about his health and ask about the appropriateness of undergoing a screening examination with urinalysis. You search for existing recommendations on this topic and find the book, Putting Prevention Into Practice.1 You find the 2 statements outlined below.American Academy of Family Physicians and US Preventive Services Task Force:Routine screening of males and most females for asymptomatic bacteriuria is not recommended. The Canadian Task Force on the Periodic Health Examination and the US Preventive Services Task Force recommend against screening for asymptomatic bacteriuria with urinalysis in infants, children, and adolescents.American Academy of Pediatrics:Urinalysis should be performed once at 5 years of age. Also, dipstick leukocyte esterase testing to screen for sexually transmitted diseases should be performed once in adolescence, preferably at 14 years of age. This clinical scenario raises a number of important questions:What sort of evidence has been used to come to these different conclusions?How have the 2 committees looked at and appraised the evidence?Have they used an explicit approach to classify the quality of existing studies?If they have, indeed, used an explicit approach, which elements—such as study design, study conduct, or relevance of the outcome measures—have they considered? Explicit recommendations for clinical practice, such as guidelines or diagnostic and therapeutic protocols, are published frequently, but many have conflicting recommendations. To decide which guidelines we should follow, we need common criteria to assess the quality of available evidence. Although it is generally agreed that practice guidelines should explicitly assess the quality of the evidence that supports different statements, this is still uncommon.2Historically, the Canadian Task Force was the first to attempt to classify levels of evidence supporting clinical recommendations. It did this by reviewing the indications for preventive interventions and producing recommendations with an explicit grading of the supporting evidence.3 These were subsequently adopted by the US Preventive Services Task Force.4 The original approach used by the Canadian Task Force classified randomized controlled trials (RCTs) as the highest level of evidence, followed by non-RCTs, cohort and case-control studies (representing fair evidence), comparisons among times and places with or without the intervention, and at the lowest level, “expert opinion.” This approach is simple to understand and easy to apply, but it implicitly assumes that RCTs, no matter how small or large or how properly conducted, always produce better evidence than nonexperimental studies such as cohort or case-control studies. This approach also ignores the issue of heterogeneity and, thus, what to do when results from several RCTs or other nonexperimental studies vary.Other scales proposed since that of the Canadian Task Force still rely on methodologic design of primary studies as the main criterion. These have incorporated systematic reviews and meta-analyses, which are placed above RCTs in the “hierarchy of evidence.” Whereas this allows for a possibly more refined grading of levels of evidence, it suffers from the same limitation—ie, that attention is given to the a priori validity of the methods used. More recently, scales assessing the quality of study conduct and the consistency of results across different studies have been proposed.The aims of this article are as follows:to review existing scales aimed at assessing the quality of evidence supporting treatment recommendationsto discuss the need to go beyond the assessment of methodologic quality—whether measured a priori by looking at study design or a posteriori by looking at study conduct—to include an explicit assessment of the epidemiologic and clinical relevance of the evidenceto suggest which direction research in this area should take. We will not address how strength of recommendations has been assessed. This is a complex concept that implies value judgments and an explicit methodologic assessment of available studies. As recently suggested (A Oxman, S Flottorp, J Cooper, et al, “Levels of Evidence and Strength of Recommendations,” unpublished data, 1999), “strength of recommendations” is a construct that should go beyond levels of evidence to incorporate more subjective considerations, such as patient- or setting-specific applicability; tradeoffs among risk, benefits, and costs; and the like.