SPINE Volume 34, Number 18, pp 1929–1941
©2009, Lippincott Williams & Wilkins
2009 Updated Method Guidelines for Systematic
Reviews in the Cochrane Back Review Group
Andrea D. Furlan, MD, PhD,*†‡ Victoria Pennick, RN, MHSc,*†
Claire Bombardier, MD, FRCP,*† and Maurits van Tulder, PhD,§
from the Editorial Board of the Cochrane Back Review Group
Study Design. Method guidelines for systematic re-
views of trials of treatments for neck and back pain.
Objective. To help review authors design, conduct and
report systematic reviews of trials in this field.
Summary of Background Data. In 1997, the Cochrane
Back Review Group published Method Guidelines for Sys-
tematic Reviews, which was updated in 2003. Since then,
new methodologic evidence has emerged and standards
have changed. Coupled with the upcoming revisions to
the software and methods required by The Cochrane Col-
laboration, it was clear that revisions were needed to the
Methods. The Cochrane Back Review Group editorial
and advisory boards met in June 2006 to review the rel-
evant new methodologic evidence and determine how it
should be incorporated. Based on the discussion, the
guidelines were revised and circulated for comment. As
sections of the new Cochrane Handbook for Systematic
Reviews of Interventions were made available, the guide-
lines were checked for consistency. A working draft was
made available to review authors in The Cochrane Library
2008, issue 3.
Results. The final recommendations are divided into 7
categories: objectives, literature search, inclusion criteria,
risk of bias assessment, data extraction, data analysis,
and updating your review. Each recommendation is clas-
sified into minimum criteria (mandatory) and further
guidance (optional). Instead of recommending Levels of
Evidence, this update adopts the GRADE approach to
determine the overall quality of the evidence for impor-
tant patient-centered outcomes across studies and in-
cludes a new section on updating reviews.
Conclusion. Citations of previous versions of the
method guidelines in published scientific articles (1997:
254 citations; 2003: 209 citations, searched February 10,
2009) suggest that others may find these guidelines use-
ful to plan, conduct, or evaluate systematic reviews in the
field of spinal disorders.
Key words: systematic reviews, meta-analysis, Co-
chrane Collaboration, method guidelines, back pain, neck
pain. Spine 2009;34:1929–1941
The current interest in evidence-based health care has led
to an extensive increase in the publication of systematic
reviews. In 1999, the QUOROM statement was devel-
oped to improve the standards for the report of system-
atic reviews.1Several leading medical journals (e.g.,
BMJ, JAMA, Lancet) have adopted the QUOROM rec-
ommendations for the reporting of abstract, introduc-
tion, methods, results, and discussion sections of system-
atic reviews. However, it has been shown that many
reviews in the field of back and neck pain are of low
methodologic quality and that their reports often lack
In 1997, the Cochrane Back Review Group (CBRG)
Editorial Board published method guidelines for system-
atic reviews in the field of spinal disorders.5These guide-
lines were updated in 20036and addressed the main
inclusion criteria, methodologic quality, data extraction,
and data analysis. The purpose of the method guidelines
was to offer guidance to researchers preparing, conduct-
ing, or reporting a systematic review and to readers eval-
uating these reviews. The guidelines were operational-
ized specifically for the field of back and neck pain. They
included certain minimum criteria for which either em-
pirical evidence existed that confirmed they were associ-
ated with bias in systematic reviews, or there was con-
sensus among the CBRG Editorial Board that they were
likely to be associated with bias. Further guidance was
presented to enhance the quality of systematic reviews.
The CBRG was established in 1998. Forty-six system-
atic reviews and 8 protocols for reviews of various treat-
ments for spinal disorders are published in “The Co-
copublished in Spine (more information available at:
www.cochrane.iwh.on.ca). Because new evidence on re-
view methodology has emerged since 2003, new guid-
From the *Institute for Work and Health, Toronto, Ontario, Canada;
†University of Toronto, Toronto, Ontario, Canada; ‡Toronto Reha-
bilitation Institute, Toronto, Ontario, Canada; and §VU University,
Amsterdam, the Netherlands.
The manuscript submitted does not contain information about medical
No funds were received in support of this work. No benefits in any
form have been or will be received from a commercial party related
directly or indirectly to the subject of this manuscript.
Canadian Institutes of Health Research (CIHR), Canadian Agency for
Drugs and Technologies in Health to Cochrane Back Review Group
These guidelines expand on the methodology outlined in: Bombardier
C, van Tulder MW, Pennick V, Bronfort G, Corbin T, Deyo RA, de Bie
R, Furlan AD, Guillemin F, Malmivaara A, Peul W, Schoene M, Shek-
elle PG, Tomlinson G. Cochrane Back Group. About The Cochrane
Collaboration (Cochrane Review Groups (CRGs)) 2008, Issue 3. Art.
No.: BACK. Copyright Cochrane Collaboration, reproduced with per-
The following are the editorial board members of the Cochrane Back
Review Group: Co-editors: Claire Bombardier and Maurits van Tul-
der; Managing editor: Victoria Pennick; Editors: Gert Brønfort, Rob
deBie, Terry Corbin, Rick Deyo, Andrea Furlan, Francis Guillemin,
Antti Malmivaara, Wilco Peul, Mark Schoene, Paul Shekelle, George
Address correspondence and reprint requests to Andrea D. Furlan,
Institute for Work & Health, 481 University Av, Suite 800, Toronto,
Ontario, Canada; E-mail: firstname.lastname@example.org
ance was introduced in the February 2008 version of the
Cochrane Handbook for Systematic Reviews of Inter-
ventions7and the CBRG has acquired more experience
in preparing, conducting, and updating systematic Co-
chrane reviews, the Editorial Board felt it was time to
update the 2003 method guidelines.
It should be emphasized that these guidelines are not a
“gold standard” but merely an indication of the current
lines build on the information provided in The Cochrane
Handbook for Systematic Reviews of Interventions7
available at: http://www.cochrane.org/resources/
handbook/index.htm (accessed September 17, 2008),
rather than replace it. They are useful to plan, conduct,
or evaluate systematic reviews in the field of back and
neck pain within and outside the framework of the
CBRG. The usefulness of the 1997 and 2003 method
guidelines is reflected in the number of citations in pub-
lished scientific articles: 254 citations and 209, respec-
tively (ISI web of science cited reference searched Febru-
ary 10th, 2009). Please note that since the Cochrane
Handbook for Systematic Reviews of Interventions is
Materials and Methods
In June 2006, the editorial and advisory boards of the CBRG
met in Amsterdam at the VIII International Forum for Primary
Care Research on Low-Back Pain to discuss the update. They
recognized that some challenging topics in the 2003 method
guidelines needed revision (e.g., levels of evidence, clinical rel-
evance of the results, and recommendations for updates).
circulated among the editors. Each editor was given a chance to
comment on additions, deletions, or other changes that were
process. Feedback was incorporated into a second draft of the
guidelines and circulated among all CBRG editors and advisory
board members for comments. The second draft was presented
and discussed at the IX International Forum for Primary Care
Research on Low-Back Pain in Palma de Mallorca, Spain, in Oc-
tober 2007. A working draft was made available to review au-
final version was delayed to be able to incorporate the new “Co-
chrane Handbook for Systematic Reviews of Interventions,”
which covered the new GRADE approach and the new Review
Manager 5 and GRADEprofiler software.
Review Objective. Reviews with the Cochrane Back Review
Group start with a clinically relevant question that is clearly
defined in the objectives. The objectives should outline the in-
tervention and participants. The Editorial Board recommends
that reviews focus specifically on (sub)acute or chronic back or
neck pain. It is also recommended that reviews focus separately
on nonspecific back or neck pain, sciatica or radicular symp-
toms, or specific causes (e.g., spinal stenosis, scoliosis). In ad-
dition, review authors should outline the comparisons that will
be evaluated in the review (Figure 1).
systematic review is to include all available evidence. There-
fore, once the research question has been defined, the literature
search is the next, very important step in conducting a system-
atic review. The starting point for the literature search is to
relevant trials as possible are identified. The search strategy
should relate directly to the research question(s) of the review
at issue and should be based on the inclusion criteria with
respect to study design, participants, interventions, and out-
comes (see Inclusion Criteria section). Searching only MEDLINE
is clearly insufficient since it has been shown that in general,
if MEDLINE is the only databases searched.8It has been sug-
gested that at least MEDLINE and EMBASE must be used to
ensure a comprehensive literature search, because overlap be-
tween these databases is small.9–11Especially in the field of low
back pain, EMBASE has been shown to retrieve more clinical
trials than MEDLINE.12
Therefore, we recommend the following as a minimum
1. A computer-aided search of the MEDLINE and EM-
BASE databases since their inception for new reviews
and since the date of the previous search for updates of
reviews.7,8The highly sensitive search strategies for re-
trieval of reports of controlled trials should be run in con-
junction with a specific search for spinal disorders and the
intervention at issue (Appendix 1, Supplemental Digital
Content 1, available at: http://links.lww.com/BRS/A373
at: http://links.lww.com/BRS/A374). It has been demon-
strated that simple search strategies (i.e., strategies with a
few terms) are not adequate for systematic reviews.13
2. A search of the Cochrane Central Register of Controlled
Trials (CENTRAL) that is included in the most recent
issue of The Cochrane Library.
3. A search of the CBRG Trials Register by contacting the
editorial base of the Cochrane Back Review Group.
4. Screening references listed in relevant systematic reviews
and identified RCTs.
The search strategy should not be limited by language.
Unless they have easy access to a health sciences librarian who
is experienced in searching electronic databases, we suggest that
review authors contact the CBRG (Cochrane@iwh.on.ca) for as-
sistance in developing and conducting the literature search. We
recommend that 2 review authors independently apply the inclu-
sion criteria to select the potentially relevant trials from the titles,
abstracts, and keywords of the references retrieved by the litera-
ture search. Articles selected in this first round, articles for which
disagreement exist, and articles for which title, abstract, and key-
words provide insufficient information for a decision should be
obtained so that the final decision about whether they meet the
inclusion criteria is based on the full paper. A consensus method
should be used to select the potentially relevant trials at both
steps. If disagreements persist, a third review author should be
Reviews should be submitted within a year of the latest search
date. Because some reviews can take longer than a year to com-
plete, the CBRG recommends that the authors update the search
1930Spine•Volume 34•Number 18•2009
been published since the last search. The authors may contact the
CBRG Trials Search Coordinator for assistance. The review au-
thors can decide if it is feasible to include newly identified trials in
the current review, or a future update.
If one of the review authors is a (co-)author of one of the
potentially relevant trials, this person should not be involved in
any decisions about inclusion of the trial at issue.
Further Guidance. Depending on the intervention at issue, and
if available, specific databases should be searched, for example:
● Mantis (Manual Alternative and Natural Therapy Index
System) for chiropractic interventions (http://www.
● Complementary and Alternative Medicine Specialist Li-
brary, from the National Library of Health, UK) for com-
plementary medicine interventions (http://www.library.nh-
● PsycINFO for psychological interventions (http://
● PEDro (Physiotherapy Evidence Database) for physio-
therapy interventions (http://www.pedro.fhs.usyd.edu.au/)
● Cumulative Index of Nursing and Allied Health (CI-
NAHL) for allied health interventions (http://www.
● Index to Chiropractic Literature (http://www.chiroindex.
Other search strategies are recommended, but are not essential,
● Identification of ongoing trials. The CBRG Trials Search
Coordinator (TSC) will identify ongoing trials that are reg-
istered on the WHO International Clinical Trials Registry
Platform (http://www.who.int/ictrp/en/); these should be in-
cluded in the reference section of the review (ongoing stud-
ies) in addition to those identified by the review authors
through their own contacts.
● Personal communication with content experts in the field
and with authors of identified RCTs.14It is up to the discre-
tion of the review authors to identify who the experts are on
a specific topic and to describe the process and results of the
contact in the review.
● Citation tracking of the identified RCTs.15The value of
using citation tracking has not yet been established, but it
may be especially useful to identify additional studies of
topics that are poorly indexed in MEDLINE and EMBASE.
● The Editorial Board recommends using the search strat-
egy suggested by Golder et al16to find reports of adverse
events of their interventions, and the search strategy sug-
gested by Furlan et al, if review authors plan to include
observational studies.17Contact a health sciences librarian
or the CBRG Trials Search coordinator for help in develop-
ing these search strategies.
Study Design. RCTs with clearly reported and appropriate
randomization should be included. If the article only reports
that the trial is a randomized trial or that the participants were
randomly allocated to the intervention groups without a clear
description of the method of randomization, the authors
should be contacted for further information. Examples of ap-
propriate randomization techniques are: computer-generated
random sequence, sequentially-ordered vials, telephone call to
a central office, and preordered list of treatment assignments.
Participants. Participants of trials that will be included in
the systematic review should be defined explicitly in terms of
age, gender, type, duration, localization and severity of symp-
toms, setting, and recruitment procedure. It is particularly im-
portant to report if participants with acute (less than 6 weeks),
neck pain are included. It is also important to report if the partic-
If there is a reason to collapse the duration of symptoms, the
categories should be (sub)acute (less than 12 weeks) and chronic
with both back and neck pain.
Interventions. It is recommended that a definition and po-
tential mechanism of action related to prevention or treatment
of back or neck pain of the intervention under study is included
and referenced in the review. The type, intensity, dosage, fre-
quency and duration of both the index and comparison inter-
ventions to be included in the review should be explicitly de-
scribed. If appropriate for the intervention, the skills, training
and experience of the provider should also be included.
Comparisons should include a clear contrast for the index
intervention, so that the independent effects of the intervention
can be assessed. For example, a comparison of traction plus
exercises versus the same exercises alone is a relevant compar-
ison in a review on traction, while a comparison of traction
plus exercises versus spinal manipulation is not a relevant com-
parison in the same review.
be included in the review should be explicitly described. Impor-
as: symptoms (e.g., pain), overall improvement or satisfaction
with treatment, back-specific functional status (e.g., measured
with the Roland Morris Questionnaire, Oswestry Disability
Index), well-being (e.g., quality of life—measured with the SF-
36, SF-12, EuroQuol), and disability (e.g., ability to perform
activities of daily living, return-to-work status, work absentee-
ism). Adverse events (intended and nonintended) should al-
ways be included in a systematic review of back or neck pain if
they are reported in the original trials. If explicit adverse events
are to be investigated, observational studies reporting on these
adverse events should also be included. Depending on the in-
tervention, specific outcomes may be relevant, for example:
depression for a review of antidepressants, knowledge gain for
a review of patient education, or radiologic outcomes for a
review of surgical intervention.
The timing of measuring outcomes should be explicitly de-
term (closest to 4 weeks) and long-term (closest to 1 year).
Language. The empirical evidence on excluding trials pub-
lished in languages other than English is conflicting.19–24The
Editorial Board recommends including studies published in
languages other than English, for example, by finding native
speakers and meeting with them to assess the risk of bias and
1931 2009 Updated Guidelines for Systematic Reviews•Furlan et al
extract the data together. However, we acknowledge that it
may not always be feasible and may depend on the time and
resources available. Potential articles retrieved in languages
outside the linguistic skills of the review team (or their local
sources) should be brought to the attention of the CBRG edi-
torial staff, who will try to find translators. If trials published in
other languages are excluded from the review, these trials
should be listed in the section on excluded trials. We strongly
recommend having an international group of (co-) authors
with different language skills involved in a systematic review to
enable the inclusion of trials in languages other than English.
This is particularly recommended for topics where there are
likely to be a significant number of non-English language pub-
lications (e.g., the Asian literature on acupuncture), in which
case, we suggest including review authors with the relevant
Study Design. Authors wishing to include studies besides
RCTs (with appropriate and clearly reported randomization
study designs that are acceptable are:
RCTs that do not clearly report the method of randomiza-
Quasi-RCTs—Quasi-RCTs may be included if there are
fewer than 5 RCTs. Quasi-RCTs are controlled clinical tri-
als that use methods of allocation that are not random, and
therefore, may be open to bias. Examples of quasi-
insurance/security number, date in which they are invited to
participate in the study, and hospital registration number.
Studies without a control group and publications that are
only expert opinion should not be included (Table 1).25
Outcomes. Outcomes of physical examination (range of
motion, spinal flexibility, degrees of straight leg raising, or
muscle strength), care-provider-centered outcomes (e.g., out-
come assessor’s global improvement), and other outcomes
(medication use, healthcare utilization) may be included as sec-
ondary outcomes where appropriate, depending on the aim of
the intervention at issue.
Assessing Risk of Bias
Minimum Criteria. Risk of bias in the studies should be inde-
pendently assessed by at least 2 review authors. Currently,
there is empirical evidence that inadequate concealment of
treatment allocation,26,27inadequate double-blinding (of par-
ticipants and outcome assessors), and a high drop-out rate, or
differences in number or reasons for dropping out between
in fields other than back and neck pain.
We recommend assessing the risk of bias in the studies by
using the criteria presented in Table 2 and the instructions
presented in Table 3. These instructions are adapted from van
Tulder,6Boutron et al (CLEAR NPT),32and the Cochrane
Handbook of Reviews of Interventions.7Of these criteria, 11
have already been used in 26 (65%) and 10 have been used in 7
(18%) systematic reviews within the CBRG (The Cochrane
Library 2008, issue 4). These criteria are also considered im-
portant by others who study nonpharmaceutical interven-
tions.32,33Internal validity criteria refer to characteristics of
7), and detection bias (criteria 5, 12). Each criterion should be
scored as yes, unclear, or no, where yes indicates the criterion
has been met and therefore suggests a low risk of bias. The
Cochrane Handbook for Systematic Reviews of Interventions7
recommends that review authors assess at least 5 issues associ-
ated with risk of bias: sequence generation, allocation conceal-
ment, blinding of participants, personnel and outcome asses-
sors, incomplete outcome data, selective outcome reporting,
and other potential threats to validity not already identified.
The criteria recommended by the CBRG are aligned with the
new “Handbook,” except for “selective reporting of out-
comes.” We suggest adding this item as the 12th internal va-
We recommend that the studies are rated as having a “low
risk of bias” when at least 6 of the 12 CBRG criteria have been
1 group). Studies with serious flaws, or those in which fewer than
6 of the criteria are met should be rated as having a “high risk of
ducted with data from the CBRG that a compliance threshold of
less than 50% of the criteria is associated with bias.34
The results of the assessment, including the rationale for the
decision, should be presented in the “Risk of Bias” table, which
is included with the “Characteristics of included studies” table
in Review Manager 5. If one of the review authors is an author
or coauthor of one of the included trials, this person should not
be involved in any decisions regarding the risk of bias assess-
ment of the trial at issue.
Further Guidance. Some empirical evidence suggests that
blinded risk of bias assessment, that is, removing the names of
authors, institution, and journals from the articles when assess-
ing the risk of bias, resulted in more consistent and higher
rating of bias than open assessment.35However, 2 other stud-
ies did not find an association between blinded assessment of
studies and bias.33,36It is difficult to achieve true blinding,
because experts are usually involved in the risk of bias assess-
ment of the studies. Therefore, the CBRG leaves it to the dis-
cretion of the review authors to decide whether or not to per-
form a blinded risk of bias assessment. Because assessment by
content experts may be biased by prior opinions, it may be
(but with a methodologic background) assess the risk of bias in
the studies. In systematic reviews where there is likely to be a
conflict of interest (e.g., chiropractors or manual therapists re-
viewing spinal manipulation, or physiotherapists reviewing ex-
ercise therapy), it may be desirable to also mask the studies for
results and conclusions, or to include someone who has no
potential conflict of interest in the risk of bias assessment.
We recommend that review authors pilot-test the risk of
bias assessment on some similar articles (regarding another
It is important for review authors to agree on a common inter-
pretation of the items and their operationalization.
We recommend using a consensus method to discuss and
solve disagreements between the review authors. If disagree-
ment persists, another independent person should be consulted
who is an expert in review methodology. The initial interob-
server reliability (e.g., Kappa) of the risk of bias assessment
should be evaluated and reported.
A study in the field of rheumatology showed that some trials
that inadequately reported the method of randomization and
1932Spine•Volume 34•Number 18•2009
Table 1. Taxonomy of Study Design of Studies Assessing the Effects of Health-Care Interventions
Experimental studies with control group (“clinical
trials” or “trials”): The investigator has control
over the decision concerning the allocation of
participants to different intervention groups.
Randomized controlled trial (RCT) (A) Reported method of randomization and the method
is adequate (see text for examples).
(B) Did not report methods of randomization. Only the
phrase “randomized study,” “random allocation” or
other similar expression is reported.
(A) Reported the method of allocation and this method
is inadequate (see text for examples). Synonym:
quasi-randomized controlled trial (q-RCT).
(B) Did not report the method of allocation and there is
no phrase or expression indicating that the
allocation was randomized.
Synonyms: “Longitudinal study” (emphasizing that people
are followed over time); “Prospective study” (implying
the forward direction of the research question);
“Incidence study” (calling attention to the basic
measure of new diseases events over time).
The cases are selected based on exposure to the
interventions. It involves measuring the occurrence of
disease within 1 or more group of individuals who are
followed or traced over a period of time. A true
cohort study is an inception cohort study, to
differentiate from “survival cohorts.”
Cohort studies can be prospective or retrospective
with regards to the data collection: “prospective”
means that the study is planned before any data is
collected; and “retrospective” means that when the
study is planned, all (or part of) the data is already
Synonym: “available patients cohort.”
People are included in a study because they both
have a disease and are currently available—perhaps
they are being seen in a specialized clinic. Survival
cohorts are misleading if they are presented as true
cohorts. In a survival cohort, people are assembled at
various times in the course of their disease, rather
than at the beginning as in a true cohort study. Their
clinical course is then described by going back in
time and seeing how they have fared up to the
The cases are selected based on the outcomes. A
research design in which all group selection, pretest
data, and posttest data are collected after completion
of the treatment. The evaluator is thus not involved in
the selection or placement of individuals into
comparison or control groups. All evaluation decisions
are made retrospectively. Individuals are matched on
variables thought to be critical in determining the
outcome, therefore the groups are equivalent except
for the interventions.
All of the information refers to the same point in time.
There is no follow-up. They are usually conducted by
collecting data from administrative databases (census,
hospital discharges and workers’ compensation
The participants are described as a group
(A) “Case study”: A single group is studied only once,
subsequent to some agent or treatment presumed to
(B) “Before and after”: a single group is studied before
and after some agent or treatment presumed to
(A) Case reports: the participants are described
(B) “N-of-1 randomized trial”: the patient undergoes
pairs of treatment periods randomized so that 1
period involves the use of experimental treatment
and the other involves the use of an alternate or
Controlled clinical trial (CCT)
Observational studies with control group: The
investigator’s intention is to observe and not
to interfere with routine care.
Survival cohort study
Case control study
Uncontrolled studies (without a separate control
group): can be experimental or observational
Reprinted with permission from J Clin Epidemiol.25Copyright 2008, Elsevier.
1933 2009 Updated Guidelines for Systematic Reviews•Furlan et al
allocation concealment had actually performed them adequately.37
more of) the internal validity criteria, the authors may be con-
tacted for additional information. If the authors cannot be con-
tacted or if the information is no longer available, the criteria
should be scored as “unclear,” with an explanation.
Different risks of bias may explain the variation in the re-
sults of the studies included in a systematic review and can
result in over- or underestimation of the effectiveness of the
the use of risk of bias assessment in systematic reviews. In
general, we recommend choosing one of the options listed be-
low and clearly describe the logic behind the choice.38,39
First, based on 1 or more domains, the risk of bias can be
used as an additional inclusion criterion for studies in the re-
view (e.g., only include adequately randomized RCTs or dou-
ble-blinded RCTs) or based on the number of criteria met (e.g.,
only include studies that adequately fulfill six of the 12 validity
criteria and have no serious flaws). Second, a stratified analysis
can be performed in which the results are separately presented
for different strata of studies (e.g., studies that meet specific
criteria, or studies with a low or high risk of bias). Third, a
sensitivity analysis can be performed to determine whether
the overall results are the same when studies with different
definitions of low or high risk of bias are analyzed. Fourth,
weights can be applied in the analysis to studies according to
the risk of bias, so that studies with a lower risk of bias have
more impact on the overall results. Obviously, choosing
weights involves additional arbitrary decisions. Fifth, a cu-
mulative meta-analysis can be performed by examining the
impact on the overall results as studies with increasing risk
of bias are included one at a time. And last, a meta-
regression can be performed to explore the relation between
criteria met and the magnitude of effect across outcomes and
studies. The first 4 options are also available when statistical
pooling is not feasible; the last 2 apply specifically to statis-
The Editorial Board refers the reader to Chapter 8 in the
Cochrane Handbook of Systematic Reviews for Interventions7
for further details on assessing risk of bias.
Minimum Criteria. At least 2 review authors should indepen-
dently extract the data. Data describing study characteristics
that include characteristics of participants, interventions, com-
parisons, outcomes, analysis, results, and study sponsorship
should be extracted and presented in a table (see inclusion
should be described in as much detail as possible to enable
If one of the review authors is an author or coauthor of one
of the included trials, this person should not be involved in any
decisions regarding the data extraction of the trial at issue.
The CBRG recommends that authors use a standardized form
for data extraction that will facilitate the comparison process.
It is advisable to pilot test the data extraction form to minimize
misinterpretations or later disagreements. If there are disagree-
ments, consensus should be achieved by discussion among the
should be consulted. If the article does not contain sufficient
information, the authors may be contacted.
Data extraction forms will vary across different systematic
reviews, but there will also be similarities among the forms
needed for reviews on back and neck pain. Because designing a
data extraction form is time-consuming, and given the impor-
tant function of data extraction forms, it may be helpful to
profit and learn from experiences of others. Examples of data
extraction forms used in other reviews can be obtained from
the CBRG website: www.cochrane.iwh.on.ca.
Minimum Criteria. Regardless of whether the authors use a
quantitative analysis (meta-analysis) or not, the results from
studies should only be combined when they are judged to be
sufficiently clinically similar to yield meaningful results. This
means review authors should avoid combining studies that are
clinically heterogeneous for populations, interventions, com-
parisons, or outcomes. A meta-analysis should be conducted
whenever trials measuring a specific outcome at similar fol-
low-up (short-term and/or long-term) report sufficient data to
do so. When a meta-analysis is performed with only a subset of
trials, review authors should assess whether the results of the
studies not reported quantitatively are consistent with the
meta-analysis. The analysis should include an explicit descrip-
tion of the comparisons (Figure 1).
Short-term follow-up refers to outcomes that are measured
closest to 4 weeks after randomization; it could be as short as 7
days in a trial of analgesics and as long as 12 weeks in a trial of
exercise therapy. Intermediate follow-up refers to measures taken
closest to 1 year. Long-term surgical outcomes should be mea-
be measured after the treatment is completed.
The Editorial Board refers the reader to Chapter 9 of the
Cochrane Handbook for Systematic Reviews of Interventions7
for further guidance on data analysis.
results from RCTs (Table 1). If review authors include designs
Table 2. Sources of Risk of Bias
1. Was the method of randomization adequate?
2. Was the treatment allocation concealed?
Was knowledge of the allocated interventions
adequately prevented during the study?
3. Was the patient blinded to the intervention?
4. Was the care provider blinded to the
5. Was the outcome assessor blinded to the
Were incomplete outcome data adequately
6. Was the drop-out rate described and
7. Were all randomized participants analysed
in the group to which they were allocated?
8. Are reports of the study free of suggestion
of selective outcome reporting?
Other sources of potential bias:
9. Were the groups similar at baseline
regarding the most important prognostic
10. Were co-interventions avoided or similar?
11. Was the compliance acceptable in all
12. Was the timing of the outcome assessment
similar in all groups?
1934Spine•Volume 34•Number 18•2009
other than RCTs, the data should be analyzed separately and
contrasted with the results from the primary analysis.
If one of the review authors is an author or coauthor of one
of the included trials, this person should not be involved in any
data analysis that includes the trial at issue.
Quantitative Analysis. If it is clinically relevant and statisti-
cally justified to combine the results, statistical pooling should
be performed that provides an overall estimate of effect, with a
95% confidence interval for each outcome.40,41The Editorial
Board recommends contacting a statistician before performing
a quantitative analysis. A meta-analysis should start by exam-
ining potential publication and other biases with a funnel plot
to explore asymmetry among trial results.42If asymmetry is
plots may be misleading and should be interpreted cautiously.43
Formal statistical tests also exist, but there is no consensus
regarding the strengths and weaknesses of these tests.44–46
For the meta-analysis of dichotomous outcomes, the relative
risk, risk difference, or odds ratio can be used to summarize the
effect. Empirical evidence from 125 meta-analyses showed that
summary odds ratios and risk differences usually lead to similar
conclusions about treatment effect, but that risk differences are
substantially more heterogeneous.47For continuous outcomes,
mean differences from each trial can be combined. If the continu-
ous outcomes are not directly combinable—that is, if different
instruments are used for the same outcome measurements—
time-to-event data (e.g., return-to-work), survival analysis is the
If data are not presented in a way that can be easily included in a
meta-analysis, review authors should try to calculate effect sizes.
For example, for trials that report a mean outcome but no stan-
dard deviation, one could estimate the standard deviation by tak-
ing the mean standard deviation weighted by the relevant treat-
ment group’s sample size across all other trials that reported
standard deviations for the same outcome.
There are 2 statistical models for combining data in a meta-
analysis: the fixed-effect model and the random-effects model.40
Although there are arguments favoring each model, in general,
the clinical heterogeneity of the back and neck pain literature
suggests that the assumptions underlying the random-effects
model are better suited to statistical combination of trials in
this field. However, the random-effects model does not account
Table 3. Criteria for a Judgment of “Yes” for the Sources of Risk of Bias
1A random (unpredictable) assignment sequence. Examples of adequate methods are coin toss (for studies with 2 groups), rolling a dice
(for studies with 2 or more groups), drawing of balls of different colors, drawing of ballots with the study group labels from a dark
bag, computer-generated random sequence, pre-ordered sealed envelops, sequentially-ordered vials, telephone call to a central
office, and pre-ordered list of treatment assignments Examples of inadequate methods are: alternation, birth date, social insurance/
security number, date in which they are invited to participate in the study, and hospital registration number.
Assignment generated by an independent person not responsible for determining the eligibility of the patients. This person has no
information about the persons included in the trial and has no influence on the assignment sequence or on the decision about
eligibility of the patient.
This item should be scored “yes” if the index and control groups are indistinguishable for the patients or if the success of blinding was
tested among the patients and it was successful.
This item should be scored “yes” if the index and control groups are indistinguishable for the care providers or if the success of
blinding was tested among the care providers and it was successful.
Adequacy of blinding should be assessed for the primary outcomes. This item should be scored “yes” if the success of blinding was
tested among the outcome assessors and it was successful or:
–for patient-reported outcomes in which the patient is the outcome assessor (e.g., pain, disability): the blinding procedure is
adequate for outcome assessors if participant blinding is scored “yes”
–for outcome criteria assessed during scheduled visit and that supposes a contact between participants and outcome assessors
(e.g., clinical examination): the blinding procedure is adequate if patients are blinded, and the treatment or adverse effects of the
treatment cannot be noticed during clinical examination
–for outcome criteria that do not suppose a contact with participants (e.g., radiography, magnetic resonance imaging): the blinding
procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed when assessing the main outcome
–for outcome criteria that are clinical or therapeutic events that will be determined by the interaction between patients and care
providers (e.g., co-interventions, hospitalization length, treatment failure), in which the care provider is the outcome assessor: the
blinding procedure is adequate for outcome assessors if item “4” (caregivers) is scored “yes”
–for outcome criteria that are assessed from data of the medical forms: the blinding procedure is adequate if the treatment or
adverse effects of the treatment cannot be noticed on the extracted data
The number of participants who were included in the study but did not complete the observation period or were not included in the
analysis must be described and reasons given. If the percentage of withdrawals and drop-outs does not exceed 20% for short-
term follow-up and 30% for long-term follow-up and does not lead to substantial bias a “yes” is scored. (N.B. these percentages
are arbitrary, not supported by literature).
All randomized patients are reported/analyzed in the group they were allocated to by randomization for the most important moments of
effect measurement (minus missing values) irrespective of non-compliance and co-interventions.
In order to receive a “yes”, the review author determines if all the results from all pre-specified outcomes have been adequately
reported in the published report of the trial. This information is either obtained by comparing the protocol and the report, or in the
absence of the protocol, assessing that the published report includes enough information to make this judgment.
In order to receive a “yes”, groups have to be similar at baseline regarding demographic factors, duration and severity of complaints,
percentage of patients with neurological symptoms, and value of main outcome measure(s).
This item should be scored “yes” if there were no co-interventions or they were similar between the index and control groups.
The reviewer determines if the compliance with the interventions is acceptable, based on the reported intensity, duration, number and
frequency of sessions for both the index intervention and control intervention(s). For example, physiotherapy treatment is usually
administered over several sessions; therefore it is necessary to assess how many sessions each patient attended. For single-
session interventions (e.g., surgery), this item is irrelevant.
Timing of outcome assessment should be identical for all intervention groups and for all important outcome assessments.
19352009 Updated Guidelines for Systematic Reviews•Furlan et al
for the heterogeneity, does not explain it, and does not take it
away. Careful analysis of heterogeneity, that is, of study char-
acteristics that might explain differences among the results, is
always important.49The characteristics of participants, types
of interventions, and the exact outcome values should be
clearly articulated for each group of study results that are com-
bined. Sensitivity analyses should be performed to examine the
impact of variation in risk of bias or individual validity criteria
(refer “Assessing Risk of Bias” section).
Sometimes it may be difficult for review authors to decide
whether it is clinically relevant to combine the results from a
group of studies in a meta-analysis—for example, studies of
participants with different types of treatments, different com-
parison groups, or different clinical characteristics. There are
no simple answers here, and review authors must be explicit
about their decisions so that others may judge for themselves
whether their choices were clinically sensible.
A related but separate issue concerns statistical homogene-
ity. A test for the statistical homogeneity of studies may be
performed to evaluate whether the differences among the re-
sults of the studies are greater than those that would be found
by chance alone. However, the test is not very powerful, and
failure to reject the hypothesis of homogeneity is not proof that
the studies are homogeneous. If the hypothesis of homogeneity
is rejected, or if the review team decides, on clinical grounds,
that the studies are too heterogeneous to support statistical
combinations, then the potential sources of heterogeneity
should be examined, because the observed differences might be
caused by factors other than chance, such as different risks of
bias, characteristics of participants, interventions, control
groups, or outcomes. If the heterogeneity can be explained,
review authors should present the results of each relevant sub-
group separately. Subgroup analyses should be kept to a min-
imum and should be defined a priori, because subgroup analy-
ses can be informative but also misleading.50
Readers are referred to Chapters 9 and 10 in the Cochrane
Handbook of Systematic Reviews of Interventions7for more
details on data analysis.
Grading the Quality of Evidence and Strength
The Cochrane Handbook of Systematic Reviews of Interven-
tions (see Chapter 12)7and the CBRG Editorial Board recom-
of quantitative analyses and rate the quality of the evidence for
each important patient-centered outcome. To help readers use
this new approach, the CBRG has adapted the GRADE ap-
proach for back and neck pain reviews. The quality of the
evidence on a specific outcome is based on 5 domains: limita-
tions of the study design, inconsistency, indirectness (inability
to generalize), and imprecision (insufficient or imprecise data)
of results and publication bias across all studies that measure that
particular outcome.51(Appendix 3, Supplemental Digital Con-
tent 3, two examples extracted from the Cochrane reviews of
“Rehabilitation after lumbar disc surgery”52and “Massage for
low back pain,”53available at: http://links.lww.com/BRS/A375).
The most important step is to choose which outcomes are
relevant for inclusion in the GRADE Evidence Profile. This is
based on the choice of “primary outcome measures,” selected a
priori in the protocol stage (see section “inclusion criteria: out-
come measures”). For each outcome, all applicable RCTs (i.e.,
those that measured the outcome) are noted in the first column,
regardless of whether they have sufficient data to be combined
in a meta-analysis. Only RCTs included in the primary analysis
of the review should be included in the GRADE Evidence Pro-
file (see section “inclusion criteria: study design”).
Population 1: Acute low-back pain with neurological symptoms.
Comparison 1.1: traction vs. placebo/sham/no treatment
Outcome 1.1.1: pain intensity
Outcome 1.1.2: functional status
Outcome 1.1.3 ………..
Comparison 1.2: traction vs. exercise therapy
Outcome 1.2.1: pain intensity
Outcome 1.2.2: functional status
Outcome 1.2.3 ………..
Population 2: Acute low back pain without neurological symptoms.
Comparison 2.1: traction vs. placebo/sham/no treatment
Outcome 2.1.1: pain intensity
Follow up: ……….
Population 3: Chronic low back pain with neurological symptoms.
Figure 1. Example of an analysis
for a systematic review on trac-
tion for low-back pain.
1936Spine•Volume 34•Number 18•2009
Limitations of the studies refer to the results of the risk of
bias assessment of the studies identified in column 1, using the
12 criteria recommended above. For example, if the studies
have a high (fewer than six criteria met, a fatal flaw that puts
the validity in question, or both) or low (six or more criteria
met, with no fatal flaws) risk of bias. Flaws or unmet criteria
should be explained in a footnote of the GRADE Evidence
Profile and “Summary of Findings” table.
“Inconsistency” refers to the lack of similarity of estimates
of treatment effects for the outcome across studies. Study re-
sults are considered consistent when direction, effect size, and
of the studies showing either a benefit or no benefit. In the case of
studies showing a clinically important or unimportant effect (see
section on clinical relevance). Consistency in statistical signifi-
cance is defined by the Chi squared test for heterogeneity.
to which the people, interventions and outcomes in the trials are
not comparable to those defined in the inclusion criteria of the
review. If the authors decide that there is uncertainty about gen-
Authors may suggest that their results are more applicable to a
specific population, (e.g., the effects of using insoles for young,
that the results are based on an indirect comparison (e.g., there is
strong evidence that discectomy is more effective than chemo-
nucleolysis and that chemonucleolysis is more effective than pla-
cebo: ergo, discectomy is more effective than placebo).55
“Imprecision” refers to the number of participants and events
and the width of the confidence interval for each outcome, espe-
cially when the confidence interval is sufficiently wide so that the
estimate could either support or refute the effectiveness of the
index intervention. The CBRG Editorial Group further recom-
mends that data are imprecise when only 1 study reports an out-
come, regardless of the sample size or the confidence interval and
when fewer than 75% of the studies present data that can be
included in a meta-analysis. A footnote should explain the exact
reason why data were judged to be sparse or imprecise.
“Publication bias” refers to the probability of selective pub-
lication of trials and outcomes. This bias might be considered if
full results for planned outcomes identified in a protocol or the
trial report are not provided in the results section. If the review
authors decide there is publication bias, they should support
their decision in a footnote.
The overall “quality of the evidence” for each outcome is
the result of the combination of the assessments in all domains.
High quality evidence ? at least 75% of the RCTs with no
limitations of study design have consistent findings, direct
and precise data and no known or suspected publication
Moderate quality evidence ? 1 of the domains is not met.
Low quality evidence ? 2 of the domains are not met.
Very low quality evidence ? 3 of the domains are not met.
The CBRG recommends adding another level:
No evidence ? no RCTs were identified that addressed this
GRADEprofiler software is available to develop the GRADE
Evidence Profiles by importing data from Review Manager 5.
See the Cochrane Handbook for Systematic Reviews of Inter-
ventions,7chapter 12 for more details on grading the evidence.
Further Guidance. The CBRG recommends including an as-
sessment of clinical relevance of study results in systematic
reviews. The conclusions about the effectiveness of the inter-
enable users to make a decision about the applicability of the
results to their population. The clinical relevance of the studies
should be independently assessed by at least 2 review authors.
In the 2003 Updated Method Guidelines, the Editorial
Board recommended 5 questions to assess the clinical relevance
of each included study.56,57In 2006, Malmivaara et al, in con-
sultation with the Editorial Board, reviewed the set of 5 ques-
tions and articulated the details in the evaluation of applicabil-
ity and clinical relevance of results of RCTs. The final
consensus consisted of 40 items. For the most part, these items
are characteristics of the population, interventions, compari-
sons, analysis, and results that review authors are advised to
the 5 questions (Table 4). For more details and examples on
how to assess each item, review authors are encouraged to read
the original study by Malmivaara et al.58There is ongoing
research examining how to determine important clinical differ-
ences in pain reduction and functional improvement. At
present, there is consensus regarding minimal clinically impor-
tant changes for pain and function in back pain.59Authors are
Table 4. Questions to Determine if Results Are Clinically Relevant
Based on the data provided, can you determine if the results will be clinically relevant?
Are the patients described in detail so that you can decide whether they are comparable to those that you see in
Are the interventions and treatment settings described well enough so that you can provide the same for your
Were all clinically relevant outcomes measured and reported?
Is the size of the effect clinically important?*
Are the likely treatment benefits worth the potential harms?
*For low-back pain, consider 30% on VAS/NRS for pain as clinically significant,59,62and 2 to 3 points (or 8 to 12%) on the Roland-Morris Disability Questionnaire
*For neck pain, consider 3.5 to 5 U on the 50-U Neck Pain Disability Index or 7 to 10% change63,64for function and 2.5 on an 10-U NRS (25% change) for pain.63
*For effect size, most authors use Cohen’s 3 levels.61
Small: WMD less than 10% of the scale (e.g., ?10 mm on a 100 mm VAS); SMD or “d” scores ?0.5; relative risk, ?1.25 or ?0.8 (depending on whether it reports
risk of benefit or risk of harm).
Medium: WMD 10 to 20% of the scale; SMD or “d” scores from 0.5 to ?0.8; relative risk between 1.25 to 2.0, or 0.5 to 0.8.
Large: WMD ?20% of the scale; SMD or “d” scores ?0.8; relative risks ?2.0 or ?0.5.
VAS indicates Visual Analog Scale; NRS, Numerical Rating Scale; SMD, standardized mean difference; WMD, weighted mean difference.
1937 2009 Updated Guidelines for Systematic Reviews•Furlan et al
advised to consult the literature that also includes key refer-
ences on neck pain59–64and include both statistical and clini-
cal importance in their discussion(Table 4).59–64
The answers to these questions should be used to inform the
discussion of the final results and conclusions; for example, in
the discussion section, clinical relevance could be included as
follows: There was high quality evidence from 10 RCTs (2000
participants) that intervention A is more effective than no treat-
ment for reducing pain in the long-term for individuals with
chronic low back pain. However, since none of the trials de-
scribed the program in detail, it is difficult to determine how to
provide this treatment to your patients and which types of
exercise healthcare providers should provide to patients (this
example is not based on real data).
Results should be listed in the same order as the compari-
consistency, the text should contain the following items
of participants), results of quantitative analysis (effect size
plus confidence interval), results of qualitative analysis (di-
rection of the effect [more/less effective, no difference]), the
ment (specifically stated), the outcome measured, and the
timing (short-term or long-term) of the outcome measure.
Example 1: There is high quality evidence from seven
trials (1268 people) that behavioral treatment is more ef-
fective than no treatment for individuals with chronic back
pain without neurologic symptoms for short-term pain re-
lief (SMD: 0.62, 95% CI: 0.25 to 0.98) and short-term
data only pooled from 5 trials).
Example 2: There is moderate quality evidence (4 tri-
als; 354 people) that there is no statistically significant
No significant difference between index and comparison group(s)
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that there is no statistically significant difference in (short-term/ long-term) follow-up
for (outcome Z) (RR 1.1, 95% CI 0.8 to 1.4), between individuals with
(acute/subacute/chronic) (back/neck) pain (with/without) neurological symptoms who
received (index) and those who received (comparison).
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that there is no significant difference in (short-term/long-term) follow-up for (outcome
Z), between individuals with (acute/subacute/chronic) (back/neck pain) (with/without)
neurological symptoms] who received (index) and those who received (comparison).
Index is more/less effective than comparison group(s)
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that (index intervention) is (more/less) effective than (comparison intervention) for
individuals with (acute/subacute/chronic) (back/neck) pain (with/without) neurologic
symptoms for (outcome A) at (short-term/long-term) follow-up with RR 4.0 (95% CI
3.0 to 5.0) and (outcome B) at (short-term/long-term) follow-up with RR 4.0 (95% CI
3.0 to 5.0).
There is (high/moderate/low/very low) quality evidence from (X) trials (no. of people)
that (index intervention) is (more/less) effective than (comparison intervention) for
individuals with (acute/subacute/chronic) (back/neck) pain (with/without) neurologic
symptoms for (outcome A, B and C) in the (short-term/long-term).
Contradictory findings across trials
There is conflicting evidence from (X) trials (no. of people) about whether (index
intervention) is more/less effective than (comparison intervention) for individuals with
(acute/subacute/chronic) (back/neck) pain (with/without) neurological symptoms for
(outcome A, B and C) in the (short-term/long-term).
There were no RCTs identified that examined the effects of (index intervention) for
individuals with (acute/subacute/chronic) (back/neck) pain (with/without) neurological
* The intervention for the comparison group should be explicitly described: placebo, no
treatment, waiting list controls, or treatment B (where treatment B is specifically
authors’ conclusions in system-
1938 Spine•Volume 34•Number 18•2009
difference in short-term pain relief between individuals
with chronic back pain with or without neurologic
symptoms who received acupuncture and those who re-
ceived placebo or sham acupuncture.
The Cochrane Handbook of systematic reviews of inter-
ventions,7chapter 11 recommends that reviews include a
tion on the quality of evidence, the magnitude of effect of
the interventions examined, and the sum of available data
on the main outcomes. The information is imported from
review. Main outcomes should be determined a priori, in
the protocol. Because the information is still new at time of
writing, review authors are directed to the Handbook for
more detailed information. As developed, we will add ex-
amples from the neck and back pain field to the Cochrane
Back Review Group website (www.cochrane.iwh.on.ca).
best current evidence on the effects of healthcare inter-
ventions. This is accomplished by updating published
reviews as new evidence becomes available. The CBRG
Trial Search coordinator updates the literature searches
at least every 2 years and more frequently if important
new evidence is published and notifies the lead author of
the results. If the lead author is unable to complete the
the right to assume responsibility for the review. This
may include finding a new lead author or a totally new
The results of the updated literature search determine
the amount of work involved in updating the review. This
may range from the editorial office staff updating the liter-
was updated, in the event no new studies are identified, to
rewriting most of the review. Depending on when the orig-
inal review was published, expectations of The Cochrane
have changed (e.g., based on the general direction of The
Cochrane Collaboration, this update of the method guide-
lines recommends using a GRADE approach rather than
Levels of Evidence for the final summary of results). Au-
thors should explore the CBRG (www.cochrane.
iwh.on.ca) and Cochrane Collaboration (www.cochrane.
org) websites and contact the Managing Editor of the
CBRG for current information on updating your review.
Before starting an update of his or her review, the lead
review author should consider the following issues:
● Is the current review team still willing and able to
update the review?
● Are the inclusion criteria for studies, search strate-
gies, risk of bias assessment criteria, analyses and
summary methods still appropriate?
Existing reviews may have included a combination of
(sub)acute or chronic back or neck pain. The Editorial
Board recommends that updates of reviews focus specif-
ically on (sub)acute or chronic back or neck pain. It is
also recommended that reviews focus separately on non-
specific back or neck pain, sciatica or radicular symp-
toms, or specific causes (e.g., spinal stenosis, scoliosis).
This means that some reviews will need to be split into
discussed with the Managing Editor of the CBRG.
The Editorial Board believes that systematic reviews repre-
sent one of the key advances in medical science in the past
15 years and offer a real opportunity for change in medical
systematic reviews. Some initiatives have been developed
that try to make systematic reviews more easily available
for clinicians in daily practice. Recently published Euro-
pean and North American clinical guidelines on the man-
agement of low back pain have used the evidence from
systematic reviews as the basis for their recommenda-
tions.65–69The BMJ Publishing Group publishes “Clinical
Evidence,” which is a summary of the current state of
knowledge based on Cochrane and other systematic re-
views on the prevention and treatment of a wide range of
ical Knowledge Summaries from the UK are reliable
sources of evidence-based information (based in part on
Cochrane reviews) and practical “know how” about the
common conditions managed in primary care (http://
cks.library.nhs.uk/home; accessed September 19, 2008).
inform clinical decisions that use systematic reviews as the
change is multifaceted, whether these and other implemen-
tation efforts indeed result in a change in clinicians’ behav-
ior and in improved patient outcomes remains unclear.
must meet high methodologic standards. The objective of
not intended to set a gold standard or to discourage people
from doing a systematic review. On the contrary, we en-
courage people to undertake a systematic review in collab-
oration with others. The Cochrane Collaboration has just
released a new version of the Cochrane Handbook of Sys-
tematic Reviews Of Interventions (February 2008) and
Review Manager 5 (March 2008), the software used to
produce Cochrane review. The CBRG will post back and
neck-related examples on our website. Therefore, for more
guidance on systematic reviews of back and neck pain, we
refer readers to the Cochrane Handbook for Systematic
Reviews of Interventions (http://www.cochrane.org/
resources/handbook/index.htm), the Review Manager
1939 2009 Updated Guidelines for Systematic Reviews•Furlan et al
website (http://www.cc-ims.net/RevMan), the
Cochrane Back Review Group, Institute for Work &
Health, Toronto, Ontario, Canada, M5G 2E9. Telephone:
(416) 927-2027, fax: (416) 927-4167.
● Many reviews of therapeutic interventions for
spinal disorders have been published. It is impor-
tant that these reviews use adequate systematic
methods to minimize bias.
● Previous method guidelines for systematic re-
views in the field of spinal disorders were updated.
● These method guidelines include recommenda-
tions that are mandatory (minimum criteria) and op-
tional (further guidance) for review authors conduct-
● The Cochrane Back Review Group now recom-
mends using the GRADE approach to determine
the overall quality of the evidence for important
patient-centered outcomes across studies.
● The method guidelines include a new section on
● Others may find these guidelines useful to plan,
conduct, or evaluate systematic reviews in the field
of spinal disorders.
Supplemental digital content is available for this article.
on the journal’s Web site (www.spinejournal.com).
1. Moher D, Cook DJ, Eastwood S, et al. Improving the quality of reports of
meta-analyses of randomised controlled trials: the QUOROM statement.
Quality of Reporting of Meta-analyses. Lancet 1999;354:1896–900.
2. Assendelft WJ, Koes BW, Knipschild PG, et al. The relationship between
methodological quality and conclusions in reviews of spinal manipulation.
3. Furlan AD, Clarke J, Esmail R, et al. A critical review of reviews on the
treatment of chronic low back pain. Spine 2001;26:E155–62.
4. Hoving JL, Gross AR, Gasner D, et al. A critical appraisal of review articles
on the effectiveness of conservative treatment for neck pain. Spine 2001;26:
5. van Tulder MW, Assendelft WJ, Koes BW, et al. Method guidelines for
systematic reviews in the Cochrane collaboration back review group for
spinal disorders. Spine 1997;22:2323–30.
6. van Tulder M, Furlan A, Bombardier C, et al. Updated method guidelines for
systematic reviews in the Cochrane collaboration back review group. Spine
7. Higgins J, Green S, eds. Cochrane Handbook for Systematic Reviews of
Interventions Version 5.0.0 [updated February 2008].The Cochrane Collab-
8. Glanville JM, Lefebvre C, Miles JN, et al. How to identify randomized
controlled trials in MEDLINE: ten years on. J Med Libr.Assoc 2006;94:
9. Minozzi S, Pistotti V, Forni M. Searching for rehabilitation articles on MED-
LINE and EMBASE: an example with cross-over design. Arch Phys Med
10. Sampson M, Barrowman NJ, Moher D, et al. Should meta-analysts search
Embase in addition to Medline? J Clin Epidemiol 2003;56:943–55.
11. Woods D, Trewheellar K. Medline and Embase complement each other in
literature searches. BMJ 1998;316:1166.
12. Suarez-Almazor ME, Belseck E, Homik J, et al. Identifying clinical trials in
the medical literature with electronic databases: MEDLINE alone is not
enough. Control Clin.Trials 2000;21:476–87.
13. Day D, Furlan A, Irvin E, et al. Simplified search strategies were effective in
identifying clinical trials of pharmaceuticals and physical modalities. J Clin
14. Avenell A, Handoll HH, Grant AM. Lessons for search strategies from a
systematic review, in The Cochrane Library, of nutritional supplementation
trials in patients after hip fracture. Am J Clin Nutr 2001;73:505–10.
15. Bakkalbasi N, Bauer K, Glover J, et al. Three options for citation tracking:
Google Scholar, Scopus and Web of Science. Biomed Digit Libr 2006;3:7.
16. Golder S, McIntosh HM, Duffy S, et al. Developing efficient search strategies
Libr J 2006;23:3–12.
17. Furlan AD, Irvin E, Bombardier C. Limited search strategies were effective in
18. Deyo RA, Battie M, Beurskens AJ, et al. Outcome measures for low back
pain research: a proposal for standardized use. Spine 1998;23:2003–13.
19. Egger M, Zellweger-Zahner T, Schneider M, et al. Language bias in random-
ised controlled trials published in English and German. Lancet 1997;350:
20. Egger M, Ebrahim S, Smith GD. Where now for meta-analysis? Int J Epide-
21. Gregoire G, Derderian F, Le Lorier J. Selecting the language of the publica-
tions included in a meta-analysis: is there a Tower of Babel bias? J Clin
22. Juni P, Holenstein F, Sterne J, et al. Direction and impact of language bias in
meta-analyses of controlled trials: empirical study. Int J Epidemiol 2002;31:
24. Pham B, Klassen TP, Lawson ML, et al. Language of publication restrictions
in systematic reviews gave different results depending on whether the inter-
vention was conventional or complementary. J Clin Epidemiol 2005;58:
25. Furlan AD, Tomlinson G, Jadad AA, et al. Methodological quality and
homogeneity influenced agreement between randomized trials and nonran-
domized studies of the same intervention for back pain. J Clin Epidemiol
26. Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in
comparisons of therapy. I: Medical. Stat Med 1989;8:441–54.
27. Kunz R, Oxman AD. The unpredictability paradox: review of empirical
comparisons of randomised and non-randomised clinical trials. BMJ 1998;
28. Chalmers TC, Celano P, Sacks HS, et al. Bias in treatment assignment in
controlled clinical trials. N Engl J Med 1983;309:1358–61.
29. Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in
comparisons of therapy. II: Surgical. Stat Med 1989;8:455–66.
30. Schulz KF, Chalmers I, Hayes RJ, et al. Empirical evidence of bias. Dimen-
in controlled trials. JAMA 1995;273:408–12.
Int J Epidemiol 2005;34:79–87.
32. Boutron I, Moher D, Tugwell P, et al. A checklist to evaluate a report of a
nonpharmacological trial (CLEAR NPT) was developed using consensus.
J Clin Epidemiol 2005;58:1233–40.
33. Verhagen AP, de Vet HC, de Bie RA, et al. Balneotherapy and quality assess-
ment: interobserver reliability of the Maastricht criteria list and the need for
blinded quality assessment. J Clin Epidemiol 1998;51:335–41.
34. van Tulder MW, Suttorp M, Morton S, et al. Empirical evidence of an
trials of low back pain. Spine 2009;34:1685–92.
35. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of
randomized clinical trials: is blinding necessary? Control Clin Trials 1996;
36. Berlin JA. Does blinding of readers affect the results of meta-analyses? Uni-
versity of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997;
37. Hill CL, LaValley MP, Felson DT. Discrepancy between published report
and actual conduct of randomized clinical trials. J Clin Epidemiol 2002;55:
38. Detsky AS, Naylor CD, O’Rourke K, et al. Incorporating variations in the
quality of individual randomized trials into meta-analysis. J Clin Epidemiol
1940Spine•Volume 34•Number 18•2009
39. Verhagen AP, de Vet HC, de Bie RA, et al. The art of quality assessment of Download full-text
RCTs included in systematic reviews. J Clin Epidemiol 2001;54:651–4.
40. Normand SL. Meta-analysis: formulating, evaluating, combining, and re-
porting. Stat Med 1999;18:321–59.
41. Whitehead A, Whitehead J. A general parametric approach to the meta-
analysis of randomized clinical trials. Stat Med 1991;10:1665–77.
42. Sterne JA, Egger M, Smith GD. Systematic reviews in health care: investigat-
ing and dealing with publication and other biases in meta-analysis. BMJ
43. Tang JL, Liu JL. Misleading funnel plot for detection of bias in meta-
analysis. J Clin Epidemiol 2000;53:477–84.
44. Begg CB, Mazumdar M. Operating characteristics of a rank correlation test
for publication bias. Biometrics 1994;50:1088–101.
45. Egger M, Davey SG, Schneider M, et al, Bias in meta-analysis detected by a
simple, graphical test. BMJ 1997;315:629–34.
46. Sterne JA, Egger M. Funnel plots for detecting bias in meta-analysis: guide-
lines on choice of axis. J Clin Epidemiol 2001;54:1046–55.
47. Engels EA, Schmid CH, Terrin N, et al. Heterogeneity and statistical signif-
icance in meta-analysis: an empirical study of 125 meta-analyses. Stat Med
48. Williamson PR, Smith CT, Hutton JL, et al. Aggregate data meta-analysis
with time-to-event outcomes. Stat Med 2002;21:3337–51.
49. Poole C, Greenland S. Random-effects meta-analyses are not always conser-
vative. Am J Epidemiol 1999;150:469–75.
50. Hahn S, Williamson PR, Hutton JL, et al. Assessing the potential for bias in
meta-analysis due to selective reporting of subgroup analyses within studies.
Stat Med 2000;19:3325–36.
51. Atkins D, Best D, Briss PA, et al; GRADE Working Group. Grading quality
of evidence and strength of recommendations. BMJ 2004;328:1490.
52. Ostelo RW, Costa LO, Maher CG, et al. Rehabilitation after lumbar disc
surgery. Cochrane Database Syst Rev. 2008:CD003007.
53. Furlan AD, Imamura M, Dryden T, et al. Massage for low-back pain. Co-
chrane Database Syst Rev. 2008:CD001929.
54. Sahar T, Cohen M, Ne’eman V, et al. Insoles for prevention and treatment of
back pain. Cochrane Database Syst Rev. 2007:CD005275.
55. Gibson JNA, Waddell G. Surgical interventions for lumbar disc prolapse.
Cochrane Database Syst Rev. 2007:CD001350.
56. Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the medical literature. II.
How to use an article about therapy or prevention. B. What were the results
and will they help me in caring for my patients? Evidence-Based Medicine
Working Group. JAMA 1994;271:59–63.
57. Shekelle PG, Andersson G, Bombardier C, et al. A brief introduction to the
critical reading of the clinical literature. Spine 1994;19:2028S–31S.
58. Malmivaara A, Koes BW, Bouter LM, et al. Applicability and clinical rele-
vance of results in randomized controlled trials: the Cochrane review on
exercise therapy for low back pain as an example. Spine 2006;31:1405–9.
59. Ostelo RW, Deyo RA, Stratford P, et al. Interpreting change scores for pain
and functional status in low back pain: towards international consensus
regarding minimal important change. Spine 2008;33:90–4.
60. Bombardier C, Hayden J, Beaton DE. Minimal clinically important differ-
ence. Low back pain: outcome measures. J Rheumatol 2001;28:431–8.
61. Cohen J. Statistical Power analysis for the Behavioral Sciences. 1st ed. New
York,San Francisco,London: Academic Press; 1988:1–474.
62. Farrar JT, Young JP Jr, LaMoreaux L, et al. Clinical importance of changes
in chronic pain intensity measured on an 11-point numerical pain rating
scale. Pain 2001;94:149–58.
63. Pool JJ, Ostelo RW, Hoving JL, et al. Minimal clinically important change of
the Neck Disability Index and the Numerical Rating Scale for patients with
neck pain. Spine 2007;32:3047–51.
64. Stratford PW, Riddle DL, Binkley JM, et al. Using the Neck Disability Index
to make decisions concerning individual patients. Physiother Can 1999;
the management of chronic nonspecific low back pain. Eur.Spine J 2006;
66. van Tulder M, Becker A, Bekkering T, et al. Chapter 3. European guidelines
for the management of acute nonspecific low back pain in primary care.
Eur.Spine J 2006;15(suppl 2):S169–91.
67. Burton AK, Balague F, Cardon G, et al. Chapter 2. European guidelines for
prevention in low back pain: November 2004. Eur Spine J 2006;15(suppl
68. Chou R, Huffman LH. Nonpharmacologic therapies for acute and chronic
low back pain: a review of the evidence for an American Pain Society/
American College of Physicians clinical practice guideline. Ann Intern Med
69. Chou R, Huffman LH. Medications for acute and chronic low back pain: a
review of the evidence for an American Pain Society/American College of
Physicians clinical practice guideline. Ann Intern Med 2007;147:505–14.
1941 2009 Updated Guidelines for Systematic Reviews•Furlan et al