Assessing the Methodological Quality of Systematic Reviews
THE DEVELOPMENT OF AMSTAR
THE DEVELOPMENT OF AMSTAR
Beverley Julia Shea
Beverley Julia Shea
Assessing the Methodological Quality of Systematic Reviews
ASSESSING THE METHODOLOGICAL
QUALITY OF SYSTEMATIC REVIEWS
THE DEVELOPMENT OF AMSTAR
Beverley Julia Shea
Assessing the Methodological Quality
of Systematic Reviews
The Development of AMSTAR
ter verkrijging van de graad van doctor aan
de Vrije Universiteit Amsterdam,
op gezag van de rector magnificus
prof. Dr. L.M. Bouter
in het openbaar te verdedigen
ten overstaan van de promotiecommissie
van de faculteit der Geneeskunde
op 8 oktober 2008 om 13.45 uur
in de het auditorium van de universiteit,
De Boelelaan 1105
Beverley Julia Shea
geboren te St. John’s, Newfoundland and Labrador, Canada
promotoren:prof.dr. L.M. Bouter
prof.dr. M. Boers
prof.dr. J.M. Grimshaw copromotor:
The study presented in this thesis was performed at the Institute for Research in Extramural Medicine
(EMGO Institute) of the VU University Medical Center (VUmc), the Netherlands and the Department
of Clinical Epidemiology and Biostatistics (KEB) of the VU University Medical Center, the Netherlands.
The EMGO Institutes participates in the Netherlands School of Primary Care Research (CaRe), which
was re-acknowledged in 2006 by the Royal Netherlands Academy of Arts and Sciences (KNAW).
Assessing the quality of reports of systematic reviews:
The QUOROM statement compared to other tools. 12
Scope for improvement in the reporting quality of systematic reviews
from the Cochrane musculoskeletal group. 29
Does updating improve the methodological and reporting quality
of review quality and the reporting quality of Cochrane reviews? 43
Development of AMSTAR:
a measurement tool to assess methodological quality of systematic reviews. 56
Internal validation of AMSTAR:
a measurement tool to assess systematic reviews.66
External validation of a measurement tool to assess systematic
reviews (AMSTAR) 76
About the author119
A systematic review is a comprehensive assessment of the medical literature on a topic of interest using
a priori specified rules for the search, identification and eligibility of the pertinent studies, and for the
abstraction of relevant data.1The systematic nature of the process, which is carried out according to
clear-cut rules, differentiates a systematic review from a traditional review authored by experts without
the self-imposed discipline of specified rules. Due to the explosion of biomedical publishing in the
latter half of the 20th century (perhaps 30,000 journals and upwards of two million primary research
articles a year), keeping up with primary research is an impossible feat.2Systematic reviews have become
essential tools of social and medical study. Patients and clinicians are making increasing use of systematic
reviews in making evidence-based decisions about treatments.3Those who rely on systematic reviews
assume that the methodologies of organizations such as the Cochrane Collaboration and the Campbell
Collaboration are rigorously developed and therefore, the quality of their reviews is the best it can
possibly be.4,5But how can the users of systematic reviews know whether their confidence is justified?
The Cochrane Collaboration, which is an international multi-disciplinary organization established in
1993, has the avowed task of preparing, maintaining, and disseminating systematic, up-to-date reviews
of health care.4The purpose of conducting systematic reviews is to gain valid and reliable information that
will guide evidence-based decisions. The issues reported are often complex, but when the evidence
gathered is strong, and its implications clear, it is hoped that the review will influence decision making
and help to shape health policy.
The production of comprehensive and accessible pre-appraised resources supports an evidence-based
approach to decision making. Systematic reviews have added valuable information to the pool of
resources.6,7The Cochrane Library now contains over 3,500 well-designed systematic reviews covering a
variety of problems.8If systematic reviews are to be useful, serious consideration must be given to how
they are conducted and reported.
The ultimate test of a systematic review is whether its report justifies confidence that it is evidence-based
and that it accurately reflects the process followed during its various stages. One way to assess the merits
of a systematic review is to examine the validity of its report. However, a systematic review may not
reflect the manner in which its authors conducted their review so much as it does their ability to write
Prior to, and in the course of performing research for this thesis, several authors documented the
considerable variations in the quality of published reviews. Silagy surveyed 28 systematic reviews
published in primary care journals during 1991 using 8 methodological criteria understood to be
important in the reporting of systematic reviews.10Each criterion had a maximum score of 2, for a total
score of 16. Silagy reported that only 25% of these systematic reviews obtained a total score higher than 8.
Recently, Moja et al. assessed 965 systematic reviews published between 1995 and 2002 in the Cochrane
Library and in paper-based journal formats. They concluded that the reviews failed to take several
important factors into account in their interpretation of results; that methods for assessment of
methodological quality by systematic review are still in their infancy; and that there is substantial room
Silagy et al. studied the distinction between methodological quality and reporting quality. He found
evidence that researchers deviated from their original study protocol during the execution of the review
process without clearly documenting same deviations in the final study report.12Liberati et al. found
little distinction between methodological quality and reporting quality.13Findings such as those
reported above have lead to the conduct of studies to assess methodological quality and the development
of instruments such as the quality of reporting of meta-analysis statement (QUOROM). Such initiatives
should encourage improvement in the reporting quality of systematic reviews.14,15
As we experienced in the course of the studies described in this thesis, there are important differences
between assessing the methodological quality of systematic reviews and assessing their reporting
quality.16-18The first, methodological quality, considers how well the systematic review was conducted
(literature searching, pooling of data, etc.). The second, reporting quality, considers how well systematic
reviewers have reported their methodology and findings.
In summary, systematic reviews have become an integral part of scholarly practice. However, the method
to assess their quality is not fully developed.
This thesis was comprised of two sets of three related research projects, each guided by its own objective
and reported in its own Chapter.
Objective 1: To review the current status of instruments used to assess the reporting quality of systematic
reviews (Chapter 2)
To select the most appropriate instrument for further use, we conducted a study to compile and appraise
a complete list of all available tools for the assessment of systematic reviews. We improved the descriptors
of the instrument that came out on top (overview quality assessment questionnaire, OQAQ).
Objective 2: To assess the reporting quality of a complete subset of electronic systematic reviews, by applying
OQAQ and QUOROM (Chapter 3)
We applied the enhanced OQAQ and QUOROM to all 57 Cochrane Musculoskeletal (CMSG)
systematic reviews published in the Cochrane Database of Systematic Reviews of the Cochrane Library,
Issue 4, 2002.
Objective 3: To determine the impact of updating on the methodological quality and reporting quality of a
subset of systematic reviews (Chapter 4)
Under this objective we assessed a newly selected sample of updated systematic reviews before and after
their updating using the same two instruments. The sample covered a wide variety of health topics
published in the Cochrane Library. This exercise provided a second test of the applicability of the
instruments used under objective 2.
We concluded there was room for a new instrument focused on methodological quality (rather than
reporting quality) of systematic reviews, with improved content and feasibility. The next 3 Chapters
describe the development and validation of this new instrument, termed AMSTAR, an acronym for
“A MeaSurement Tool to Assess Systematic Reviews”.
Objective 4: To develop a valid and reliable quality assessment instrument for systematic reviews (Chapter 5)
AMSTAR was developed from a comprehensive set of possible items retrieved from the OQAQ and a
Set of reviews
derived from a
dataset of 151
from the above
sample of 151
1. Enhanced overview quality assessment questionnaire (OQAQ)
2. Quality of reporting of meta-analysis (QUOROM)
3. Sacks (developed by Henry Sacks et al.)
4. A measurement tool to assess systematic reviews (AMSTAR)
5. Cochrane musculoskeletal group (CMSG)
comprehensive list drawn up by Sacks, to which three new items/dimensions were added. All items were
scored in a large dataset of reviews, and these scores were subjected to factor analysis. An international
panel of experts appraised the resulting domains and selected the best item per domain in a nominal
group consensus process.
Objective 5: To test the validity and reliability of AMSTAR in the source dataset (Chapter 6)
We tested the new instrument by having two assessors apply it and the two original instruments to a
random sample of 30 systematic reviews (out of the 151 selected under objective 4). The purpose of this
exercise was to compare the validity and reliability (and feasibility) of the new instrument to that of the
two existing instruments.
Objective 6: To externally test the validity and reliability of AMSTAR (Chapter 7)
We conducted a second validation study to test the reliability of AMSTAR using a separate set of
reviews. External assessors naive to AMSTAR applied the instrument to a set of 42 reviews assessing the
use of protein pump inhibitors for gastroesophageal reflux disease, dyspepsia and peptic ulcer disease. The
following table summarizes the various measurement instruments used in the course of our research, the
sets of systematic reviews we used in this study, and the systematic reviews to which we applied each
instrument (Table 1).
Table 1: Measurement instruments and reviews studied
Research done on each of the objectives is described in an article that has been published in, or
submitted to, a scientific journal. Consequently, each Chapter can be read independently. Because of the
inter-related nature of the objectives, some degree of overlap in introduction and methods sections could
not be avoided. In some instances, an addendum has been added to accommodate insights gained after
publication of these Chapters.
A general discussion of our overall project is provided following the seven main Chapters of this thesis.
The discussion describes the major findings for each of the objectives listed in this introduction,
outlines some of the challenges faced in our research, and provides recommendations for future
application and further research (Chapter 8).
We invite you to follow us as we describe the journey that lead to the development of AMSTAR.
Chalmers I, Altman DG. Systematic Reviews. London: BMJ Publications 1995.
Davies HT, Crombie IK. What is a systematic review? 2003: Hayward Medical Communications
Tugwell P, Shea B, Boers M, Brooks P, Simon L, Strand V, Wells G. Evidence Based Rheumatology.
BMJ Books 2004.
Bero L, Rennie D. The Cochrane Collaboration: preparing, maintaining, and disseminating reviews
of the effects of health care. JAMA 1995 Dec; 274(24): 1935-38.
The Campbell Collaboration http://www.campbellcollaboration.org/.
Cook DJ, Mulrow CD, Haynes RB. Systematic Review: Synthesis of Best Evidence for Clinical
Decisions. Ann Intern Med 1997 Mar; 126(5): 376-80.
Mulrow CD, Cook DJ, Davidoff F. Systematic Review: Critical Links in the Great Chain of Evidence.
Ann Intern Med 1997 Mar; 126(5): 389.
The Cochrane Library, Issue 3, 2007. Chichester, UK: John Wiley & Sons, Ltd.
Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, Jones A, Pham B, Klassen TP. Assessing the
quality of randomized controlled trials: implications for the conduct of meta-analyses. Health
Technol Assess 1999; 3(12): i-iv, 1-98.
10. Silagy C. An analysis of review articles published in primary care journals. Fam Pract 1993 Sept;
11. Moja L, Telaro E, D'Amico R, Moschetti I, Coe L, Liberati A and on behalf of the Metaquality
Study Group. Assessment of methodological quality of primary studies by systematic reviews:
results of the metaquality cross sectional study. BMJ 2005 May; 330(7499): 1053.
12. Silagy C, Middleton P, Hopewell S. Publishing protocols of systematic reviews: Comparing what
was done to what was planned. JAMA 2002 Jun; 287(21): 2831-34.
13. Liberati A, Himel HN, Chalmers TC. A quality assessment of randomized control trials of primary
treatment of breast cancer. J Clin Oncol 1986 Jun; 4(6): 942-51.
14. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports
of meta-analyses of randomized controlled trials: the QUOROM statement. Lancet 1999 Nov;
15. Shea B, Dubé C, Moher D. Assessing the quality of reports of systematic reviews: the QUOROM
statement compared to other tools. Systematic Review in Health Care Meta-analysis in context. BMJ
Books 2001: 122-39.
16. The AGREE Collaboration. Writing Group: Cluzeau FA, Burgers JS, Brouwers M, Grol R, Mäkelä
M, Littlejohns P, Grimshaw J, Hunt C. Development and validation of an international appraisal
instrument for assessing the quality of clinical practice guidelines: the AGREE project. Quality and
Safety in Health Care 2003; 12(1): 18-23.
17. Smidt N, Rutjes A, van der Windt D, Ostelo R, Reitsma J, Bossuyt P, Bouter LM, de Vet H.
Quality of Reporting of Diagnostic Accuracy Studies. Radiology 2005 May; 235(2): 347-53.
18. Moher D, Soeken K, Sampson M, Campbell K, Ben Perot L, Berman B. Assessing the quality of
reports of systematic reviews in pediatric complementary and alternative medicine. BMC Pediatr
ASSESSING THE QUALITY OF REPORTS OF SYSTEMATIC REVIEWS:
THE QUOROM STATEMENT COMPARED TO OTHER TOOLS
Systematic reviews within health care are conducted retrospectively which makes them susceptible to
potential sources of bias. In the last few years, steps have been taken to develop evidence based methods
to help improve the reporting quality of randomized trials in the hope of reducing bias when trials are
included in meta-analysis. Similar efforts are now underway for reports of systematic reviews.
This paper describes the development of the QUOROM statement and compares it to other instruments
identified through a systematic review. There are many checklists and scales available to be used as
evaluation tools, but most are missing important evidence based items when compared against the
A pilot study suggests considerable room for improvement in the quality of reports of systematic reviews
using four different instruments. It is hoped that journals will support the QUOROM statement in a
similar manner to the CONSORT statement.
Shea B, Dubé C, Moher D. Assessing the quality of reports of systematic reviews: The QUOROM statement
compared to other tools. In: Egger M, Smith GD, Altman DG, eds. Systematic Reviews in Health Care.
Meta-analysis in context. BMJ Books 2001:122-139.
There are approximately 17,000 biomedical books published every year and 30,000 biomedical
journals, with an annual increase of 7%.1This makes it very difficult for health care professionals to stay
apprised of the most recent advances and research in their respective fields as they would be required to
read an average of 17 original articles each day.2
To make this task slightly more manageable, health care providers and other decision-makers now have
among their information resources access to a form of clinical report called the systematic review. This is
a review in which bias has been reduced by the systematic identification, appraisal, synthesis, and if
relevant, statistical aggregation of all relevant studies on a specific topic according to a predetermined and
explicit methodology. Theoretically, such reviews can effectively summarize the accumulated research on
a topic, promote new questions on the matter, and channel the stream of clinical research towards
relevant horizons. Consequently, systematic reviews can also be important to health policy planners and
others involved in planning effective health care.
If the results of systematic reviews are to be used by health care providers and health care consumers, it
is necessary that they are as uninhibited of bias as possible (i.e., systematic error). One way to assess the
merits of a systematic review is to assess the quality of its report. It is possible that a scientific report may
not reflect how the investigators conducted their review but rather, their ability to write comprehensively.
Although the data addressing this point is sparse, it appears that a scientific report is a reasonable
marker as to how the project was conducted. In an assessment of the quality of 63 randomized trials in
breast cancer, Liberati and colleagues3reported that the average quality of reports was 50% (95%CI:
46 to 54 %). Following these assessments, the investigators interviewed 62 of the corresponding authors
to ascertain whether information in the manuscripts submitted to publication consideration was
removed prior to its publication. The authors reported that with the additional information obtained
from the interviews, the quality scores only increased marginally to an average score of 57%. These data
come from clinical trials. We are unaware of comparable data for systematic reviews.
Choosing an appropriate evaluation tool for critically appraising the report of a systematic review is as
difficult as the assessment of the quality of reports of randomized trials. A systematic review4designed to
identify and appraise instruments that assess the quality of reports of randomized trials found
twenty-five scales and nine checklists. The scales differed considerably from one another in a variety of
areas including: how they defined quality; the scientific rigor in which they were developed; the number
of items they used; and the time required to use them. When six of the scales were compared to one
another to assess the same randomized trials, divergent scores and rankings were reported.
In an attempt to attain consistency in the reporting quality, the purpose of this Chapter is to identify and
appraise instruments developed to assess the quality of reports of systematic reviews. It will also evaluate
whether different instruments assessing the same meta-analysis would provide similar evidence regarding
A systematic review of published checklists and scales
A literature search was performed to take an inventory of published checklists and scales. Potentially
relevant articles were chosen and the tools described were reviewed. Quality assessment, across a sample
of conveniently selected instruments was tested based on four randomly chosen systematic reviews. A
more detailed description of this process can be found in Box 2: Methodology.
•MEDLINE: January 1966- February 1999
• three independent searches with keywords: meta-analysis, review literature,
systematic or quantitative or methodologic review, overview, review,
information synthesis, integrative research review, guideline, checklist, tool,
scoring, scale, clinimetric, quality, critical reading, methodology.
•PubMed “related articles” function to find others
Identification and selection
• initial screening to identify relevance
• potentially relevant articles reviewed independently by each author
•article eligible regardless of language
• article has to be scale or checklist (def. Annex I) designed to assess quality
of systemic reviews and meta-analyses
• checklists and scales assessed for: 1) number of items included in tool,
2) aspects of quality assessed, 3) whether or not article included explicit
statement regarding purpose of tool, and 4) time for completion of tool
• data extraction was completed in a group and a consensus was reached
• compared items in each quality assessment instrument against
• three checklists and one scale were conveniently selected to compare stability
of quality assessments across instruments
• randomly selected four systematic reviews (from pool of 400 systematic
reviews) to be used for quality assessment based on four selected instruments
• quality assessments completed as a group
• quality assessment established in two ways: 1) a quantitative estimate based
on one item from a validated scale and 2) the proportion of items reported
(as a function of the number of items included in tool) in the systematic