BookPDF Available

Guidance on the conduct of narrative synthesis in systematic reviews: A product from the ESRC Methods Programme

Guidance on the Conduct of Narrative Synthesis in
Systematic Reviews
A Product from the ESRC Methods Programme
Jennie Popay
Helen Roberts
Amanda Sowden
Mark Petticrew
Lisa Arai
Mark Rodgers
Nicky Britten
Katrina Roen and Steven Duffy
Institute for Health Research, Lancaster University
Child Health Research and Policy Unit, City University
Centre for Reviews and Dissemination, University of York
MRC Social and Public Health Sciences Unit, University of Glasgow: Mark Petticrew is funded by the
Chief Scientist Office of the Scottish Executive Department of Health
Peninsula Medical School, Universities of Exeter and Plymouth
Version 1: April 2006
Chapter 1 Introduction and the Purpose of the Guidance……………………………….
Chapter 2: The main elements in a systematic review process...................................
Chapter 3: Guidance on narrative synthesis – an overview……………………………..
Chapter 4: Applying the guidance 1: effectiveness studies……………………………..
Chapter 5: Applying the guidance 2: implementation studies…………………………...
Chapter 6: Implications for future research………………………………………………..
References………………………………………………………………………………….... 69
Appendix 1: Methods used in the production of the guidance………………………..... 73
Appendix 2: Bibliography of methodological texts used in the production of the
The research reported here was funded by the ESRC (Grant reference number H33250019) within
the ESRC Methods Programme. There are many people who have contributed to this work along the
way to whom we would like to extend our thanks. We are grateful to Professor Angela Dale, the
Director of the programme for her support throughout the project. We are also grateful to the
international panel of experts in research synthesis including in particular: Professor David DuBois,
University of Illinois; Dr Jeanne Daly, Mother and Child Health Research, La Trobe University;
Professor Mike Fisher, Social Care Institute for Excellence; Angela Harden, The EPPI Centre,
Institute of Education, London; Professor Cindy Mulrow, University of Texas; Dr Pierre Pluye, McGill
University; Professor Helen Thomas, McMaster University; Dr Carl Thompson, University of York; Dr
Heather McIntosh, NHS Quality Improvement Scotland; and Ros Collins, Dr Catriona McDaid and Dr
Nerys Woolacott at CRD University of York. The extensive and invaluable comments from panel
members helped us to improve on earlier versions of this guidance but the responsibility for the final
product is, of course, entirely our own. Finally, we would like to thank the individuals in our various
administrative teams and research support offices who have supported the work in many ways.
Thanks are due in particular to Erja Nikander, Yvonne Moorhouse, and Vicki Bell in Lancaster and
Mel Bryan at City who helped at various points with producing documents, travel arrangements,
organising meetings, managing budgets and to Lisa Stirk at CRD in York and Vicki Bell at Lancaster
who developed and maintain our website.
We would also like to pay tribute to Professor Sally Baldwin who died as a result of an accident in
October 2003. Sally developed the proposal for this work with us. There have been many tributes to
her extraordinary qualities as a researcher, teacher, colleague and friend. We will not repeat these
here – suffice it to say that her stimulating and good humoured contributions to our intellectual
endeavours have been badly missed.
Do domestic smoke alarms save lives? Can young offenders be 'scared straight' through tough penal
measures? What factors should be considered when designing and implementing a multi-sectoral
injury prevention programme in a local area? Making sense of large bodies of evidence drawn from
research using a range of methods is a challenge. Ensuring that the product of this synthesis process
can be trusted is important for policy makers, for practitioners and for the people research is intended
to benefit. There are a number of ways in which research evidence can be brought together to give
an overall picture of current knowledge that can be used to inform policy and practice decisions.
However, the trustworthiness of some of these methods remains problematic.
The guidance we set out here focuses on a particular approach - narrative synthesis. Variants of
this approach are widely used in work on evidence synthesis, including Cochrane reviews, but there is
currently no consensus on the constituent elements of narrative synthesis and the conditions for
establishing trustworthiness – notably a systematic and transparent approach to the synthesis
process with safeguards in place to avoid bias resulting from the undue emphasis on one study
relative to another – are frequently absent. This guidance therefore aims to contribute to improving
the quality of narrative approaches to evidence synthesis.
1.1 Telling stories – the nature of narrative synthesis
Narrative synthesis is sometimes viewed as a ‘second best’ approach for the synthesis of findings
from multiple studies, only to be used when statistical meta-analysis or another specialist form of
synthesis (such as meta-ethnography for qualitative studies) is not feasible. In fact, even when
specialist methods are used to synthesise findings from multiple studies, those who want to increase
the chances of a scientific synthesis being used in policy and practice are likely to find a narrative
synthesis helpful in the initial stages of a review. Recognising this, the guidance on undertaking
systematic reviews produced by The Centre for Reviews and Dissemination at the University of York
suggests that reviewers should first undertake a narrative synthesis of the results of the included
studies to help them decide what other methods are appropriate.
Narrative synthesis is a form of story telling. We are part of a story telling culture, and bringing
together evidence in a way that tells a convincing story of why something needs to be done, or needs
to be stopped, or why we have no idea whether a long established policy or practice makes a positive
difference is one of the ways in which the gap between research, policy and practice can start to be
bridged. Telling a trustworthy story is at the heart of narrative synthesis.
1.2 Narrative synthesis, narrative reviews and evidence synthesis
‘Narrative’ synthesis’ refers to an approach to the systematic review and synthesis of findings from
multiple studies that relies primarily on the use of words and text to summarise and explain the
findings of the synthesis. Whilst narrative synthesis can involve the manipulation of statistical data,
the defining characteristic is that it adopts a textual approach to the process of synthesis to ‘tell the
story’ of the findings from the included studies. As used here ‘narrative synthesis’ refers to a process
of synthesis that can be used in systematic reviews focusing on a wide range of questions, not only
those relating to the effectiveness of a particular intervention.
Narrative review is a phrase some commentators have used to describe more traditional literature
reviews and they are typically not systematic or transparent in their approach to synthesis.
synthesis - the focus of this guidance - in contrast, is part of a larger review process that includes a
systematic approach to searching for and quality appraising research based evidence as well as the
synthesis of this evidence. A narrative review can also be another name for a description, and is
used in fields as diverse as performance review of staff
to assessing familial patterns in colorectal
Narrative reviews in the sense of traditional literature reviews can be distinguished from
narrative synthesis as the latter refers specifically to a specific approach to that part of a systematic
review process concerned with combining the findings of multiple studies.
Evidence synthesis includes, but is not restricted to, systematic reviews. Findings from research
using a wide range of designs including randomised controlled trials, observational studies, designs
that produce economic and qualitative data may all need to be combined to inform judgements on the
effectiveness, cost-effectiveness, appropriateness and feasibility of a wide range of interventions and
policies. Evidence syntheses may also addresses many other types of questions including, for
example, questions about the current state of knowledge on the causes of particular health or social
problems. They are also undertaken in diverse fields from health services research and sociology to
engineering and urban planning.
1.3 Why this guidance has been produced?
The Cochrane Collaboration, established in 1993, is an international non-profit and independent
organisation, dedicated to making up-to-date, accurate information about the effects of healthcare
readily available worldwide. It produces and disseminates systematic reviews of healthcare
interventions and promotes the search for evidence in the form of clinical trials and other studies of
Since its inception, there have been major developments in methods for the systematic review of
research evidence which have increased the reliability of the evidence about effectiveness available
to decision makers by combining findings from good quality studies which evaluate policies, specific
interventions or professional practices. However, even in reviews focusing on effectiveness, meta-
analysis is often an inappropriate approach to synthesis. Additionally, there has been increasing
recognition of the need for review and synthesis of evidence to answer questions other than those
focusing on effectiveness, particularly those relating to the local implementation of interventions
shown to be effective in experimental contexts. Methods for the synthesis of evidence on
effectiveness when meta-analysis is not appropriate or for the synthesis of more diverse evidence
are, however, not well developed.
Unlike meta-analysis, narrative synthesis does not rest on an authoritative body of knowledge or on
reliable and rigorous techniques developed and tested over time. In the absence of such a body of
knowledge there is, as the Cochrane handbook argues
‘a possibility that systematic reviews adopting a narrative approach to synthesis will be prone to bias,
and may generate unsound conclusions leading to harmful decisions’
This problem is not confined to narrative synthesis - statistical techniques have produced misleading
results in the past (and continue to do so from time to time). However, given the widespread use of
narrative synthesis in systematic reviews there is a pressing need for the methodological foundation
of this approach to be strengthened, if systematic reviews produced to inform the choice and
implementation of interventions are to be credible. This is the aim of this guidance.
1.4 What the guidance is about
The guidance provides advice on the conduct of narrative synthesis in the context of systematic
reviews of research evidence and describes some specific tools and techniques that can be used in
the synthesis. The (synthesis) product, at a minimum,
is a summary of the current state of knowledge
in relation to a particular review question. This question might relate to effectiveness or cost
effectiveness, to issues of efficacy, appropriateness (to need), feasibility of implementation, or to
some or all of these.
We recognise that narrative synthesis can be utilised in reviews addressing a wide range of
questions. However, for practical reasons, we have focused this guidance on the conduct of the
narrative synthesis of research evidence in the context of two types of systematic review which have
particular salience for those who want their work to inform policy and practice: those addressing
questions concerned with the effects of interventions and those concerned with the implementation
of interventions shown to be effective in experimental settings.
1.5 Who the guidance is for
The guidance is intended to be accessible to a range of people involved in systematic reviewing.
However, whilst users of the guidance will not need to be systematic review experts, they will need a
reasonable level of research literacy and we would advise anybody without experience of systematic
review work to collaborate with more experienced colleagues.
The phrase evidence synthesis can be used to mean many different things. At its most simple,
synthesis will involve the juxtaposition of findings from multiple studies, perhaps with some analysis of
common themes or findings across studies. More sophisticated approaches to synthesis involve the
integration or interpretation of results from multiple studies, with the aim of producing new
knowledge/findings. It has been suggested
that different types of evidence synthesis can be located
along a continuum from quantitative approaches, which involve the pooling of findings from multiple
studies (e.g. meta-analysis), to qualitative approaches, which involve an interpretative approach (e.g.
meta-ethnography). The guidance provided here lies between these two. Narrative synthesis will
always involve the ‘simple’ juxtaposition of findings from the studies that have been included in the
review. However, it may also involve some element of integration and/or interpretation, depending on
the type of evidence included. These methods necessarily require some familiarity with research
processes if they are to be done well.
1.6 When might the guidance be used?
The process of evidence synthesis is not linear, so reviewers may use a number of different
approaches to synthesis in an iterative way. Narrative synthesis might be used:
Before undertaking a specialist synthesis approach such as statistical meta-analysis or meta-
Instead of a specialist synthesis approach because the studies included are insufficiently
similar to allow for this
Where the review question dictates the inclusion of a wide range of research designs,
producing qualitative and/or quantitative findings for which other approaches to synthesis are
1.7 Developing the guidance
The methods used in the development of the guidance are described in detail in the appendix and
summarised here. The process began with a systematic search of the methodological literature in an
attempt to identify existing guidance on the conduct of narrative synthesis and any specific tools and
techniques that could potentially be used in the narrative synthesis process. The search process and
results are shown in Figure 1.
The search included three elements: i) a database search, ii) a search of internet sites and iii)
identification of relevant text by members of the research team. This generated 1,309 items. On the
basis of an initial review of titles and, where available, abstracts by at least two members of the
research team 264 of these items were retrieved and read in full by at least two members of the
research team. This process resulted in 69 articles, reports and/or books being included in the
methodological review. None specifically related to narrative synthesis although some elements of
guidance on established methodologies such as meta-ethnography and ‘case survey’ method, for
example, were judged relevant to the conduct of narrative synthesis.
Methodological guidance on the conduct of various different approaches to review and synthesis were
used to identify common generic elements of an evidence synthesis process. Other text provided
‘tips’ on aspects of the evidence review process in general, such as how to structure results and/or
present data and described a number of specific tools and techniques for the management,
manipulation and presentation of quantitative and/or qualitative data. This material formed the basis
of an initial draft of the guidance on narrative synthesis. The guidance was then applied to two
‘demonstration’ syntheses: one focusing on the effectiveness of intervention(s); the other on the
implementation of intervention(s). These demonstration syntheses have been incorporated into the
final version of the guidance to illustrate how the guidance may be used to inform decisions about
which specific tools and techniques to use in the context of a particular review.
Identified by
N = 54
N = 1,145
N =110
Titles &
Titles &
Titles &
N = 0
N = 1,024
N = 21
N = 41
N = 98
N = 64
Total included texts, n = 69
Figure 1. Search process and results
1.8 What the guidance does not do
The guidance does not describe a new approach to the synthesis of qualitative or mixed method
research. Instead this guidance seeks to provide an over-arching framework to guide the conduct of
a narrative synthesis and suggests ways in which current approaches to narrative synthesis may be
further enhanced and developed. Similarly, the guidance is not intended as a source of detailed
methodological advice on the systematic review process as a whole. Whilst there is some limited
discussion, for example, of search strategies and study quality appraisal, the guidance does not
provide details of specific methods for these. We include references to detailed methodological
advice in these and other areas in Appendix 2.
The process of undertaking a systematic review has been well documented and there is broad
agreement about the main elements involved. Six main elements are identified here including the
process of synthesis, the focus of this guidance. The other five elements of a systematic review are
not described in detail. References to detailed methodological advice on systematic reviewing are
included in Appendix 2. This chapter provides a framework to aid understanding of where the
synthesis occurs in the systematic review process.
2.1 Identifying the review focus, searching for and mapping the available
Getting the question(s) ‘right’ is critical to the success of the systematic review process overall. The
review question has to be both relevant to potential users of the review and in theory at least
answerable. In some instances the question is clearly formulated at an early stage. More often,
however, whilst an initial focus for the review is identified, a ‘mapping’ of the available relevant
evidence needs to be carried out before the specific question(s) for the review can be clearly
The mapping exercise can be used to assess the need for a systematic review and/or to guide and
refine the scope of the review. It is especially useful in situations where a broad question is of
interest, such as “how effective are interventions to prevent unintentional injuries?” By mapping the
available literature addressing this topic it is possible to:
Describe the types of interventions that have been evaluated
Describe the sorts of study designs used in these evaluations and
Assess the volume of potentially relevant literature.
Based on this initial mapping the scope of the review can be refined, so that the questions to be
addressed are both answerable and relevant. The search for studies should be comprehensive and
appropriate to the question posed so a mapping exercise may also help to refine a search strategy.
2.2 Specifying the review question
It will take time to get the review question right. In the context of reviews of the effectiveness of
interventions, there is general agreement that a well-formulated question involves three key
components: the people (or participants) who are the focus of the interventions, the interventions, and
the outcomes. Sometimes a fourth component that relates to type of study design is also included. If
the review intends to focus on the factors shaping the implementation of an intervention then the
question will also have to include components related to this, such as aspects of the context in which
the intervention was implemented.
2.3 Identifying studies to include in the review
Once the precise review question has been agreed, the key components of the question form the
basis of specific selection criteria, each of which any given study must meet in order to be included in
the review. It is usually necessary to elaborate on the key components of the review question so as
to aid process of identifying studies to include in the review and make sure that decisions made are
transparent to users of the review. These might include, for example, being more precise about the
age groups of participants to be included in the review or about aspects of the intervention design.
2.4 Data extraction and study quality appraisal
Once studies are selected for inclusion a process of study quality appraisal and data extraction takes
place. Decisions about which data should be extracted from individual studies should also be guided
by the review question. In the context of a systematic review addressing a question about the effect
of a particular intervention, for example, the data to be extracted should include details of: the
participants, the interventions, the outcomes and, where used, the study design. For reviews focusing
on implementation, it would be important to extract detailed data on the design of the intervention, the
context in which it was introduced and on the factors and/or processes identified as impacting on
implementation. The specific data and/or information to be extracted and recorded are usually those
which could affect the interpretation of the study results or which may be helpful in assessing how
applicable the results are to different population groups or other settings. This may be referred to as
applicability, generalisability or external validity.
Study appraisal - also called validity assessment, assessment of study quality and critical appraisal -
usually refers to a process of assessing the methodological quality of individual studies. This is
important as it may affect both the results of the individual studies and ultimately the conclusions
reached from the body of studies - although ‘quality’ in general and validity in particular are defined
differently in relation to different types of study designs. In the context of effectiveness reviews study
quality is often used as a criterion on which to base decisions about including or excluding particular
studies, although this does depend on the approach taken by the reviewers. Whatever the focus of
the review, reviewers may choose to exclude studies from the synthesis on grounds of
methodological quality; others may opt to include all studies, but in this case it is important to
differentiate clearly between more and less robust studies. There are many different appraisal tools
available for use in relation to both quantitative and qualitative study designs and details of how to get
information about some of these are provided in Appendix 2.
2.5 The synthesis
The key element of a systematic review is the synthesis: that is the process that brings together the
findings from the set of included studies in order to draw conclusions based on the body of evidence.
The two main approaches are quantitative (statistical pooling) and narrative, and sometimes both
approaches are used to synthesise the same set of data. One approach - narrative synthesis - is the
focus of detailed attention in this guidance.
2.6 Reporting the results of the review and dissemination
Once the review is complete the findings need to be disseminated to potential users, although
communication needs to be considered from the start often with the involvement of policy, practice
and end point users and throughout the review process. We have included some useful references to
the ‘art’ of dissemination - an often neglected component of the systematic review process in
Appendix 2.
As we have noted this guidance focuses on the conduct of narrative synthesis in systematic reviews
of research-based evidence on:
The effects of interventions and/or
The factors shaping the implementation of interventions.
Although we have restricted our focus in this way, the guidance may also be helpful for people
focusing on other types of review questions, for example, about the needs and/or preferences of
particular population groups or the causes of particular social and/or health problems.
Our aim is to provide broad guidance on ways in which the process of narrative synthesis can be
made more systematic and transparent and on how bias introduced by the evidence itself (as a result
of methodological shortcomings in the included studies) and/or by decisions made by reviewers (for
example, through the process of inclusion and exclusion) can be minimised. The guidance does not
provide a set of definitive prescriptive rules on the conduct of narrative synthesis. In our experience
the most appropriate approach and the selection of specific tools and techniques for data
management and manipulation depends on the nature of the particular review being conducted.
In this chapter we describe a generic framework that identifies four elements of the narrative
synthesis process and various tools and techniques that can be used to manage data, manipulate
and synthesise findings from multiple studies and present the results of the synthesis. In the following
two chapters we describe in detail the practical application of the guidance and particular tools and
techniques to the synthesis of two bodies of research-based evidence: one concerned with the effects
of an intervention the other concerned with factors influencing the implementation of an intervention.
3.1 A general framework for narrative synthesis
For the purpose of this guidance we have identified four main elements to a narrative synthesis
Developing a theory of how the intervention works, why and for whom
Developing a preliminary synthesis of findings of included studies
Exploring relationships in the data
Assessing the robustness of the synthesis
Figure 2 describes the purpose of each of these four elements of a synthesis in relation to a
systematic review focusing on (1) the effects and (2) the factors impacting on the implementation of
an intervention/programme.
We are not suggesting that narrative synthesis should proceed in a linear fashion with these elements
being undertaken sequentially. In practice, reviewers will move in an iterative manner among the
activities we have suggested make up these four elements. We have separated them out and
presented them sequentially simply to provide a structure to the guidance. In the following sections
we focus on these elements in turn in order to explain the aims of each in more detail. We then
provide brief descriptions of tools and/or techniques that may be utilised in the conduct of a narrative
synthesis before moving on in the subsequent chapters to demonstrate the practical application of the
narrative synthesis framework and the specific tools and techniques.
Main elements
of synthesis
Effectiveness Reviews Implementation Reviews
1. Developing a
theoretical model
of how the
work, why and for
To inform decisions about the review
question and what types of studies to
To contribute to the interpretation of the
review’s findings
To assess how widely applicable those
findings may be
To inform decisions about the review question
and what types of studies to review
To contribute to the interpretation of the review’s
To assess how widely applicable those findings
may be
2.Developing a
To organise findings from included
studies to describe patterns across the
studies in terms of:
o The direction of effects
o The size of effects
To organise findings from included studies in
order to:
o Identify and list the facilitators and barriers to
implementation reported
o Explore the relationship between reported
facilitators and barriers
3. Exploring
relationships in
the data
To consider the factors that might explain
any differences in direction and size of
effect across the included studies
To consider the factors that might explain any
differences in the facilitators and/or barriers to
successful implementation across included
To understand how and why interventions have
an effect
4. Assessing the
robustness of the
synthesis product
To provide an assessment of the strength
of the evidence for:
o Drawing conclusions about the likely
size and direction of effect
o Generalising conclusions on effect
size to different population groups
and/or contexts
To provide an assessment of the strength of the
evidence for drawing conclusions about the
facilitators and/or barriers to implementation
identified in the synthesis. Generalising the
product of the synthesis to different population
groups and/or contexts
Figure 2. The main elements in a narrative synthesis
Element 1: The role of theory in evidence synthesis
Although not all reviewers may choose to do this, it can be useful to develop a model of what Weiss
refers to as an intervention’s “theory of change” to inform a systematic review. The “theory of change”
describes “the chain of causal assumption that link programme resources, activities, intermediate
outcomes and ultimate goals”.
It is concerned with how the intervention works, why, and for whom.
Reviewers would normally develop their theory of change at an early stage of a review before the
synthesis proper begins. If done early enough an understanding of the theory behind the intervention
can inform decisions about the review question and the types of studies to include. In terms of the
narrative synthesis, a “theory of change” can contribute to the interpretation of the review’s findings
and will be valuable in assessing how widely applicable those findings may be. Information on
programme theory may come from explicit statements in study reports on the goals of the intervention
(who it is intended to affect, in what way and how) and from other reviews. The theory can be
presented in narrative form or as a diagram like the one reproduced below in Figure 3.
Theory building and theory testing is a neglected aspect of systematic reviews. Shadish (1996) has
pointed out that meta-analysis for example has focused too much on descriptive causation (simply
describing the size of an effect) and too little on the development of explanatory theories.
systematic reviews - whether of qualitative or quantitative research - are likely to be much more
powerful than single studies for these purposes. In turn systematic reviews can contribute to
developing and testing the limits of theories, by examining how contextual or temporal variables
moderate outcomes. Theories themselves can also be the subject of systematic reviews.
The notion of ‘effects’ should not be taken for granted. In some reviews the synthesis process will involve the
reviewers in a process intended to help to understand what the effects of a particular interventions or programme
are. This is particularly the case when the effects are presented in narrative form rather than in numerical form or
derived from structured questionnaires/indicators.
Teachers’ salaries increase
Teacher morale
Classroom climate
becomes more
Teachers give up their
second jobs and put
full energies into
Abler people are
attracted to teaching
Teachers work harder
at teaching and put
more effort into
preparation and
Teachers develop more
congenial relationships
with students
Teachers prepare
lessons more
School districts hire
abler teachers
Students understanding
of their material
Students seek to
maintain good relations
with their teachers
Teachers employ a
greater variety of
pedagogical strategies
Abler teachers teach
more effectively
Students work harder Teachers teach more effectively
Increased student achievement
Figure 3. Example of a Programme Theory model: mechanisms by which higher teachers’ pay
may be linked to increased student achievement (from Weiss, 1998)
Element 2: Developing a preliminary synthesis
Whatever the focus of the review, the purpose of the preliminary synthesis is to develop an initial
description of the results of included studies. It is important to remember that the product of this initial
process will only be preliminary, rather than an end in itself. It will always be necessary to
interrogate the preliminary synthesis to identify factors that have influenced the results reported in
included studies i.e. to begin to construct an explanation of how and why a particular intervention had
the effects reported; of how and why particular factors/processes impinged on implementation, and to
test the robustness of the results of the synthesis. This is the purpose of other elements of the
synthesis process described below.
During the preliminary synthesis, reviewers focusing on the effects of an intervention will need to
organise the results of the included studies so they are able to describe patterns across them in terms
of both the direction and size of the effects reported. In relation to a review on implementation, the
studies need to be organised so that patterns in the factors/processes that are reported as impacting
in some way on the implementation of an intervention can be identified across the studies. Assuming
that study quality appraisal has been carried out at the same time as data extraction these details will
be available during the whole of the synthesis process although quality was not examined in our
demonstration synthesis reported later until near the end of the synthesis.
Element 3: Exploring relationships within and between studies
As patterns across study results begin to emerge from preliminary attempts at a synthesis reviewers
should begin to subject these to rigorous interrogation in order to:
Identify any factors that might explain differences in direction and size of effect across the
included studies or in the type of facilitators and/or barriers to successful implementation
To understand how and why interventions have or do not have an effect or why particular
barriers and/or enablers to implementation operate
At this point in the synthesis the reviewers move beyond identifying, listing, tabulating and/or counting
results to exploring relationships within and across the included studies. The relationships of interest
are of two broad types:
Those between characteristics of individual studies and their reported findings
Those between the findings of different studies
Some of the studies included in a review may have reported information about relationships between
study characteristics and reported findings, in which case the job of reviewers is to compare and
contrast the ways in which the relationships have been identified and analysed across the studies. In
other cases little attention may have been paid to these relationships. The practical work involves
using data previously extracted from primary studies to look at the relationships between study results
and key aspects of the primary studies, and comparing and contrasting these relationships across the
studies. This element of a narrative synthesis can be very time consuming but it is critical to the
quality of the process as a whole.
Exploring the influence of heterogeneity is important at this stage of the synthesis process. We have
already noted that a primary reason for choosing a narrative approach to synthesis in a systematic
review about the effects of an intervention is because there is considerable heterogeneity in the
included studies in terms of methods, participants, interventions and via other unknown sources.
There are also likely to be differences between studies in terms of their findings – whether quantitative
or qualitative. This too may be due to known differences between the studies, including
methodological differences, and differences in the baseline characteristics of populations being
Narrative methods have long been recognised as useful for investigating heterogeneity
across primary studies and developing an understanding of which aspects of an intervention may be
responsible for its success
or investigating the possibility that study variation is attributable to
theoretical variables.
Many social or behavioural interventions are complex because of the characteristics of the
interventions, study population/s, outcomes, or other methodological issues relating to the conduct of
the primary studies.
Further complexity is introduced because any effects of the interventions may
be modified by context, and the intervention itself may vary when it is being implemented.
Because of these variations, reviewers of complex interventions may expect considerable
heterogeneity across studies and need to consider this when synthesising results.
“Social” heterogeneity may incorporate not only socio-demographic and individual differences, but
also historical, cultural, spatial and other differences that may affect both the delivery and impact of
the interventions being reviewed. Some of the main sources of variability that reviewers need to
consider when ‘testing’ the robustness of the patterns emerging from the included studies are outlined
below (adapted from guidance produced by the Cochrane Health Promotion and Public Health
Variability in outcomes
In systematic reviews of clinical interventions variation in outcomes is termed clinical heterogeneity.
Variation also exists in social research, however, given the longer causal chains for many social
interventions (including public health interventions), proximal/immediate, intermediate, and distal/long
term outcomes may be reported. Whilst the synthesis would ideally seek to address all these
outcomes in practice it is often not feasible to do this.
Variability in study designs
Methodological diversity is common in systematic reviews of social interventions. Where the main
potential sources of variation are known, heterogeneity between effects can be explored by means of
subgroup analysis, based for example on theories about how the intervention works, and for which
groups. For many social and public health interventions, theories about mechanisms and interactions
may be under-developed and the exploration and interpretation of heterogeneity complex. It may
therefore be difficult to anticipate the main sources of heterogeneity a priori.
Variability in study populations, interventions and settings
The content of complex social interventions may vary between specific settings or populations. Some
of the variability may be intentional as interventions are tailored to local needs (including
characteristics which may influence the outcomes of interest such as race, gender, and socio-
economic position).
As noted earlier an understanding of the interventions ‘theory of change’ will be particularly valuable
when exploring the influence of heterogeneity especially when interpreting differences between
subgroups of studies (post-hoc sub group analyses). The findings of individual studies will vary with
study characteristics such as intervention type, quality and extent of implementation, and the study
setting, and may vary between different subgroups of participants. Developing plausible explanations
for these differences (some of which will be due to chance) is difficult but sub-group findings that are
supported by an a priori rationale (that is, which have been described in the programme theory) are
more plausible than those which are not.
The extent to which reviewers are able to consider the impact of context in systematic reviews
evaluating the effects of interventions or factors impacting on implementation will depend on the
availability of relevant information in the included studies. Typically, reviews focusing on effects do
not consider the context in which an intervention is implemented in great depth. Given that
implementation studies are focusing specifically on how dimensions of context (alongside other
factors) impinge on implementation, the data available in these studies should be much richer.
However, research has suggested that there may be a particular problem with inadequate reporting of
research methods in these studies.
The dimensions of context which might be relevant to exploring
differences in the reported results of included studies will depend on the nature of the intervention
with which the review is concerned.
Other factors to be considered in this exploration of factors mediating the impact of an intervention, or
explanations of how or why it has a particular impact, may not be able to be extracted from studies as
‘data’. These include information about the general approach taken by the researchers both in terms
of theory and methods.
Element 4: Assessing the robustness of the synthesis
The notion of robustness in relation to evidence synthesis is complex. Most straightforwardly
robustness can be used to refer to the methodological quality of the primary studies included in the
review and/or the trustworthiness of the product of the synthesis process. Obviously, these are
related. The trustworthiness of a synthesis will depend on both the quality and the quantity of the
evidence base it is built on. If primary studies of poor methodological quality are included in the
review in an uncritical manner then this will affect the trustworthiness of the synthesis.
The trustworthiness of the synthesis will also depend on the methods used in the synthesis. This will
depend on the measures taken to minimize bias, ensuring, for example, that studies judged to be of
equal technical quality are given equal weight or if not providing a sound justification for not doing so.
Another less straightforward aspect of robustness that can impact on the trustworthiness of the
synthesis is the extent to which reviewers have enough information to judge that individual studies
meet the criteria for inclusion. This can be a significant problem with reviews of complex
interventions. Authors of primary studies often fail to provide adequate information on the intervention
they are focusing on and there can be inconsistency between studies in the definition of what
constitutes a particular intervention. It is particularly important that reviewers give detailed information
about the interventions they plan to include and exclude from a review: for example, stating that
‘psychological interventions are eligible’ is unlikely to be adequate.
Towards the end of the synthesis process, therefore, the analysis of relationships within and between
studies described above should lead into an overall assessment of the strength of the evidence
available for drawing conclusions on the basis of a narrative synthesis. This should include
systematic attention to all three elements of robustness discussed above.
It is particularly important that the results of any appraisal of the methodological quality of included
studies be considered in a systematic manner. Whilst there are well-established methods for
assessing the quality of intervention studies, this is not the case in relation to studies of
implementation processes, qualitative research or mixed methods research in general so there are no
approaches to quality assessment that can be recommended in these situations. Additionally, the
results of the appraisal process may or may not have been used to exclude some studies on
methodological grounds. Whatever approach to quality appraisal is adopted, (probably at an earlier
stage of the review process) this information should inform the assessment of the strength or weight
of the evidence available to support conclusions drawn on the basis of the synthesis process.
3.2 Tools and techniques for narrative synthesis
In this section we provide brief descriptions of the tools and techniques we have identified which can
be used in the process of narrative synthesis. We have divided these into those which appear to be
most appropriate for use in each of the three analytical elements of the synthesis.
At the beginning of each sub-section below the main tools and techniques are listed in a table. As we
have noted, decisions about which of these are appropriately used in a specific synthesis will be
determined by the nature of the evidence being synthesised as will be illustrated in the practical
applications of the guidance.
Before describing the tools and techniques a general comment about the visual representation of data
from included studies is warranted. Many of the specific tools and techniques described involve
visual representation and this can be invaluable at all stages of a synthesis. However, it is important
to recognise that visual representation of data is not sufficient in itself as a synthesis. As Evans
argued, for example, tabulation and other visual representations of data tend to reduce studies to their
key characteristics neglecting aspects that could be important in understanding the patterns revealed.
He draws a distinction between ‘descriptive synthesis’ and ‘interpretive synthesis’ and is critical of the
heavy reliance placed by some reviewers on synthesis by tabulation. For commentators such as
Evans, the relationship between the visual representation of data (the descriptive synthesis) and the
narrative elaboration of the patterns identified (the interpretative synthesis) is critical to the quality of a
narrative synthesis.
Element 1: Tools and techniques for developing a theory of change
We have not identified specific tools or techniques for use in the development of a theory of change
although some of those described for use at other points in the synthesis process may also inform
theory development and elaboration - as highlighted in the practical application of the guidance in
chapters four and five.
Element 2: Tools and techniques for developing a preliminary synthesis
1. Textual descriptions of studies
2. Groupings and clusters
3. Tabulation
4. Transforming data into a common rubric
5. Vote counting as a descriptive tool
6. Translating data; thematic analysis
7. Translating data: content analysis
Textual descriptions
A simple starting point in a preliminary synthesis is to produce a descriptive paragraph on each
included study - it may also be useful for recording purposes to do this for all excluded studies as well.
In many reviews this will have been completed at an early stage in the review process and it can be
done for any type of study. It is important that these narrative descriptions are produced in a
systematic way, including the same information for all studies if possible and in the same order.
Some reviewers have suggested that studies considered more important in terms of what they offer
the review may be discussed at greater length, while briefer discussion may be afforded to less
central or informative studies.
In theory this is a way of giving more weight to higher quality or larger
studies within a narrative synthesis. However, it is difficult to determine how much “weight” in terms of
description/discussion should be allotted to individual studies and how this should vary with
methodological quality, for example. Additionally, if textual descriptions are produced at an early
stage of the review process it will not be possible to give more weight to one study over another and
hence a fuller description because methodological quality and other aspects of relevance will not yet
have been assessed. Whilst textual descriptions are a useful way for reviewers to become familiar
with the included studies and to begin to compare and contrast findings across studies, it can be very
difficult to discern patterns across studies from these textual descriptions, particularly when there are
a large number of studies.
Groupings and clusters
There can be considerable variation in the number of studies included in systematic reviews. Some
Cochrane reviews, for example, conduct the synthesis on a very small number of studies, often
because of very tightly defined inclusion/exclusion criteria and/or to a paucity of research addressing
the question of interest. Other reviews include large numbers of studies in the pool to be
synthesised. In most cases the number of studies included will be determined by the size and quality
of the existing literature. Whilst including findings from large numbers of studies can be labour
intensive, the analytical process involved in statistical meta-analyses can readily manage large
numbers. This is not the case with narrative synthesis. Usually therefore, a process of narrative
synthesis will involve organising the included studies into smaller groups to make the process more
manageable. Although the reviewers may start to group the included studies at an early stage of the
review, it may be necessary to refine these initial groups as the synthesis develops.
Organising the included studies into groups can also be a useful way of aiding the process of
description and analysis and looking for patterns within and across these groups. It is important to
use the review question(s) to inform decisions about how to group the included studies. Studies can
be grouped according to one or a combination of the following: the type of intervention being studied;
the setting or context for the intervention (school or community based interventions for example); the
group at whom it is being directed (different age groups, for example); the study design; and/or the
nature of the results being reported (different outcome measures for example, or different types of
factors impacting on implementation).
Tabulation is a common approach used in all types of systematic review to represent both quantitative
and/or qualitative data visually - indeed many of the examples of approaches to description included
in this guidance are presented in tabular form. Tabulation can be useful at any stage of the
preliminary synthesis process according to the preference of reviewers. It can be particularly useful in
helping to develop an initial description of the included studies and to begin to identify patterns across
studies. They are typically used to provide details of study design, results of study quality
assessment, outcome measures and other results. These data may be presented in different
columns in the same table or in different tables. Used thoughtfully, tabulation can be a valuable tool
in the preliminary synthesis of results across studies and can provide important building blocks for
future elements of the synthesis process.
22, 24-26
Some authors stress the need to take care with the layout of tables, arguing that the way in which
data are tabulated may affect readers’ impression of the relationships between studies. For example,
‘placing a results column adjacent to any of the characteristics or quality columns could invite
speculation about correlation and association’.
These notes of caution point to the importance of
the reviewers’ attempting some narrative interpretation of tabulated data.
22, 24
Transforming data: Constructing a common rubric across quantitative studies
The results of studies included in a review may take different numerical and/or statistical forms.
these situations reviewers need to transform results into a common numerical/statistical rubric if
possible. When extracting data from quantitative studies, it is standard practice to extract the raw or
summary data from included studies wherever possible, so a common statistic can be calculated for
each study, e.g. converting dichotomous data into odds ratios or relative risks and continuous data (if
from different measurement scales) into standardised mean differences (SMD). In a review of
effectiveness which incorporates a statistical meta-analysis these results would be pooled to provide
a single estimate of effect. In a narrative synthesis study results will not be pooled statistically, so the
process cannot provide a new single estimate of effect. However, transforming study results into a
The distinction being made here, between numerical and statistical, relates to the possibility that figures
provided as percentages, for example, would not accurately be described as statistics.
common rubric will allow reviewers to develop a meaningful summary of study results and a more
robust assessment of the range of effects that would be anticipated from a particular intervention.
Vote-counting as a descriptive tool
Although some commentators
have argued strongly against ‘vote counting’ calculating the
frequency of different types of results across included studies can be a useful way of producing an
initial description of patterns across the included studies.
Indeed, it could be argued to be an
intrinsic element of the preliminary stages of any narrative synthesis. In the case of reviews
evaluating the effects of an intervention, a simple approach to vote-counting might involve the
tabulation of statistically significant and non-significant findings. Some reviewers have developed
more complex approaches to vote counting, both in terms of the categories used and by assigning
different weights or scores to different categories.
The interpretation of the results of any vote counting exercise is a complex task. According to some
methodologists writing about vote counting, the category with the most studies “wins”.
Similarly in
the context of reviews of effects, some commentators argue that the statistical significance category
‘containing the largest number of studies represents the direction of the true relationship’.
it has also been argued that, this approach to synthesis “tends to give equal weight to studies with
different sample sizes and effect sizes at varying significance levels, resulting in misleading
There are examples where vote counting has been compared with other methods of
synthesis and major differences in findings have been reported.
So, whilst vote counting can be a
useful step in a preliminary synthesis the interpretation of the results must be approached with caution
and these should be subjected to further exploration of relationships between data/findings within and
across the included studies.
Translating data: thematic and content analysis
Where results are presented in the form of themes or concepts, as is the case in qualitative research
or some surveys, studies focusing on similar topics may have conceptual overlaps, even if these are
not apparent from the way the results are reported. Alternatively, apparently similar concepts in
different studies may actually be referring to different phenomena. In this context a process of
‘translation’ of primary themes or concepts reported across studies can be used to explore similarities
and/or differences between different studies.
Where studies involve both qualitative and quantitative
data reviewers may decide to construct a common rubric for the synthesis – this could involve
transforming qualitative findings into quantitative form or vice versa. Both thematic analysis and
content analysis can help in this process of ‘translation’ or ‘interpretation’ as it is sometimes referred
Thematic analysis
Thematic analysis, a common technique used in the analysis of qualitative data in primary research,
can be used to identify systematically the main, recurrent and/or most important (based on the review
question) themes and/or concepts across multiple studies. Although usually used with qualitative
data some people have argued that it could be used with studies involving quantitative data or data
from mixed method studies. For example the variable labels included in survey research may be
extracted as ‘themes’ in the same way as conceptual themes are extracted from qualitative research
Thematic analysis provides a means of organising and summarising the findings from
large, diverse bodies of research. The analysis would typically, but not invariably, be developed in an
inductive manner; i.e. without a complete set of a priori themes to guide data extraction and analysis
from the outset. Thematic analysis tends to work with, and reflect directly, the main ideas and
conclusions across studies, rather than developing new knowledge although this is possible.
There are problems with thematic analysis from the perspective of a systematic review. The process
can, for example, be associated with a lack of transparency – it can be difficult to understand how and
at what stage themes were identified. The results of the synthesis might look very different if an
entirely a priori, theoretically-driven approach had been used as against an inductive approach. In
this context it is important that reviewers give as much detail as possible about how a thematic
analysis was conducted.
Content analysis
Content analysis was developed as an analytical approach for primary research, but it is readily
applied to the synthesis of findings from multiple studies. Content analysis has been defined as ‘a
systematic, replicable technique for compressing many words of text into fewer content categories
based on explicit rules of coding.’
Unlike thematic analysis, it is essentially a quantitative method,
since all the data are eventually converted into frequencies, though qualitative skills and knowledge of
underlying theory may be needed to identify and characterise the categories into which findings are to
be fitted.
Element 3: Tools and techniques for exploring relationships
1. Graphs, frequency distributions, funnel plots, forest plots and L’Abbe plots
2. Moderator variables and sub-group analyses
3. Idea webbing and conceptual mapping
4. Translation : reciprocal and refutational
5. Qualitative case descriptions
6. Investigator/methodological triangulation
7. Conceptual triangulation
Graphs, frequency distributions, funnel plots, forest plots and L’Abbe plots.
There are several visual or graphical tools that can help reviewers explore relationships within and
between studies, although these are typically only useful in the context of quantitative data. These
presenting results in graphical form
plotting findings (e.g. effect size or factors impacting on implementation) against study
plotting confidence intervals; and/or plotting outcome measures
Frequency distributions, funnel plots, forest plots, and L’Abbé plots are other possibilities. These
tools do not provide any overall interpretative synthesis of the data presented in the plot. There may
be good reasons for reviewers not to provide an overall interpretative synthesis of the data presented
graphically but it is normally good practice to do so and if not done then it is important that reviewers
‘explain’ their reasons for not presenting an overall narrative synthesis of these types of
representations of data.
Moderator variables and subgroup analyses
There is a growing consensus that when evaluating the impacts of interventions the important
questions are “what works, for whom, and in what circumstances”. One approach to answering these
questions when findings are quantitative is by means of analysing moderator variables – variables
which can be expected to moderate the main effects being examined by the review. This can be
done at the study level, by examining characteristics that vary between studies (such as study quality,
study design or study setting) or by analysing characteristics of the sample (such as groups of
outcomes, or participants), based on some underlying theory as to the effects of those variables on
outcomes. An analysis of moderator variables can be guided by questions such as:
What are the moderators that the authors of the primary studies identify?
What are the contributing factors that appear to recur across the studies even if they have
not been explicitly identified by authors as moderators?
How much difference do the likely moderators appear to make to the study results?
What possible relationships are there among the moderators?
One approach currently used to explore moderators is to examine the effects of interventions across
different social groups. Systematic reviewers have argued for some years for the importance of
exploring moderator effects in systematic reviews.
8, 38, 39
Methodological groups working within the
Cochrane Collaboration have also contributed extensive empirical and other work on these issues.
For example the Cochrane Methods Group in Subgroup Analysis has demonstrated some of the
methodological and epistemological pitfalls. A new Joint Cochrane Campbell Methods Group has
also been formed focusing on equity issues in systematic reviews and exploring the effects of socio-
economic moderators will be an important focus for this group. Explorations of effects in subgroups
can also play an important role in testing and developing theory in systematic reviews. They can be
an important tool for assessing the strength of relationships, for testing the limits of theoretical
concepts and explanations, and can contribute to the development of new theories.
Developing conceptual models
There are a number of approaches to exploring relationships within and across the studies included in
a systematic review that can be broadly described as conceptual models. The basic idea
underpinning these approaches is (i) to group findings that reviewers decide are empirically and/or
conceptually similar and (ii) to identify (again on the basis of empirical evidence and/or
conceptual/theoretical arguments) relationships between these groupings. The approaches often
involve visual methods to help to construct groupings and relationships and to represent the final
product of this process. Three specific approaches were identified in the methodological literature
review conducted to support the production of this guidance: idea webbing, conceptual mapping and
conceptual triangulation. Although we describe them separately below they are very similar as we
discuss in the demonstration syntheses reported in chapter 4 and 5. It is perhaps worth noting that
these tools can also be used to develop review questions and to begin to identify moderator variables
to be explored in more detail before the synthesis begins but we do not discuss these uses in this
Ideas webbing
Ideas webbing suggested by Clinkenbeard,
as a method for conceptualising and exploring
connections among the findings reported by the studies included in a review. This approach uses
spider diagrams to develop a visual picture of possible relationships across study results.
Concept mapping
Mulrow, Langhorne & Grimshaw
describe a similar process which we refer to as concept mapping.
Their approach involves linking multiple pieces of evidence extracted from across individual studies
included in a review to construct a model highlighting key concepts or issues relevant to the review
question and representing the relationships between these. This approach uses diagrams and flow
charts to visually represent the relationships being explored. The notion of conceptual triangulation
described by Foster appears to be very similar in that it is concerned to explore relationships between
data drawn from within and between studies.
Foster argues that this approach alleviates ‘concerns
about combining numbers and text because both qualitative and quantitative results can be portrayed
conceptually”. The approach relies heavily on tables to facilitate the analysis and produces a number
of possible models through which the phenomenon of interest may be better understood on the basis
of the diverse sources of evidence synthesised.
Translation as an approach to exploring relationships
Translation as a process for synthesis is typically associated with the work of Noblit & Hare on meta-
It is a way of using qualitative research techniques to synthesise findings from
multiple studies. The term ‘meta’ in this context refers to the translation of studies into one another.
Although developed for use with qualitative research, the approach could be used with a mixture of
qualitative and quantitative evidence. Translation focuses on seeking a common rubric for salient
categories of meaning, rather than the literal translation of words or phrases. Noblit and Hare identify
two different types of ‘translation’:
1. Reciprocal translation (accounts are directly comparable)
2. Refutational translation (the accounts are oppositional)
In practice there are few examples of refutational translation in the literature. Having translated the
studies into one another, they suggest that reviewers should develop a ‘line of argument’ drawing
inferences from the results of the translation. The line of argument is developed by examining
similarities and differences between cases to integrate them in a new interpretation that ‘fits’ all the
studies. Meta-ethnography is a specialist approach to synthesis (akin to statistical meta-analysis with
quantitative studies) and not therefore an approach to be utilised in full in the context of a narrative
synthesis. However, the translational process may be of value as a way of exploring relationships
across studies. The inductive nature of the process means it is emergent, the initial question or area
of interest may be adapted or redirected, and there are numerous judgement calls along the way. Of
course the same can be argued for other types of synthesis.
Qualitative case descriptions
As Light and Pillemer note, formal statistical procedures may be able to detect subtle differences in
effectiveness but they do not necessarily explain them.
These authors argue that ‘qualitative case
descriptions’ are particularly valuable in helping with the interpretation of statistical findings. However,
they give relatively little practical advice about how one would go about doing this type of case
description. In general terms qualitative case description would seem to include any process in which
descriptive data from studies included in a systematic review are used to try to explain differences in
statistical findings, such as why one intervention outperforms another (ostensibly similar) intervention
or why some studies are statistical outliers. As an example of this process they suggest that in a
review of the effectiveness of educational programmes the reviewers might use a range of information
from the included studies to seek to answer questions such as:
What are the characteristics of successful implementations?
How were the teachers trained?
How were parents involved?
What were the details of the educational programme?
This kind of descriptive information may or may not be reported in the original study reports. The
textual descriptions of studies described earlier would be a potential resource for this type of work.
Investigator triangulation and methodological triangulation
Approaches to triangulation focus on the methodological and theoretical approaches adopted by the
researchers undertaking the primary studies included in a systematic review. Consideration of how
these differ across the included studies may be helpful in exploring the nature and impact of
moderators in quantitative research or broader relationships in qualitative research. Some authors
argue that by working with a number of different triangulation approaches reviewers can develop a
better understanding of how the various factors involved in the intervention and its evaluation may
have impacted on the results reported in included studies.
Investigator triangulation was developed by Begley to explore the extent to which heterogeneity in
study results may be attributable to the diverse approaches taken by different researchers.
approach involves analysing the data in relation to the context in which they were produced, notably
the disciplinary perspectives and expertise of the researchers producing the data.
Begley is
focusing on primary research but this approach could be valuable for evidence synthesis too. It works
from the understanding that each disciplinary approach may have produced different kinds of findings.
Considering what kinds of evidence and what kinds of outcomes emerge from studies conducted by
researchers from particular disciplinary and epistemological positions is potentially an illuminating way
to think about possible sources of heterogeneity. This approach will be easier if the review is being
undertaken by a multidisciplinary research team “allowing data to be subjected to a range of
disciplinary gazes”.
Methodological triangulation was developed by Maggs-Rapport and offers a broadly similar
Both of these approaches serve as a reminder that the evidence being synthesised in a
systematic review does not offer a series of discrete ‘answers’ to a specific question. Rather, each
‘piece’ of evidence offers a partial picture of the phenomenon of interest. The product of the
systematic review, particularly in the case of narrative synthesis, may not be a ‘meta-answer’ to the
review question, but a theoretical insight and/or a new model that informs understanding about the
mechanisms underlying the results reported.
Element 4: Tools and techniques for assessing robustness of the synthesis
1. Weight of Evidence – e.g. the EPPI approach
2. Best Evidence Synthesis
3. Use of validity assessment – e.g. the CDC approach
4. Reflecting critically on the synthesis process
5. Checking the synthesis with authors of primary studies
Weight of Evidence – the EPPI approach
The Weight of Evidence approach developed by staff of the EPPI-Centre is used in many EPPI-
Centre reviews.
In the EPPI approach relevance criteria are set for a particular review and studies
are then assessed for relevance using these. Those that are judged to be relevant are then assessed
for methodological quality.
Best Evidence Synthesis (BES)
BES deals with the robustness in terms of the methodological quality of included studies though the
application of inclusion criteria. This is based on an approach described by the educational
researcher Robert Slavin.
46, 47
In BES, only studies that meet minimal standards of methodological
adequacy and relevance to the review are included, and information is extracted in a common
standard format from each study, with a systematic approach to the assessment of study quality and
study relevance. This approach is not prescriptive about the study designs which can to be included
in a review – this can vary, depending on the review question. BES aims to identify and synthesise
sources of evidence no matter how diverse. It has been suggested however that BES is simply an
example of good systematic review practice albeit with some problems. Suri, for example, suggests
that in extracting data from the primary studies BES tends towards calculating the median effect size,
rather than calculating a weighted mean effect size, as is standard meta-analytic practice.
Although BES accounts cover the whole review process the approach focuses in particular on the
selection of studies into a systematic review rather than focusing on the synthesis, thus emphasising
that decisions about study quality should be taken early in the review process to ensure that the
review is based on robust evidence. The decision about “strength of evidence” is therefore made
early in the review process, and its practical application can be seen in the inclusion and exclusion
criteria. For this reason the demonstrations of the application of the narrative synthesis guidance
reported in the next two chapters were not able to utilise the approach to check the robustness of the
synthesis findings.
Use of validity assessment – Centre for Disease Control (CDC) approach
Other approaches to assessing the strength of evidence included in evidence synthesis have been
developed. For example, specific rules may be used to define explicitly what is meant by “weak”,
“moderate” or “good” evidence. There are numerous examples of this form of synthesis but few from
the social sciences. One recent example from healthcare comes from the CDC Community Guide to
Preventive Services.
In this approach, the reasons for determining that the evidence is insufficient
are: A. Insufficient designs or executions, B. Too few studies, C. Inconsistent, D. Effect size too small,
E. Expert opinion not used. The categories are not mutually exclusive. While the criteria can be
debated, the grounds on which the decision about strength of evidence is made are at least explicit.
Many other healthcare evidence grading systems use a similar approach.
Reflecting critically on the synthesis process
Busse et al
recommend that in reporting the results of a systematic review a summary discussion
section should be provided including the following:
Methodology of the synthesis used (especially focusing on its limitations and their
influence on the results)
Evidence used (quality, validity, generalisability) – with emphasis on the possible sources
of bias from the sources of evidence used and their potential influence on results of the
Assumptions made
Discrepancies and uncertainties identified (the way that any discrepancies in findings
between included evidence were dealt with in the synthesis should be discussed and
wherever the evidence is weak or non-existent, areas where future research is needed
can be highlighted)
Expected changes in technology or evidence (e.g. identified ongoing studies)
Aspects that may have an influence on the implementation of the technology and its
effectiveness in real settings
Such a summary would enable the analysis of robustness to temper the synthesis of
evidence as well as indicating how generalisable the synthesis might be.
Checking the synthesis with authors of primary studies
In the context of their meta-ethnography of qualitative research Britten et al suggest consulting the
authors of included primary studies in order to test the validity of the interpretations developed during
the synthesis and the extent to which they are supported by the primary data.
This is most likely to
be useful where the number of primary studies is small but the authors of the primary studies may
have useful insights into the possible accuracy and generalisability of the synthesis.
3.3 Conclusion
In this chapter we have provided an overview of the four main elements of the narrative synthesis
process that we have identified and briefly described various tools and techniques that can be used at
different points in the synthesis process. In the next two chapters we describe in detail the practical
application of the guidance, including the use of particular tools and techniques, to the synthesis of
two bodies of research evidence. Chapter four focuses on a narrative synthesis of the findings of the
11 RCTs included in the Cochrane systematic review of interventions for promoting smoke alarm
ownership and function.
The original Cochrane review involved a meta-analysis which means we
are able to compare the results/conclusions of the two approaches to synthesis. Chapter five focuses
on the narrative synthesis of studies of the implementation of domestic smoke alarm promotion
interventions. This is linked to an earlier pilot review and some comparisons with the outcomes of this
earlier work are made.
21, 52
4.1 Introduction
The aims of this chapter are to:
Illustrate in practical terms the decision making processes involved in the application of
the guidance to a specific narrative synthesis
Identify factors that should inform choices about the use of particular tools and techniques
in the context of a specific synthesis
Provide examples of how particular tools and techniques can be used in the synthesis of
evidence on effectiveness
Demonstrate the type of outcomes achieved by a narrative synthesis
Compare the outcomes of a narrative synthesis of the effect of an intervention with those
produced by meta-analysis.
The review selected for comparison was a Cochrane review investigating the effects of interventions
for promoting smoke alarm ownership and function.
This review was selected because it was
methodologically sound, had incorporated a meta-analysis, had analysed a ‘manageable’ number of
studies (11 RCTs) and because it complemented the systematic review of the implementation of
smoke alarm promotion interventions that was also being resynthesised in a concurrent ‘testing’ of the
Developing a theory of how the intervention works, why and for whom
Developing a preliminary synthesis
Exploring relationships within and between studies
Assessing the robustness of the synthesis
Within each of these sections, the guidance presents a number of related tools and techniques that
can be used to complete the various stages of the synthesis. To apply this guidance to the narrative
synthesis, each of the sections was read through in sequential order, and for each element the tools
and/or techniques that appeared to be useful and relevant to the synthesis at hand were selected.
The reasons for selecting or rejecting a tool or technique are given within each section. Where
possible, the tools and techniques employed were used to derive conclusions about the effects of
interventions for promoting smoke alarm ownership and function. Where tools or techniques proved
to be less useful, this is discussed. A flow chart summarising the synthesis process is presented in
figure 4.
11 RCTs of interventions to promote smoke
alarm ownership
Developing a preliminary synthesis
Exploring relationships within and
between studies
the robustness of the s
Groupings and clusters
Transforming data: constructing a
common rubric
Vote-counting as a descriptive tool
Textual descriptions
Translating data
Moderator variables and subgroup
Idea webbing/conceptual mapping
Qualitative case descriptions
Visual representation of relationship
between stud
characteristics and results
Conceptual triangulation
Investigator and
methodological triangulation.
Use of validity assessment
(CDC approach)
Best evidence synthesis
Checking the synthesis with
authors of primary studies.
Use of validity assessment (EPPI
Reflecting critically on the synthesis
Conclusions and recommendations
Relevant tools and techniques
Relevant tools and techniques
Relevant tools and techniques
Movement between stages
Developing a theory
Figure 4: Synthesis process
4.2 Developing a theory
The majority of studies aimed to increase smoke alarm ownership and function through the use of
educational interventions with or without the addition of free or discounted smoke alarms for
participants. The primary studies did not clearly describe the theoretical basis of the evaluated
interventions, but the implicit theory underlying most educational interventions was that education can
increase knowledge of potential fire/burns risks, change risk perceptions and lead to behaviour
change (i.e. acquisition of smoke alarms). The use of discounted or free smoke alarms as an
intervention to increase ownership and function (usually in lower income families) suggests that
authors consider cost to be a barrier to smoke alarm acquisition.
4.3 Developing a preliminary synthesis
It is stated in the guidance that “how a reviewer approaches the preliminary synthesis... will depend in
part on whether the evidence to be synthesised is quantitative, qualitative or both”. In the case of this
example, the data to be synthesised were anticipated to be predominantly quantitative and, more
specifically, derived entirely from randomised controlled trials. With this in mind each of the tools and
techniques presented in the ‘preliminary synthesis’ section of chapter three were evaluated as to
whether they would be relevant for the synthesis at hand (see table 1 below).
Table 1: Selection of tools and techniques in developing a preliminary synthesis
Name of
Thoughts/ideas/comments in relation to
current synthesis
Should this
be applied
Textual descriptions Need to determine which aspects of each study will
be drawn from the reports. These might be the
same as the table headings
Possibly, but not
necessarily as a
first step
Groupings and clusters If possible, organise studies by intervention type,
context, target population, study design, outcomes.
Maybe have ‘primary clusters’ e.g. (intervention type,
population) and have ‘secondary clusters’ (e.g. study
design, context) within these
Transforming data:
constructing a common
Odds ratios or relative risks for dichotomous data,
weighted or standardised mean difference for
continuous data
Translating data Inappropriate given predominantly quantitative data
and the effectiveness focus of this review
Tabulation Describe study characteristics and results. Will
quality be assessed here? How? Predefined
categories or just voicing methodological concerns
that occur when reading the studies? Present these
in text, tables, or both? Perhaps use the text
descriptions to highlight any important aspects about
individual studies that might not be apparent from the
tables (issues across studies are more likely to fit into
the next section on ‘exploring relationships’)
Vote-counting as a
descriptive tool
Would be possible here if all data had been
converted to odds ratios/relative risks/mean
Consequently, five of the six tools/techniques described in the guidance were applied to the synthesis
and were carried out in the order described below.
Tabulating the data
It was decided that extracting data from the primary studies in tabular form might be the most natural
starting point for the synthesis. This was done by using the same format as the Cochrane review’s
‘characteristics of included studies’ table (participants, interventions, outcomes, notes) and adding
further information, including country of origin, duration and provider of the intervention, number of
participants in each group, context in which intervention was delivered, and results (see Table 4).
Study validity/quality is not addressed in detail in this section of the guidance. However, the
Cochrane review did report some aspects of study validity (e.g. concealment of allocation) in the data
extraction tables. It seemed sensible at this stage of the narrative synthesis (where the papers were
being read in detail and some broad judgements about their content are starting to be made) to
consider study quality. Consequently, a column including data on methods/quality was included in the
table and structured comments were included regarding individual papers, based on Jadad et al’s
scale for evaluating RCTs.
It became apparent that there were some discrepancies between the outcomes extracted for
tabulation in the narrative synthesis and those in the Cochrane review. However, upon contacting the
Cochrane review’s authors, all these discrepancies could be explained by the inclusion of unpublished
data or statistical adjustments for clustering. In these cases, to ensure comparability, data from the
Cochrane review were used in the narrative synthesis.
At this stage, it became clear that the majority of studies were concerned with child safety, and that
most included some measure of smoke alarm ownership/function as a main outcome. Only two
54, 55
looked at injuries as an outcome, but neither of these presented separate data on
fire/smoke/burn related injuries.
Textual descriptions
It was not entirely clear what these might add to the preliminary synthesis over and above the
information presented in the data extraction tables. Immediately after having constructed the data
extraction tables, this seemed like an unnecessary duplication of effort, though it was considered that
‘textual descriptions’ might actually be useful for describing the interventions in more depth than can
be usefully given in the tables. Consequently, the use of this technique was delayed until a later
stage of the synthesis process.
Groupings and clusters
The presence of natural groups or clusters of studies was investigated, primarily to determine whether
studies could be clustered according to the characteristics in the data-extraction tables (such as
intervention, participants, setting, outcomes etc). The most obvious difference between studies in
terms of the populations included is that all the studies deal with children and/or their families, with the
exception of the Ploeg study that includes only participants aged 65+ years.
This study was
therefore excluded from later comparisons. Secondly, studies could be clearly be grouped according
to which of the four smoke alarm/ownership outcomes (specified a priori in the Cochrane review) they
Developing a common rubric
As mentioned previously, data were only available for the four smoke alarm ownership/function
outcomes. As these data were dichotomous, odds ratios and relative risks were calculated. Absolute
risks differences and percentage smoke alarm ownership in the control group were also calculated for
each smoke alarm ownership outcome and tabulated (an example for the ‘final smoke alarm
ownership’ outcome is shown in table 5).
These tables showed that the effects of most interventions were generally fairly small for most smoke
alarm ownership and function outcomes (absolute differences ranged from 0% to 12.4%). However,
they generally favoured intervention over control (only two of the 10 studies that measured final
smoke alarm ownership were negative for this outcome and one of the four studies reported a very
small negative finding (absolute difference –0.1%) for ‘smoke alarms acquired’.
Smoke alarm ownership in the control groups of each study was generally quite high, with one clear
exception (Kelly et al),
11%). As might be expected, there was greater range of odds ratios than
corresponding relative risks for each outcome, as odds ratios are frequently more extreme (i.e. further
from 1) than relative risks.
This approach proved a useful first step in comparing the effects observed across the included
Table 2: Characteristics of included studies
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other
Barone (1988)
I: Usual safety education, plus
slides and handouts on burn
prevention, motor vehicle safety
education and video; bath water
thermometer; hot water gauge.
C: Usual safety education (n=
4 x 2h weekly meetings.
Delivered by:
Couples or
“Parenting the
Classes conducted
at suburban
hospital, family
inspection 6
months after
1) Final
smoke alarm
2) Final
smoke alarms
1) Final smoke alarm
I = 32/34
C = 26/29
2) Final functioning
smoke alarms:
I = 39/41
C = 34/38
I = 32/34
C = 26/29
No significant difference
between groups
Allocation by coin toss
within paired classes
Outcome assessment not
27% of parents attending
randomised classes did
not enrol in trial
Clamp (1998)
I: Safety advice, leaflets,
discount safety devices for low
income families (n=83 families)
C: Routine child health
surveillance and routine
consultations without
intervention (n=82 families)
Delivered by:
Health visitors/practice nurses
Families of
children <5 yrs
on GP list
Delivered during
child health
during other
consultations, or
the family was
asked to make an
specifically for the
mail survey 6
weeks after
1) Smoke
smoke alarms
3) Final
smoke alarm
4) Final
smoke alarms
1) Smoke alarms
I = 8/83
C = 0/82
2) Functioning smoke
alarms acquired
I = 7/83
C = 4/82
3) Final smoke alarm
I: 82/83
C: 71/82
4) Final functioning
smoke alarms:
I: 80/83, C: 71/82
Allocation by random
numbers table numbered
1-165, the first 83
numbers on the list were
allocated to the
intervention group.
Allocation was done by a
researcher blinded to the
number given to each
family at the time of
Outcome assessment not
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other
I: Fire safety lessons with
workbook, demonstrations,
teacher training, materials, take
home materials for parents
C: Usual lessons (n=418)
6 x 1-hour lessons
Delivered by:
Children in
grade 4-6
School In school
after class:
1) Final smoke
alarm ownership
Final smoke alarm
I = 221/314
C = 195/299
I = 309/439
C = 272/418
Method of random
allocation unclear
Outcome assessment
not blinded
I = 1%
C = 0%
The study
I: Discharge teaching book about
burn care and prevention; routine
discharge teaching (n=62 families)
C: Routine discharge teaching
(n=61 families)
One session
Delivered by:
Physical therapist, occupational
therapist or nurse
Families of
children <17
years in burn
Delivered at
discharge from
burn unit
Interview in
clinic at first
follow-up visit
(time since
Final smoke
alarm ownership
Final smoke alarm
I = 45/62
C = 46/61
Allocation by random
numbers table read by
independent person
Outcome assessment
13% overall (unclear for
each group)
48% of
children in
the study
were of
were less
likely to
have safety
devices, and
less likely to
English as a
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other
Kelly (1987)
I: Developmentally oriented child
safety education, hazard
assessment and handout at 6, 9
and 12-month well child visits.
(n=55 families)
C: Usual 6, 9 and 12-month well
child visits (n=54 families)
Each visit approx 15 mins
Delivered by:
I = Principal investigator
C = primary caretaker (paediatric
resident, fellow, faculty member, or
nurse practitioner
Families of
children aged
6 months seen
for well child
Family home 1) Final smoke
alarm ownership
(from home
inspection, 1
month after 12-
month visit)
2) Accidents
(from hospital
record review)
1) Final smoke alarm
I = 8/55
C = 6/54
No significant difference
between groups
2) ER/primary care
visits for accidents:
I = 15/55
C = 11/54
Accidents requiring
I = 3/55
C = 4/54
Hospitalisations for
I = 1/55
C = 1/54
Method of random
allocation unclear
Outcome assessment
I = 35%
C = 37%
I: Age specific advice, cheap
safety equipment for low income
families, home safety checks, first
aid training. Checklists,
information sheets and literature
provided throughout (18 centres
randomised, n=1124)
C: Usual care (no further
description) (18 centres
randomised, n=1028)
Delivered by:
Health visitors and practice nurses
Children aged
3-12 months
Community a) Record
review of
b) Postal survey
of safety
practices at 25
month follow-up:
1) Smoke
alarms acquired
2) Functioning
smoke alarms
3) Final smoke
alarm ownership
4) Final
smoke alarms
1) Smoke alarms
I = 15/274
C = 11/277
2) Functioning smoke
alarms acquired:
I = 20/274
C = 14/277
3) Final smoke alarm
I = 254/274
C = 248/277
4) Final functioning
smoke alarms:
I = 243/274
C = 241/277
Allocation by random
numbers table by
investigator blind to the
identity of the practices
Outcomes assessment
I = 67%
C = 64%
Not all
received all
aspects of
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other
King (2001)
I: Home safety inspection and
tailored education, safety device
coupons; reinforcement (by
telephone) at 4 and 8 months, plus
a letter from the local project
director (n=482 families)
C: Home safety inspection and
general safety pamphlet only
(n=469 families)
Delivered by:
“Home visitor”
Families of
children aged
<8 years
hospitalised for
Family home Home
inspection at 1
year follow-up:
1) Smoke
alarms acquired
2) Functioning
smoke alarms
3) Final smoke
alarm ownership
4) Final
smoke alarms
1) Smoke alarms
I = 14/476
C = 14/464
2) Functioning smoke
alarms acquired:
I = 44/440
C = 36/435
3) Final smoke alarm
I = 460/479
C = 454/465
1.45 (0.94, 2.22),
4) Final functioning
smoke alarms:
I = 412/459
C = 401/447
1.01 (0.79, 1.30)
Allocation by opening
sealed, serially
numbered, opaque
Outcome assessment
I = 20%
C = 18%
not given
after home
informed if
alarms were
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other
I: Home safety inspection, video,
handouts, modelling re: safety and
managing dangerous child
behaviour; hot water
thermometers; choke tube. (n=12
C: Home visit with video,
handouts, modelling on language
simulation (n=12 families)
Home visits 1.5 – 2 hours,
intervention 45-60 mins
Delivered by:
Mothers of
toddlers (12-
14 months at
first contact)
from clinics,
day care
Family home Home
inspection 2
weeks after
home visit:
1) Smoke
3) Final
4) Final
1) Smoke alarms
I = 0/12
C = 0/12
2) Functioning smoke
alarms acquired:
I = 0/12
C = 0/12
3) Final smoke alarm
I = 10/12
C = 9/12
4) Final functioning
smoke alarm
I = 6/12
C = 6/12
There were no
significant differences
between groups or trials
on these outcomes
First eight participants
allocated in odd-even
manner, remainder
using open random
numbers table
Blinding unclear
8% in total
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other notes
Ploeg (1994)
I: Safety behaviour promotion A
safety checklist developed from
the injury prevention literature,
used with clients to discuss
personal, home and community
safety and to address strategies to
improve safety. (n=148)
C: Influenza immunisation
promotion (n=211)
One visit Duration unclear
Delivered by:
Public health nurses
public health
clients aged
65 or over
Mean age 77.2
years, 67%
Delivered during a
visit to the client’s
survey after
2-3 months:
Smoke alarms
I = 3/146
C = 1/197
Allocation by random
numbers table read by
independent person
Outcome assessment
I = 1%
C = 7%
I: Well-baby classes with standard
safety information plus burn
prevention education lecture,
pamphlet, handouts and discount
coupon for smoke alarm purchase
(9 classes: n=29)
C: Well-baby classes with
standard safety information (6
classes: n=26)
I/C: 1 x 90min session
Delivered by:
Paediatric nurse practitioners
parents of
enrolled with a
single HMO
No further
(conference room)
inspection 4-
6 weeks
after class:
Final smoke
Final smoke alarm
I = 27/28
C = 21/25
Randomised using coin
Blinding unclear
No withdrawals
Smoke alarm
ownership was
very high in
both groups
numbers not
given for C
Reference Intervention Participants Setting/context Outcomes Results Methods/quality Other
I: Usual safety education plus 1
hour lecture, handouts on burn
prevention; motor vehicle safety
education and video (n=40).
C: Usual safety education plus 1
hour lecture, handouts on infant
stimulation and feeding (n=35)
Delivered by:
New mothers
identified while
Unclear Home
inspection 4-
70 weeks
1) Final
smoke alarm
Outcome data not
The authors state that
there was no difference
between I and C
groups, with both
groups showing usage
rates for smoke alarms
of over 77%
Allocation by random
numbers table by
independent statistician
Outcome assessment
not blinded
55% of women
attending randomised
classes did not enrol in
I = Intervention group
C = Control group
Table 3: Final smoke alarm ownership (common rubric and vote count)
Key to table colour coding
Significantly favours intervention
Trend towards intervention
No difference
Trend towards control
Significantly favours control
Reference Absolute
Relative risk (95%
Odds ratio (95% CI) Vote count RR Vote count OR % smoke alarm
ownership in
control group
Barone (1988) 4.5 1.05 (0.90, 1.22) 1.85 (0.29, 11.89) 90
Clamp (1998) 12.2 1.14 (1.04, 1.25) 12.7 (1.6, 100.85)
9 9
Davis (1987) 5.2 1.08 (0.97, 1.20) 1.27 (0.9, 1.78) 65
Jenkins (1996) -2.8 0.96 (0.78, 1.19) 0.86 (0.39, 1.93) 75
Kelly (1987) 3.4 1.31 (0.49, 3.52) 1.36 (0.44, 4.23) 11
Kendrick (1999) 3.2 1.04 (0.98, 1.09) 1.49 (0.82, 2.7) 90
King (2001) -1.6 0.98 (0.96, 1.01) 0.59 (0.28, 1.25) 98
Mathews (1988) 8.3 1.11 (0.74, 1.68) 1.67 (0.22, 12.35) 75
Thomas (1984) 12.4 1.15 (0.95, 1.38) 5.14 (0.53, 49.5) 84
Williams (1988) No stats No stats No stats No stats No stats >77
Vote counting as a descriptive tool
Tables showing two approaches to vote counting were developed: (i) only using ticks where the effect
of the intervention was positive and statistically significant; (ii) using colours (superimposed on the
rows of the table) to grade both the direction and statistical significance of each outcome (see table 5
for an example showing the ‘final smoke alarm ownership’ outcome).
In terms of the vote-count there were no differences between the relative risks and odds ratios
calculated previously. The study by Williams
reported that there was no statistically significant
difference between the experimental and control groups but did not provide data to calculate the
measures in this table. For the subsequent steps, the relative risk and the more “informative” (colour
coded) vote count were both used.
The vote-count supported the observations previously made by looking across the absolute risk
values. Where several studies report the same outcome, the majority of these studies show a
tendency to favour the intervention over control, though the relative risk is usually small. Only one
study reported any statistically significant differences between intervention and control groups (Clamp
reported statistically significant positive effects of intervention on final smoke alarm ownership and
final functioning smoke alarms).
In this case, the colour-coded descriptive vote-count allows the reader to see the outcome data as
either a simple vote-count or as a statistical value, depending upon the ‘focus’ they adopt when
examining the outcome table.
4.4 Exploring relationships within and between studies
Tools/techniques described in this section of the guidance are described in the table below.
Table 4: Selection of tools and techniques for exploring relationships between studies
Name of
Thoughts/ideas/comments in relation to current
Should this
be applied
Moderator variables
and subgroup analyses
Most likely sources of potential moderator variables are
likely to be variations in intervention, population or
possibly setting
This may help structure the investigation of moderator
This approach would be more appropriate to a synthesis
of implementation studies, in which more qualitative
information is likely to be available and there is greater
scope for model development
Insufficient qualitative evidence in this review No
Qualitative case
This appears to be essentially the same as the ‘textual
descriptions’ described earlier. However, here the
approach is presented in the context of investigating
differences between, rather than simply describing, the
studies. Might be worthwhile to revisit the studies and
extract detailed data from them, with an eye to any
potential moderator variables
Visual representation
of relationship between
study characteristics
and results
This is possible given the quantitative data available for
each study
Investigator and
More applicable to qualitative studies. As all studies here
were RCTs, there should not be any systematic
difference in results between authors from different
disciplines (if there was, bias would be a very serious
concern). Data on the disciplinary perspective/expertise
of investigators was not available for all studies
The four main tools and techniques for exploring relationships within and between studies were
conducted in the order described below.
Moderator variables and subgroup analyses
It would be useful to know any variables that might moderate the main effects being examined by the
review. Two further types of table were drawn up to help investigate whether there were any clear
moderators of effect. The first table was constructed to show the various components that make up
the intervention for each study and the overlap between the different interventions in terms of these
components (table 5).
The table indicates that there is little overlap between the studies in terms of the specific components
employed within the interventions they evaluate. Seven of the ten studies concerned with children
and/or their families used handouts and four used ‘burn education’, money-off coupons or discounted
devices and home safety inspections. However, this lack of overlap is possibly due to the fact that
studies were, on the whole, very poorly described. Even when sufficient information was reported
allow extraction, there was still variation in the terms and definitions used by different authors, making
direct comparisons even more difficult.
The second set of tables is an adaptation of the outcomes/vote count table, with further information
taken from the data extraction table and the intervention components table described above (table 6
gives an example for the ‘final smoke alarm ownership’ outcome). Intervention, population and
setting columns were included to identify potential subgroups/moderators. These are described as
briefly as possible (1-5 words) to allow visual comparison across the table. The description of the
intervention is broken into 3 separate cells to facilitate such visual comparisons for the complex
Looking at the outcome of ‘final smoke alarm ownership’ (for which the majority of studies provide
data), four studies stand out from the majority of positive but statistically non-significant findings:
Williams (no difference),
Clamp (significantly positive),
Jenkins and King (both non-significantly
59, 60
Williams reports that “there were no differences between experimental and control
though whether this means there was truly no difference between the groups or that any
observed differences were not statistically significant is unclear. Either way, it is difficult to determine
why the studied intervention had little or no effect based on this one study alone. The intervention
studied by Clamp included safety advice, discounted safety devices and handouts and resulted in a
significant increase in final smoke alarm ownership and function.
However, these particular
intervention components were common to other studies that differed from Clamp’s study in terms of
both magnitude and statistical significance of effect. The two negative studies on the ownership
outcome (Jenkins and King) evaluate two different interventional approaches.
59, 60
However, these
studies do share a common characteristic that is not present in the ‘positive’ studies: the intervention
was delivered to the families of children that had been previously hospitalised for an injury.
Qualitative case reports/textual descriptions
The two ‘tools’ ‘textual descriptions’ and ‘qualitative case descriptions’ would seem to be very similar.
It was decided that writing a short summary of each study at this stage of the synthesis (i.e. having
already organised, described and examined them) would provide an opportunity to check the previous
stages for accuracy, and allow the reviewer to draw out in detail any aspects of individual studies that
may not have seemed relevant at the start of the synthesis, but have become of interest during the
subsequent stages of describing and exploring the study data. These summaries were structured
such that they provided details of the setting, participants, intervention, comparison, and outcomes,
along with any other factors of interest (for an example of one such description see Box 1).
Table 5: Table showing various components of the evaluated interventions
Slides Handouts Safety
Reinforcement Video Modelling Free
/choke tube
9 9 9
9 9 9
9 9 9 9
9 9 9
9 9 9
9 9
*Above studies relate to children/families. Ploeg (included only ptps aged >65yrs)
Table 6: Final smoke alarm ownership (potential moderator variables)
Key to colour coding:
Significantly favours intervention
Trend towards intervention
No difference
Trend towards control
Significantly favours control
Reference Intervention Population Setting Absolute
difference (%)
% smoke alarm
ownership in
control group
Burn education Slides Handouts Parents of toddlers Hospital,
family home
4.5 90
Safety advice Discount devices Handouts Parents of children
Family home,
12.2 87
Davis (1987) Fire safety lessons Take home
material for
Children School 5.2 65
Discharge teaching book on
burn care/prevention
Children <17yrs Hospital burn
-2.8 75
Kelly (1987) Child safety education Home safety
Families of babies
Family home 3.4 11
Safety advice
First aid training
Discount devices Home safety
Families of babies 3-
Community 3.2 90
King (2001) Tailored education
Discount coupons Home safety
Families of
hospitalized children
Family home -1.6 98
Modeling re safety
Free thermometers
and choke tube.
Home safety
Mothers of toddlers
Family home 8.3 75
Well-baby classes plus burn
prevention education
Discount smoke
alarm coupon
Parents of infants Hospital(?) 12.4 84
Burn prevention lecture Handouts Pregnant women
(last trimester)
Unclear No stats >77
A number of questions arose from the process of writing these summaries:
Does the immediate on-site availability of smoke alarms in the intervention setting
increase uptake?
Are lower income families more likely than higher income families to respond to
interventions incorporating discounted smoke alarms?
Does having experienced a child injury prior to intervention increase uptake of the
recommendations given in the intervention?
Do interventions that focus on burn injuries/fire prevention have different effects to
interventions that relate to safety more generally?
Does advice being age-specific alter outcomes? Would advice regarding fire safety
always be the same, independent of child age?
Does attrition have an effect?
Is length of follow-up an important factor?
Is sample size important? Studies may be powered to detect differences on other
Several studies attribute any lack of effect to the fact that an active effort is required to
install smoke alarms. Is there a relationship between intervention effectiveness and
amount of active effort required?
Box 1 – Example of textual descriptions/qualitative case descriptions of included studies
Barone (1988)
Setting: US suburban hospital.
Participants: Individuals or couples attending a continuing-education series on “Parenting the Toddler”.
Couples were predominantly of middle- and upper-middle class socioeconomic status and generally
well educated.
Intervention: Parenting information, with specific information and materials on burn prevention and
child restraints. Included a slide presentation on falls, strangulations, drownings, poisonings, and fire
hazards, plus additional slides on the hazard of hot tap water, use of smoke detectors and the
advantages of child car seats. 4 weekly sessions, each of 2 hours duration. 41 participants.
Comparison: Parenting information, with general child safety information. Included a slide presentation
on falls, strangulations, drownings, poisonings, and fire hazards. 38 participants.
Outcomes: A researcher inspecting participants’ homes looked for and tested any smoke alarms, 6
months after the classes.
The protocol for this intervention is very similar to that described by Williams (both are
from the same University in the same year).
The author suggests that the very high rate of smoke alarm ownership might be due to
previous health promotion efforts.
The author also suggests that it would have been possible for participants in the control
group to be ‘warned’ in advance what the researchers were looking for and testing during
home inspections by other participants whose homes had already been inspected.
This suggests that the production of summaries can be a helpful prelude to identifying and assessing
impact of moderator variables, building on data extraction and developing conceptual models.
Developing conceptual models/idea webbing/concept mapping
These three tools/techniques are also very similar although the implementation narrative synthesis
illustrates differences between them. The aim of using these techniques in this example was to make
transparent the logic behind the subgroup analyses/investigation of moderator variables (see figure
5). In working through this process it became apparent that it also incorporates aspects of grouping
and clustering. The resulting figure is also in part a way to link the previously described processes
and the resulting issues/ideas together in order to structure the synthesis. It represents the product of
a process whereby variables or patterns are identified in one of the previously described tables or
documents and then re-examined from the viewpoint of the remaining tables/documents. For
example, the characteristic most fully explored in the figure is that of the included study population, as
described in the table of potential moderator variables and in the textual descriptions. Studies of
children/families were grouped by age of the included children according to the moderator tables.
Within these groups, further participant variables such as socioeconomic status were identified from
the textual descriptions.
The ‘outcomes’ and ‘quality’ nodes are connected to one another via ‘loss to follow-up’. The
withdrawal rates vary substantially across this group of studies, from 0% to 67%. Where high dropout
rates are discussed in these studies, it is attributed non-attendance over time or unavailability of
participants at final follow-up.
Though identified as potential moderators, no clear or consistent effect on smoke alarm ownership
could be seen across studies for intervention variables such as the use of home inspections or
free/discounted devices, or for fire/burn-specific education alone versus general safety information
that incorporates fire/burn material.
Initially idea webbing was very useful approach to guiding the synthesis. However, the ad hoc use of
the approach led to a natural impulse to seek out any association, no matter how spurious. Given this
it may be better to use these types of approaches early on in the synthesis (or even protocol
development) process to identify a priori the characteristics to be investigated and to structure the
synthesis before seeing the data itself.
Alternatively, it might be useful to employ this approach at both points in the review process (protocol
development and exploring relationships in the final synthesis), placing more “weight” on
investigations from the a priori idea web (i.e. using it to help develop conclusions about effects and
moderators), and using the idea web constructed after interrogating the data purely for suggesting
areas in which further research might be worthwhile.
Family ho me
Interv ention
Fire/burn education
General safety education
Effectiveness of
interventions on final
smoke alarm ownership
Loss to follow-up
Baseline or
control group
aged >65