Article

Case selection for robust generalisation: lessons from QuIP impact evaluation studies

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

What wider lessons can be drawn from a single impact evaluation study? This article examines how case study and source selection contribute to useful generalisation. Practical suggestions for making these decisions are drawn from a set of qualitative impact studies. Generalising about impact is a deliberative process of building, testing and refining useful theories about how change happens. To serve this goal, purposive selection can support more credible generalisation than random selection by systematically and transparently drawing upon prior knowledge of variation in actions, contexts, and outcomes to test theory against diverse, deviant and anomalous cases.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... We see the job of the causal mapper as being primarily to collect and accurately visualise evidence from different sources, often leaving it to others (or to themselves wearing a different hat) to draw conclusions about what doing so reveals about the real world. This second interpretative step goes beyond causal mapping per se (Copestake, 2021;Copestake et al., 2019a;Powell et al., 2023). ...
Article
Full-text available
Evaluators are interested in capturing how things causally influence one another. They are also interested in capturing how stakeholders think things causally influence one another. Causal mapping – the collection, coding and visualisation of interconnected causal claims – has been used widely for several decades across many disciplines for this purpose. It makes the provenance or source of such claims explicit and provides tools for gathering and dealing with this kind of data and for managing its Janus-like double-life: on the one hand, providing information about what people believe causes what, and on the other hand, preparing this information for possible evaluative judgements about what causes what. Specific reference to causal mapping in the evaluation literature is sparse, which we aim to redress here. In particular, the authors address the Janus dilemma by suggesting that causal maps can be understood neither as models of beliefs about causal pathways nor as models of causal pathways per se but as repositories of evidence for those pathways.
... We see the job of the causal mapper as being primarily to collect and accurately visualise evidence from different sources, often leaving it to others (or to themselves wearing a different hat) to draw conclusions about what doing so reveals about the real world. This second interpretative step goes beyond causal mapping per se (Copestake, 2020a;Copestake, Davies, et al., 2019;Powell et al., 2023). ...
Preprint
Full-text available
p> Evaluators are interested in capturing how things causally influence one another. They are also interested in capturing how stakeholders think things causally influence one another. Causal mapping, the collection, coding and visualisation of interconnected causal claims, has been used widely for several decades across many disciplines for this purpose. It makes the provenance or source of such claims explicit and provides tools for gathering and dealing with this kind of data, and for managing its Janus-like double-life: on the one hand providing information about what people believe causes what and on the other hand preparing this information for possible evaluative judgements about what actually causes what. Specific reference to causal mapping in the evaluation literature is sparse, which we aim to redress here. In particular we address the Janus dilemma by suggesting that causal maps can be understood neither as models of beliefs about causal pathways nor as models of causal pathways per se but as repositories of evidence for those pathways.</p
... Each interview (whether with an individual, household, or group) constitutes one source, although the narratives presented in each may not be completely independent from each other. To make credible generalizations across a wider population of intended beneficiaries care is necessary both to select who to interview (Copestake, 2020) and to weigh up carefully how the evidence obtained from multiple sources can be aggregated to sustain wider generalization. Construction of causal maps incorporates simple frequency counts of how often a particular causal claim (or set of causal claims) is cited by the same source and across multiple sources. ...
Chapter
Full-text available
What do the intended beneficiaries of international development programmes think about the causal drivers of change in their livelihoods and lives? Do their perceptions match up with the theories of change constructed by organizations trying to support them? This case study looks at an entrepreneurship programme aiming to economically empower rural women smallholders in Ghana. The programme provided a combination of financial services, training and peer support to improve the women’s productivity, and purchase and sale options. It was implemented by two Ghanian savings and credit organizations, Opportunity International Savings and Loans, and Sinapi Aba Savings and Loans, with support from the development organization Opportunity International UK (OIUK). We report on a mid-term qualitative evaluation of the programme that used the Qualitative Impact Protocol (QuIP) to gather stories of change directly from the programme participants. These stories were coded, analysed and visualized using a web application called Causal Map.
Article
Full-text available
As a theory-driven evaluation design grounded in the critical realism philosophy, realist evaluation relies on eliciting the assumptions of stakeholders about how programmes are expected to produce outcomes. Little is known about key principles for involving stakeholders in realist evaluations. In this scoping review of realist evaluation studies, we explored how stakeholders were involved in the conduct of realist evaluation. A total of 26 ‘participatory’ realist evaluation studies have been included. We analysed the approach to the involvement of stakeholders, using an analytical framework rooted in existing theories and models of participatory research and evaluations, using a ‘Best Fit Framework Synthesis’ methodology. We found a large range of approaches to stakeholder involvement, related to the types of stakeholders involved, their depth of involvement and means by which researchers/evaluators fostered their involvement. We discuss key lessons learnt and contribute to foster transparent and meaningful involvement of stakeholders in Realist Evaluation.
Article
Full-text available
Archaeology and heritage projects can have profound social, economic, environmental and cultural impacts on the development of communities. Yet, their impacts are rarely articulated or measured in development terms, to the detriment of their accountability, sustainability and replicability. This article explores the potential for a more systematic evaluation of these impacts through the case study of the Sustainable Preservation Initiative (SPI) and their evaluation strategies in Peru. Informed by an evaluability assessment framework, this study highlights the practical challenges in evaluating small-scale projects in the Global South and the scope for overcoming them, appraising how SPI’s contribution to local development can be measured in practice. Development evaluation methods are measured against the practical concerns expressed by project staff and participants. The article reflects on the importance of evaluating the wide-ranging development impacts of archaeology and heritage projects and concludes with practical suggestions for documenting these multifaceted impacts and for further comparative research.
Conference Paper
Full-text available
The paper reflects on action research into the use of a qualitative impact protocol (the QuIP) to conduct commissioned evaluations of the social impact of development interventions in complex contexts. Unusually, the QuIP unbundles the tasks of data collection and analysis. This can enhance the transparency and auditability of the evaluative process, and hence its credibility to users, but also accentuates the importance of reflection on the analyst’s positionality. With sufficient safeguards, we argue that the approach opens up new opportunities for generating qualitative evidence to influence development practice. The paper first describes the QuIP and its approach to coding and analysis. It then reflects on the challenges analysts face, emphasising that positionality relates not only to their personal characteristics but also to how their role is structured in relation to that of other stakeholders. KEYWORDS Coding; impact evaluation; positionality; qualitative data analysis
Book
Full-text available
The Centre for Development Studies and Bath SDR have authored a book with Practical Action Publishing presenting the experiences of designing and executing eight different QuIP studies, from the perspective of both the independent evaluators and the commissioners. Attributing Development Impact is based on studies undertaken by Bath SDR in the first two years following its launch as a social enterprise and illustrate the potential flexibility of the QuIP and its continued evolution as we learned from each project. The book contains detailed methodological reflections and guidelines on the approach. The book is available in hard copy, but also as a free download by clicking on ‘e-book’ thanks to the generous support of the University of Bath Alumni Fund. https://bathsdr.org/about-the-quip/quip-casebook-attributing-development-impact/
Article
Full-text available
Who does what and when during an impact evaluation has an important influence on the credibility and usefulness of the evidence generated. We explore such choreography from technical, political and ethical perspectives by reflecting on a case study that entailed collaborative design of a qualitative impact evaluation protocol (‘the QuIP’) and its pilot use in Ethiopia and Malawi. Double blind interviewing was employed to reduce project-specific confirmation bias, followed by staged ‘unblindfolding’ as a form of triangulation. We argue that these steps can enhance credibility of evidence, and that ethical concerns associated with them can be addressed by being open with stakeholders about the process. The case study illustrates scope for better use of qualitative impact evaluation methods in complex international development contexts.
Article
Full-text available
Randomized Controlled Trials (RCTs) are increasingly popular in the social sciences, not only in medicine. We argue that the lay public, and sometimes researchers, put too much trust in RCTs over other methods of investigation. Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) covariates. Finding out whether an estimate was generated by chance is more difficult than commonly believed. At best, an RCT yields an unbiased estimate, but this property is of limited practical value. Even then, estimates apply only to the sample selected for the trial, often no more than a convenience sample, and justification is required to extend the results to other groups, including any population to which the trial sample belongs, or to any individual, including an individual in the trial. Demanding 'external validity' is unhelpful because it expects too much of an RCT while undervaluing its potential contribution. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not 'what works', but 'why things work'.
Article
Full-text available
One of us (VC) was having a conversation with a student recently about the origins and history of thematic analysis (TA). The student had read Qualitative Research in Counselling and Psychotherapy (McLeod, 2011), a text which presents TA as a variant of grounded theory. Victoria commented that she thought that TA evolved from content analysis, and therefore predated grounded theory, and discussed her recent discovery of the use of a variant of TA in psychotherapy research in the 1930s-1950s. The student let out a heavy sigh and slumped in her chair, bemoaning her ability to ever fully grasp qualitative research in all its complexity. This reaction is not uncommon. Students learning and implementing qualitative research at times find it bewildering and challenging; simple models of ‘how to do things’ can appear to offer reassuring certainty. But simplified models, especially if based in confidently-presented-yet-partial accounts of the field or an approach, at best obfuscate and at worst lead to poor quality research. In our discipline (psychology), students typically learn about qualitative research only after they have been fully immersed in the norms, values and methods of scientific psychology. Many find it difficult to let go of what we call a ‘quantitative sensibility’. For such students, and others not well versed in a qualitative sensibility, Fugard and Potts’ (2015) tool for determining sample sizes in TA research has great intuitive appeal; it provides a life-raft to cling to in the sea of uncertainty that is qualitative research. Thus, we share Hammersley’s (2015) concerns that their tool will be used by funding bodies and others (e.g. editors, reviewers) to determine and evaluate sample sizes in TA research. We fear it will result in further confusion about, and further distortion of, the assumptions and procedures of qualitative research. We here build on concerns expressed by others (Byrne, 2015; Emmel, 2015; Hammersley, 2015) to briefly highlight why this quantitative model for qualitative sampling in TA is problematic, based on flawed assumptions about TA, and steeped in a quantitative logic at odds with the exploratory and qualitative ethos of much TA research.
Article
Full-text available
This article examines five common misunderstandings about case-study research: (1) Theoretical knowledge is more valuable than practical knowledge; (2) One cannot generalize from a single case, therefore the single case study cannot contribute to scientific development; (3) The case study is most useful for generating hypotheses, while other methods are more suitable for hypotheses testing and theory building; (4) The case study contains a bias toward verification; and (5) It is often difficult to summarize specific case studies. The article explains and corrects these misunderstandings one by one and concludes with the Kuhnian insight that a scientific discipline without a large number of thoroughly executed case studies is a discipline without systematic production of exemplars, and that a discipline without exemplars is an ineffective one. Social science may be strengthened by the execution of more good case studies.
Book
Full-text available
Access the book via: https://www.alnap.org/help-library/qualitative-research-for-development-a-guide-for-practitioners
Article
Full-text available
Debate continues over how best international development agencies can evaluate the impact of actions intended to reduce poverty, insecurity and vulnerability in diverse and complex contexts. There are strong ethical grounds for simply asking those intended to benefit what happened to them, but it is not obvious how to do so in a way that is sufficiently free from bias in favour of confirming what is expected. This article considers scope for addressing this problem by minimizing the prior knowledge participants have of what is being evaluated. The tensions between more confirmatory and exploratory methodological approaches are reviewed in the light of experience of designing and piloting a qualitative impact assessment protocol for evaluating NGO interventions in complex rural livelihood transformations. The article concludes that resolving these tensions entails using mixed methodologies, and that the importance attached to exploratory (nested within confirmatory) approaches depends on contextual complexity, the type of evidence sought and the level of trust between stakeholders.
Article
Full-text available
Rising standards for accurately inferring the impact of development projects has not been matched by equivalently rigorous procedures for guiding decisions about whether and how similar results might be expected elsewhere. These ‘external validity’ concerns are especially pressing for ‘complex’ development interventions, in which the explicit purpose is often to adapt projects to local contextual realities and where high quality implementation is paramount to success. A basic analytical framework is provided for assessing the external validity of complex development interventions. It argues for deploying case studies to better identify the conditions under which diverse outcomes are observed, focusing in particular on the salience of contextual idiosyncrasies, implementation capabilities and trajectories of change. Upholding the canonical methodological principle that questions should guide methods, not vice versa, is required if a truly rigorous basis for generalizing claims about likely impact across time, groups, contexts and scales of operation is to be discerned for different kinds of development interventions.
Article
Full-text available
Two proponents of theory-based approaches to evaluation that have found favour in the UK in recent years are Theories of Change and Realistic Evaluation. In this article we share our evolving views on the points of connection and digression between the approaches based on our reading of the theory-based evaluation literature and our practice experience. We provide a background to the two approaches that emphasizes the importance of programme context in understanding how complex programmes lead to changes in outcomes.We then explore some of the differences in how `theory' is conceptualized and used within the two approaches and consider how knowledge is generated and cumulated in subtly different ways depending on the approach that is taken. Finally, we offer our thoughts on what this means for evaluators on the ground seeking an appropriate framework for their practice.
Chapter
The Oxford Handbook of Philosophy of Political Science contains twenty-seven freshly written chapters to give the reader a panoramic introduction to philosophical issues in the practice of political science. Simultaneously, it advances the field of Philosophy of Political Science by creating a fruitful meeting place where both philosophers and practicing political scientists contribute and discuss. These philosophical discussions are close to and informed by actual developments in political science, making philosophy of science continuous with the sciences, another aspiration that motivates this volume. The chapters fall under four headings: (1) evaluating theoretical frameworks in political science; (2) methodological challenges and reconciliations; (3) the purposes and uses of political science; and (4) the interactions between political science and society. Specific topics discussed include the biology of political attitudes, intraagent mechanisms, rational choice explanations, theories of collective action, explaining institutional change, conceptualizing and measuring democracy, process tracing, qualitative comparative analysis, interpretivism and positivism, mixed methods, within-cause causal inference, evidential pluralism, lab and field experiments, external validity, contextualization, prediction, expertise, clientelism, feminism, values, and progress in political science.
Article
There is much debate over the number of interviews needed to reach data saturation for themes and metathemes in qualitative research. The primary purpose of this study is to determine the number of interviews needed to reach data saturation for metathemes in multisited and cross-cultural research. The analysis is based on a cross-cultural study on water issues conducted with 132 respondents in four different sites. Analysis of the data yielded 240 site-specific themes and nine cross-cultural metathemes. We found that 16 or fewer interviews were enough to identify common themes from sites with relatively homogeneous groups. Yet our research reveals that larger sample sizes—ranging from 20 to 40 interviews—were needed to reach data saturation for metathemes that cut across all sites. Our findings may be helpful in estimating sample sizes for each site in multisited or cross-cultural studies in which metathematic comparisons are part of the research design.
Article
Qualitative and multimethod scholars face a wide and often confusing array of alternatives for case selection using the results of a prior regression analysis. Methodologists have recommended alternatives including selection of typical cases, deviant cases, extreme cases on the independent variable, extreme cases on the dependent variable, influential cases, most similar cases, most different cases, pathway cases, and randomly sampled cases, among others. Yet this literature leaves it substantially unclear which of these approaches is best for any particular goal. Via statistical modeling and simulation, I argue that the rarely considered approach of selecting cases with extreme values on the main independent variable, as well as the more commonly discussed deviant case design, are the best alternatives for a broad range of discovery-related goals. By contrast, the widely discussed and advocated typical case, extreme-on-Y, and most similar cases approaches to case selection are much less valuable than scholars in the qualitative and multimethods research traditions have recognized to date.
Book
Over the last twenty or so years, it has become standard to require policy makers to base their recommendations on evidence. That is now uncontroversial to the point of triviality—of course, policy should be based on the facts. But are the methods that policy makers rely on to gather and analyze evidence the right ones? In Evidence-Based Policy, Nancy Cartwright, an eminent scholar, and Jeremy Hardie, who has had a long and successful career in both business and the economy, explain that the dominant methods which are in use now—broadly speaking, methods that imitate standard practices in medicine like randomized control trials—do not work. They fail, Cartwright and Hardie contend, because they do not enhance our ability to predict if policies will be effective. The prevailing methods fall short not just because social science, which operates within the domain of real-world politics and deals with people, differs so much from the natural science milieu of the lab. Rather, there are principled reasons why the advice for crafting and implementing policy now on offer will lead to bad results. Current guides in use tend to rank scientific methods according to the degree of trustworthiness of the evidence they produce. That is valuable in certain respects, but such approaches offer little advice about how to think about putting such evidence to use. Evidence-Based Policy focuses on showing policymakers how to effectively use evidence. It also explains what types of information are most necessary for making reliable policy, and offers lessons on how to organize that information.
Article
Storytelling has long been recognized as central to human cognition and communication. Here we explore a more active role of stories in social science research, not merely to illustrate concepts but also to develop new ideas and evaluate hypotheses, for example, in deciding that a research method is effective. We see stories as central to engagement with the development and evaluation of theories, and we argue that for a story to be useful in this way, it should be anomalous (representing aspects of life that are not well explained by existing models) and immutable (with details that are well-enough established that they have the potential to indicate problems with a new model). We develop these ideas through considering two well-known examples from the work of Karl Weick and Robert Axelrod, and we discuss why transparent sourcing (in the case of Axelrod) makes a story a more effective research tool, whereas improper sourcing (in the case of Weick) interferes with the key useful roles of stories in the scientific process.
Article
Validity and generalization continue to be challenging aspects in designing and conducting case study evaluations, especially when the number of cases being studied is highly limited (even limited to a single case). To address the challenge, this article highlights current knowledge regarding the use of: (1) rival explanations, triangulation, and logic models in strengthening validity, and (2) analytic generalization and the role of theory in seeking to generalize from case studies. To ground the discussion, the article cites specific practices and examples from the existing literature as well as from the six preceding articles assembled in this special issue. Throughout, the article emphasizes that current knowledge may still be regarded as being at its early stage of development, still leaving room for more learning. The article concludes by pointing to three topics worthy of future methodological inquiry, including: (1) examining the connection between the way that initial evaluation questions are posed and the selection of the appropriate evaluation method in an ensuing evaluation, (2) the importance of operationally defining the ‘complexity’ of an intervention, and (3) raising awareness about case study evaluation methods more generally.
Article
There is an inherent tension between implementing organizations — which have specific objectives and narrow missions and mandates — and executive organizations — which provide resources to multiple implementing organizations. Ministries of finance/planning/budgeting allocate across ministries and projects/programs within ministries, development organizations allocate across sectors (and countries), foundations or philanthropies allocate across programs/grantees. Implementing organizations typically try to do the best they can with the funds they have and attract more resources, while executive organizations have to decide what and who to fund. Monitoring and Evaluation (M&E) has always been an element of the accountability of implementing organizations to their funders. There has been a recent trend towards much greater rigor in evaluations to isolate causal impacts of projects and programs and more ‘evidence-based’ approaches to accountability and budget allocations. Here we extend the basic idea of rigorous impact evaluation — the use of a valid counterfactual to make judgments about causality — to emphasize that the techniques of impact evaluation can be directly useful to implementing organizations (as opposed to impact evaluation being seen by implementing organizations as only an external threat to their funding). We introduce structured experiential learning (which we add to M&E to get MeE) which allows implementing agencies to actively and rigorously search across alternative project designs using the monitoring data that provides real-time performance information with direct feedback into the decision loops of project design and implementation. Our argument is that within-project variations in design can serve as their own counterfactual and this dramatically reduces the incremental cost of evaluation and increases the direct usefulness of evaluation to implementing agencies. The right combination of M, e, and E provides the right space for innovation and organizational capability building while at the same time providing accountability and an evidence base for funding agencies.
Article
How can scholars select cases from a large universe for in-depth case study analysis? Random sampling is not typically a viable approach when the total number of cases to be selected is small. Hence attention to purposive modes of sampling is needed. Yet, while the existing qualitative literature on case selection offers a wide range of suggestions for case selection, most techniques discussed require in-depth familiarity of each case. Seven case selection procedures are considered, each of which facilitates a different strategy for within-case analysis. The case selection procedures considered focus on typical, diverse, extreme, deviant, influential, most similar, and most different cases. For each case selection procedure, quantitative approaches are discussed that meet the goals of the approach, while still requiring information that can reasonably be gathered for a large number of cases.
Article
Comparative case studies often rely on the analysis of a few cases selected on the dependent variable. This approach has been criticized in the methodological literature. I offer a qualified defense of the method. Selecting on the dependent variable is appropriate when necessary conditions are being evaluated. Works that have been criticized for selecting on the dependent variable make appropriate claims about necessary conditions. Evaluation of the inferential problems suggest guidelines for case selection and analysis. In particular, a simple Bayesian model can question whether comparative case studies necessarily suffer from a "small n" problem. Necessary conditions are an important, if undervalued, tool of political analysis.
London: Department for International Development
  • I Vogel
Impact Evaluation. A Guide for Commissioners and Managers. London: Bond
  • Bond