Evaluation

Published by SAGE Publications
Print ISSN: 1356-3890
Publications
The use of evaluation results is at the core of evaluation theory and practice. Major debates in the field have emphasized the importance of both the evaluator's role and the evaluation process itself in fostering evaluation use. A recent systematic review of interventions aimed at influencing policy-making or organizational behavior through knowledge exchange offers a new perspective on evaluation use. We propose here a framework for better understanding the embedded relations between evaluation context, choice of an evaluation model and use of results. The article argues that the evaluation context presents conditions that affect both the appropriateness of the evaluation model implemented and the use of results.
 
Implementation evaluations, also called process evaluations, involve studying the development of programmes, and identifying and understanding their strengths and weaknesses. Undertaking an implementation evaluation offers insights into evaluation objectives, but does not help the researcher develop a research strategy. During the implementation analysis of the UNAIDS drug access initiative in Chile, the strategic analysis model developed by Crozier and Friedberg was used. However, a major incompatibility was noted between the procedure put forward by Crozier and Friedberg and the specific characteristics of the programme being evaluated. In this article, an adapted strategic analysis model for programme evaluation is proposed.
 
Models that shift more responsibility onto researchers for the process of incorporating research results into decision-making have greatly gained in popularity during the past two decades. This shift has created a new area of research to identify the best ways to transfer academic results into the organizational and political arenas. However, evaluating the utilization of information coming out of a knowledge transfer (KT) initiative remains an enormous challenge. This article demonstrates how logic analysis has proven to be a useful evaluation method to assess the utilization potential of KT initiatives. We present the case of the evaluation of the Research Collective on the Organization of Primary Care Services, an innovative experiment in knowledge synthesis and transfer. The conclusions focus not only on the utilization potential of results coming out of the Research Collective, but also on the theoretical framework used, in order to facilitate its application to the evaluation of other knowledge transfer initiatives.
 
Rankings of health system responsiveness across all domains 
International comparison of performance has become an influential lever for change in the provision of public services. For health care, patients’ views and opinions are increasingly being recognized as legitimate means for assessing the provision of services, to stimulate quality improvements, and more recently, in evaluating system performance. This has shifted the focus of analyses towards the use of individual-level surveys of performance from the perspective of the user and raises the issue of how to compare appropriately self-reported data across institutional settings and population groups. This represents a major challenge for all public services, the fundamental problem being that comparative evaluation needs to take account of variations in social and cultural expectations and norms when relying on self-reported information. Using data on health systems responsiveness across 18 OECD countries contained within the World Health Survey, this paper outlines the issues that arise in comparative inference that relies on respondent self-reports. The problem of reporting bias is described and illustrated together with potential solutions brought about through the use of anchoring vignettes. The utility of vignettes to aid cross-country analyses and its implications for comparative inference of health system performance are discussed.
 
This paper will report on the evaluation experience in two SOCRATES (European Union funding mechanism designed to support innovation in teaching and learning) projects focused on change in Higher Education. The projects were international in scope involving 6 countries and 10 institutions within the last four years. The paper reflects on change in institutions specifically, especially those introduced by the use of ICT (Information and Communication Technologies) and it suggests the hypothesis that in such a phase of transition, new rules are not yet established and a state of anomie can occur at the level of courses, departments and institutions. What happens in educational institutions in which rules and practices are well established and validated and a new event radically changes or challenges the traditional practices? Instead of the psycho-social notion of 'resistance to change', we think that the theory of Durkheim and followers which analyses human responses in times of social change may be of use to interpret situations in which change or the will to change creates conflicting systems of rules and practices. The paper will argue for a crucial role for evaluation in negotiating such periods of change.
 
Supporting value judgements about policies and programmes is a central task in evaluation. There is, however, little consensus on how evaluators are to accomplish this task.The traditional cost-benefit approaches were found wanting and yet valuation as promoted by checklists or qualitative stakeholder interviews is not anchored to an economic theory and thus inspires little confidence. While no single methodology is likely to be accepted by all, recent developments in economic theory support a new interpretation. This proposed approach is a variant of social cost benefit analysis (SCBA); it retains the representation of stakeholder values while avoiding the more dogmatic, and even mechanical, underpinnings of traditional economic analysis. In this article we trace the development of this new ‘options-based’ approach and chart out the path for further research. It warrants, we believe, a voice in the dialogue on economic evaluation.
 
Presents the 1st in a series of 4 reports on specific approaches to need assessment available to community mental health centers. It describes the assessment of needs and utilization of services within the framework of a multistage epidemiologic model for comprehensive evaluation research in a single county. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
 
Describes goal attainment scaling (GAS) in the evaluation of individual treatment programs. GAS consists of (a) a set of dimensions selected for each patient, (b) a schedule for assigning weights to dimensions, (c) a list of expected outcomes for each dimension, (d) a follow-up assessment of these outcomes, and (e) a total score summarizing outcomes across all dimensions. Intake scores are compared with follow-up scores to estimate change during treatment. GAS data for 170 patients indicate test-retest reliabilities of .70 for outcome scores and .88 for content, and interscorer reliability of .70. (22 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
 
Examines factors related to innovations in mental health-care practices. Results of a questionnaire survey covering 162 institutions providing psychiatric services support the following hypotheses: (a) practitioners depend primarily on personal contacts for new information; (b) innovators rely on professional conferences for information and stimulation; (c) lack of funds often limits their attendance at conferences; (d) experience is regarded as more important than formal research in contributing to innovation; (e) the degree of innovation relates to the amount of encouragement by administrators; (f) innovations are more likely to succeed when supported by persons who are to implement them; and (g) there is little interaction between innovators and researchers in institutions. A case history is presented to illustrate utilization of information at each stage in designing and implementing new programs. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
 
This article describes private-sector evaluation in the European Bank for Reconstruction and Development (EBRD) from the beginning of the institution in 1991 until the end of 2010. It shows the way of undertaking ex-post project evaluation that was adopted by the EBRD’s Board during that period, against the background of a region that experienced severe ups and downs as a result of different financial crises. It describes how the EBRD’s theory-based evaluation system, anchored in a comprehensive evaluation policy, evolved over time, being reviewed and amended approximately every five years. At the heart of this evolving system was a progressively more independent evaluation function that followed the good practice standards of the Evaluation Cooperation Group (ECG), the working group for cooperation among multilateral development bank (MDB) evaluators. The article pictures the dual objective of evaluation through concentrating on generating lessons and being a Board accountability tool. It poses the question, ‘who evaluates the evaluators?’ and shows the importance of applying an established ECG peer-review system of evaluation functions in MDBs.
 
The Debates, Notes and Queries section provides an opportunity for evaluation-related issues to be debated as well as for more genera! interchange. Debates can take the form of sustained arguments by the advocates of different approaches or of briefer thoughts or notes. Contributors may also wish to comment and raise questions about material that has previously appeared in the journal or simply use the section to draw readers' attention to relevant issues, ongoing research, evaluation activities and other events.
 
This article identifies pathways through which impacts from the 2014 Commonwealth Games might arise. It also assesses the likelihood of positive impacts and considers how best to evaluate the games. The pathways identified are: economic growth; increased sports participation; increased pride and sense of identity; volunteering; improved environment; and legacy programmes. There is little or no evidence from previous major multi-sports events to suggest that any of these pathways are likely to generate meaningful positive outcomes although there is an absence of evidence for some. The available evidence could be improved if the 2014 Games were to be evaluated using: retrospective cohort analysis for discrete interventions; theory-based comparative cohort analyses, which includes an assessment of opportunity costs for effects that are intrinsic to hosting the event; and a realist evaluation of ‘catalytic impacts’.
 
Main stakeholders, their definitions of problems, evaluation criteria and perceived forms of integration
Investment subsidies are the most popular means of public support for enterprises. However, evaluation studies measuring their net effect suggest that their effectiveness is highly debatable. The social mechanism of investment subsidies has been investigated in the article with a flexible, abductive methodological approach. Both methodological and data source triangulations have been applied: qualitative and quantitative methods deployed; and viewpoints of manifold groups (policy-makers, beneficiaries, journalists) reconstructed. The article goes beyond previous findings indicating the small net effects of intervention by investigating the social mechanism accounting for the size of the effects. It also indicates that the permanency of the programme may be explained by the analysis of the programme theories of stakeholders involved in implementation.
 
On a regular basis, members of the journal's editorial team will visit centres of evaluation practice as well as centres that are important in shaping contemporary evaluation. This section carries forward the journal's intention as stated in its aims ‘to advance theoretical and methodological understandings of evaluation in the context of evaluation policy and practice’. We have published a number of articles in the journal over recent years on evaluation and review processes in higher education. Many of these articles have contained specific details of national HE evaluation arrangements. Most have been based in Europe, although in this issue we include one article describing arrangements in Hong Kong. We thought it would be interesting for readers to extend the comparison of these models with the NZ approach to these matters and were therefore pleased to receive the following description of the NZ system, from the director of the NZ Universities Academic Audit Unit.
 
This article describes an attempt to synthesize results from four major multi-partner evaluations applying a realist approach. The article argues that synthesizing the results of these four major evaluations proved possible applying (part of) a realistic approach, and also reveals some of the limitations of the realist model as experienced by a very complex synthesis exercise. The researchers identify the concepts and approaches of the realist model that provided a supportive framework for synthesizing a very broad and diverse range of evidence, and which helped convert the findings into accessible outputs to support policy decisions. Lessons learned are extracted in the hope of supporting others facing similar tasks.
 
Evaluation of public policies in France has been late arriving but its subsequent development has been markedly sustained. Although the methods used are often either rudimentary or excessively rigid, evaluation is now an integral part of public action. Having first presented the approaches behind this trend, this article then follows the institutionalization of evaluation through its adoption by state and then local bureaucracies, a process heavily encouraged by the European Commission. Two distinctive features, both involving social scientists, mark the French evaluation scene and influence the article's final section. First, although some continue to actually carry out evaluations, today a growing number of academics are involved as key members of committees advising actors responsible for the evaluation of public policies. Second, these committees serve as interfaces between evaluators and actors. Together these developments could be seen as prefiguring a new relationship between knowledge and power.
 
This article analyses the current Portuguese Schools’ Evaluation Programme, implemented since 2006 in all state schools, as a social construction. The article focuses on a particular topic of the external evaluation the participation of social actors in school life, as well as on school principals’ perceptions of the process. Our research is based on a content analysis of schools’ evaluation reports conducted in three different regions (Lisbon and Tagus Valley, Alentejo and Algarve) and a series of semi-directive interviews with the principals and chairpersons of the General Councils of 20 schools. While it is important to consider the evaluation programme in the light of international political tendencies (e.g. New Public Management), it is also relevant to understand the impact of such policy on schools. We highlight the contributions of this approach to a wider reflection on evaluation processes.
 
This article examines the nature of policy evaluation with particular reference to the twin concepts of deadweight and additionality. Two different perspectives on evaluation are presented, namely: (a) a 'control' model based on assessing the value for money of a policy intervention where emphasis is often placed on the measurement of deadweight and additionality and (b) a 'helping' model where the emphasis is on providing feedback on the policy or program in question, thereby leading to a mutual learning process. A critical analysis is presented of the way in which the concepts of deadweight and additionality are treated in evaluations, with evidence drawn from the evaluation of various industrial support schemes in Northern Ireland.
 
Speeches and Addresses is an occasional feature, not in article format, in which presentations at conferences and other public gatherings are seen as likely to be of interest to a wider audience. Sometimes these speeches and addresses will undoubtedly be contentious, in which case responses and counter-arguments are to be expected and are welcome. Contributions to this section are intended to make accessible relevant material for academic, policy-making and practitioner audiences.
 
This article argues that Qualitative Comparative Analysis can be a useful method in case-based evaluations for two reasons: a) it is aimed at causal inference and explanation, leading to theory development; b) it is strong on external validity and generalization, allowing for theory testing and refinement. After a brief introduction to QCA, the specific type of causality handled by QCA is discussed. QCA is shown to offer improvements over Mill’s methods by handling asymmetric and multiple-conjunctural causality in addition to counterfactual reasoning. It thereby allows the explicitly separate analysis of necessity and sufficiency, recognizing the relevance of causal packages as well as single causes and of multiple causal paths leading to the same outcome (equifinality). It is argued that QCA can generalize findings to a small, medium and large number of cases.
 
Six common drivers of RI performance improvement, linked with country-specific drivers. 
Analytical framework for ARISE case studies.  
Overview of ARISE in-depth study analyses.
Common drivers related to the ARISE analytical framework.
Final synthesized findings on drivers of improved RI performance, ARISE, showing clustering and ordering.  
This article considers the challenges of generalizability related to case studies, and specifically for the in-depth case studies of the Africa Routine Immunization System Essentials (ARISE) project. The article describes how these challenges were addressed, by developing a Theory of Change to frame case selection strategies, data collection, and analysis, including synthesis of findings across multiple cases. The authors then consider: the importance of grounding generalizability in theory; balancing within-and cross-case analyses for synthesis; and using theory-based case selection, as ways to support generalizability of the case study findings. Multiple case studies should sequence analysis as: 1) within-case analysis; 2) identification of replicated findings and implementation variation across cases; and 3) synthesis across cases, pooling the data. Case selection should be a stand-alone, formative part of case study research. The lessons from the ARISE case studies suggest that these are important ways in which case study methodology can be strengthened.
 
The past two decades have seen a drastic increase in the role and expectations of civil society organizations (CSOs) in international development, placing greater demands on the evaluation of CSOs and associated programs.While there is much potential in CSOs, it is important to recognize the distinctly political and economic realities in which they operate. CSOs face formidable challenges, the foremost being dependency on donor funding and the resultant threat to CSO autonomy and performance. Does donor-required monitoring and evaluation (M&E) enhance or compromise CSO performance? Drawing upon research from the United Nations Office of the Special Coordinator of Africa and the Least Developed Countries (UN/ OSCAL) and the United Nations Office of the Special Adviser on Africa (UN-OSAA), this article presents an overview of African CSOs, exploring key issues that can inform the international M&E community.
 
The article explores the dilemma facing evaluators of development aid (as indeed of most other topics), namely, whether to give priority to accountability or lesson-learning as the main objective. It reviews the strengths and weaknesses of donor evaluation practices, and finds a strength of aid evaluation has been its role in improving agency performance in terms of the achievement of development targets (i.e. as distinct from the efficiency of aid delivery), exemplified by the current trend towards more impact evaluation. Another strength has been the forging of a closer relationship between evaluation and project-cycle management, through the use of the Logical Framework sequence. The main weaknesses are inadequate feedback, and a failure to involve the stakeholders in evaluations. The article recommends that there should be more flexible use of the Logical Framework, involving the beneficiaries at all stages, and that specific measures should be taken to ensure feedback at the policy level.
 
Tabl. ; bibl. The evaluation of foreign aid is thoroughly integrated into the work of aid agencies. It is argued that evaluations contribute to organizational learning and are used to support change in policies and operations. How the use of evaluation is understood, however, depends on the organization perspective applied. Various organization perspectives are relevant. This article discusses the different approaches and looks at organizational practices and a number of case studies to address the question of how evaluation is used. The findings suggest that the use of evaluation for learning in agencies may be less important than other inputs, and evaluation results only partly support policy and operational changes. One single perspective on an organization cannot explain its evaluation processes and use. Different elements of the evaluation appear to be dominated by specific organization perspectives.
 
Although the European Commission has introduced a rigorous Impact Assessment process as part of policy proposal development since 2000, to date very little in the way of ex post evaluation of policy and legislation practice exists. In 2007, the European Commission firmly committed itself to developing ex post evaluation of these non-spending activities to match the importance of its Impact Assessment process. However, early experience when evaluating legislation, and EU Directives in particular, has highlighted problems in applying existing evaluation practices. Finding little by way of specific guidance that could be used for evaluating EU legislation, DG Internal Market and Services has developed its own methodology. This contribution aims to provide some insight for evaluation practitioners into this EU DG’s proposed approach to evaluating EU legislation, which readers may consider with a view to transferring to similar political and organizational environments.
 
Democratic and participatory evaluation raises questions of power. Power lies not only in agenda setting and problem definition but also in formulating alternatives. The latter seems to have been forgotten in the literature on democratic evaluation. This may be partly due to the general neglect of assessing alternatives in evaluations, whether ex ante or ex post. The article distinguishes different forms of democracy and specifies a model of evaluation that is intended to fit within representative democracy. The model is exemplified by an evaluation of a large infrastructure programme in Stockholm, in which environmental organizations took part.
 
Drawing on their experience of managing and evaluating a major international education project in Belize, Central America (as well as other international projects) the authors seek to draw attention to potential conflicts that arise when Western assumptions are presumed in evaluating major educational reform programs in small states. In particular, they draw attention to the political dimensions that underlie such projects; to the issues of State vs Church; to cultural differences and to the issue of absorptive capacity. In doing so, they question whether characteristics which have been attributed uniquely to small states are, in fact, not more generally to be found in all societies where resources are scarce. They conclude with a checklist of approaches that seek to address these issues. -Authors
 
This article reports on a mid-term evaluation of the impact of a recent anticorruption program developed by the Bank's Economic Development Institute (EDI). Central in EDI's approach is helping to develop and/or reinvigorate a country's National Integrity System (NIS). Integrity pillars, amongst others, are administrative reforms, watchdog agencies, Parliament, civil society, public awareness, the judiciary, the media and political will. The evaluation focuses on two African countries. It describes goals and instruments of the EDI approach, and puts these in an institutional context. The underlying 'program logic' is reconstructed. This reconstructed 'logic' is confronted with findings from a literature review, document analysis and on-site interviews in Uganda and Tanzania.This (realist evaluation) approach highlights the importance for evaluators to unravel (behavioural and social) mechanisms that underly programs. Conclusions are drawn about the program and its delivery including participants' assessment of workshops and likely wider impact in the societies concerned.
 
Although evaluators have devised a range of ethical and methodological approaches to evaluation, most have assumed a Weberian construct regarding organizations. Evaluators have tended to avoid judgemental data on the use of power and authority in organizations and have taken refuge in various relativistic discourses. Attempts to deal with these problems, such as 'portrayal of persons' and case study, are considered.
 
Using ‘evidence’ to falsify rather than verify patterns in data and searching for alternative explanations enables a better understanding of the circumstances that explain why and how a social programme works or does not work. An analysis of the extent to which a programme is meeting its aims and objectives to find out if it provides a solution to the policy problem, is more rigorous. The roles researchers adopt influence the quality of an evaluation; facilitating a better understanding of the theories embodied in programmes enhances an evaluation while being a ‘broker of compromise’ can limit access to information. Researchers have a valuable role in promoting learning. A robust evaluation framework integrates strategies for generalizing at the outset and identifying mechanisms of change or causal mechanism is a way forward. Examples are taken from recent evaluations conducted by the author and colleagues to illustrate the arguments. Published (author's copy) Peer Reviewed
 
Separate interpretive arguments for a reaching-down-reaching-up evaluation with two distinct uses 
To maximize the impact of research on programs, this article proposes a ‘reaching-down—reachingup’ perspective in evaluation design, whereby it serves two functions simultaneously: the program improvement function, reaching down, and the knowledge development function, reaching up. This proposal frames applied research as a particular species of evaluation. As validity is a fundamental assessment of how well a design supports particular inferences related to specific uses, the article subsequently summarizes and integrates disparate validity perspectives to develop an argument-based approach to validity in a ‘reaching-down—reaching-up’ evaluation model case study. Lastly, the location(s) of validation is considered in order to highlight evaluator responsibilities.
 
But what experience and history teach us is this, that nations and governments have never learned anything from history. (G. W. F. Hegel, 1837, cited in Feyerabend 1978) Contemporary literature on policy evaluation challenges the 'traditional', rational-objectivist model of policy evaluation. Instead, an argumentative- subjectivist approach is forwarded, conceiving of policy-making as an ongoing dialogue, in which both governmental and societal actors contest their views on policy issues by exchanging arguments. It is argued that, through constructive argumentation, policy actors, networks or advocacy coalitions may arrive at moral judgements on policy issues and, hopefully, at 'better' policies and ways of delivering those policies. The paradigm shift from the rational-objectivist model to the argumentative-subjectivist approach has implications for the way policy evaluation is studied as a means of institutionalized policy-oriented learning; the searching process of improving and perfecting public policy and its underlying normative assumptions through the detection and correction of perceived imperfections. Policy- oriented learning can be studied from a cybernetic control, a cognitive development and a social-constructivist perspective. Within policy-oriented argumentation and negotiation—the discursive processes that constitute the roots of policy-oriented learning—there may (still) be a need for methodologically sound assessments of the cost-effectiveness or 'quality' of policy measures. Accepting this premise, we advance an integrated learning strategy, in which the 'traditional', rational-objectivist role of evaluating institutions may well serve to complement more argumentative-oriented perspectives.
 
Over the past two decades, all major industrial western societies have been plagued with increasing unemployment. The member states of the European Union have been particularly hit by this phenomenon. Corresponding re- employment strategies, notably in the White Book of the European Commission in 1993, were formulated. But these proposals are still contested, partly because of lack of knowledge about the impact of existing policies. Thus, there is reason enough to draw closer attention to the evaluation of labour market policy. What do we know about the effectiveness and efficiency of labour market policies? Which approaches to measure the impact and the cost-effectiveness do exist? How do policymakers, agencies and their officials administering the policies and programmes use the results of evaluation researchers? The aim of this paper is to consider the progress made in evaluation research of labour market policy. The emphasis is more on methodological questions of evaluation and less on substantive questions of labour market policy. Section I outlines the rationale for a target-oriented approach to evaluate labour market policy; section 2 addresses general methodological issues of assessing policy impacts; section 3 discusses why experiments are so little used in labour market policy evaluation, particularly in Europe; section 4 turns to practical conclusions related to monitoring as a necessary complement to sophisticated evaluation research: and section 5 summarizes the arguments.
 
The article reviews the state of M&E in South Asia, and makes a case for building capacity in the field, without which the discipline will remain underdeveloped, and accountability for government spending not improve. We suggest that almost none of the South Asian countries have yet managed to develop an M&E ‘system’, though all have mechanisms in place. The article discusses the MIS and evaluation systems, tools in use by governments in South Asia, budgets for evaluation, and the role of civil society organizations. It highlights two cases in South Asia (India and Sri Lanka) where the concept of Performance Management has gathered momentum, even though outcome budgeting has barely taken off in any country of the sub-region. It examines plans for improving implementation of evaluation, but finds many areas suffering from capacity weaknesses. The article closes by recommending ways to address the capacity weaknesses.
 
Mean Strength of Logical Links of Interventions in the Hoenderloogroep and the Kolkemate with Theoretical Elements of the Threshold Set X 1 , …, X 4
Impact studies of prevention programmes, in particular meta-analyses, usually interpret outcome and impact statistics as tests of an underlying theory of prevention. However, these programmes usually combine various interventions linked to different theoretical perspectives. Consequently, the effects of the programme can easily be misinterpreted.This article introduces an interpretation method that acknowledges the eclecticism of prevention practice and is also a feasible instrument to enhance the quality of meta-studies. First, the interventions of the programme are identified and analysed separately. Second, an assessment is made of the arguments that link interventions and the espoused theories.Third, each intervention is represented by a set of scores indicating the types of links and core theoretical assumptions.These scores are aggregated to programme scores and included as independent variables in meta-analyses.The method is illustrated by an evaluation of practices in two Dutch crime prevention institutions. The evaluation demonstrated the theoretical value of (eclectic) practices, even when they differ substantially from the official programme theories.The approach also highlighted an `interpretation error' in assessing the impact of one of the programmes and suggests a correction.
 
Basic Factors in the Context-Degrees of Openness
Incorporating Incidental Outcomes in an Abstract CMO Configuration
This article builds on an earlier experiment in applying Realist Evaluation (RE) techniques to a set of Best Value Reviews (BVRs) undertaken in a single English local authority. That experiment used a range of assumptions regarding context (C), mechanisms (M) and outcomes (O) that restricted the possible pathways in the resulting CMO causal loop. They were ‘heroic’ in nature and left largely implicit. The article subjects those assumptions to rigorous criticism. Five hypotheses are tested concerning the nature of the context within which the BVRs occurred; the potentially skewed nature of the review mechanisms chosen; the impact of process outcomes and goals-setting problems on BVR outcomes; the scope for strong linkages to be formed between context, mechanism and outcomes, such that deterministic effects ensue; and the need to define the boundaries of the evaluand (the BVR). Realist evaluations have typically focused on individual services or programmes. It is contended that RE methods need to be adapted to address cumulative impacts on policy and organizational culture that are inherently political in nature.
 
This article deals with the question of whether health technology assessment (HTA) should be regarded as a kind of evaluation. Following Michael Scriven, we define evaluation as the determination of value – value covering the merit or worth for all those affected – by use of valid methods. Mainstream HTA entails scientific research into the effects and associated costs of health technologies. It shows a tendency towards judging rather than improving technology; employs a positivist rather than a constructivist scientific paradigm; and features a strong emphasis on internal validity. If HTA is regarded as a kind of evaluation, it has limited scope. Although we agree that information on costs and effects is important for policy making in the field of healthcare, our view is that HTA as it is commonly practised is a goal-based tool rather than a type of evaluation. To ameliorate this problem, commissioners of HTA should take more risks in financing research that is still experimental, but has the potential of revitalizing HTA as a science of valuing. In this respect, social constructivism may have something to offer.
 
This article examines the relationships of the National Audit Office (NAO) — the state audit institution (SAI) of the United Kingdom — with a range of third parties that shape the performance audit work the NAO undertakes. In particular, it considers from a practitioner perspective how the NAO has sought to balance its independence with the desire to be responsive to the expectations of others. It concludes that, while independence remains crucial to the credibility of the value for money auditor, examining the connections made by the NAO in its value for money work also helps to explain the hybrid discipline that performance auditing has become in recent years.
 
This article draws on research into evaluation and on the tacit practices used in an evaluation agency to develop an approach to initiating new evaluators into evaluation planning processes. Using these two sources as a base, this article suggests that it is possible to conceptualize evaluation as a series of knowledge-based practices. These knowledge-based practices form the resources of ‘communities of practice’, i.e. a group of practising evaluators. In that this conceptualization refers to any job, work or occupation, beginning to be an evaluator, just like beginning any job or work, requires the ‘novice’ to be inducted or socialized into the ‘community of practice’. Understanding evaluation activity in this way should provide the basis for some enabling ‘tools’ for thinking about an evaluation design. The learning as an outcome of ‘process use’ is, in fact, the way we might prompt access to a reservoir of experiential and other knowledge in order for evaluations to be carried out by new evaluators, within the normative frame of a group of evaluators. In essence, it involves a process of reflexive questioning during which key procedural dimensions of an evaluation are addressed, leading to an accelerated induction into key aspects of evaluation design. It enables initial planning to occur and an evaluation to ‘get off the ground’. RUFDATA is the acronym given to questions which consolidate this reflexive process. To that extent, the approach is a ‘meta-evaluative’ tool. It outlines RUFDATA as an example of such an approach and demonstrates its use by a ‘mini case study’.
 
Criteria for Evaluating Public Participation Exercises 
The Benefits of Participation and how they Might be Measured 
Among parliamentary democracies there is a widespread belief that above and beyond the occasional opportunity to vote, citizens should be allowed to participate in decisions that affect them. Governments at all levels are now going further and supporting more active forms of citizenship in which various decision processes are open to more public participation. While this principle may be widely accepted, the practice has remained remarkably free from empirical scrutiny. For something that is held to deliver a myriad of benefits, we still know little of the extent to which these are in fact delivered. This article addresses this gap by developing a framework for conducting more robust empirical scrutiny of participatory exercises. It does so at three levels: first by proposing a conceptual clarification of the perceived benefits of greater participation, second by considering some of the methodological challenges in designing more robust evaluative studies and finally by reviewing measures that might be used in practice to quantify benefits. Yes Yes
 
The lack of transparency in the process of qualitative data analysis has led to the suspicion that findings may not be robust enough to be used as evidence. Much qualitative data analysis software (QDAS) contains functions that facilitate the demonstration of reliability and validity, although the process is not straightforward. Given the scale and complexity of many evaluations, QDAS is a powerful tool that can be used to manage and analyse vast amounts of qualitative data. Its use in an evaluative context, however, can be fundamentally different from using it for other forms of qualitative analysis. This article explores the use of NUD*IST version 6 (N6), a widely used QDAS, to manage data from the qualitative aspects of a large-scale multi-component evaluation involving a large, multi-site team. It demonstrates that research and logistical imperatives can have concrete impact on the way technology is harnessed to produce qualitative findings, hence inevitably influencing resultant evidence.
 
This article addresses the evaluative dimension of development effectiveness. It argues that, with some refinements, the effectiveness criteria forged in the international-development field would be equally serviceable in other domains of evaluative practice. Specifically, the criteria and tools in widespread use within the development-evaluation arena reflect hard-won lessons of experience. They have demonstrated their worth in diverse operating environments. They have proved resilient to shifts in policy doctrines. They are equally applicable to project, programme and policy evaluations. Finally, having supplanted measures focused on inputs and outputs (rather than outcomes and impacts), they are well aligned with the evidence-based and results-oriented stance currently favoured by policy makers in rich and poor countries alike.
 
Participatory evaluation methodologies are considered to produce many positive and empowering impacts. However, given the complex power, knowledge and discursive issues involved and other factors, use of these methodologies can have contradictory effects. This article presents results from the implementation of a process that aimed to build the capacities of people in two Australian rural communities to evaluate their local communication and information technology (C&IT) initiatives. The ‘LEARNERS’ process used participatory action research and participatory evaluation methods, and took an inclusive ‘whole of community’ approach. The process aimed to enhance community development and to facilitate community empowerment, participation and leadership, particularly for women. Rigorous analysis of the impacts of the project found that it was effective in producing various degrees of social, technological, political and psychological empowerment. However, some corresponding disempowering impacts were also identified. The strengths and limitations of this evaluation capacity-building process and the lessons learned are considered.
 
Most developing countries now have monitoring and evaluation systems in place. However, most systems are concerned with the progress of implementation, rather than assessing the social, economic and environmental impacts of projects. Also, there seem to be no systems that assess the impact of policy interventions emerging from recent macro-level measures, such as liberalization, privatization, and the preservation of women's rights. In developing countries, donor agencies have played a role in planning, implementing and financing various socio-economic development programs and projects. In many cases, the outcomes of these interventions do not match the intended objectives. It has been argued that due to the lack of ongoing evaluation many governments fail to learn, in time, the way a project is unfolding and the manner in which it is generating benefits. There are also many who simply do not see the benefits of evaluation and consider it to be a donor-driven activity of no management use. Those donors who do see evaluation as an important tool to improve investment quality are now initiating evaluation capacity building activities. The success of these initiatives seems to have been constrained, among other things, by the lack of a unified approach; inadequate appreciation and analysis of governmental culture; confusion about concepts and methodologies; lack of long-term commitment; and lack of either interest or resources-or both-from the recipient governments. Future evaluation capacity building work will need to make a careful analysis of these constraints and approach the subject with far greater sensitivity and technical knowledge.
 
Mental Health Link - a facilitated programme - aimed to develop systems within primary care and links with specialists to improve care for patients with long-term mental illness. A process evaluation based on Pawson and Tilley’s Realistic Evaluation complemented a randomized controlled trial. This article describes the method developed for this ‘realistic evaluation’, the mechanisms behind the integration of linked specialist workers and discusses practical and theoretical issues arising from the use of the realistic evaluation framework as a way of explaining the results of trials and service development. Retrospective interviews identified the important outcomes and were used to construct ‘Context-Mechanism-Outcome’ configurations. The 12 case studies represented what had happened. A second-level analysis using analytic induction developed ‘middle range theories’ designed to be of value to those developing care elsewhere. The intervention was successful in stimulating productive joint working, through case discussions, but often failed to ensure a review of progress.
 
Programme evaluation has become a widely applied mode of systematic inquiry for making judgements about public policies. Although evaluation, as a form of systematic inquiry, has provided feedback information for policy makers, it still too often produces banal answers to complex and multi-dimensional societal problems. In this article, we take a close look at the ontological premises, conceptions of causality, and relationships to rational theories of action of different programme evaluation paradigms. There is a paradigm crisis in evaluation resulting from differences over assumptions about causality. Evaluation paradigms clearly provide research strategies, but more particularly they map causal links in contrasting ways. Traditional cause-and-effect logic disregards the fact that programme effects are always brought about by real actors rather than constructed ideal actors. A new interpretation of causes and effects is needed, which would strengthen the core ideas that lie behind the now widely applied and consolidated realistic evaluation tradition.
 
Three steps in using case studies to evaluate development impacts of value chain partnerships. 
Partnerships between companies and non-governmental organizations that aim to incorporate smallholder farmers into value chains are increasingly being promoted as a way of pursuing development goals. This article investigates two case studies of such partnerships and the outcomes they achieved in order to refine the rationale underlying such interventions. In two case studies in Uganda and Rwanda, we documented the sequences of events within such partnership interventions, their context, and the intermediate outcomes, identified as the new rules and practices that generate institutional change. By portraying both the configuration of events within a partnership intervention and the contextual factors, these case studies reveal how the interventions produced outcomes that were situated in changing contexts, such as changes in market demand, government policy or business strategy. The research approach made it possible to disentangle partnership interventions and contextual processes, and to give participants a firmer idea of the potential and limitations of value chain partnerships to achieve developmental targets.
 
This article aims to inform evaluators about issues involved in using and integrating administrative databases from public agencies. With the growing focus on monitoring and oversight of public programs for health, mental health and substance abuse problems, existing data sets in public agencies became an important source of evaluation and planning information. The focus of this article is on the methods used to find, integrate and analyze multiple existing databases. Primary challenges that confront the evaluator in identifying and accessing data sources and in addressing the technical issues involved are discussed.
 
Gender equality was introduced into international-development evaluation two decades ago. Over the years, there have been different experiences in incorporating gender issues into the diverse phases of the evaluative process. This article reviews the practices of international-development agencies based on meta-evaluation studies and the most relevant material published by international organizations. The article also explores what it means to carry out a gender-sensitive evaluation, basing it on gender and feminist contributions and different methodological options. Finally, the article describes the key challenges of incorporating the gender perspective into the whole evaluative process.
 
Top-cited authors
Christopher Pollitt
Patricia Rogers
  • BetterEvaluation
Ana Manzano
  • University of Leeds
Nicoletta Stame
  • Sapienza University of Rome
John Mayne