The Metric Tide
Report of the Independent Review of the Role of Metrics
in Research Assessment and Management
James Wilsdon, Liz Allen, Eleonora Belfiore, Philip Campbell, Stephen Curry,
Steven Hill, Richard Jones, Roger Kain, Simon Kerridge, Mike Thelwall, Jane Tinkler,
Ian Viney, Paul Wouters, Jude Hill, Ben Johnson.
Wilsdon, J., et al. (2015). The Metric Tide: Report of the Independent Review of the Role of
Metrics in Research Assessment and Management. DOI: 10.13140/RG.2.1.4929.1363
© HEFCE 2015, except where indicated.
Cover image © JL-Pfeifer / Shutterstock.com.
The parts of this work that are © HEFCE are available under the Open Government Licence 2.0:
Foreword ................................................................................................................................................ iii
Acknowledgments .................................................................................................................................. iv
Steering group and secretariat ................................................................................................................. v
Executive summary ............................................................................................................................... vii
1. Measuring up ................................................................................................................................... 1
2. The rising tide ............................................................................................................................... 12
3. Rough indications .......................................................................................................................... 30
4. Disciplinary dilemmas .................................................................................................................. 50
5. Judgement and peer review ........................................................................................................... 59
6. Management by metrics ................................................................................................................ 68
7. Cultures of counting ...................................................................................................................... 79
8. Sciences in transition ..................................................................................................................... 96
9. Reflections on REF ..................................................................................................................... 117
10. Responsible metrics .................................................................................................................... 134
Annex of tables ................................................................................................................................... 148
List of abbreviations and glossary ...................................................................................................... 161
The Metric Tide: Literature Review (Supplementary Report I to the Independent
Review of the Role of Metrics in Research Assessment and Management)
The Metric Tide: Correlation analysis of REF2014 scores and metrics
(Supplementary Report II to the Independent Review of the Role of Metrics in
Research Assessment and Management)
Metrics evoke a mixed reaction from the research community.
A commitment to using data and evidence to inform decisions makes
many of us sympathetic, even enthusiastic, about the prospect of granular,
real-time analysis of our own activities. If we as a sector can’t take full
advantage of the possibilities of big data, then who can?
Yet we only have to look around us, at the blunt use of metrics such as
journal impact factors, h-indices and grant income targets to be reminded
of the pitfalls. Some of the most precious qualities of academic culture
resist simple quantification, and individual indicators can struggle to do justice to the richness and
plurality of our research. Too often, poorly designed evaluation criteria are “dominating minds,
distorting behaviour and determining careers.”
At their worst, metrics can contribute to what Rowan
Williams, the former Archbishop of Canterbury, calls a “new barbarity” in our universities.
tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a
review of its use of performance metrics, is a jolting reminder that what's at stake in these debates is
more than just the design of effective management systems.
Metrics hold real power: they are
constitutive of values, identities and livelihoods.
How to exercise that power to positive ends is the focus of this report. Based on fifteen months of
evidence-gathering, analysis and consultation, we propose here a framework for responsible metrics,
and make a series of targeted recommendations. Together these are designed to ensure that indicators
and underlying data infrastructure develop in ways that support the diverse qualities and impacts of
UK research. Looking to the future, we show how responsible metrics can be applied in research
management, by funders, and in the next cycle of the Research Excellence Framework.
The metric tide is certainly rising. Unlike King Canute, we have the agency and opportunity – and in
this report, a serious body of evidence – to influence how it washes through higher education and
research. Let me end on a note of personal thanks to my steering group colleagues, to the team at
HEFCE, and to all those across the community who have contributed to our deliberations.
James Wilsdon, Chair
Lawrence, P.A. (2007) ‘The mismeasurement of science’. Current Biology Vol.17, Issue 15, pR583–R585.
Annual Lecture to the Council for the Defence of British Universities, January 2015.
metrics/2019381.article. Retrieved 22 June 2015.
The steering group would like to extend its sincere thanks to the numerous organisations and
individuals who have informed the work of the review. Metrics can be a contentious topic, but the
expertise, insight, challenge and open engagement that so many across the higher education and
research community have brought to this process has made it both enjoyable and instructive.
Space unfortunately limits us from mentioning everyone by name. But particular thanks to David
Willetts for commissioning the review and provoking us at the outset to frame it more expansively,
and to his ministerial successors Greg Clark and Jo Johnson for the interest they have shown in its
progress and findings. Thanks also to Dr Carolyn Reeve at BIS for ensuring close government
engagement with the project.
The review would not have been possible without the outstanding support that we have received from
the research policy team at HEFCE at every stage of research, evidence-gathering and report drafting;
notably Jude Hill, Ben Johnson, Alex Herbert, Kate Turton, Tamsin Rott and Sophie Melton-Bradley.
Thanks also to David Sweeney at HEFCE for his advice and insights.
We are indebted to all those who responded to our call for evidence; attended, participated in and
spoke at our workshops and focus groups; and contributed to online discussions. Thanks also to those
organisations who hosted events linked to the review, including the Universities of Oxford, Sheffield,
Sussex, UCL and Warwick, the Higher Education Policy Institute and the Scottish Funding Council.
The review has hugely benefited from the quality and breadth of these contributions. Any errors or
omissions are entirely our own.
Steering group and secretariat
The review was chaired by James Wilsdon FAcSS, Professor of Science and Democracy at the
Science Policy Research Unit (SPRU), University of Sussex (orcid.org/0000-0002-5395-5949;
Professor Wilsdon was supported by an independent steering group with the following members:
Dr Liz Allen, Head of Evaluation, Wellcome Trust (orcid.org/0000-0002-9298-3168;
Dr Eleonora Belfiore, Associate Professor in Cultural Policy, Centre for Cultural Policy
Studies, University of Warwick (orcid.org/0000-0001-7825-4615; @elebelfiore);
Sir Philip Campbell, Editor-in-Chief, Nature (orcid.org/0000-0002-8917-1740;
Professor Stephen Curry, Department of Life Sciences, Imperial College London
Dr Steven Hill, Head of Research Policy, HEFCE (orcid.org/0000-0003-1799-1915;
Professor Richard Jones FRS, Pro-Vice-Chancellor for Research and Innovation,
University of Sheffield (orcid.org/0000-0001-5400-6369; @RichardALJones)
(representing the Royal Society);
Professor Roger Kain FBA, Dean and Chief Executive, School of Advanced Study,
University of London (orcid.org/0000-0003-1971-7338; @kain_SAS) (representing the
Dr Simon Kerridge, Director of Research Services, University of Kent, and Chair of the
Board of the Association of Research Managers and Administrators (orcid.org/0000-
Professor Mike Thelwall, Statistical Cybermetrics Research Group, University of
Wolverhampton (orcid.org/0000-0001-6065-205X; @mikethelwall);
Jane Tinkler, Social Science Adviser, Parliamentary Office of Science and Technology
Dr Ian Viney, MRC Director of Strategic Evaluation and Impact, Medical Research
Council head office, London (orcid.org/0000-0002-9943-4989, @MRCEval);
Paul Wouters, Professor of Scientometrics & Director, Centre for Science and
Technology Studies (CWTS), Leiden University (orcid.org/0000-0002-4324-5732,
The following members of HEFCE’s research policy team provided the secretariat for the steering
group and supported the review process throughout: Jude Hill, Ben Johnson, Alex Herbert, Kate
Turton, Tamsin Rott and Sophie Melton-Bradley. Hannah White and Mark Gittoes from HEFCE’s
Analytical Services Directorate also contributed, particularly to the REF2014 correlation exercise (see
Supplementary Report II). Vicky Jones from the REF team also provided advice.
This report presents the findings and recommendations of the Independent Review of the Role of
Metrics in Research Assessment and Management. The review was chaired by Professor James
Wilsdon, supported by an independent and multidisciplinary group of experts in scientometrics,
research funding, research policy, publishing, university management and administration.
Scope of the review
This review has gone beyond earlier studies to take a deeper look at potential uses and limitations of
research metrics and indicators. It has explored the use of metrics across different disciplines, and
assessed their potential contribution to the development of research excellence and impact. It has
analysed their role in processes of research assessment, including the next cycle of the Research
Excellence Framework (REF). It has considered the changing ways in which universities are using
quantitative indicators in their management systems, and the growing power of league tables and
rankings. And it has considered the negative or unintended effects of metrics on various aspects of
Our report starts by tracing the history of metrics in research management and assessment, in the UK
and internationally. It looks at the applicability of metrics within different research cultures, compares
the peer review system with metric-based alternatives, and considers what balance might be struck
between the two. It charts the development of research management systems within institutions, and
examines the effects of the growing use of quantitative indicators on different aspects of research
culture, including performance management, equality, diversity, interdisciplinarity, and the ‘gaming’
of assessment systems. The review looks at how different funders are using quantitative indicators,
and considers their potential role in research and innovation policy. Finally, it examines the role that
metrics played in REF2014, and outlines scenarios for their contribution to future exercises.
The review has drawn on a diverse evidence base to develop its findings and conclusions. These
include: a formal call for evidence; a comprehensive review of the literature (Supplementary Report
I); and extensive consultation with stakeholders at focus groups, workshops, and via traditional and
The review has also drawn on HEFCE’s recent evaluations of REF2014, and commissioned its own
detailed analysis of the correlation between REF2014 scores and a basket of metrics (Supplementary
There are powerful currents whipping up the metric tide. These include growing pressures for
audit and evaluation of public spending on higher education and research; demands by policymakers
for more strategic intelligence on research quality and impact; the need for institutions to manage and
develop their strategies for research; competition within and between institutions for prestige,
students, staff and resources; and increases in the availability of real-time ‘big data’ on research
uptake, and the capacity of tools for analysing them.
Across the research community, the description, production and consumption of ‘metrics’
remains contested and open to misunderstandings. In a positive sense, wider use of quantitative
indicators, and the emergence of alternative metrics for societal impact, could support the transition to
a more open, accountable and outward-facing research system. But placing too much emphasis on
narrow, poorly-designed indicators – such as journal impact factors (JIFs) – can have negative
consequences, as reflected by the 2013 San Francisco Declaration on Research Assessment (DORA),
which now has over 570 organisational and 12,300 individual signatories.
Responses to this review
reflect these possibilities and pitfalls. The majority of those who submitted evidence, or engaged in
other ways, are sceptical about moves to increase the role of metrics in research management.
However, a significant minority are more supportive of the use of metrics, particularly if appropriate
care is exercised in their design and application, and the data infrastructure can be improved.
Peer review, despite its flaws and limitations, continues to command widespread support across
disciplines. Metrics should support, not supplant, expert judgement. Peer review is not perfect, but it
is the least worst form of academic governance we have, and should remain the primary basis for
assessing research papers, proposals and individuals, and for national assessment exercises like the
REF. However, carefully selected and applied quantitative indicators can be a useful complement to
other forms of evaluation and decision-making. One size is unlikely to fit all: a mature research
system needs a variable geometry of expert judgement, quantitative and qualitative indicators.
Research assessment needs to be undertaken with due regard for context and disciplinary diversity.
Academic quality is highly context-specific, and it is sensible to think in terms of research qualities,
rather than striving for a single definition or measure of quality.
Inappropriate indicators create perverse incentives. There is legitimate concern that some
quantitative indicators can be gamed, or can lead to unintended consequences; journal impact factors
and citation counts are two prominent examples. These consequences need to be identified,
These are presented in greater detail in Section 10.1 of the main report.
www.ascb.org/dora. As of July 2015, only three UK universities are DORA signatories: Manchester, Sussex and UCL.
acknowledged and addressed. Linked to this, there is a need for greater transparency in the
construction and use of indicators, particularly for university rankings and league tables. Those
involved in research assessment and management should behave responsibly, considering and pre-
empting negative consequences wherever possible, particularly in terms of equality and diversity.
Indicators can only meet their potential if they are underpinned by an open and interoperable
data infrastructure. How underlying data are collected and processed – and the extent to which they
remain open to interrogation – is crucial. Without the right identifiers, standards and semantics, we
risk developing metrics that are not contextually robust or properly understood. The systems used by
higher education institutions (HEIs), funders and publishers need to interoperate better, and
definitions of research-related concepts need to be harmonised. Information about research –
particularly about funding inputs – remains fragmented. Unique identifiers for individuals and
research works will gradually improve the robustness of metrics and reduce administrative burden.
At present, further use of quantitative indicators in research assessment and management
cannot be relied on to reduce costs or administrative burden. Unless existing processes, such
as peer review, are reduced as additional metrics are added, there will be an overall increase in
burden. However, as the underlying data infrastructure is improved and metrics become more
robust and trusted by the community, it is likely that the additional burden of collecting and
assessing metrics could be outweighed by the reduction of peer review effort in some areas –
and indeed by other uses for the data. Evidence of a robust relationship between newer metrics
and research quality remains very limited, and more experimentation is needed. Indicators
such as patent citations and clinical guideline citations may have potential in some fields for
quantifying impact and progression.
Our correlation analysis of the REF2014 results at output-by-author level (Supplementary
Report II) has shown that individual metrics give significantly different outcomes from the REF
peer review process, and therefore cannot provide a like-for-like replacement for REF peer
review. Publication year was a significant factor in the calculation of correlation with REF scores,
with all but two metrics showing significant decreases in correlation for more recent outputs. There is
large variation in the coverage of metrics across the REF submission, with particular issues with
coverage in units of assessment (UOAs) in REF Main Panel D (mainly arts & humanities). There is
also evidence to suggest statistically significant differences in the correlation with REF scores for
early-career researchers and women in a small number of UOAs.
Within the REF, it is not currently feasible to assess the quality of UOAs using quantitative
indicators alone. In REF2014, while some indicators (citation counts, and supporting text to
highlight significance or quality in other ways) were supplied to some panels to help inform their
judgements, caution needs to be exercised when considering all disciplines with existing bibliographic
databases. Even if technical problems of coverage and bias can be overcome, no set of numbers,
however broad, is likely to be able to capture the multifaceted and nuanced judgements on the quality
of research outputs that the REF process currently provides.
Similarly, for the impact component of the REF, it is not currently feasible to use quantitative
indicators in place of narrative impact case studies, or the impact template. There is a danger that
the concept of impact might narrow and become too specifically defined by the ready availability of
indicators for some types of impact and not for others. For an exercise like the REF, where HEIs are
competing for funds, defining impact through quantitative indicators is likely to constrain thinking
around which impact stories have greatest currency and should be submitted, potentially constraining
the diversity of the UK’s research base. For the environment component of the REF, there is scope
to enhance the use of quantitative data in the next assessment cycle, provided they are used with
sufficient context to enable their interpretation.
There is a need for more research on research. The study of research systems – sometimes
called the ‘science of science policy’ – is poorly funded in the UK. The evidence to address the
questions that we have been exploring throughout this review remains too limited; but the questions
being asked by funders and HEIs – ‘What should we fund?’ ‘How best should we fund?’ ‘Who should
we hire/promote/invest in?’ – are far from new and can only become more pressing. More investment
is needed as part of a coordinated UK effort to improve the evidence base in this area. Linked to this,
there is potential for the scientometrics community to play a more strategic role in informing how
quantitative indicators are used across the research system, and by policymakers.
In recent years, the concept of ‘responsible research and innovation’ (RRI) has gained currency as a
framework for research governance. Building on this, we propose the notion of responsible metrics
as a way of framing appropriate uses of quantitative indicators in the governance, management and
assessment of research. Responsible metrics can be understood in terms of the following dimensions:
Robustness: basing metrics on the best possible data in terms of accuracy and scope;
Humility: recognising that quantitative evaluation should support – but not supplant
– qualitative, expert assessment;
Transparency: keeping data collection and analytical processes open and
transparent, so that those being evaluated can test and verify the results;
Diversity: accounting for variation by field, and using a range of indicators to reflect
and support a plurality of research and researcher career paths across the system;
Reflexivity: recognising and anticipating the systemic and potential effects of
indicators, and updating them in response.
This review has identified 20 specific recommendations for further work and action by stakeholders
across the UK research system. These draw on the evidence we have gathered, and should be seen as part
of broader attempts to strengthen research governance, management and assessment which have been
gathering momentum, and where the UK is well positioned to play a leading role internationally. The
recommendations are listed below, with targeted recipients in brackets:
Supporting the effective leadership, governance and management of research
1 The research community should develop a more sophisticated and nuanced approach to the
contribution and limitations of quantitative indicators. Greater care with language and terminology is
needed. The term ‘metrics’ is often unhelpful; the preferred term ‘indicators’ reflects a recognition that
data may lack specific relevance, even if they are useful overall. (HEIs, funders, managers, researchers)
2 At an institutional level, HEI leaders should develop a clear statement of principles on their
approach to research management and assessment, including the role of quantitative indicators. On
the basis of these principles, they should carefully select quantitative indicators that are appropriate to
their institutional aims and context. Where institutions are making use of league tables and ranking
measures, they should explain why they are using these as a means to achieve particular ends. Where
possible, alternative indicators that support equality and diversity should be identified and included. Clear
communication of the rationale for selecting particular indicators, and how they will be used as a
management tool, is paramount. As part of this process, HEIs should consider signing up to DORA, or
drawing on its principles and tailoring them to their institutional contexts. (Heads of institutions, heads of
research, HEI governors)
3 Research managers and administrators should champion these principles and the use of
responsible metrics within their institutions. They should pay due attention to the equality and
diversity implications of research assessment choices; engage with external experts such as those at the
Equality Challenge Unit; help to facilitate a more open and transparent data infrastructure; advocate the
use of unique identifiers such as ORCID iDs; work with funders and publishers on data interoperability;
explore indicators for aspects of research that they wish to assess rather than using existing indicators
because they are readily available; advise senior leaders on metrics that are meaningful for their
institutional or departmental context; and exchange best practice through sector bodies such as ARMA.
(Managers, research administrators, ARMA)
4 HR managers and recruitment or promotion panels in HEIs should be explicit about the criteria
used for academic appointment and promotion decisions. These criteria should be founded in expert
judgement and may reflect both the academic quality of outputs and wider contributions to policy,
industry or society. Judgements may sometimes usefully be guided by metrics, if they are relevant to the
criteria in question and used responsibly; article-level citation metrics, for instance, might be useful
indicators of academic impact, as long as they are interpreted in the light of disciplinary norms and with
due regard to their limitations. Journal-level metrics, such as the JIF, should not be used. (HR managers,
recruitment and promotion panels, UUK)
5 Individual researchers should be mindful of the limitations of particular indicators in the way they
present their own CVs and evaluate the work of colleagues. When standard indicators are inadequate,
individual researchers should look for a range of data sources to document and support claims about the
impact of their work. (All researchers)
6 Like HEIs, research funders should develop their own context-specific principles for the use of
quantitative indicators in research assessment and management and ensure that these are well
communicated, easy to locate and understand. They should pursue approaches to data collection that are
transparent, accessible, and allow for greater interoperability across a diversity of platforms. (UK HE
Funding Bodies, Research Councils, other research funders)
7 Data providers, analysts and producers of university rankings and league tables should strive for
greater transparency and interoperability between different measurement systems. Some, such as
the Times Higher Education (THE) university rankings, have taken commendable steps to be more open
about their choice of indicators and the weightings given to these, but other rankings remain ‘black-
boxed’. (Data providers, analysts and producers of university rankings and league tables)
8 Publishers should reduce emphasis on journal impact factors as a promotional tool, and only use
them in the context of a variety of journal-based metrics that provide a richer view of performance.
As suggested by DORA, this broader indicator set could include 5-year impact factor, EigenFactor,
SCImago, editorial and publication times. Publishers, with the aid of Committee on Publication Ethics
(COPE), should encourage responsible authorship practices and the provision of more detailed
information about the specific contributions of each author. Publishers should also make available a range
of article-level metrics to encourage a shift toward assessment based on the academic quality of an article
rather than JIFs. (Publishers)
Improving the data infrastructure that supports research information management
9 There is a need for greater transparency and openness in research data infrastructure. A set of
principles should be developed for technologies, practices and cultures that can support open,
trustworthy research information management. These principles should be adopted by funders, data
providers, administrators and researchers as a foundation for further work. (UK HE Funding Bodies,
RCUK, Jisc, data providers, managers, administrators)
10 The UK research system should take full advantage of ORCID as its preferred system of unique
identifiers. ORCID iDs should be mandatory for all researchers in the next REF. Funders and HEIs
should utilise ORCID for grant applications, management and reporting platforms, and the benefits of
ORCID need to be better communicated to researchers. (HEIs, UK HE Funding Bodies, funders,
managers, UUK, HESA)
11 Identifiers are also needed for institutions, and the most likely candidate for a global solution is the
ISNI, which already has good coverage of publishers, funders and research organisations. The use
of ISNIs should therefore be extended to cover all institutions referenced in future REF submissions, and
used more widely in internal HEI and funder management processes. One component of the solution will
be to map the various organisational identifier systems against ISNI to allow the various existing systems
to interoperate. (UK HE Funding Bodies, HEIs, funders, publishers, UUK, HESA)
12 Publishers should mandate ORCID iDs and ISNIs and funder grant references for article
submission, and retain this metadata throughout the publication lifecycle. This will facilitate
exchange of information on research activity, and help deliver data and metrics at minimal burden to
researchers and administrators. (Publishers and data providers)
13 The use of digital object identifiers (DOIs) should be extended to cover all research outputs. This
should include all outputs submitted to a future REF for which DOIs are suitable, and DOIs should also
be more widely adopted in internal HEI and research funder processes. DOIs already predominate in the
journal publishing sphere – they should be extended to cover other outputs where no identifier system
exists, such as book chapters and datasets. (UK HE Funding Bodies, HEIs, funders, UUK)
14 Further investment in research information infrastructure is required. Funders and Jisc should
explore opportunities for additional strategic investments, particularly to improve the interoperability of
research management systems. (HM Treasury, BIS, RCUK, UK HE Funding Bodies, Jisc, ARMA)
Increasing the usefulness of existing data and information sources
15 HEFCE, funders, HEIs and Jisc should explore how to leverage data held in existing platforms to
support the REF process, and vice versa. Further debate is also required about the merits of local
collection within HEIs and data collection at the national level. (HEFCE, RCUK, HEIs, Jisc, HESA,
16 BIS should identify ways of linking data gathered from research-related platforms (including
Gateway to Research, Researchfish and the REF) more directly to policy processes in BIS and
other departments, especially around foresight, horizon scanning and research prioritisation. (BIS, other
government departments, UK HE Funding Bodies, RCUK)
Using metrics in the next REF
17 For the next REF cycle, we make some specific recommendations to HEFCE and the other HE
Funding Bodies, as follows. (UK HE Funding Bodies)
a. In assessing outputs, we recommend that quantitative data – particularly around published
outputs – continue to have a place in informing peer review judgements of research quality.
This approach has been used successfully in REF2014, and we recommend that it be continued and
enhanced in future exercises.
b. In assessing impact, we recommend that HEFCE and the UK HE Funding Bodies build on the
analysis of the impact case studies from REF2014 to develop clear guidelines for the use of
quantitative indicators in future impact case studies. While not being prescriptive, these
guidelines should provide suggested data to evidence specific types of impact. They should include
standards for the collection of metadata to ensure the characteristics of the research being described
are captured systematically; for example, by using consistent monetary units.
c. In assessing the research environment, we recommend that there is scope for enhancing the
use of quantitative data, but that these data need to be provided with sufficient context to
enable their interpretation. At a minimum this needs to include information on the total size of the
UOA to which the data refer. In some cases, the collection of data specifically relating to staff
submitted to the exercise may be preferable, albeit more costly. In addition, data on the structure and
use of digital information systems to support research (or research and teaching) may be crucial to
further develop excellent research environments.
Coordinating activity and building evidence
18 The UK research community needs a mechanism to carry forward the agenda set out in this report.
We propose the establishment of a Forum for Responsible Metrics, which would bring together
research funders, HEIs and their representative bodies, publishers, data providers and others to
work on issues of data standards, interoperability, openness and transparency. UK HE Funding
Bodies, UUK and Jisc should coordinate this forum, drawing in support and expertise from other funders
and sector bodies as appropriate. The forum should have preparations for the future REF within its remit,
but should also look more broadly at the use of metrics in HEI management and by other funders. This
forum might also seek to coordinate UK responses to the many initiatives in this area across Europe and
internationally – and those that may yet emerge – around research metrics, standards and data
infrastructure. It can ensure that the UK system stays ahead of the curve and continues to make real
progress on this issue, supporting research in the most intelligent and coordinated way, influencing
debates in Europe and the standards that other countries will eventually follow. (UK HE Funding Bodies,
UUK, Jisc, ARMA)
19 Research funders need to increase investment in the science of science policy. There is a need for
greater research and innovation in this area, to develop and apply insights from computing, statistics,
social science and economics to better understand the relationship between research, its qualities and
wider impacts. (Research funders)
20 One positive aspect of this review has been the debate it has generated. As a legacy initiative, the
steering group is setting up a blog (www.ResponsibleMetrics.org) as a forum for ongoing discussion
of the issues raised by this report. The site will celebrate responsible practices, but also name and
shame bad practices when they occur. Researchers will be encouraged to send in examples of good or bad
design and application of metrics across the research system. Adapting the approach taken by the Literary
Review’s “Bad Sex in Fiction” award, every year we will award a “Bad Metric” prize to the most
egregious example of an inappropriate use of quantitative indicators in research management. (Review
1. Measuring up
“ The standing of British science, and the individuals and institutions that comprise
it, is rooted firmly in excellence… Much of the confidence in standards of
excellence promoted comes from decisions being informed by peer-review: leading
experts assessing the quality of proposals and work.”
Our Plan for Growth: science and innovation, HM Treasury/BIS, December 2014
“ We have more top ranking universities in London than in any other city in the
world. With 4 universities in the global top 10, we rank second only to the US.”
Jo Johnson MP, Minister for Universities and Science, 1 June 2015
Citations, journal impact factors, h-indices, even tweets and Facebook likes – there are no end of
quantitative measures that can now be used to try to assess the quality and wider impacts of research.
But how robust and reliable are such metrics, and what weight – if any – should we give them in the
future management of research systems at the national or institutional level?
These are questions that have been explored over the past year by the Independent Review of the Role
of Metrics in Research Assessment. The review was announced by David Willetts, then Minister for
Universities and Science, in April 2014, and has been supported by the Higher Education Funding
Council for England (HEFCE).
As the 2014 BIS/HM Treasury science and innovation strategy reminds us, the UK has a remarkable
breadth of excellent research across the sciences, engineering, social sciences, arts and humanities.
These strengths are often expressed in metric shorthand: “with just 3% of global research spending,
0.9% of global population and 4.1% of the world’s researchers, the UK produces 9.5% of article
downloads, 11.6% of citations and 15.9% of the world’s most highly-cited articles”.
The quality and productivity of our research base is, at least in part, the result of smart management of
the dual-support system of research funding. Since the introduction of the Research Assessment
Exercise (RAE) in 1986, the UK has been through six cycles of evaluation and assessment, the latest
of which was the 2014 Research Excellence Framework (REF2014). Processes to ensure and improve
Speech to ‘Going Global’ 2015 conference https://www.gov.uk/government/speeches/international-higher-education
Elsevier. (2013). International Comparative Performance of the UK Research Base – 2013; A report prepared by Elsevier
for the UK’s Department of Business, Innovation and Skills (BIS), p2.
comparative-performance-of-the-UK-research-base-2013.pdf. Retrieved 1 May 2015.
research quality, and more recently its wider impacts, are also used by the UK Research Councils, by
other funders such as the Wellcome Trust, and by universities themselves.
The quality and diverse impacts of research have traditionally been assessed using a combination of
peer review and a variety of quantitative indicators. Peer review has long been the most widely used
method, and underpins the academic system in the UK and around the world. The use of metrics is a
newer approach, but has developed rapidly over the past 20 years as a potential method of measuring
research quality and impact in some fields. How best to do this remains the subject of considerable
There are powerful currents whipping up the metric tide. These include growing pressures for audit
and evaluation of public spending on higher education and research; demands by policymakers for
more strategic intelligence on research quality and impact; the need for institutions to manage and
develop their strategies for research; competition within and between institutions for prestige,
students, staff and resources; and increases in the availability of real-time ‘big data’ on research
uptake, and the capacity of tools for analysing them.
In a positive sense, wider use of quantitative indicators, and the emergence of alternative metrics for
societal impact, can be seen as part of the transition to a more open, accountable and outward-facing
But this has been accompanied by a backlash against the inappropriate weight being
placed on particular indicators – such as journal impact factors (JIFs) – within the research system, as
reflected by the 2013 San Francisco Declaration on Research Assessment (DORA), which now has
over 570 organisational and 12,300 individual signatories.
As DORA argues, “The outputs from
scientific research are many and varied…Funding agencies, institutions that employ scientists, and
scientists themselves, all have a desire, and need, to assess the quality and impact of scientific outputs.
It is thus imperative that scientific output is measured accurately and evaluated wisely.”
1.1. Our terms of reference
Our work builds on an earlier pilot exercise in 2008 and 2009, which tested the potential for using
bibliometric indicators of research quality in REF2014. At that time, it was concluded that citation
information was insufficiently robust to be used formulaically or as a primary indicator of quality, but
that there might be scope for it to enhance processes of expert review.
Royal Society. (2012). Science as an Open Enterprise. The Royal Society Science Policy Centre report 02/12
https://royalsociety.org/~/media/policy/projects/sape/2012-06-20-saoe.pdf. Retrieved 1 June 2015.
www.ascb.org/dora. As of June 2015, only three UK universities are DORA signatories: Manchester, Sussex and UCL.
This review has gone beyond the earlier pilot study to take a deeper and broader look at the potential
uses and limitations of research metrics and indicators. It has explored the use of metrics across
different disciplines, and assessed their potential contribution to the development of research
excellence and impact within higher education. It has also analysed their role in processes of research
assessment, including the next cycle of the REF. And it has considered the changing ways in which
universities are using metrics, particularly the growing power of league tables and rankings. Finally, it
has considered the relationship between the use of indicators and issues of equality and diversity, and
the potential for ‘gaming’ that can arise from the use of particular indicators in systems of funding and
To give structure and focus to our efforts, clear terms of reference were established at the outset. The
review was asked to examine:
The relative merits of different metrics in assessing the academic qualities and
diverse impacts of research;
The advantages and disadvantages of using metrics, compared with peer review, in
creating an environment that enables and encourages excellent research and diverse
impact, including fostering inter- and multidisciplinary research;
How metrics-based research assessment fits within the missions of universities and
research institutes, and the value that they place on published research outputs in
relation to the portfolio of other activities undertaken by their staff, including
training and education;
The appropriate balance between peer review and metrics in research assessment,
and the consequences of shifting that balance for administrative burden and research
cultures across different disciplines;
What is not, or cannot, be measured by quantitative metrics;
The differential impacts of metrics-based assessment on individual researchers,
including the implications for early-career researchers, equality and diversity;
Ethical considerations, and guidance on how to reduce the unintended effects and
inappropriate use of metrics and university league-tables, including the impact of
metrics-based assessment on research culture;
The extent to which metrics could be used in novel ways by higher education
institutions (HEIs) and research funders to support the assessment and management
The potential contribution of metrics to other aspects of research assessment, such as
the matching of reviewers to proposals, or research portfolio analysis;
The use of metrics in broader aspects of government science, innovation and
Reflecting the evidence we received, this report focuses in greater depth on some aspects of these
terms of reference than others (notably, the use of metrics in the REF, by other funders and in HEI
management). However, we hope that the report provides a clear framework for thinking about the
broader role of metrics, data and indicators within research management, and lays helpful foundations
for further work to be carried out by HEFCE, the Research Councils and others.
The review has been conducted in an open and consultative manner, with the aim of drawing in
evidence, views and perspectives from across the higher education and research system. There has
been a strong emphasis on transparency and plurality throughout the project, and the make-up of the
review’s steering group itself reflects a diversity of disciplines and perspectives. In addition, the group
has engaged actively with stakeholders from across the research community through numerous
workshops, meetings, talks and other channels, including the review’s website and social media.
Papers from steering group meetings have been made publicly available at every stage, as have other
resources, including evidence received and slides presented at workshops.
1.2. Definitions and terminology
The research assessment landscape is contested, contentious and complex. Researchers, funders and
managers face an ever-expanding menu of indicators, metrics and assessment methods in operation,
many of which are explored in this review. Some are founded on peer review, others on quantitative
indicators such as citation counts, or measures of input, such as research funding or student numbers.
The term ‘metric’ is itself open to misunderstanding, because something can be a metric in one
context but not in another. For example, the number of citations received by a researcher’s
publications is a citation metric but not an impact metric because it does not directly measure the
impact of that researcher’s work. In other words, it can imply ‘measurement’ of a quantity or quality
which has not in fact been measured. The term indicator is preferable in contexts in which there is the
potential for confusion. To reduce the scope of possible misunderstanding, this report will adopt the
following definitions and terminology throughout.
All of this material is available at the review’s website: https://www.hefce.ac.uk/rsrch/metrics/
A measurable quantity that ‘stands in’ or substitutes for something
less readily measurable and is presumed to associate with it without
directly measuring it. For example, citation counts could be used as
indicators for the scientific impact of journal articles even though
scientific impacts can occur in ways that do not generate citations.
Similarly, counts of online syllabi mentioning a particular book
might be used as an indicator of its educational impact.
Bibliometrics focuses on the quantitative analysis of scientific and
scholarly publications, including patents. Bibliometrics is part of the
field of scientometrics: the measurement of all aspects of science and
technology, which may encompass information about any kind of
research output (data, reagents, software, researcher interactions,
funding, research commercialisation, and other outputs).13
The most widely exploited bibliometric relies on counts of citations.
Citation counts are sometimes used as an indicator of academic
impact in the sense that citations from other documents suggest that
the cited work has influenced the citing work in some way.
Bibliometric indicators might normalise these citation counts by
research field and by year, to take into account the very different
citation behaviours between disciplines and the increase in citations
over time. It has to be emphasised that as bibliometrics often do not
distinguish between negative or positive citation, highly cited
literature might attract attention due to controversy or even error.
High numbers of citations might also result from a range of different
contributions to a field e.g. including papers that establish new
methodologies or systematically review the field, as well as primary
Alternative or altmetrics
Altmetrics are non-traditional metrics that cover not just citation
counts but also downloads, social media shares and other measures of
impact of research outputs. The term is variously used to mean
‘alternative metrics’ or ‘article level metrics’, and it encompasses
webometrics, or cybermetrics, which measure the features and
relationships of online items, such as websites and log files. The rise
of new social media has created an additional stream of work under
Definitions adapted from Encyclopedia of Science Technology and Ethics, 2nd Edition (2014). Macmillan.
the label altmetrics. These are indicators derived from social
websites, such as Twitter, Academia.edu, Mendeley, and
ResearchGate with data that can be gathered automatically by
A process of research assessment based on the use of expert
deliberation and judgement.14
Academic or scholarly
Academic or scholarly impact is a recorded or otherwise auditable
occasion of influence from academic research on another researcher,
university organisation or academic author. Academic impacts are
most objectively demonstrated by citation indicators in those fields
that publish in international journals.15
As for academic or scholarly impact, though where the effect or
influence reaches beyond scholarly research, e.g. on education,
society, culture or the economy.
Research has a societal impact when auditable or recorded influence
is achieved upon non-academic organisation(s) or actor(s) in a sector
outside the university sector itself – for instance, by being used by
one or more business corporations, government bodies, civil society
organisations, media or specialist/professional media organisations or
in public debate. As is the case with academic impacts, societal
impacts need to be demonstrated rather than assumed. Evidence of
external impacts can take the form of references to, citations of or
discussion of a person, their work or research results.16
For the purposes of the REF2014,17 impact was defined as an effect
on, change or benefit to the economy, society, culture, public policy
or services, health, the environment or quality of life, beyond
academia. REF2014 impact includes, but was not limited to, an effect
on, change or benefit to:
Adapted from: Council of Canadian Academies. (2012). Informing Research Choices: Indicators and Judgment, p11.
ance/scienceperformance_fullreport_en_web.pdf. Retrieved 6 December 2014.
Taken from LSE Public Policy Group (2011) Maximising the Impacts of Your Research: A Handbook for Social Scientists.
London: PPG. http://blogs.lse.ac.uk/impactofsocialsciences/the-handbook/.
REF 02. 2011. Assessment framework and guidance on submissions, p26, para 141.
df. Retrieved 2 April 2015.
the activity, attitude, awareness, behaviour, capacity,
opportunity, performance, policy, practice, process or
of an audience, beneficiary, community, constituency,
organisation or individuals
in any geographic location whether locally, regionally, nationally or
Within REF2014, the research environment was assessed in terms of
its ‘vitality and sustainability’, including its contribution to the
vitality and sustainability of the wider discipline or research base.
Within REF2014, panels assessed the quality of submitted research
outputs in terms of their ‘originality, significance and rigour’, with
reference to international research quality standards.18
1.3. Data collection and analysis
The review drew on an extensive range of evidence sources, including:
A formal call for evidence
A call for evidence was launched on 1 May 2014, with a response deadline of 30 June 2014.
steering group appealed for evidence from a wide range of sources, including written summaries or
published research. Respondents were asked to focus on four key themes and associated questions, as
A Identifying useful metrics for research assessment.
B How metrics should be used in research assessment.
C ‘Gaming’ and strategic use of metrics.
D International perspective.
Ibid, p23, para 118, notes that permitted ‘types’ of outputs included: Books (or parts of books); Journal articles and
conference contributions; Physical artefacts; Exhibitions and performances; Other documents; Digital artefacts (including
web content); Other.
The call for evidence letter is available at:
In total, 153 responses were received to the call for evidence: 67 from HEIs, 42 from individuals, 27
from learned societies, 11 from publishers and data providers, three from HE mission groups, and
three from other respondents. An analysis of the evidence received can be found at
A literature review
Two members of the Steering Group, Paul Wouters and Michael Thelwall, researched and wrote a
comprehensive literature review to inform the review’s work. The findings of the literature review
have been incorporated into this report at appropriate points, and the full review is available as
Supplementary Report I.
Community and stakeholder engagement
The review team engaged actively with stakeholders across the higher education and research
community. These activities included a series of six workshops, organised by the steering group, on
specific aspects of the review, such as the role of metrics within the arts and humanities, and links to
equality and diversity. Members of the steering group also gave talks and presentations about the
work of the review at around 30 conferences, roundtables and workshops. Findings and insights from
these events have been incorporated into the report wherever appropriate. A full itinerary of events
linked to the review can be found in the annex of tables at the end of this report (Table 2).
Media and social media
Over the course of the review, the steering group sought to encourage wider discussion of these issues
in the sector press (particularly Times Higher Education and Research Fortnight) and through social
media. There was extensive use of the #HEFCEmetrics hashtag on Twitter. Members of the steering
group, including Stephen Curry,
also wrote blog posts on issues relating to the review, and a number
of other blog posts and articles were written in response to the review.
Wouters, P., et al. (2015). Literature Review: Supplementary Report to the Independent Review of the Role of Metrics in
Research Assessment and Management. HEFCE. DOI: 10.13140/RG.2.1.5066.3520.
Curry, S. (2014). Debating the role of metrics in research assessment. Blog posted at
http://occamstypewriter.org/scurry/2014/10/07/debating-the-role-of-metrics-in-research-assessment/. Retrieved 1 June 2015.
Numerous blog posts, including contributions from steering group members, have been featured at
http://blogs.lse.ac.uk/impactofsocialsciences/2014/04/03/reading-list-for-hefcemetrics/. Retrieved 1 June 2015. We have
referred to some of these posts within this report. Others discussing the review through blog posts include: David
Retrieved 1 June 2015. Also see contributors to: http://thedisorderofthings.com/tag/metrics/. Retrieved 1 June 2015.
Focus groups with REF2014 panel members
The steering group participated in a series of focus group sessions for REF2014 panel members,
organised by HEFCE, to allow panellists to reflect on their experience, and wider strengths and
weaknesses of the exercise. Specific sessions explored the pros and cons of any uses of metrics within
REF2014, and their potential role in future assessment exercises.
Where relevant, the steering group also engaged with and analysed findings from HEFCE’s portfolio
of REF2014 evaluation projects, including:
The nature, scale and beneficiaries of research impact: an initial analysis of
REF2014 case studies;
Preparing impact submissions for REF2014;
Assessing impact submissions for REF2014;
Evaluating the 2014 REF: Feedback from participating institutions;
REF Manager’s report;
REF panel overview reports;
REF Accountability Review: costs, benefits and burden project report.
King’s College London and Digital Science. (2015). The nature, scale and beneficiaries of research impact: An initial
analysis of Research Excellence Framework (REF ) 2014 impact case studies.
www.hefce.ac.uk/pubs/rereports/Year/2015/analysisREFimpact/. Retrieved 1 June 2015.
Manville, C., Morgan Jones, M, Frearson, M., Castle-Clarke, S., Henham, M., Gunashekar, S. and Grant, J. (2015).
Preparing impact submissions for REF 2014: Findings and Observations. Santa Monica, Calif.: RAND Corporation. RR-
Manville, C., Guthrie, S., Henham, M., Garrod, B., Sousa, S., Kirtley, A., Castle-Clark, S. and Ling, T. (2015). Assessing
impact submissions for REF2014: An Evaluation. Santa Monica, Calif. RAND Corp.
HEFCE. (2015). Evaluating the 2014 REF: Feedback from Participating Institutions.
HEFCE. (2015). Research Excellence Framework 2014: Manager’s report.
www.ref.ac.uk/media/ref/content/pub/REF_managers_report.pdf. Retrieved 25 May 2015
HEFCE’s Panel overview reports can be downloaded from www.ref.ac.uk/panels/paneloverviewreports/
Relating REF2014 outcomes to indicators
A final element of our evidence gathering was designed to assess the extent to which the outcome of
the REF2014 assessment correlated with 15 metrics-based indicators of research performance. For the
first time, we were able to associate anonymised REF authors by paper outputs to a selection of metric
indicators, including ten bibliometric indicators and five alternative metric indicators. Previous
research in this area has been restricted to specific subject areas and departmental level metrics, as the
detailed level of data required for this analysis was destroyed before publication of the REF2014
results. This work is summarised in Chapter 9, and presented in detail in Supplementary Report II.
1.4. The structure of this report
This opening chapter has provided a summary of the aims and working methods of the review, and
the range of evidence sources on which this final report draws.
Chapter 2 (The rising tide) gives a brief history of the role of metrics in research management, and
the evolution of data infrastructure and standards to underpin more complex and varied uses of
quantitative indicators. It also surveys the main features of research assessment systems in a handful
of countries: Australia, Denmark, Italy, the Netherlands, New Zealand and the United States.
Chapter 3 (Rough indications) looks in greater detail at the development, uses and occasional abuses
of four categories of quantitative indicators: bibliometric indicators of research quality; alternative
indicators of quality; input indicators; and indicators of impact.
Chapter 4 (Disciplinary dilemmas) maps the diversity in types of research output, publication
practices and citation cultures across different disciplines, and the implications these have for any
attempts to develop standardised indicators across the entire research base. It also considers the extent
to which quantitative indicators can be used to support or suppress multi- or interdisciplinary research.
Chapter 5 (Judgement and peer review) compares the strengths and weaknesses of the peer review
system with metric-based alternatives, and asks how we strike an appropriate balance between
quantitative indicators and expert judgement.
Chapter 6 (Management by metrics) charts the rise of more formal systems of research management
within HEIs, and the growing significance that is being placed on quantitative indicators, both within
HEFCE. (2015). Correlation analysis of REF2014 scores and metrics: Supplementary Report II to the Independent Review
of the Role of Metrics in Research Assessment and Management. HEFCE. DOI: 10.13140/RG.2.1.3362.4162.
institutions and as a way of benchmarking performance against others. It looks specifically at
university rankings and league tables as a visible manifestation of these trends, and considers how
these might be applied in more responsible ways across the sector.
Chapter 7 (Cultures of counting) assesses the wider effects a heightened emphasis on quantitative
indicators may have on cultures and practices of research, including concerns over systems for
performance management, and negative effects on interdisciplinarity, equality and diversity. It also
considers the extent to which metrics exacerbate problems of gaming and strategic approaches to
Chapter 8 (Sciences in transition) looks beyond HEIs to examine changes in the way key institutions
in the wider research funding system are using quantitative indicators, including the Research
Councils, research charities such as the Wellcome Trust, and the national academies. It also looks to
developments at the European level, within Horizon2020. Finally, it considers how government could
make greater use of available quantitative data sources to inform horizon scanning and policies for
research and innovation.
Chapter 9 (Reflections on REF) provides a detailed analysis of the modest role that quantitative
indicators played in REF2014, and considers a range of scenarios for their use in future assessment
exercises. It also outlines the results of our own quantitative analysis, which correlated the actual
outcomes of REF2014 against 15 metrics-based indicators of research performance.
Finally, Chapter 10 (Responsible metrics) summarises our headline findings, and makes a set of
targeted recommendations to HEIs, research funders (including HEFCE), publishers and data
providers, government and the wider research community. Within a framework of responsible
metrics, the report concludes with clear guidance on how quantitative indicators can be used
intelligently and appropriately to further strengthen the quality and impacts of UK research.
2. The rising tide
“ The institutionalization of the citation is the culmination of a decades-long process
starting with the creation of the Science Citation Index. The impact of this
emergence of a new social institution in science and scholarship is often
“ A timid, bureaucratic spirit has come to suffuse every aspect of intellectual life.
More often than not, it comes cloaked in the language of creativity, initiative and
The quantitative analysis of scientific papers and scholarly articles has been evolving since the early
20th century. Lotka’s Law, dating back to 1926, first highlighted that within a defined area over a
specific period, a low number of authors accounted for a large percentage of publications.
point, the field of scientometrics
developed rapidly, especially after the creation of the Science
Citation Index (SCI), and over time we have seen a proliferation of quantitative indicators for
research. This chapter provides a brief history of the use of metrics in research management and
assessment, focusing on bibliometrics, alternative metrics and the role of data providers and data
infrastructure. We then offer a brief outline of research assessment approaches from six countries.
The SCI was created in 1961, by Eugene Garfield.
Initially, it was mainly used by scientometric
experts, rather than by the wider research community. In this early stage of scientometrics, data were
generally used to describe the development and direction of scientific research, rather than to evaluate
Wouters, P. (2014). The Citation: From Culture to Infrastructure. In Cronin, B. and Sugimoto, C. R. (eds.) Beyond
Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact. MIT Press.
Graeber, D. (2015) The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy. London: Melville
Elsevier (2007). Scientometrics from Past to Present. Research Trends, 1, September 2007.
www.researchtrends.com/issue1-september-2007/sciomentrics-from-past-to-present/. Retrieved 1 March 2015.
“Scientometric research [is] the quantitative mathematical study of science and technology, encompassing both
bibliometric and economic analysis.” Ibid.
Garfield founded the Institute for Scientific Information (ISI), which is now part of Thomson Reuters.
In the 1980s, new approaches to public management, particularly in the UK and US, led to a growing
emphasis on measurable indicators of the value of research. The 1990s gave rise to increasingly
strategic forms of research policy and management, accompanied by greater use of bibliometric
indicators, including JIF scores. These were developed in 1955 by Eugene Garfield, and became
available through Journal Citation Reports from 1975,
but were used quite infrequently initially, and
have only seen a real explosion in usage since the 1990s.
Citation analysis has been much more readily available since 2001, when the Web of Science (WoS)
became easily accessible to all, followed by Scopus in 2003 and Google Scholar (GS) in 2004. J.E.
Hirsch invented the Hirsch or h-index in 2005, and this led to a surge of interest in individual level
2.2. Alternative metrics
From the mid-1990s, as advances in information technology created new ways for researchers to
network, write and publish, interest grew in novel indicators better suited to electronic communication
and to capturing impacts of different kinds.
These alternative metrics include web citations in digitised scholarly documents (e.g. eprints, books,
science blogs or clinical guidelines) and, more recently, altmetrics derived from social media (e.g.
social bookmarks, comments, ratings and tweets). Scholars may also produce and use non-refereed
academic outputs, such as blog posts, datasets and software, where usage-based indicators are still in
the early stages of development. Significant developments in this area include the establishment of
F1000Prime in 2002, Mendeley in 2008 and Altmetric.com in 2011.
2.3. Approaches to evaluation
Research assessment has traditionally focused on input and output indicators, evaluating academic
impact through bibliometric measures such as citation counts. However, there is now far greater focus
on the wider impacts, outcomes and benefits of research, as reflected in exercises such as REF2014.
The measurement of societal impact, with robust indicators and accurate, comparable data, is still in
its relative infancy.
Garfield, E. (2006). The history and meaning of the journal impact factor. Journal of the American Medical Association,
295 (1), 90-93.
Ingwersen, P. (1998). The calculation of Web impact factors. Journal of Documentation, 54 (2), 236-243; Borgman, C.,
and Furner, J. (2002). Scholarly communication and bibliometrics. Annual Review of Information Science and Technology,
36. Medford, NJ: Information Today Inc., pp. 3-72; Priem, J., Taraborelli,, D., Groth, P. and Neylon, C. (2010). Altmetrics:
A manifesto, 26 October 2010. http://altmetrics.org/manifesto. Retrieved 1 June 2015.
Neither research quality nor its impacts are straightforward concepts to pin down or assess. Differing
views on what they are, and how they can be measured, lie at the heart of debates over research
assessment. In this report, we take research quality to include all scholarly impacts. But what
constitutes quality remains contested.
As PLOS noted in its submission to this review, “it is unclear
whether any unique quality of research influence or impact is sufficiently general to be measured”.
In the context of research evaluation, quality typically denotes the overall calibre of research based on
the values, criteria or standards inherent in an academic community.
However, those values and
standards are highly dependent on context: for instance, views vary enormously across and indeed
within certain disciplines, as a result of different research cultures, practices and philosophical
approaches. It is more productive to think in terms of research qualities, rather than striving for a
2.4. Data providers
As scientometrics has developed, and evaluation systems have become more sophisticated, so the
range of data providers and analysts has grown.
Those now engaged with the production of
quantitative data and indicators include government agencies at the international, national and local
level, HEIs, research groups, and a wide range of commercial data providers, publishers and
Funding agencies in the US, France, UK and the Netherlands were pioneers in using bibliometrics for
research evaluation and monitoring, and the Organisation for Economic Co-operation and
Development (OECD) set global standards for national science and technology indicators in its
Today, leading universities around the world have adopted, or are in the process of developing,
comprehensive research information systems in which statistical and qualitative evidence of
Halevi, G. and Colledge, L. (2014). Standardizing research metrics and indicators- perspectives and approaches. Research
Trends. 39, December 2014. www.researchtrends.com/issue-39-december-2014/standardizing-research-metrics-and-
indicators/. Retrieved 4 January 2015.
Council of Canadian Academies. (2012), p43.
Whitley, R. (2010). Reconfiguring the public sciences: the impact of governance changes on authority and innovation in
public science systems, in Reconfiguring knowledge production: changing authority relationships in the sciences and their
consequences for intellectual innovation, edited by R. Whitley et al. Oxford, Oxford University Press.
performance in research, teaching, impact and other services can be recorded.
benchmarking tools such as SciVal and InCites, management systems such as PURE and Converis,
and data consultancy from companies such as Academic Analytics, iFQ, Sciencemetrix and CWTS.
Assisted by reference linking services like CrossRef, these enable users to link sophisticated
bibliometric and other indicator-based analyses with their information infrastructure at all levels, to
monitor institutional, departmental and individual performance. Research funders, such as RCUK, are
also adopting new systems like Researchfish, which gather new information about research progress,
while other funders are using systems such as UberResearch which aggregate existing information
and add value to it.
2.5. Data infrastructure
Systems for data collection and analysis have developed organically and proliferated over the past
decade. In response to this review, many HEIs noted the burden associated with populating and
updating multiple systems, and the need for more uniform standards and identifiers that could work
across all of them. Others raised concerns that underpinning systems may become overly controlled
by private providers, whose long-term interests may not align with those of the wider research
Underpinning infrastructure has to be fit for the purpose of producing robust and trustworthy
Wherever possible, data systems also need to be open and transparent
principles for ‘open’ scholarly infrastructures.
To produce indicators that can be shared across
platforms, there are a number of prerequisites: unique identifiers; defined data standards; agreed data
semantics; and open data processing methods. These are discussed in turn below. In addition, the
infrastructure must be able to present the relevant suites of indicators to optimise forms of assessment
DINI AG Research Information Systems (2015) Research information systems at universities and research
institutions - Position Paper of DINI AG FIS. https://zenodo.org/record/17491/files/DINI_AG-
FIS_Position_Paper_english.pdf. Retrieved 1 July 2015.
Jacso, P. (2006). Deflated, inflated and phantom citation counts. Online Information Review. 30 (3), 297-309; Abramo, G.
and D’Angelo, C. A. (2011). Evaluating research: from informed peer review to bibliometrics. Scientometrics. 87, 499–514.
Bilder, G., Lin, J. and Neylon, C. (2015). Principles for Open Scholarly Infrastructure-v1,
http://cameronneylon.net/blog/principles-for-open-scholarly-infrastructures/. Retrieved 1 June 2015.
Royal Society. (2012). Science as an Open Enterprise. The Royal Society Science Policy Centre report 02/12
https://royalsociety.org/~/media/policy/projects/sape/2012-06-20-saoe.pdf. Retrieved 1 June 2015.
that are sensitive to specific research missions and context. They should not ‘black-box’ particular
indicators or present them as relevant for all fields and purposes.
Some key players in research information
Converis (owned by Thomson Reuters) is an integrated research information system. It provides support for
universities, other research institutions and funding offices in collecting and managing data through the research
CrossRef is a collaborative reference linking service that functions as a sort of digital switchboard. Its specific
mandate is to be the citation linking backbone for all scholarly information in electronic form. It holds no full text
content, but effects linkages through CrossRef Digital Object Identifiers (CrossRef DOI), which are tagged to article
metadata supplied by the participating publishers. www.crossref.org/
Elements (owned by Symplectic) is designed to gather research information to reduce the administrative burden
placed on researchers, and to support research organisation librarians and administrators. http://symplectic.co.uk/
InCites (owned by Thomson Reuters) is a customised, web-based research evaluation tool that allows users to
analyse institutional productivity and benchmark output against peers worldwide, through access to customised citation
data, global metrics, and profiles on leading research institutions. http://researchanalytics.thomsonreuters.com/incites/
PURE (owned by Elsevier) is a research information system. It accesses and aggregates internal and external sources,
and offers analysis, reporting and benchmarking functions. www.elsevier.com/solutions/pure
Researchfish is an online database of outputs reported by researchers linked to awards, now widely used by UK
funding agencies and being taken up by funders in Denmark and Canada. It aims to provide a structured approach to
prospectively capturing outputs and outcomes from as soon as funding starts, potentially to long after awards have
finished. The information is used by funders to track the progress, productivity and quality of funded research, and as a
way of finding examples of impact. https://www.researchfish.com/
SciVal (owned by Elsevier) provides information on the research performance of research institutions across the globe.
This can be used for analysis and benchmarking of performance. www.elsevier.com/solutions/scival
UberResearch provides services aimed at science funders including information tools based on natural language
2.5.1. Unique identifiers
In order for an indicator to be reliable, it is important to be able to collect as much as possible of the
underlying data that the indicator purports to represent. For example, if we consider citations to
academic outputs, it is clear that the main databases do not include all possible citations, and that
numbers of citations within them can vary. As PLOS noted in its response to our call for evidence,
‘there are no adequate sources of bibliometric data that are publicly accessible, useable, auditable and
In order to correctly count the number of citations that an article has, all other articles must be
checked to see if they cite the article in question. This can be achieved through manual processes, but
is subject to error. With unique identifiers for articles, the process can be automated (reducing sources
of error to original mis-citation by the author).
The most commonly used identifier is the Digital Object Identifier (DOI).
While still not universal,
DOIs have gained considerable traction across the sector. For instance, looking at the 191,080 outputs
submitted to REF2014, 149,670 of these were submitted with DOIs (see Supplementary Report II,
Table 1). Use of DOIs varies by discipline, and is still less common in the arts and humanities than in
DOIs in themselves are not sufficient for robust metrics. As well as article identifiers, a robust
management and evaluation system needs unique identifiers for journals, publishers, authors and
institutions. This would enable answers to more sophisticated questions, such as: How many articles
has a particular author produced with citations above the average for the journal in question?
Journals have, in general, adopted the International Standard Serial Number (ISSN
However, there is still a small proportion that have not. Journals which appear in more than one
format (e.g. print and online) will have an ISSN for each media type, but one is the master (ISSN-L),
to which the other ISSNs link.
Publisher and institutional identifiers are more problematic. There are various options for uniquely
identifying organisations. One 2013 study found 22 organisational identifiers currently in use in the
higher education sector in the UK.
But while none of these is wholly authoritative, both the
Hammond, M. and Curtis, G. (2013). Landscape study for CASRAI-UK Organisational ID. http://casrai.org/423
International Standard Name Identifier (ISNI
) and UK Provider Reference Number (UKPRN
traction. The former is international, and the latter is more UK-centric and does not include funders;
so it would seem that ISNI is the preferred route for developing an authoritative list of publishers.
Author identifiers are particularly important, as a particular scholar’s contributions to the scientific
literature can be hard to recognise, as personal names are rarely unique, can change (e.g. through
marriage), and may have cultural differences in name order or abbreviations. Several types of author
identifiers exist, and a detailed analysis of the pros and cons of these was undertaken in 2012 by
The ORCID system is widely regarded as the best, and uptake of ORCID is now growing
rapidly in the UK and internationally. The same analysis recommended that the UK adopted ORCID,
and many of the key players in the UK research system endorsed this proposal in a joint statement in
A similar initiative in the US funded by the Alfred P. Sloan Foundation highlighted
the importance of advocacy and improved data quality.
A recent Jisc-ARMA initiative has
JISC Researcher Identifier Task and Finish Group. (2012). Researcher Identifier Recommendations – Sector Validation.
www.serohe.co.uk/wp-content/uploads/2013/10/Clax-for-JISC-rID-validation-report-final.pdf. Retrieved 1 June 2015.
Signatories to this joint statement include ARMA, HEFCE, HESA, RCUK, UCISA, Wellcome Trust and Jisc.
Brown, J., Oyler, C. and Haak, L. (2015). Final Report: Sloan ORCID Adoption and Integration Program 2013-2014.
http://dx.doi.org/10.6084/m9.figshare.1290632. Retrieved 25 May 2015.
ORCID (Open Researcher and Contributor ID)
ORCID is a non-proprietary alphanumeric code to uniquely identify academic authors. Its stated aim is to aid "the
transition from science to e-Science, wherein scholarly publications can be mined to spot links and ideas hidden in the
ever-growing volume of scholarly literature". ORCID provides a persistent identity for humans, similar to that created for
content-related entities on digital networks by DOIs.
ORCID launched its registry services and started issuing user identifiers on 16 October 2012. It is now an independent
non-profit organisation, and is freely usable and fully interoperable with other ID systems. ORCID is also a subset of the
International Standard Name Identifier (ISNI). The two organisations are cooperating: ISNI has reserved a block of
identifiers for use by ORCID, so it is now possible for an individual to have both an ISNI and an ORCID.
By the end of 2013 ORCID had 111 member organisations and over 460,000 registrants. As of 1 June 2015, the number
of registered accounts reported by ORCID was 1,370,195. Its organisational members include publishers, such as
Elsevier, Springer, Wiley and Nature Publishing Group, funders, learned societies and universities.
successfully piloted the adoption of ORCID in a number of UK HEIs
, and an agreement negotiated
by Jisc Collections will enable UK HEIs to benefit from reduced ORCID membership costs and
enhanced technical support.
UK uptake will also be driven by the Wellcome Trust’s decision to
make ORCID iDs a mandatory requirement for funding applications from August 2015,
and by the
strong support shown by Research Councils UK. ORCID also recently announced an agreement with
ANVUR (National Agency for the Evaluation of University and Research Institutes) and CRUI
(Conference of Italian University Rectors) to implement ORCID on a national scale in Italy.
For outputs other than journal articles, ISBNs (International Standard Book Numbers)
for books are
analogous to ISSNs for journals. A longstanding issue here is that different editions (e.g. hardback
and paperback) have different ISBNs, but retailers such as Amazon have made progress in
disambiguating this information.
Funder references are important unique identifiers for contracts between research-performing and
research-funding organisations. This information is required by most funders to be included in
acknowledgement sections within manuscripts submitted for publication. However despite efforts to
encourage standard forms for this acknowledgement,
there is a need for authoritative sources for
funder names (as with institutional names above), and for authenticating the funding references
(although Europe PubMed Central provides a post-publication grant lookup tool populated by those
agencies that fund it).
Increasingly, other forms of output, such as datasets and conference proceedings, are issued with
DOIs, or DOIs can be obtained retrospectively, for example through platforms such as ResearchGate.
Similarly DOIs can also resolve to ISBNs.
http://orcidpilot.jiscinvolve.org/wp/. ORCID is also discussed in Anstey, A. (2014). How can we be certain who authors
really are? Why ORCID is important to the British Journal of Dermatology. British Journal of Dermatology. 171 (4), 679-
680. DOI 10.1111/bjd.13381. Also Butler, D. (2012) Scientists: your number is up. Nature, 485, 564, DOI:
Retrieved 28 June 2015.
https://orcid.org/blog/2015/06/19/italy-launches-national-orcid-implementation. Retrieved 28 June 2015.
www.rin.ac.uk/our-work/research-funding-policy-and-guidance/acknowledgement-funders-journal-articles Retrieved 1
http://europepmc.org/GrantLookup/. Retrieved 1 June 2015.
Other systems of unique identifiers have been proposed to support the sharing of research equipment
and to improve the citation of research resources.
2.5.2. Defined data standards
Once unique and disambiguated identifiers for objects in the research information arena have been
agreed, the next issue is how to represent them and their associated metadata. Various standards for
data structure and metadata have been proposed over time. Across Europe, one standard for research
information management, the Common European Research Information Format (CERIF),
adopted. In 1991 the European Commission recommended CERIF to the member states, and in 2002
handed stewardship of the standard to euroCRIS.
There have been a number of iterations since
In 2009, Jisc commissioned a report, Exchanging Research Information in the UK,
the use of CERIF as the UK standard for research information exchange. This was followed by
several Jisc-funded initiatives
and a further report: Adoption of CERIF in Higher Education
Institutions in the UK
which noted progress but a lack of UK expertise. The majority of off-the-shelf
research information management systems used in UK HEIs today are CERIF-compliant and able to
exchange data in the agreed format. To date the CERIF standard covers around 300 entities and 2000
attributes, including: people, organisations (and sub units), projects, publications, products,
equipment, funders, programmes, locations, events and prizes, although fully describing research
qualities in this way is an ongoing task.
For example, see the N8 Shared Equipment Inventory System www.n8equipment.org.uk/. Retrieved 1 June 2015.
Bandrowski, A., Brush, M., Grethe, J.S. et al. The Resource Identification Initiative: A cultural shift in publishing [v1; ref
status: awaiting peer review, http://f1000r.es/5fj] F1000Research 2015, 4:134 (DOI:10.12688/f1000research.6555.1).
Retrieved 1 June 2015.
EuroCRIS is a not-for-profit association with offices in The Hague, The Netherlands, that brings together experts on
research information in general and research information systems (CRIS) in particular. The organisation has 200+ members,
mainly coming from Europe, but also from some countries outside of Europe. www.eurocris.org/
Rogers, N., Huxley, L. and Ferguson, N. (2009). Exchanging Research Information in the UK.
http://repository.jisc.ac.uk/448/1/exri_final_v2.pdf. Retrieved 1 June 2015.
Russell, R. (2011). Research Information Management in the UK: Current initiatives using CERIF.
www.ukoln.ac.uk/rim/dissemination/2011/rim-cerif-uk.pdf. Retrieved 1 June 2015.
Russell, R. (2012). Adoption of CERIF in Higher Education Institutions in the UK: A landscape study.
www.ukoln.ac.uk/isc/reports/cerif-landscape-study-2012/CERIF-UK-landscape-report-v1.0.pdf. Retrieved 1 June 2015.
2.5.3. Agreed data semantics
An agreed approach to the semantics of data elements is
required to ensure that everyone interprets data in the
same way. One example is the titles used for academic
staff. In the UK, it might be possible to agree on a
standard scale of lecturer, senior lecturer, reader and
professor, but this does not translate to other countries
where other titles like ‘associate professor’ are
commonly used and ‘readers’ are unknown. Clearly the
context is important to the semantics. In order to
compare research items from different databases, we
need to have a standard vocabulary that we can match
to, ideally at the international level, or else on a country
basis. The Consortia Advancing Standards in Research
Administration Information (CASRAI) is an
international non-profit organisation that constructs such dictionaries, working closely with other
2.5.4. More than pure semantics
Once all these elements are in place, it is possible to build
robust indicators and metrics. But here again, agreed
definitions are key. Take the example of proposal success
rates. If an institution has submitted ten proposals for
funding and three have been funded, it may claim to have a
30% success rate. This indicator could be benchmarked
against other institutions. However, if two of those
proposals were yet to be reviewed, a three in eight or 37.5%
success rate could also be claimed. Alternatively, the
success rate might be calculated based on the financial value
of applications and awards rather than the number
submitted, each definition producing potentially different
‘success rates’ from the same data.
Kerridge, S. (2015). Questions of identity. Research Fortnight. 27 May 2015.
https://www.researchprofessional.com/0/rr/news/uk/views-of-the-uk/2015/5/Questions-of-identity.html. Retrieved 1 June
The Consortia Advancing Standards in Research
Administration Information (CASRAI) is an
international non-profit organisation dedicated to
reducing the administrative burden on
researchers and improving business intelligence
capacity of research institutions and funders.
CASRAI works by partnering with funders,
universities, suppliers and sector bodies to define
a dictionary and catalogue of exchangeable
business ‘data profiles’. These create an
interoperable ‘drawbridge’ between collaborating
organisations and individuals. http://casrai.org/
Snowball Metrics is a bottom-up academia-
industry initiative. The universities involved aim
to agree on methodologies that are robustly and
clearly defined, so that the metrics they describe
enable the confident comparison of apples with
apples. These metrics (described by recipes) are
data source- and system-agnostic, meaning that
they are not tied to any particular provider of data
or tools. The resulting benchmarks between
research-intensive universities provide reliable
information to help understand research strengths,
and thus to establish and monitor institutional
The semantics of any metrics must also be clear and transparent. Progress in this area has been made
by the UK-led Snowball Metrics consortium, which has specified 24 metrics ‘recipes’ to date, in areas
such as publications and citations, research grants, collaboration, and societal impact. Snowball is also
gaining some traction in the USA and Australia.
2.6. International perspectives
Although this review has focused on the UK, we have taken a keen interest in how other countries
approach these issues. At several of our workshops and steering group meetings, we heard
presentations and considered questions from international perspectives.
A handful of the responses
to our call for evidence came from overseas, and our schedule of stakeholder events included
meetings or presentations in Paris, Melbourne, Barcelona and Doha (see Table 2 in the annex).
Dialogue, learning and exchange across different systems are important, and any moves that the UK
makes in respect of greater use of metrics are likely to be watched closely. The UK system continues
to attract the attention of research leaders, managers and policymakers worldwide – particularly since
the introduction of the impact element for REF2014.
Here we offer a brief outline of some of the
striking features of research assessment in a handful of other countries – Australia, Denmark, Italy,
the Netherlands, New Zealand and the United States – chosen to reflect the diversity of systems in
The Australian Research Council administers Excellence in Research for Australia (ERA), which
aims to identify and promote excellence in research across Australian HEIs. There is no funding
attached to its outcomes. The first full round of ERA (in 2010-11) was the first time a nationwide
For relevant discussion, see US Research Universities Futures Consortium. (2013). The current state and
recommendations for meaningful academic research metrics among American research universities.
Retrieved 1 March 2015.
For example, Clare Donovan presented insights from her research in Australia and elsewhere at our Arts and Humanities
workshop hosted by Warwick University;
www.hefce.ac.uk/media/hefce/content/news/Events/2015/HEFCE,metrics,workshop,Warwick/Donovan.pdf. Donovan also
contributed to one of the Review group’s early steering group meetings. Academic Analytics, who presented at our
workshops in Sheffield and Sussex, discussed their approach and use of data in US and UK contexts.
invited to our Sussex workshop, operate at the global level, these being Academic Analytics, Altmetric, PLOS, Snowball
Metrics, Elsevier and The Conversation, Plum Analytics and Thomson Reuters.
See relevant discussion on internationalising the REF
www.researchresearch.com/index.php?option=com_news&template=rr_2col&view=article&articleId=1342955. Retrieved 1
stocktake of disciplinary strengths had been conducted in Australia. Data submitted by 41 HEIs
covered all eligible researchers and their research outputs.
ERA is based upon the principle of expert review informed by citation-based analysis, with the
precise mix depending on discipline; citations are used for most science, engineering and medical
disciplines, and peer review for others. It aims to be “a dynamic and flexible research assessment
system that combines the objectivity of multiple quantitative indicators with the holistic assessment
provided by expert review….”
evaluations were informed by four broad categories of indicators:
Of research quality: publishing profile, citation analysis, ERA peer review and peer
reviewed research income;
Of research volume and activity: total research outputs, research income and other
items within the profile of eligible researchers;
Of research application: commercialisation income and other applied measures;
Of recognition: based on a range of esteem measures.
Evaluation of the data submitted was undertaken by eight evaluation committees, representing
different disciplinary clusters. The next ERA round will take place in 2015.
Danish public university funding is allocated according to four parameters: education based on study
credits earned by the institution (45%); research activities measured by external funding (20%);
research activities measured by the ‘BFI’, a metrics-based evaluation system (25%)
; and number of
PhD graduates (10%). The current system was gradually implemented from 2010 to 2012 following
agreement in 2009 to follow a new model. It is primarily a distribution model, based on the Danish
Agency for Science, Technology and Innovation’s count of peer reviewed research publications. The
goal was to allocate an increasing proportion of the available research funding according to the
outcomes of the national research assessment exercise. Given the methodology employed, the BFI has
www.arc.gov.au/era/faq.htm#Q6. Retrieved 1 June 2015.
Submission guidelines are provided at the following. Australian Research Council (2014) ERA 2015 Submission
Guidelines. www.arc.gov.au/pdf/ERA15/ERA%202015%20Submission%20Guidelines.pdf. These include changes to the
process since 2012, outlined on pp7-9.
Veterager Pedersen, C. (2010). The Danish bibliometric research indicator- BFI: Research publications, research
assessment, university funding. ScieCom Info. 4, 1-4.
been described as a primarily quantitative distribution system, as opposed to a quality measurement
Due to the limitations of existing publications databases (see Chapter 3), the Danish government
decided to create its own. This enables the BFI to be defined by Danish researchers, with 67 expert
groups of academics involved in selecting items for inclusion in two authority lists, one of series
(journals, book series or conference series) and one of publishers. These are then ranked each year by
the panels, and this is then used as the basis of a points system for researchers.
The scoring system includes monographs, articles in series and anthologies, doctoral theses and
patents. Peer review is a prerequisite for inclusion on an authoritative list. These lists decide what
publishers and what journals are recognised as being worth to publish in, and what level this
recognition has – Level 1 or Level 2. Level 2 channels generate more points. These lists effectively
decide which publication channels contain serious research. All eligible research outputs can be
attributed BFI-points as they are entered into the system. Different weights are applied for different
sorts of output and publication channel, so the system aims to assess performance and not just volume
In 2013, Italy’s National Agency for the Evaluation of the University and Research Systems
(ANVUR) completed its largest ever evaluation initiative, known as the ‘eValuation of the Quality of
Research’ (VQR), across 95 universities, 12 public research bodies and 16 voluntary organisations.
The aim was to construct a national ranking of universities and institutes, based on key indicators,
including: research outcomes obtained from 2004 to 2010; ability to attract funding; number of
international collaborations; patents registered; spin-offs; and other third-party activities.
The results of the VQR are being used by the education and research ministry to award €540 million
in ‘prize funds’ from the government’s university budget. The process included the evaluation of
approximately 195,000 publications, using a hybrid approach of two methodologies:
Bibliometric analysis: based on the impact factor (IF) of the journal
number of citations received in a year, divided by articles published;
A useful analysis of the VQR is provided by Abramo, G. and D’Angelo, C.A. (2015). The VQR, Italy’s Second National
Research Assessment: Methodological Failures and Ranking Distortions. Journal of the Association for Information, Science
For those indexed in Web of Science, or the SCImago Journal Rank for those indexed in Scopus.
Peer review: assigned to around 14,000 external reviewers, more than 4,000 of
whom were from outside Italy.
Bibliometric analysis was used in the natural sciences and engineering; whereas for social sciences
and humanities (Panels 10-14), only peer review was used. The overall evaluation of institutions was
based on a weighted sum of various indicators: 50% for the quality of the research products submitted
(for faculty members, the maximum number of products was three); and the remaining 50% based on
a composite score from six indicators. These are: capacity to attract resources (10%); mobility of
research staff (10%); internationalisation (10%); PhD programmes (10%); ability to attract research
funds (5%); and overall improvement from the last VQR (5%). ANVUR used 14 panels to undertake
evaluations, divided by disciplinary area.
Since 2003, it has been the responsibility of individual Dutch university boards and faculties to
organise research assessment on a six-yearly cycle, in line with a ‘Standard Evaluation Protocol’
Assessments are made by expert committees, which may use qualitative and quantitative
indicators to score research groups or programmes on a scale. The distribution of government research
funds is not explicitly linked to this assessment process.
From 2015 onwards, the assessment involves three criteria: quality, societal relevance, and viability.
Productivity was previously a criterion, but has now been removed as a goal in itself (and subsumed
under the quality criterion) to put less emphasis on the number of publications and more on their
quality. The review also looks at the quality of PhD training, and management of research integrity
(including how an institution has dealt with any cases of research misconduct). The research unit’s
own strategy and targets are guiding principles for the evaluation. In addition, the evaluation should
provide feedback to the evaluated research institutes and groups on their research agendas for the near
The Standard Evaluation Protocol (SEP) was jointly developed by the Royal Netherlands Academy of Arts and Sciences
(KNAW), The Association of Universities in the Netherlands (VSNU) and the Netherlands Organisation for Scientific
Research (NWO). The goal of the SEP is to provide common guidelines for the evaluation and improvement of research and
research policy to be used by university boards, institutes and the expert evaluation committees.
Key Perspectives Ltd. (2009). A Comparative Review of Research Assessment Regimes in Five Countries and the Role of
Libraries in the Research Assessment Process: Report Commissioned by OCLC Research.
2.6.5. New Zealand
New Zealand’s evaluation system is known as the ‘Performance-Based Research Fund’ (PBRF), and
is used to assess the performance of all Tertiary Education Organisations (TEOs).
Its four objectives
are: to increase the quality of basic and applied research at degree-granting TEOs; to support world-
leading teaching and learning at degree and postgraduate levels; to assist TEOs to maintain and lift
their competitive rankings relative to their international peers; and to provide robust public
information to stakeholders about research performance within and across TEOs.
The PBRF is carried out every six years; most recently in 2012,
when 27 institutions participated
(eight universities, ten institutes of technology and polytechnics, one wãnanga,
and eight private
training establishments.) The amount of funding that a participating institution receives is based on
three elements: quality evaluation (55%); research degree completions (25%); and external research
The quality element of the process rests on the submission and evaluation of evidence portfolios.
Twelve specialist peer-review panels assess and evaluate these portfolios with additional advice from
expert advisory groups and specialists as needed.
The PBRF is unusual in that it takes the individual
(rather than the department or school) as the unit of assessment, so provides very detailed
performance information that can inform strategic planning and resource allocation within
institutions. It does not systematically measure research impacts outside academia.
2.6.6. United States
The US does not have a centralised national assessment system for its universities and research
institutes; however, in recent years, it has actively supported projects including STAR METRICS
(Science and Technology for America’s Reinvestment: Measuring the Effects of Research, Innovation
and Competitiveness and Science).
This was launched in 2010 and is led by the National Institute of
Health (NIH), the National Science Foundation (NSF), and the Office of Science and Technology
Policy (OSTP). It aims to create a repository of data and tools to help assess the impact of federal
www.tec.govt.nz/Funding/Fund-finder/Performance-Based-Research-Fund-PBRF-/. Retrieved 30 March 2015.
Details of the 2012 exercises can be downloaded from www.tec.govt.nz/Funding/Fund-finder/Performance-Based-
In the New Zealand education system, a wānanga is a publicly-owned tertiary education organisation that provides
education in a Mãori cultural context.
PBRF Quality evaluation guidance 2012 is provided at www.tec.govt.nz/Documents/Publications/PBRF-Quality-
STAR METRICS focus at two different levels:
Level I: Developing uniform, auditable and standardised measures of the impact of
science spending on job creation, using data from research institutions’ existing
Level II: Developing measures of the impact of federal science investment on
scientific knowledge (using metrics such as publications and citations), social
outcomes (e.g. health outcomes measures and environmental impact factors),
workforce outcomes (e.g. student mobility and employment), and economic growth
(e.g. tracing patents, new company start-ups and other measures). This is achieved
through the Federal RePORTER
tool, thus developing an open and automated data
infrastructure that will enable the documentation and analysis of a subset of the
inputs, outputs, and outcomes resulting from federal investments in science.
The STAR METRICS project involves a broad consortium of federal R&D funding agencies with a
shared vision of developing data infrastructures and products to support evidence-based analyses of
the impact of research investment.
It aims to utilise existing administrative data from federal
agencies and their grantees, and match them with existing research databases of economic, scientific
and social outcomes. It has recently been announced that from 2016 onwards resources will be
redirected away from STAR METRICS data scraping to focus on the RePORTER tool, which has
similarities to the UK Gateway to Research approach.
2.7. Adding it up
As these snapshots reveal, the ways that metrics and indicators are conceived and used varies by
country, often significantly. The nature of the assessment approach, and the choice, use and relative
importance of particular indicators, reflect particular policies, and usually involve compromises
around fairness across disciplines, robustness, administrative and/or cost burdens and sector buy in.
STAR METRICS will be discontinuing Level I activities as of 1 January 2016.
But not all funders are involved, e.g. the National Endowment for the Humanities.
Two recent studies provide further discussion of how national approaches differ:
A 2012 report by the Council of Canadian Academies looks at the systems used in
10 different countries.
It emphasises the importance of national research context in
defining a given research assessment, underlining that no single set of indicators for
assessment will be ideal in all circumstances. The report also highlights a global
trend towards national research assessment models that incorporate both quantitative
indicators and expert judgment.
A 2014 study by Technopolis examined 12 EU member states and Norway.
includes a comparative consideration of systems using performance-based research
funding (PRF systems). The report shows that Czech Republic is the only country
that limits the indicators used to the output of research, (even though it is the PRF
system that covers research and innovation-related outputs in the most detailed and
comprehensive manner). In a second group of countries – Denmark, Finland,
Norway (PRI), Belgium/PL (BOF), Norway (HEI) and Sweden – the PRFs include
both output and systemic indicators; (in Denmark, Finland and Norway this includes
indicators related to innovation-oriented activities). Only a few countries also
examine research impacts: Italy, UK (REF), France (AERES) and Belgium/FL
(IOF). While the PRFs in France and Belgium focus on impacts in the spheres of
research and innovation, Italy and the UK also consider societal impacts.
It is valuable to learn from the approaches being used by different countries, particularly as research
and data infrastructure are increasingly global. However, context is also crucial to good assessment,
and there will be elements that are specific to the design, operation and objectives of the UK system.
Overall though, we are likely to see greater harmonisation of approaches, certainly across EU member
states. Recent initiatives, such as the 2014 ‘Science 2.0’ White Paper from the European Commission
point towards a more integrated architecture for research funding, communication, dissemination and
impact. The UK has been at the forefront of these debates since the 1980s, and over that same period
its research system, across many indicators, has grown significantly in strength. Ensuring that the UK
is positioned well for the next wave of change in how the research system operates – in terms of data
Council of Canadian Academies (2012) work included analysis of research assessment systems employed in ten countries
including Australia, China, Finland, Germany, the Netherlands, Norway, Singapore, South Korea, USA and the UK.
Technopolis. (2014). Measuring scientific performance for improved policy making.
Published for the European Parliamentary Research Service. This examined Norway, Sweden, the UK, Spain, France,
Belgium/FL, Italy, Czech Republic, Denmark, the Netherlands, Slovakia, Austria and Finland. A third report published in
2010, by the expert group on assessment in university-based research (AUBR), provided case studies of 16 different
countries, which again represent a breadth of approaches and objectives: http://ec.europa.eu/research/science-
infrastructure, standards and systems of assessment – is a vital part of our overall leadership in
research. Moves by HEFCE to explore the potentially increased internationalisation of research
assessment are to be welcomed, although such steps are not without strategic and operational
challenges. Proceeding cautiously, in an exploratory way, seems an appropriate approach.
3. Rough indications
“ ‘The answer to the Great Question of Life, the Universe and Everything is… forty-
two’, said Deep Thought, with infinite majesty and calm.”
Douglas Adams, The Hitchhiker’s Guide to the Galaxy
Having charted the development of research metrics and indicators and their usage internationally,
this chapter turns the focus on their application. It looks in detail at the current development, uses and
occasional abuses of four broad categories of indicator: bibliometric indicators of quality (3.1);
alternative indicators (3.2); input indicators (3.3); and indicators of impact (3.4).
3.1. Bibliometric indicators of quality
The most common approaches to measuring research quality involve bibliometric methods, notably
weighted publication counts; and citation-based indicators, such as the JIF or h-index. As the
Canadian Council of Academies report states: “Bibliometric indicators are the paradigmatic
quantitative indicators with respect to measurement of scientific research.”
This section gives a brief overview of the technical possibilities of bibliometric indicators. Many
points raised here are addressed in greater detail in our literature review (Supplementary Report I),
reflecting the breadth of existing literature on citation impact indicators, the use of scientometric
indicators in research evaluation
, and in measuring the performance of individual researchers
Several considerations need to be borne in mind when working with bibliometric analyses, including:
differences between academic subjects/disciplines; coverage of sources within databases; the selection
of the appropriate unit of analysis for the indicator in question; the question of credit allocation where
outputs may include multiple authors, and accounting for self-citations.
3.1.1. Bibliographic databases
The three most important multidisciplinary bibliographic databases are Web of Science, Scopus, and
Google Scholar. Scopus has a broader coverage of the scholarly literature than Web of Science. Some
studies report that journals covered by Scopus but not by Web of Science tend to have a low citation
impact and tend to be more nationally oriented, suggesting that the most important international
Council of Canadian Academies. (2012), pp53-54.
Vinkler, P. (2010). The evaluation of research by scientometric indicators. Oxford, Chandos Publishing.
Wildgaard, L., Schneider, J. W. and Larsen, B. (2014). A review of the characteristics of 108 author-level bibliometric
indicators. Scientometrics. 101 (1), 125-158.
academic journals are usually covered by both databases. Certain disciplines, especially the Social
Sciences and Humanities (SSH) create special challenges for bibliometric analyses.
is generally found to outperform both Web of Science and Scopus in terms of its coverage of the
literature. However, there are a few fields, mainly in the natural sciences, in which some studies have
reported the coverage of Google Scholar to be worse than the coverage of Web of Science and
Scopus. On the other hand, the coverage of Google Scholar has been improving over time, so it is not
clear the same still applies today.
3.1.2. Basic citation impact indicators
A large number of citation impact indicators have been proposed in the literature. Most of these
indicators can be seen as variants or extensions of a limited set of basic indicators: the total and the
average number of citations of the publications of a research unit (e.g. of an individual researcher, a
research group, or a research institution); the number and the proportion of highly cited publications
of a research unit; and a research unit’s h-index. There is criticism in the literature of the use of
indicators based on total or average citation counts. Citation distributions tend to be highly skewed,
and therefore the total or the average number of citations of a set of publications may be strongly
influenced by one or a few highly cited publications (‘outliers’). This is often considered undesirable.
Indicators based on the idea of counting highly cited publications are suggested as a more robust
alternative to indicators based on total or average citation counts.
3.1.3. Exclusion of specific types of publications and citations
When undertaking bibliometric analyses, one needs to decide which types of publications and
citations are included and which are not. In Web of Science and Scopus, each publication has a
document type. It is clear that research articles, which simply have the document type ‘article’, should
be included in bibliometric analyses. However, publications of other document types, such as
‘editorial material’, ‘letter’, and ‘review’ may be either included or excluded.
Most bibliometric researchers prefer to exclude author self-citations from bibliometric analyses. There
is no full agreement in the literature on the importance of excluding these citations. In some
bibliometric analyses, the effect of author self-citations is very small, suggesting that there is no need
to exclude these citations. In general, however, it is suggested that author self-citations should
preferably be excluded, at least in analyses at low aggregation levels, for instance at the level of
individual researchers. However, as self-citation is a common and acceptable practice in some
See Sections 1.1 and 1.4.1 of the literature review (Supplementary Report I).
disciplines but frowned upon in others, choosing to exclude them will affect some subject areas more
3.1.4. Normalisation of citation impact indicators
In research assessment contexts, there is often a requirement to make comparisons between
publications from different fields. There is agreement in the literature that citation counts of
publications from different fields should not be directly compared with each other, because there are
large differences among fields in the average number of citations per publication. Researchers have
proposed various approaches to normalise citation impact indicators for differences between field,
between older and more recent publications, and between publications of different types.
Most attention in the literature has been paid to normalised indicators based on average citation
counts. Recent discussions focus on various technical issues in the calculation of these indicators, for
instance whether highly cited publication indicators count the proportion of the publications of a
research unit that belong to the top 10% or the top 1% of their field, and more sophisticated variants
thereof, including the position of publications within the citation distribution of their field.
A key issue in the calculation of normalised citation impact indicators is the way in which the concept
of a research field is operationalised. The most common approach is to work with the predefined
fields in a database such as Web of Science, but this approach is heavily criticised. Some researchers
argue that fields may be defined at different levels of aggregation and that each aggregation level
offers a legitimate but different viewpoint on the citation impact of publications. Other researchers
suggest the use of disciplinary classification systems (e.g. Medical Subject Headings or Chemical
Abstracts sections) or sophisticated computer algorithms to define fields, typically at a relatively low
level of aggregation. Another approach is to calculate normalised citation impact indicators without
defining fields in an explicit way. This idea is implemented in so-called ‘citing-side normalisation’
approaches, which represent a recent development in the literature.
3.1.5. Considerations of author position on scholarly published work
In the absence of other reliable indicators of research contribution or value, the contribution of a
particular researcher to a piece of scholarly published work has been estimated by consideration of the
inclusion of a researcher as a listed author on published work – and the relative position in the list.
However, the average number of authors of publications in the scholarly literature continues to
increase, partly due to the pressure to publish to indicate research progression and also due to a trend,
in many disciplines, toward greater collaboration and ‘team science’
. Research in many disciplines
An extreme case is the recent physics paper with more than 5,000 authors, as discussed at: www.nature.com/news/physics-
paper-sets-record-with-more-than-5-000-authors-1.17567. Retrieved 1 June 2015.
is increasingly collaborative, and original research papers with a single author are – particularly in the
natural sciences – becoming rarer.
This trend makes it increasingly difficult to determine who did what, and who had a particularly
pivotal role or contribution, to scholarly published work. It is currently difficult to decipher individual
contributions by consulting the author lists, acknowledgements or contributions sections of most
journals; and the unstructured information is difficult to text-mine.
There has been a mixture of approaches to identifying contributions of ‘authors’. One example works
on the assumption that any listing of an author is valuable, known as ‘full counting’. The citations to a
multi-author publication are counted multiple times, once for each of the authors, even for authors
who have made only a small contribution. Because the same citations are counted more than once, the
full counting approach has a certain inflationary effect, which is sometimes considered undesirable. A
number of alternative credit allocation approaches have therefore been proposed, including the
fractional counting approach, where the credit for a publication is divided equally among all authors.
Another approach frequently used as short-hand is to assume that the first and/or last authors in a list
have played the most pivotal role in the production of the scholarly outputs. However this does not
apply across disciplines (e.g. economics and high energy physics where author-listing protocols are
frequently alphabetical). An alternative possibility is to fully allocate the credits of a publication to the
corresponding author instead of the first author. A final approach discussed in the literature is to
allocate the credits of a publication to the individual authors in a weighted manner, with the first
author receiving the largest share of the credits, the second author receiving the second-largest share,
and so on.
Developments in digital technology present opportunities to address the challenge of deriving
contributions to published work. A collaboration between the Evaluation team at the Wellcome Trust,
Harvard University and Digital Science has made steps to address this challenge by working across
the research community to develop a simply structured taxonomy of contributions to scholarly
published work which capture what has traditionally been masked as ‘authorship’. The taxonomy is
currently being trialled within publishing manuscript submissions systems and by several
organisations interested to help research gain more visibility around the work that they do.
Academy of Medical Sciences is also exploring how enabling greater visibility and credit around
contributions to research might help to incentivise, encourage and sustain ‘team science’ in disciplines
where this is highly valuable.
For researchers, the ability to better describe what they contributed would be a more useful currency
than being listed as a specific ‘author number’. Researchers could draw attention to their specific
contributions to published work to distinguish their skills from those of collaborators or competitors,
for example during a grant-application process or when seeking an academic appointment. This could
benefit junior researchers in particular, for whom the opportunities to be a ‘key’ author on a paper can
prove somewhat elusive. Methodological innovators would also stand to benefit from clarified roles –
their contributions are not reliably apparent in a conventional author list. It could also facilitate
collaboration and data sharing by allowing others to seek out the person who provided, for example,
an important piece of data or statistical analysis.
Through the endorsement of individuals’ contributions, researchers could move beyond ‘authorship’
as the dominant measure of esteem. For funding agencies, better information about the contributions
of grant applicants would aid the decision-making process. Greater precision could also enable
automated analysis of the role and potential outputs of those being funded, especially if those
contributions were linked to an open and persistent researcher profile or identifier. It would also help
those looking for the most apt peer reviewers. For institutions, understanding a researcher’s
contribution is fundamental to the academic appointment and promotion process.
3.1.6. Indicators of the citation impact of journals
The best-known indicator of the citation impact of journals is the JIF. This is an annual calculation of
the mean number of citations to articles published in any given journal in the two preceding years.
There is a lot of debate about the JIF, both regarding the way in which it is calculated (which skews
the JIF towards a minority of well-cited papers)
and the way in which it is used in research
assessment contexts (as discussed more in Chapter 7).
Various improvements of and alternatives to the JIF have been proposed in the literature. It is for
instance suggested to take into account citations during a longer time period, possibly adjusted to the
specific citation characteristics of a journal, or it is proposed to consider the median instead of the
Curry, S. (2012). Sick of impact factors. Post on Reciprocal Space blog.
http://occamstypewriter.org/scurry/2012/08/13/sick-of-impact-factors/. Retrieved 1 June 2015.
Seglen, P. (1992). The skewness of science. Journal of the Association for Information Science and Technology. 43, 628–
638. DOI: 10.1002/(SICI)1097-4571(199210)43:9<628::AID-ASI5>3.0.CO;2-0.
average number of citations of the publications in a journal. Another suggestion is to calculate an h-
index for journals as an alternative or complement to the JIF.
Researchers also argue that citation impact indicators for journals need to be normalised for
differences in citation characteristics among fields. A number of normalisation approaches have been
suggested, such as the SNIP indicator available in Scopus.
Another idea proposed in the literature is that in the calculation of citation impact indicators for
journals more weight should be given to citations from high-impact sources, such as citations from
Nature and Science, than to citations from low-impact sources, for instance from a relatively unknown
national journal that receives hardly any citations itself. This principle is implemented in the
EigenFactor and article influence indicators reported, along with the JIF, in the Journal Citation
Reports. The same idea is also used in the SJR indicator included in Scopus.
The JIF and other citation impact indicators for journals are often used not only in the assessment of
journals as a whole but also in the assessment of individual publications in a journal. Journal-level
indicators then serve as a substitute for publication-level citation statistics. The use of journal-level
indicators for assessing individual publications is rejected by many bibliometricians. It is argued that
the distribution of citations over the publications in a journal is highly skewed, which means that the
JIF and other journal-level indicators are not representative of the citation impact of a typical
publication in a journal. However, some bibliometricians agree with the use of journal-level indicators
in the assessment of very recent publications. In the case of these publications, citation statistics at the
level of the publication itself provide hardly any information.
3.1.7. Future developments
RCUK has extended its bibliometric analysis beyond an examination of citation counts, having an
interest in the qualities of the literature that cites RCUK-funded research and the qualities of the
literature cited by RCUK-funded research. RCUK has obtained this data from Thomson Reuters,
drawn from Web of Science. Using this approach, it is possible to analyse the body of knowledge that
authors draw upon, and also the diversity of research fields that subsequently draws on these results.
This quantification of the ‘diffusion of ideas’ and mapping of the distance between research subject
According to its provider, “SNIP corrects for differences in citation practices between scientific fields, thereby allowing
for more accurate between-field comparisons of citation impact.” www.journalindicators.com/
areas has been pioneered by Rafols et al. and has contributed to the discussion of how to measure
Another area that is developing fast is analysis of the influence of a given work within a particular
network. In the area of citations this is exemplified by the EigenFactor.
In social media analyses the
concept of reach or page impressions can be more informative than simple counts. Within this
network conception of the spread of knowledge and ideas it is also possible to use knowledge of the
types of connections. Once again this is illustrated by citations in analyses that categorise citations
into types by both function (citing an idea, data) and sentiment (agree or disagree). These much richer
indicators will make it possible to track and understand the way that research outputs spread and
influence activities ranging from further research to public discussion of policy.
3.2. Alternative indicators
Here we consider the more influential of the alternative indicators now in circulation, many of which
are discussed in more detail in Section 3 of our literature review (Supplementary Report I).
Throughout this section, we generally treat alternative indicators in relation to their potential to
indicate scholarly impacts, but in some cases, we also cover wider impacts as well. Table 3 in the
annex provides a summary of key alternative indicators.
The most common method to help assess the value of altmetrics is to investigate their correlation with
citations, despite the hope that they may indicate different aspects of scholarly impact. This is because
it would be strange for two valid impact indicators, no matter how different, to be completely
3.2.1. Open access scholarly databases
The internet now contains a range of websites hosting free general scholarly databases, such as
Google Scholar (discussed above in 3.1.1) and Google Books, as well as institutional and subject
repositories, some of which form new sources of citation or usage data. These inherit many of the
strengths and limitations of traditional bibliometric databases, but with some important differences.
Rafols, I., Porter, A.L. and Leydesdorff, L. (2010). Science overlay maps: A new tool for research policy and library
management. Journal of the American Society for Information Science and Technology. 61(9), 871–1887.
Shotton, D., Portwin, K., Klyne, G. and Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic
Enhancements of a Research Article. PLOS Comput Biol. 5(4). e1000361. DOI:10.1371/journal.pcbi.1000361; Shotton, D.
(2010). Introducing the Semantic Publishing and Referencing (SPAR) Ontologies. Post on Open Citations and Related Work
Retrieved 1 June 2015; Moed, H. and Halevi, G. (2014). Research assessment: Review of methodologies and approaches.
Research Trends. 36, March 2014. www.researchtrends.com/issue-36-march-2014/research-assessment/. Retrieved 1 June
Although Google Scholar was not primarily developed to rival conventional citation indexes, many
studies have now compared it with them for research assessment, as covered by Appendix A of the
literature review (Supplementary Report I).
3.2.2. Usage indicators from scholarly databases
Usage data is a logical choice to supplement citation counts and digital readership information can be
easily and routinely collected, except for paper copies of articles. Bibliometric indicators do not show
the usage of a published work by non-authors, such as students, some academics, and non-academic
users who do not usually publish but may read scholarly publications. Usage-based statistics for
academic publications may therefore help to give a better understating of the usage patterns of
documents and can be more recent than bibliometric indicators.
Many studies have found that correlations between usage and bibliometric indicators for articles and
usage data could be extracted from different sources such as publishers, aggregator services, digital
libraries and academic social websites. Nonetheless, the usage statistics could be inflated or
manipulated and some articles may be downloaded or printed but not read or may be read offline or
via different websites such as authors’ CVs and digital repositories.
Integrated usage statistics from
different sources such as publishers’ websites, repositories and academic social websites would be
optimal for global usage data if they are not manipulated in advance. However, this does not seem to
be practical at present because of differences in how they are collected and categorised.
3.2.3. Citations and links from the general web
It is possible to extract information from the web in order to identify citations to publications, hence
using the web as a huge and uncontrolled de-facto citation database. This data collection can be
automated, such as through the Bing API, making the web a practical source of this type of citation
data. Web and URL citations to publications can be located by commercial search engines (Google
manually and Bing automatically) from almost any type of online document, including blog posts,
presentations, clinical guidelines, technical reports or document files (e.g. PDF files) and there is
evidence (although not recent) that they can be indicators of research impact. In theory, then, web and
URL citations could be used to gather evidence about the scholarly impact of research if they were
filtered to remove non-scholarly sources. In contrast, unfiltered web or URL citation counts are easy
to manipulate and many citations are created for navigation, self-publicity or current awareness and so
it does not seem likely that they would genuinely reflect the wider impacts of research, without time-
consuming manual filtering out of irrelevant sources.
Thelwall, M. (2012). Journal impact evaluation: A webometric perspective. Scientometrics, 92(2), 429-441.
In addition to searching for citations from the general web, citations can be counted from specific
parts of the web, including types of website and types of document. This information can be extracted
from appropriate searches in commercial search engines and automated, for example via the Bing
API. The discussions below cover online presentations, syllabi and science blogs, although there is
also some evidence that mentions in news websites and discussion forums may also be useful.
Citations from online ‘grey’ literature seem to be an additional useful source of evidence of the wider
impact of research,
but there do not seem to be any systematic studies of these.
Statistics about the uptake of academic publications in academic syllabi may be useful in teaching-
oriented and book-based fields, where the main scholarly outputs of teaching staff are articles or
monographs for which students are an important part of the audience, or textbooks. It is practical to
harvest such data from the minority of syllabi that have been published online in the open web and
indexed by search engines, but it seems that such syllabus mentions may be useful primarily to
identify publications with a particularly high educational impact rather than for the systematic
assessment of the educational impact of research. Syllabus mentions have most potential for the
humanities and social sciences, where they are most common and where educational impact may be
Research may be cited and discussed in blogs by academics or non-academics in order to debate with
or inform other academics or a wider audience. Blog citations can perhaps be considered as evidence
of a combination of academic interest and a potential wider social interest, even if the bloggers
themselves tend to be academics. In addition, the evidence that more blogged articles are likely to
receive more formal citations shows that blog citations could be used for early impact evidence.
Nevertheless, blog citations can be easy to manipulate, and are not straightforward to collect, so may
need to be provided by specialist altmetric software or organisations.
In addition to the types of web citations discussed above, preliminary research is evaluating online
clinical guidelines, government documents and encyclopaedias. Online clinical guidelines could be
useful for medical research funders to help them to assess the societal impact of individual studies.
Costas, R., Zahedi, Z. and Wouters, P. (2014). Do altmetrics correlate with citations? Extensive comparison of altmetric
indicators with citations from a multidisciplinary perspective. arXiv preprint arXiv:1401.4321; Thelwall, M., Haustein, S.,
Larivière, V. and Sugimoto, C. (2013). Do altmetrics work? Twitter and ten other candidates. PLOS ONE, 8(5), e64841.
Wilkinson, D., Sud, P. and Thelwall, M. (2014). Substance without citation: Evaluating the online impact of grey
literature. Scientometrics. 98(2), 797-806.
For a discussion of issues see Manchikanti, L., Benyamin, R., Falco, F., Caraway, D., Datta, S. and Hirsch, J. (2012).
Guidelines warfare over interventional techniques: is there a lack of discourse or straw man? Pain Physician, 15, E1-E26;
also Kryl, D., Allen, L., Dolby, K., Sherbon, B., and Viney, I. (2012). Tracking the impact of research on policy and
practice: investigating the feasibility of using citations in clinical guidelines for research evaluation. BMJ Open, 2(2),
In support of this, one study extracted 6,128 cited references from 327 documents produced by the
National Institute of Health and Clinical Excellence (NICE) in the UK, finding articles cited in
guidelines tend to be more highly cited than comparable articles.
3.2.4. Altmetrics: citations, links, downloads and likes from social
The advent of the social web has seen an explosion in both the range of indicators that could be
calculated as well as the ease with which relevant data can be collected (even in comparison to web
impact metrics). Of particular interest are comments, ratings, social bookmarks, and microblogging,
although there have been many concerns about validity and the quality of altmetric indicators due to
the ease with which they can be manipulated.
Elsevier (via Scopus), Springer, Wiley, BioMed
Central, PLOS and Nature Publishing Group have all added article-level altmetrics to their journals,
and uptake is rising among other publishers.
Although the term ‘altmetrics’ refers to indicators for research assessment derived from the social
the term alternative metrics seems to be gaining currency as a catch-all for web-based metrics.
A range of altmetrics have been shown to correlate significantly and positively with bibliometric
indicators for individual articles,
giving evidence that, despite the uncontrolled nature of the social
web, altmetrics may be related to scholarly activities in some way.
This is perhaps most evident
Thelwall, M., and Maflahi, N. (in press). Guideline references and academic citations as evidence of the clinical value of
health research. Journal of the Association for Information Science and Technology.
For instance: Taraborelli, D. (2008). Soft peer review: Social software and distributed scientific evaluation. Proceedings
of the Eighth International Conference on the Design of Cooperative Systems. Carry–Le–Rouet, 20–23
May. http://nitens.org/docs/spr_coop08.pdf; Neylon, C. and Wu. S. (2009). Article-level metrics and the evolution of
scientific impact. PLOS Biol 7(11). DOI: 10.1371/journal.pbio.1000242; Priem, J., and Hemminger, B. M. (2010).
Scientometrics 2.0: Toward new metrics of scholarly impact on the social web. First Monday, 15(7).
Birkholz, J., and Wang, S. (2011). Who are we talking about?: the validity of online metrics for commenting on science.
Paper presented in: altmetrics11: Tracking scholarly impact on the social Web. An ACM Web Science Conference 2011
Workshop, Koblenz (Germany), 14-15. http://altmetrics.org/workshop2011/birkholz-v0; Rasmussen, P. G., and Andersen,
J.P. (2013). Altmetrics: An alternate perspective on research evaluation. Sciecom Info. 9(2).
Priem, J., Taraborelli, D., Groth, P., and Neylon, C. (2010). Altmetrics: A manifesto. Retrieved from
Priem, J., Piwowar, H., and Hemminger, B. (2012). Altmetrics in the wild: Using social media to explore scholarly
impact. Retrieved from http://arXiv.org/html/1203.4745v1; Thelwall, M., Haustein, S., Larivière, V., and Sugimoto, C.
(2013). Do altmetrics work? Twitter and ten other candidates. PLOS ONE. 8(5), e64841.
DOI:10.1371/journal.pone.0064841; Costas, R., Zahedi, Z., and Wouters, P. (2014).
However, recent research suggests that some factors driving social media and citations are quite different: (1) while
editorials and news items are seldom cited, these types of document are most popular on Twitter; (2) longer papers typically
attract more citations, but the converse is true of social media platforms; (3) SSH papers are most common on social media
platforms, the opposite to citations. Haustein, S., Costas, R., and Larivière, V. (2015). Characterizing Social Media Metrics
when the altmetrics are aggregated to entire journals
rather than to individual articles. Social usage
impact can be extracted from a range of social websites that allow users to upload, or register
information about, academic publications, such as Mendeley, Twitter, Academia and ResearchGate.
These sites can be used for assessing an aspect of the usage of publications based on numbers of
downloads, views or registered readers. Fuller information on the following are included within the
literature review (Supplementary Report I): Faculty of 1000 Web Recommendations; Mendeley and
other Online Reference Managers; Twitter and microblog citations.
3.2.5. Book-based indicators
Research evaluation in book-oriented fields is more challenging than for article-based subject areas
because counts of citations from articles, which dominate traditional citation indexes, seem
insufficient to assess the impact of books. The Book Citation Index within Web of Science is a recent
response to this issue
since journal citations on their own might miss about half of the citations to
However, some academic books are primarily written for teaching (e.g. textbooks) or
cultural purposes (e.g. novels and poetry) and citation counts of any kind may be wholly inappropriate
In REF2014, books (authored books, edited books, scholarly editions and book chapters) were more
frequently submitted to Main Panels C and D (29.4%) than to Main Panels A and B (0.4%), and many
of these books (art, music and literary works) may have merits that are not reflected by conventional
bibliometric methods (see Table 3 in the annex for the full distribution of results in REF2014).
Moreover, the main sources of citations to humanities books are other books.
Even today, the
Thomson Reuters Book Citation Index and Scopus index a relatively small number of books
as of September 2014, respectively) and this may cause problems for
of Scholarly Papers: The Effect of Document Properties and Collaboration Patterns. PLOS ONE. 10(3): e0120495.
Alhoori, H. and Furuta, R. (2014). Do altmetrics follow the crowd or does the crowd follow altmetrics? In: Proceedings
of the IEEE/ACM Joint Conference on Digital Libraries (JCDL 2014). Los Alamitos: IEEE Press.
http://people.tamu.edu/~alhoori/publications/alhoori2014jcdl.pdf; Haustein, S. and Siebenlist, T. (2011). Applying social
bookmarking data to evaluate journal usage. Journal of Informetrics. 5(3), 446-457.
Previously noted in Garfield, E. (1996). Citation indexes for retrieval and research evaluation. Consensus Conference on
the Theory and Practice of Research Assessment, Capri.
Hicks, D. (1999). The difficulty of achieving full coverage of international social science literature and the bibliometric
consequences. Scientometrics. 44 (2), 193-215.
Thompson, J. W. (2002). The death of the scholarly monograph in the humanities? Citation patterns in literary
scholarship. Libri. 52(3), 121-136; Kousha, K., and Thelwall, M. (2014). An automatic method for extracting citations from
Google Books. Journal of the Association for Information Science and Technology. DOI: 10.1002/asi.23170.
bibliometric analyses of books.
Expert peer judgment of books seems to be by far the best method
but it is even more time-consuming and expensive than article peer assessment because books are
generally much longer.
In response, alternative sources have been investigated for book impact
assessment, including syllabus mentions, library holding counts, book reviews and publisher prestige.
Many of the indicators discussed elsewhere in the full literature review (Supplementary Report I) can
also be used for books but have not yet been evaluated for this purpose. However, since academic
books are still mainly read in print form, download indicators are not yet so relevant.
contains a large number of academic and non-academic books based upon digitising
the collections of over 40 libraries around the world as well as partnerships with publishers.
studies have shown that the coverage of Google Books is quite comprehensive, but, due to copyright
considerations, Google Books does not always reveal the full text of the books that it has indexed.
Although Google Books is not a citation index and provides no citation statistics of any kind, it is
possible to manually search it for academic publications and hence identify citations to these
publications from digitised books.
Google Books could be useful because citations from books have
been largely invisible in traditional citation indexes and the current book citation search facilities in
Scopus and Web of Science cover relatively few books that are predominantly in English and from a
small number of publishers, which is problematic for citation impact assessment in book-based
For example: Gorraiz, J., Purnell, P. J., and Glänzel, W. (2013). Opportunities for and limitations of the book citation
index. Journal of the American Society for Information Science and Technology, 64(7), 1388-1398; Torres-Salinas, D.,
Robinson-García, N., Jiménez-Contreras, E., and Delgado López-Cózar, E. (2012). Towards a ‘Book Publishers Citation
Reports’. First approach using the ‘Book Citation Index’. Revista Española de Documentación Científica. 35(4), 615-620;
Torres-Salinas, D., Rodríguez-Sánchez, R., Robinson-García, N., Fdez-Valdivia, J., and García, J. A. (2013). Mapping
citation patterns of book chapters in the Book Citation Index. Journal of Informetrics, 7(2), 412-424.
See Weller, A. C. (2001). Editorial peer review: Its strengths and weaknesses. Medford, N.J: Information Today.
Chen, X. (2012). Google Books and WorldCat: A comparison of their content. Online Information Review. 36 (4), 507-
516.; Weiss, A., and James, R. (2013). Assessing the coverage of Hawaiian and pacific books in the Google Books
digitization project. OCLC Systems and Services, 29(1), 13-21.; Weiss, A., and James, R. (2013a). An examination of
massive digital libraries’ coverage of Spanish language materials: Issues of multi-lingual accessibility in a decentralized,
mass-digitized world. Paper presented at the Proceedings – 2013 International Conference on Culture and Computing,
Culture and Computing 2013. 10-14.
Kousha, K., and Thelwall, M. (2009). Google Book Search: Citation analysis for social science and the humanities.
Journal of the American Society for Information Science and Technology. 60(8), 1537-1549; Kousha, K., Thelwall, M., and
Rezaie, S. (2011). Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus. Journal
of the American Society for Information Science and Technology, 62(11), 2147-2164.
Gorraiz, J., Purnell, P., and Glänzel, W. (2013); Torres-Salinas et al. (2012), (2013).
National or international library holdings statistics can indicate library interest in books and seem to
reflect a different type of impact to that of citations, perhaps including educational and cultural
impacts. These statistics are relatively simple to collect automatically from the OCLC WorldCat
library holding catalogue,
with more than 2.2 billion items from over 72,000 libraries in 170
countries. These data, which are based upon book holdings and hence would be costly to manipulate,
seem promising for assessing the wider influence of books in SSH based on the information needs of
users, teaching staff and researchers. While more detailed borrowing statistics might be even more
useful, these data do not seem to be currently available.
Publisher prestige, reputational surveys, libcitation and citation indicators can also help to identify
prestigious scholarly publishers. A combination of all of the above may be more useful for rating
(rather than ranking) academic publishers of books or monographs as long as other factors, such as
geographical, language and disciplinary differences, are taken into consideration when they are used.
3.2.6. Varieties of outputs
While much of this discussion tends to focus on text-based outputs in peer-reviewed publications, it is
common for scholars across all disciplines to produce a wider variety of outputs from their research
processes. These range from research datasets, software, images, videos and patents, through to
exhibitions, compositions, performances, presentations and non-refereed publications (such as policy
documents or ‘grey’ literature). For some of these there may be plausible indicators of impact, such as
audience size, art gallery prestige, composition commissioner prestige, art sales or sales prices. In
most cases, however, it is likely that the contributions of individual works are so varied that any data
presented to support an impact case would not be directly comparable with other available data,
although they could be presented as evidence to support a specific argument about the contribution of
3.2.7. How robust are alternative quality metrics?
There is empirical evidence that a wide range of indicators derived from the web for scholars or their
outputs are related to scholarly activities in some way because they correlate positively and
significantly with citation counts. In many cases these metrics can also be harvested on a large scale
in an automated way with a high degree of accuracy (see Appendix B of the literature review,
Supplementary Report I, for methods to obtain alternative metric data). Nevertheless, most are easy to
and nearly all are susceptible to spam to some extent. Thus, alternative metrics do not
For example: Dullaart, C. (2014). High Retention, Slow Delivery. (Art piece: 2.5 million Instagram followers bought and
distributed to artists. See e.g. http://jeudepaume.espacevirtuel.org/, http://dismagazine.com/dystopia/67039/constant-dullaart-
seem to be suitable as a management tool with any kind of objective to measure, evaluate or manage
Even if no manipulation took place, which seems unlikely, the results would be suspected
of being affected by manipulation and in the worst case scenario the results would be extensively
manipulated and researchers would waste their time and money on this manipulation.
In our call for evidence, 19 respondents (of which 15 were HEIs) proposed that altmetrics could be
used as a research assessment tool; while 12 responses (of which eight were HEIs), argued that
altmetrics are not reliable enough to be used as a measure of research quality. This reflects the
uncertainties often associated with these indicators which are at an early stage of development. For an
altmetric to be taken seriously, empirical evidence of its value is needed in addition to evidence of a
reasonable degree of robustness against accidental or malicious spam.
3.3. Input indicators
In some contexts, there is support for the measurement of research quality through the use of proxy
indicators including: external research income (recognising that organisations are in competition for
these funds, so success is a marker of quality); research student enrolments; and research student
completion data. These were all mentioned by a number of respondents to our call for evidence as
potential measures of quality, but more often as a useful means to measure ‘environment’ or ‘vitality’
or the research base, along the lines of the REF’s environment component.
In UK HEIs, the maturity of current research information systems (CRISs) varies markedly between
institutions. Some HEIs have fully fledged systems that are completely integrated with other core
systems, others have stand-alone systems, and some rely on non-specific systems such as generic
databases and spreadsheets. Some UK HEIs capture or wish to capture data associated with all of the
above items (and more) for internal or external purposes. Publication information is most commonly
collected in central systems. Grant information, commercialisation, and PhD numbers and
completions tend to be collected centrally and most can produce information by staff member/FTE.
On the other hand, prizes, editorships, other esteem indicators and international visitors might more
commonly only be collected locally within departments. Information on research infrastructure is
perhaps the most variable and, anecdotally at least, least likely to be comprehensive (though
Wouters, P., and Costas, R. (2012). Users, narcissism and control: Tracking the impact of scholarly publications in the
21st century. SURFfoundation. Retrieved November 29, 2014, from https://www.surf.nl/kennis-en-
innovatie/kennisbank/2012/rapport-users-narcissism-and-control.html [In Dutch].
For instance, see the University of Durham’s response to our call for evidence, available at
initiatives like equipment.data.ac.uk and the work of sharing consortiums like N8 show that
infrastructure can be established and well-utilised).
3.4. Indicators of impact
Attempting to measure and capture broader societal or external impacts of academic work is a
relatively new concern in the UK system. Originally emphasised by the UK Research Councils as a
means of enhancing the external reach of their grant awards, impact became an established part of the
UK’s research assessment culture when it was introduced into the REF in 2011.
Impact is still a
contested term, with a variety of definitions and understandings of its implications. The ways in which
it can be assessed and measured are equally varied. Some definitions of impact highlight the
importance of being able to evidence its reach and significance: “Research has an external impact
when an auditable or recorded influence is achieved upon a non-academic organization or actor in a
sector outside the university sector itself ... external impacts need to be demonstrated rather than
One problem associated with the creation of impact indicators is that the model of impact seemingly
supported by some definitions can be rather linear, when dissemination is in fact more broadly
interspersed through the research cycle.
For example findings from the recent HEFCE review of
monographs and open access show that arts and humanities scholars don’t first research and then
disseminate in a neat two-stage process.
This means that developing metrics for the ways in which
impact is created in these disciplines is harder and the use of ad-hoc data that are contextualised by
interpretation of their meaning may be more suitable.
Others also argue that academics who create
impact by building long-lasting partnerships with groups and organisations will not be able to
demonstrate the depth and detail of impacts through metrics or data alone. A key concern from some
critics is that impact metrics focus on what is measurable at the expense of what is important.
Evidence of external impacts can take a number of forms – references to, citations of or discussion of
an academic or their work; in a practitioner or commercial document; in media or specialist media
See www.ref.ac.uk/pubs/2011-01/. Arguably, the REF impact pilots (concluded 2010) marked the formal introduction of
the impact element.
LSE Public Policy Group. (2011).
A recent report on whether altmetrics are useful measures of broader impact is: Adams, J. and Loach, T. (2015). Altmetric
mentions and the communication of medical science: Disseminating research outcomes outside academia. Digital Science.
Retrieved 1 June 2015.
Crossick, G. (2015). Monographs and Open Access: A report to HEFCE.
Thelwall, M. and Delgado, M. (2015, in press). Arts and humanities research evaluation: No metrics please, just data.
Journal of Documentation. 71 (4).
outlets; in the records of meetings, conferences, seminars, working groups and other interchanges; in
the speeches or statements of authoritative actors; or via inclusions or referencing or weblinks to
research documents in an external organisation’s websites or intranets; in the funding, commissioning
or contracting of research or research-based consultancy from university teams or academics; and in
the direct involvement of academics in decision-making in government agencies, government or
professional advisory committees, business corporations or interest groups, and trade unions, charities
or other civil society organisations.
Journal articles and books are seen to be less impact relevant than other forms of publications such as
research reports, briefing notes and conference papers. However research from Talbot and Talbot
found that journal articles were identified as the third most used route to find academic work by
However, even where government documents, for example, quote academic work
these references are not citations in the traditional sense and are therefore not picked up by
bibliometric analysis. Grey literature produced by academics tends to be more used by policymakers
but its impact is difficult to capture. Firstly citations are not made in the usual way and secondly
academics have been slow to realise the importance of using tagging information such as DOIs in
order to allow these references to be tracked.
The development of a range of alternative indicators has created the potential to diversify away from a
reliance upon counting citations to journal articles. Nevertheless, although a few alternative indicators
are promising and do have the potential to enable a new view of the impact and reach of research, the
‘science’ is in its infancy and most of the alternative metrics can be easily gamed. Some of the most
promising indicators are relevant to a much narrower range of research than are citations to journal
articles (F1000 ratings, patent citations, syllabus mentions, citations from Google Books). Hence, the
systematic use of alternative indicators as pure indicators of academic quality seems unlikely at the
current time, though they have the potential to provide an alternative perspective on research
dissemination, reach and ‘impact’ in its broadest sense.
The variety of evidence needed to build case studies of impact is such that, unlike scholarly impact, it
is difficult to reach a consensus about which indicators to use to highlight particular kinds of impact.
From the almost 7,000 impact case studies submitted to REF2014, there is little consistency in the
LSE Public Policy Group. (2011).
Talbot, C. and Talbot, C. (2014). Sir Humphrey and the professors: What does Whitehall want from academics?
v4%281%29.pdf. Retrieved 1 June 2015.
See Ernesto Priego’s contribution to our Warwick workshop on this point
indicators that case study authors used to evidence the impact of their research. An analysis of the
impact case studies commissioned by HEFCE from Digital Science and King’s College London
concluded as follows:
“ The quantitative evidence supporting claims for impact was diverse and
inconsistent, suggesting that the development of robust impact metrics is unlikely.
There was a large amount of numerical data (ie, c.170,000 items, or c.70,000 with
dates removed) that was inconsistent in its use and expression and could not be
synthesized. In order for impact metrics to be developed, such information would
need to be expressed in a consistent way, using standard units. However, as noted
above, the strength of the impact case studies is that they allow authors to select the
appropriate data to evidence their impact. Given this, and based on our analysis of
the impact case studies, we would reiterate…impact indicators are not sufficiently
developed and tested to be used to make funding decisions.”
Although the potential for a small subset of quantitative data to represent a diverse array of impacts is
limited, we did receive a wealth of views on how the narrative elements of impact case studies could
be enhanced. For example, the work carried out by King’s College London examined the use of
Quality Adjusted Life Years (QALYs) as a measure of health impact in the case studies, and
concluded that in the future where this data was available it could allow better comparability between
There are likely to be numerous additional examples of indicators that could be used in this way, but
they are usually specific to certain types of impact and need to be interpreted in context. Sometimes
these indicators may be measures of dissemination (e.g. webpage visits or YouTube views) that need
to be considered alongside other evidence of impact. The reports from RAND Europe which analysed
impact in the REF, also provide useful evidence, noting that HEIs could develop their own impact
metrics and should be encouraged to do so in future, but any effort to define impact indicators up front
risks unnecessary limitations on the exercise, as has been found to be the case in other pilot impact
Furthermore, the same metrics may not be applicable across main panels and might not
work for all disciplines. For example, a subset of research users were concerned about measures when
King’s College London and Digital Science. (2015), p72.
See for example Ovseiko, P., Oancea, A. and Buchan, A. (2012). Assessing research impact in academic clinical
medicine: a study using Research Excellence Framework pilot impact indicators, BMC Health Services Research 2012,
12:478. www.biomedcentral.com/1472-6963/12/478. Retrieved 1 June 2015.
claiming an impact involving interaction with the public. As one panellist asked: “What is the right
number of website hits to become 4-star?”
Figure 1: Examples of types of impact metrics tracking how research has been used
3.5. Indicating ways ahead
There are widespread concerns that quantitative indicators, such as citation-based data, cannot provide
sufficiently nuanced or robust measures of quality when used in isolation. Bibliometricians generally
see citation rates as a proxy measure of academic impact or of impact on the relevant academic
communities. But this is only one of the dimensions of academic quality. Quality needs to be seen as a
multidimensional concept that cannot be captured by any one indicator, and which dimension of
quality should be prioritised may vary by field and mission.
During the process of our own review, we have found greater support for the use of (carefully chosen)
indicators as a complement to other forms of assessment (in particular peer review), than as a means
to assess research quality by themselves (this is discussed further in Chapter 5). Many recent studies
Manville, C., Guthrie, S., Henham, M., Garrod, B., Sousa, S., Kirtley, A., Castle-Clark, S. and Ling, T. (2015). p34.
This figure was designed by Jane Tinkler for the review.
also recommend opting for a combination of strategies, and it is crucial that these are tailored to the
specific context in question.
It is crucial to consider what is best suited to the scale and focus of assessment. Concern over the
application of indicators at inappropriate scales features prominently in recent statements, such as
DORA and the Leiden Manifesto. Too often, managers and evaluators continue to rely on metrics that
are recognised as unsuitable as measures of individual performance, such as journal-level indicators.
Using carefully chosen ‘baskets’ of (qualitative and quantitative) indicators is often deemed to
provide the best way forward. “A single indicator cannot fully capture and represent the diversity and
complexity of the relationships within a research funding ecosystem. Quantitative measures are a
conduit of information that represents only very specific aspects of that ecosystem.”
It is also
important to emphasise that high quality bibliometric data are expensive. But we want high quality
data, otherwise our analysis is not worth the effort.
Turning to indicators for impact, there is an increasing body of literature on how scholarly impact
relates to broader external impact. Work by Altmetrics for example has shown that citations and
altmetric indicators seem to be measuring different but possibly related aspects.
And early analysis
from REF2014 highlights that the same units score well on outputs and impact, showing that these
aspects may be constituent of each other.
Views that the impact agenda is problematic are found across all disciplines but are perhaps strongest
in the arts and humanities where it is felt that it is impossible to be able to show the variety and depth
of impact of the work in those fields. However these disciplines have experience in developing
possible indicators for, or data about, the impacts of arts and humanities research, particularly in REF
It could be that experience from cultural organisations could be used in order to
further develop impact metrics that are relevant for the outputs produced by arts and humanities
As noted elsewhere, the way that impact is assessed in the REF is through case studies alongside a
broader narrative (see Section 9.3.2). These narrative-based outputs allow academics to outline in
detail how their work has created impact and therefore can be crafted to take appropriate account of
Council of Canadian Academies. (2012), p42
Also see Adams, J. and Loach, T. (2015). This compares altmetrics mentions and the communication of medical research.
The authors call for more work to be done to better understand definitive outcomes in terms of the relationship between the
content of biomedical research papers and the frequency with which they are mentioned in social media contexts.
Thelwall, M. and Delgado, M. (2015, in press). Arts and humanities research evaluation: No metrics please, just data.
Journal of Documentation. 71 (4).
the context. As with peer review, case studies allow expert judgement to be used in determining
successful research impact. The REF made use of external research users as part of assessing impact;
they were actively involved in providing context to impact claims made by academics. For some, case
studies are the only viable route to assessing impact; they offer the potential to present complex
information and warn against more focus on quantitative metrics for impact case studies. Others
however see case studies as “fairy tales of influence”
and argue for a more consistent toolkit of
impact metrics that can be more easily compared across and between cases.
In sum, while some alternative metrics seem to reflect types of impact that are different from that of
traditional citations, only Google patent citations and clinical guideline citations can yet be shown to
reflect wider societal impact. In addition, as the range of impact metrics is so wide – rightly so to be
able to show the range of impacts taking place – many of them would be too rare to help distinguish
between the impacts of typical publications. But they could be useful to give evidence of the impact
of the small minority of high impact articles. Overall, then, despite the considerable body of mostly
positive empirical evidence reviewed above, although alternative metrics do seem to give indications
of where research is having wider social impact they do not yet seem to be robust enough to be
routinely used for evaluations in which it is in the interest of stakeholders to manipulate the results.
Recent work on the REF impact case studies indicates that interdisciplinary research is more likely to
achieve greater impact, and it is to (inter)disciplinary differences and dilemmas that we now turn.
Dunleavy, P. (2012). REF Advice Note 1: Understanding HEFCE’s definition of Impact, LSE Impact, 22 October.
King’s College London and Digital Science (2015). p24.
4. Disciplinary dilemmas
“ Metrics have to be intertwined with the context of the discipline in question.”
Martin Eve, University of Warwick Arts and Humanities workshop
It is well known that practices of output production and research outlet selection vary significantly
across disciplinary and subdisciplinary fields. These diverse practices are bound up with specific
philosophical and methodological histories and practices, though the propensity to choose a particular
type of output or outlet over others may also be influenced by other factors, such as a specific
university or other policy environments.
This diversity has implications for how universally useful particular metrics are for some disciplines,
not least because of their limited coverage, as already discussed, but also because differences in
research practices across disciplines have deeper implications for the applicability of metrics.
The various emphases of existing metrics (at least as far as bibliometrics are concerned) has been on
counting and analysing research outputs published predominately in journals, and for that reason the
debate around metrics is often characterised by its identification of an ‘arts and humanities problem’,
where practices differ considerably from this pattern. It is perhaps inevitable that the focus of this
chapter is on how arts and humanities research might be distinctive, but we would sound a note of
caution here. Research is diverse, right across the academy. Disciplinary differences are often broadly
and unhelpfully characterised, but can in reality be quite subtle and entirely valid where they occur.
Metrics should not become the ‘tail that wags the dog’ of research practice in all disciplines. Rather, it
is incumbent on those who design and use metrics to take a fuller account of the existing diversity of
research, and design sensitive and meaningful metrics to reflect and support this.
4.1. Variations in research outputs
Researchers produce a variety of research outputs across the range of disciplines. Submissions to
REF2014 revealed a wealth of types of output submitted across all units of assessment: journal
articles, books, datasets, performances, compositions, artefacts, software, patents, exhibitions,
installations, designs, digital media – the list goes on. RCUK’s use of Researchfish has compiled a
large structured dataset of outputs linked to research funded since 2006, and this demonstrates that a
diverse range of outputs are produced across the breadth of research disciplines, and that certain
outputs are not exclusive to particular disciplines.
For instance, although rare, it is not unheard of
for life science researchers to write a play or devise a work inspired by their research, and
interdisciplinary research often results in unusual or multimodal outputs. It is therefore important not
to oversimplify. However, some general trends for output production, across broad disciplinary areas,
can be noted as follows:
Journal articles are the primary output for many disciplines (on average half of all
output reports captured in Researchfish
) but their importance varies, tending to
play a less predominant role in the arts and humanities, and some areas of the social
Monographs and book chapters are particularly important for many disciplines
within the humanities, and for some within the social sciences;
Conference contributions are particularly important for computer scientists and
Products and prototypes are important outputs for some academics, particularly in
the engineering sciences;
Art works, artefacts and practice-based outputs are more likely to play an important
role for arts-based disciplines.
While not entirely representative of output production across subject areas,
some idea of the trends
in output production can be gained by looking at the spread of output types submitted to REF2014
across the 36 panels (see Table 3 in the annex).
As noted in Section 3.1 of the previous chapter, such trends are noteworthy as some outputs are less
likely to be included within bibliographic databases, which are central to the formulation of many
bibliometric indicators and analyses (see Section 3.1). In particular, book publications and
While the REF and RCUK have noted this diversity, this doesn’t necessarily reflect what researchers actually do – instead
it reflects what is considered important to submit to the various assessment or reporting systems.
Other outputs might include audio-visual recordings, technical drawings, website content, software, designs or working
models, exhibitions, patents, plant breeding rights, working papers, policy documents, research reports, legal cases, and
translations, amongst others. Furthermore, the diversity of outputs being produced by academics is becoming increasingly
broad due to digital and web-based technological developments.
www.mrc.ac.uk/documents/pdf/Introduction/; also note the above point in Footnote 144.
For a useful discussion of monographs, see Crossick, G. (2015).
Indeed see Adams, J. and Gurney, K. (2014). Evidence for excellence: Has the signal taken over the substance? An
analysis of journal articles submitted to the RAE2008. Digital Science. www.digital-science.com/resources/digital-research-
report-evidence-for-excellence-has-the-signal-overtaken-the-substance/. Retrieved 1 April 2015.
publications in “niche”
or locally important journals play an important role in SSH but these
publications are often not indexed in bibliographic databases. Bibliometric analyses in computer
science and engineering involve similar difficulties, as many computer science and engineering
publications appear in conference proceedings, but such literature is often less well covered by
bibliographic databases, especially by Web of Science and Scopus, in comparison to journal articles,
perhaps because of the costs and complexity of monitoring conference proceedings.
Publication patterns and practices also vary across discipline. For instance in some areas, academics
might publish a number of articles per year, while in those disciplines where monographs are a
favoured output, producing one book every few years might be seen as appropriate. The language in
which outputs are likely to be produced can also vary; for some areas of SSH outputs are more likely
to be produced in the relevant national language, which may not be English. Outputs that are not
produced in English are less likely to be included in certain bibliographic databases, and may be less
likely to be captured in bibliometric indicators.
The number of authors per publications also varies to some extent by subject area. For instance,
outputs produced in the humanities are more likely to be single-authored, whereas publications in
some areas of science, such as medicine and biology, often have several. This will affect requirements
for allocation of credit, which can be relevant to bibliometric and citation indicators.
4.2. Variations in citation practices
There is significant disciplinary diversity in terms of citation practices, and linked to this, the use and
acceptance of indicators for the assessment of research outputs. Such variations influence the
interpretation of indicators such as the JIF. For instance, top ranked journals in mathematics have a
JIF of three versus 30 for cell biology.
Some subject areas are more likely to rank their journals and publishers according to their JIFs, while
in other areas, such as SSH, this practice is less common
. Thus, bibliometrics measures such as JIFs
are more welcomed and embedded within certain disciplines than others.
Many humanities disciplines are characterised by internal debate, such that an output may be just as
likely to be cited for its stance on a particular issue, or place in a broader debate, than for the quality
Social sciences and humanities in particular have larger numbers of ‘national’ or ‘niche’ journals which are not indexed
in bibliometric databases. Yet these may still have transnational contributions and readership, and despite a smaller audience
may be highly significant for that specialist (sub)discipline.
With some exceptions, such as Business and Management, where the practice is more commonplace as per the ABS
of the thinking or research it describes. In STEM disciplines, methods papers may also attract large
numbers of citations. Practices of citation are therefore more complex than is visible in the simplicity
of citation numbers.
The time span over which a piece of research is deemed to be relevant can also vary by discipline, as
some subject areas move faster than others. In general terms, research in SSH tends to remain relevant
for longer periods than in the natural sciences, as noted by recent analysis of journal usage half-lives
by the British Academy.
This will affect citation practices and therefore the relevance of certain
indicators in particular contexts.
There are widespread concerns that indicators are less likely to capture the value of academic outputs
from less popular or more obscure fields of work as these are cited less often, or works published in
languages other than English and there are concerns that fields of enquiry, based on more theoretical
and also applied outputs may fall foul of certain indicator-based assessment strategies (whatever the
4.3. Differing disciplinary perspectives
There is an extensive literature on these and related issues, and the review has captured detailed
debate and commentary on these points, particularly in response to our call for evidence. Thirty-five
out of 153 respondents were concerned that indicators, in particular citation-based metrics, could
unfairly disadvantage some disciplines, especially in the arts, humanities and social sciences. Some
felt that in certain disciplines, including law, English literature, nursing and criminology, the use of
such indicators would never be plausible. A number of respondents made the point that variations are
often considerable within as well as between disciplines.
Throughout the course of the review, we have heard that assessment regimes must take different
cultures of output and citation into account to ensure that diverse research practices and cultures are
supported and captured appropriately.
This was certainly the case within discussions of the (potential) use of citation-based data within the
REF during our focus groups with REF panellists (see Chapter 9). At our roundtable review workshop
hosted by UCL in July 2014, a diversity of opinions on the potential use of (typically citation-based
data) were aired, with considerable variation in views across disciplines, as summarised below:
Darley, R, Reynolds, D. and Wickham, C. (2014). Open access journals in Humanities and Social Science: A British
Academy Research Project. www.britac.ac.uk/templates/asset-relay.cfm?frmAssetFileID=13584. Retrieved 1 June 2015.
Area studies: Capturing metrics data for both outputs and impacts has proved very
difficult in area studies;
Biological sciences: Citation metrics can be helpful as a last resort to inform
borderline decisions but are not currently seen as widely useful;
Built environment: Some disciplines are more inclined to use quantitative data but
they are in a minority. The use of metrics for assessment of architecture is flawed –
most outputs are buildings, design projects, books, etc, which don’t fit into metrics;
Computer science: There are significant problems relating to coverage of citations
by providers, for instance, indexing conference proceedings. Other computer science
outputs include software, which are poorly captured. Downloads might be one option
but it is unclear what these say about the excellence of research;
Education: It was suggested that some quantitative measures in research assessment
are appropriate, but there was a risk that reviewers might use metrics
disproportionately within the peer review process;
Performing arts: There is no formalised process of outputs, so a metrics-based
approach based on this assumption would be unsuitable. More discursive elements of
assessment would be welcome in these disciplines;
Physics and epidemiology: Very large author groups can be an issue. Currently
‘team science’ and collaborative research is not well rewarded. It would be worth
exploring whether metrics could address this. Current metrics and methods of
assessment can create tensions in research practices for some disciplines;
Psychosocial studies: There is an important question about why papers are cited and
how to interpret the meaning of high citation counts – for example, something
written provocatively can be cited many times despite being a paper well known to
be poor. There are also issues about use of metrics in people’s individual references,
when these are not necessarily comparable and produce certain kinds of gaming and
During the course of the review, we have found that the most serious concerns about certain
bibliometric and citation-based indicators tended to be voiced by academics working in arts and
humanities disciplines. Colleagues from the arts and humanities are not alone in their distrust of
certain research assessment indicators, however the diversity of their research practice and their
attendant methodologies and research cultures – which impact on views and attitudes surrounding
what epitomises quality and thus how to best evaluate research – perhaps makes them well placed to
articulate these concerns. Notwithstanding this, as pointed out in the literature review (Supplementary
Report I), research conducted within the arts, humanities (and parts of the social sciences) often
differs from much of the research conducted in the sciences in a number of fundamental ways, for
It has a stronger national and regional orientation; for instance, more publications are
likely to be written in languages other than English;
It is often published in books and other outputs which are harder to measure
quantitatively (e.g. objects, films and ephemeral works);
It can have a different configuration of theoretical development that operates at a
different pace; it is difficult to introduce quantitative metrics of incremental work
that is undertaken over a long period of time and is slow to develop;
It depends on the scholars working alone as well as in collaborative teams, so is
sometimes less collaborative;
It may be directed more at a non-scholarly public.
Therefore, we contend that the specificities of different disciplines and sub-disciplines including, but
not limited to, the arts and humanities,
need to be accounted for within research assessment. We
agree with the assertion by PLOS in their response to our call for evidence: “It is entirely appropriate
that various research communities seek to articulate the value of their work in different ways.”
4.4. Tailored approaches
As noted above, metrics or indicators are not discipline-specific, as such, but can be more or less
relevant (or the data more complete) for particular forms of communication, interaction or re-use.
In research assessment contexts, comparison between publications from research fields is often a
requirement, but this is a challenging task due to differing cultures of output production and citation
practice. For outputs, attempts to address these difficulties, through processes or normalisation, are
often seen as crucial (see the literature review (Supplementary Report I) and Section 3.1.4 of this
report). Indeed, the call to normalise indicators across fields is one of the ten guiding principles for
Concerns raised in the literature review (Supplementary Report I) echo several of the concerns and points raised by
respondents to our call for evidence as well as contributors to our REF panel focus groups (from Main Panels C and D) who
were largely sceptical about the use of citation indicators as a means to assess research quality.
Also see the report produced by the BA and AHRC in (2006). Use of research metrics in the arts and humanities: report
of the expert group set up jointly by the Arts and Humanities Research Council and HEFCE.
research evaluation listed in the Leiden manifesto for research metrics.
However, processes of
normalisation are not always straightforward and do not necessarily remove problems across all
fields, including but not limited to those related to the coverage of metrics. Relatedly, many of the
standard indicators currently on offer cannot adequately capture outputs from the arts, especially
practice-based subjects, but also written products such as poetry and novels.
Attempts are being made to find alternative and altmetric solutions as a means to provide other routes
forward, as discussed at our Sussex and Warwick workshops.
However, while increasingly
sophisticated, these are not yet ready for widespread application in evaluations. Furthermore,
difficulties in terms of their potential application across and between different disciplines would
remain. For some disciplines, there is more (or less) of a culture of citing outputs online in ways that
will be captured. For instance, at the arts and humanities workshop in Warwick, some concerns were
raised that unless larger proportions of the arts and humanities community increase their use and
understanding of social media, for instance to capture and circulate DOIs, then such altmetric projects
are unlikely to succeed. As Ernesto Priego suggests, systems will not function unless communities
buy into them and use associated platforms in effective ways.
There are also residual concerns in relation to books, though improvements are being made (see
Section 3.2.5). Perhaps further investment is required in these fields into the development of more
suitable indicators. However, the challenge of making meaningful comparisons across subjects is
likely to still remain and sufficient sensitivity to context would still be paramount.
One option would be to develop a ‘basket’ of appropriate metrics perhaps used alongside other forms
of assessment such as peer review, and tailored to the community in question. For instance, at
Warwick, we also heard that (perhaps) alternative measures/indicators need to be found given the
different nature of research quality within the arts and humanities, in order to avoid the risk of using
“inappropriate proxies, and bringing in unsuitable goals and objectives” (Jonathan Adams).
Most of the international case studies discussed in Section 2.6 tailor their approaches to research
assessment to account for some disciplinary variations, as is also the case with the REF, in terms of
Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., and Rafols, I. (2015). Bibliometrics: The Leiden Manifesto for
research metrics. Nature. 520, 429-431. www.nature.com/news/bibliometrics-the-leiden-manifesto-for-research-metrics-
1.17351. Retrieved 1 May 2015.
Slides available at:
%20In%20Metrics%20we%20Trust.pdf. Retrieved 1 February 2015;
www.hefce.ac.uk/media/hefce/content/news/Events/2015/HEFCE,metrics,workshop,Warwick/Thelwall.pdf. Retrieved 1
https://epriego.wordpress.com/2015/01/16/hefcemetrics-more-on-metrics-for-the-arts-and-humanities/. Retrieved 1 June
citation data provision. However, the degree to which such systems have been adapted varies
enormously, and primarily focuses on attempts to mitigate certain biases of bibliometric analyses.
Snowball Metrics also attempts to provide clarity to the definition and use of metrics, such that
disciplinary differences could be more readily accounted for.
However, use of tailored and varied approaches raises additional complexities in terms of cost and
administration, has implications for interdisciplinary research (as discussed further in the next
section), and could lead to potential disquiet between groups. Claire Donovan, who spoke at our
Warwick arts and humanities workshop, discussed the use of different research assessment methods
within the Australian system, where some subject areas just employed metrics and others used a mix;
she warned that this was not without its attendant problems, and has the potential to lead to perceived
hierarchies which may cause significant tension between disciplinary groups.
4.5. Indicators of interdisciplinarity
In recent years, there has been an increasing emphasis on the importance of interdisciplinary research,
but also a recognition that this isn’t always easy to undertake effectively, or to support. Our literature
review (Supplementary Report I, Section 126.96.36.199) highlights past attempts to establish whether certain
modes of research assessment, including the 1996 RAE , have helped or hindered interdisciplinary
Throughout the review, a number of contributors have emphasised the need to pay due attention to
supporting interdisciplinary working. For instance, a small number of responses to the call for
evidence (eight in total) expressed concern that the use of discipline-led metrics could unfairly
disadvantage interdisciplinary research. At the Sussex review workshop, several contributors
suggested ways to encourage interdisciplinarity, including:
Taking plurality seriously, calling for metrics to open up the debate rather than
closing down the range of outputs within disciplines. It is argued that we need to
resist strong demands to keep systems simple, given that the research system may be
irreducibly complex. ‘Baskets’ of metrics that include qualitative and quantitative
indicators are more likely to give a better picture of how systems work.
Adopt new indicators that ask meaningful questions about the research enterprise
that appreciate the multiplicity of qualities underpinning research. These would give
Slides available at
www.hefce.ac.uk/media/hefce/content/news/Events/2015/HEFCE,metrics,workshop,Warwick/Donovan.pdf. Retrieved 1
due consideration to the type of research being carried out, who is using and doing
the research and the networks involved.
Evaluation processes could better be linked to the creative process. There is a need to
create further novel research experiences, which could include work across and
Some of the work that Academic Analytics have been undertaking to find indicators
for discovery, for instance highlighting the interdisciplinary work of US academics
through network analysis, may be useful in this context.
4.6. Resolving our differences
It is clear that research across disciplines, and within them, is diverse in practice and output. Variation
in citation practices is the most obvious and striking example of this diversity, but the differences run
deeper, drawing in questions of method, debate, epistemology, value, quality and documentation. The
research system clearly displays a degree of complexity that is difficult to reduce to simple numbers,
but approaches that take account of local practice within disciplines and sub-disciplines may prevent
unhelpful or misleading comparisons being made between different types and modes of research and
encourage – and perhaps even nurture – diversity. These approaches would carry significant costs,
though, and may hamper interdisciplinary research and creative approaches more broadly. There are
no quick fixes here, but the greatest potential for recognising differences may come in the form of
‘baskets’ of indicators – qualitative and quantitative – that can capture the valuable aspects of research
practice, output and impact within all disciplines, however they are configured.