The Metric Tide
Report of the Independent Review of the Role of Metrics
in Research Assessment and Management
James Wilsdon, Liz Allen, Eleonora Belfiore, Philip Campbell, Stephen Curry,
Steven Hill, Richard Jones, Roger Kain, Simon Kerridge, Mike Thelwall, Jane Tinkler,
Ian Viney, Paul Wouters, Jude Hill, Ben Johnson.
Wilsdon, J., et al. (2015). The Metric Tide: Report of the Independent Review of the Role of
Metrics in Research Assessment and Management. DOI: 10.13140/RG.2.1.4929.1363
© HEFCE 2015, except where indicated.
Cover image © JL-Pfeifer / Shutterstock.com.
The parts of this work that are © HEFCE are available under the Open Government Licence 2.0:
Foreword ................................................................................................................................................ iii
Acknowledgments .................................................................................................................................. iv
Steering group and secretariat ................................................................................................................. v
Executive summary ............................................................................................................................... vii
1. Measuring up ................................................................................................................................... 1
2. The rising tide ............................................................................................................................... 12
3. Rough indications .......................................................................................................................... 30
4. Disciplinary dilemmas .................................................................................................................. 50
5. Judgement and peer review ........................................................................................................... 59
6. Management by metrics ................................................................................................................ 68
7. Cultures of counting ...................................................................................................................... 79
8. Sciences in transition ..................................................................................................................... 96
9. Reflections on REF ..................................................................................................................... 117
10. Responsible metrics .................................................................................................................... 134
Annex of tables ................................................................................................................................... 148
List of abbreviations and glossary ...................................................................................................... 161
The Metric Tide: Literature Review (Supplementary Report I to the Independent
Review of the Role of Metrics in Research Assessment and Management)
The Metric Tide: Correlation analysis of REF2014 scores and metrics
(Supplementary Report II to the Independent Review of the Role of Metrics in
Research Assessment and Management)
Metrics evoke a mixed reaction from the research community.
A commitment to using data and evidence to inform decisions makes
many of us sympathetic, even enthusiastic, about the prospect of granular,
real-time analysis of our own activities. If we as a sector can’t take full
advantage of the possibilities of big data, then who can?
Yet we only have to look around us, at the blunt use of metrics such as
journal impact factors, h-indices and grant income targets to be reminded
of the pitfalls. Some of the most precious qualities of academic culture
resist simple quantification, and individual indicators can struggle to do justice to the richness and
plurality of our research. Too often, poorly designed evaluation criteria are “dominating minds,
distorting behaviour and determining careers.”
At their worst, metrics can contribute to what Rowan
Williams, the former Archbishop of Canterbury, calls a “new barbarity” in our universities.
tragic case of Stefan Grimm, whose suicide in September 2014 led Imperial College to launch a
review of its use of performance metrics, is a jolting reminder that what's at stake in these debates is
more than just the design of effective management systems.
Metrics hold real power: they are
constitutive of values, identities and livelihoods.
How to exercise that power to positive ends is the focus of this report. Based on fifteen months of
evidence-gathering, analysis and consultation, we propose here a framework for responsible metrics,
and make a series of targeted recommendations. Together these are designed to ensure that indicators
and underlying data infrastructure develop in ways that support the diverse qualities and impacts of
UK research. Looking to the future, we show how responsible metrics can be applied in research
management, by funders, and in the next cycle of the Research Excellence Framework.
The metric tide is certainly rising. Unlike King Canute, we have the agency and opportunity – and in
this report, a serious body of evidence – to influence how it washes through higher education and
research. Let me end on a note of personal thanks to my steering group colleagues, to the team at
HEFCE, and to all those across the community who have contributed to our deliberations.
James Wilsdon, Chair
Lawrence, P.A. (2007) ‘The mismeasurement of science’. Current Biology Vol.17, Issue 15, pR583–R585.
Annual Lecture to the Council for the Defence of British Universities, January 2015.
metrics/2019381.article. Retrieved 22 June 2015.
The steering group would like to extend its sincere thanks to the numerous organisations and
individuals who have informed the work of the review. Metrics can be a contentious topic, but the
expertise, insight, challenge and open engagement that so many across the higher education and
research community have brought to this process has made it both enjoyable and instructive.
Space unfortunately limits us from mentioning everyone by name. But particular thanks to David
Willetts for commissioning the review and provoking us at the outset to frame it more expansively,
and to his ministerial successors Greg Clark and Jo Johnson for the interest they have shown in its
progress and findings. Thanks also to Dr Carolyn Reeve at BIS for ensuring close government
engagement with the project.
The review would not have been possible without the outstanding support that we have received from
the research policy team at HEFCE at every stage of research, evidence-gathering and report drafting;
notably Jude Hill, Ben Johnson, Alex Herbert, Kate Turton, Tamsin Rott and Sophie Melton-Bradley.
Thanks also to David Sweeney at HEFCE for his advice and insights.
We are indebted to all those who responded to our call for evidence; attended, participated in and
spoke at our workshops and focus groups; and contributed to online discussions. Thanks also to those
organisations who hosted events linked to the review, including the Universities of Oxford, Sheffield,
Sussex, UCL and Warwick, the Higher Education Policy Institute and the Scottish Funding Council.
The review has hugely benefited from the quality and breadth of these contributions. Any errors or
omissions are entirely our own.
Steering group and secretariat
The review was chaired by James Wilsdon FAcSS, Professor of Science and Democracy at the
Science Policy Research Unit (SPRU), University of Sussex (orcid.org/0000-0002-5395-5949;
Professor Wilsdon was supported by an independent steering group with the following members:
Dr Liz Allen, Head of Evaluation, Wellcome Trust (orcid.org/0000-0002-9298-3168;
Dr Eleonora Belfiore, Associate Professor in Cultural Policy, Centre for Cultural Policy
Studies, University of Warwick (orcid.org/0000-0001-7825-4615; @elebelfiore);
Sir Philip Campbell, Editor-in-Chief, Nature (orcid.org/0000-0002-8917-1740;
Professor Stephen Curry, Department of Life Sciences, Imperial College London
Dr Steven Hill, Head of Research Policy, HEFCE (orcid.org/0000-0003-1799-1915;
Professor Richard Jones FRS, Pro-Vice-Chancellor for Research and Innovation,
University of Sheffield (orcid.org/0000-0001-5400-6369; @RichardALJones)
(representing the Royal Society);
Professor Roger Kain FBA, Dean and Chief Executive, School of Advanced Study,
University of London (orcid.org/0000-0003-1971-7338; @kain_SAS) (representing the
Dr Simon Kerridge, Director of Research Services, University of Kent, and Chair of the
Board of the Association of Research Managers and Administrators (orcid.org/0000-
Professor Mike Thelwall, Statistical Cybermetrics Research Group, University of
Wolverhampton (orcid.org/0000-0001-6065-205X; @mikethelwall);
Jane Tinkler, Social Science Adviser, Parliamentary Office of Science and Technology
Dr Ian Viney, MRC Director of Strategic Evaluation and Impact, Medical Research
Council head office, London (orcid.org/0000-0002-9943-4989, @MRCEval);
Paul Wouters, Professor of Scientometrics & Director, Centre for Science and
Technology Studies (CWTS), Leiden University (orcid.org/0000-0002-4324-5732,
The following members of HEFCE’s research policy team provided the secretariat for the steering
group and supported the review process throughout: Jude Hill, Ben Johnson, Alex Herbert, Kate
Turton, Tamsin Rott and Sophie Melton-Bradley. Hannah White and Mark Gittoes from HEFCE’s
Analytical Services Directorate also contributed, particularly to the REF2014 correlation exercise (see
Supplementary Report II). Vicky Jones from the REF team also provided advice.
This report presents the findings and recommendations of the Independent Review of the Role of
Metrics in Research Assessment and Management. The review was chaired by Professor James
Wilsdon, supported by an independent and multidisciplinary group of experts in scientometrics,
research funding, research policy, publishing, university management and administration.
Scope of the review
This review has gone beyond earlier studies to take a deeper look at potential uses and limitations of
research metrics and indicators. It has explored the use of metrics across different disciplines, and
assessed their potential contribution to the development of research excellence and impact. It has
analysed their role in processes of research assessment, including the next cycle of the Research
Excellence Framework (REF). It has considered the changing ways in which universities are using
quantitative indicators in their management systems, and the growing power of league tables and
rankings. And it has considered the negative or unintended effects of metrics on various aspects of
Our report starts by tracing the history of metrics in research management and assessment, in the UK
and internationally. It looks at the applicability of metrics within different research cultures, compares
the peer review system with metric-based alternatives, and considers what balance might be struck
between the two. It charts the development of research management systems within institutions, and
examines the effects of the growing use of quantitative indicators on different aspects of research
culture, including performance management, equality, diversity, interdisciplinarity, and the ‘gaming’
of assessment systems. The review looks at how different funders are using quantitative indicators,
and considers their potential role in research and innovation policy. Finally, it examines the role that
metrics played in REF2014, and outlines scenarios for their contribution to future exercises.
The review has drawn on a diverse evidence base to develop its findings and conclusions. These
include: a formal call for evidence; a comprehensive review of the literature (Supplementary Report
I); and extensive consultation with stakeholders at focus groups, workshops, and via traditional and
The review has also drawn on HEFCE’s recent evaluations of REF2014, and commissioned its own
detailed analysis of the correlation between REF2014 scores and a basket of metrics (Supplementary
There are powerful currents whipping up the metric tide. These include growing pressures for
audit and evaluation of public spending on higher education and research; demands by policymakers
for more strategic intelligence on research quality and impact; the need for institutions to manage and
develop their strategies for research; competition within and between institutions for prestige,
students, staff and resources; and increases in the availability of real-time ‘big data’ on research
uptake, and the capacity of tools for analysing them.
Across the research community, the description, production and consumption of ‘metrics’
remains contested and open to misunderstandings. In a positive sense, wider use of quantitative
indicators, and the emergence of alternative metrics for societal impact, could support the transition to
a more open, accountable and outward-facing research system. But placing too much emphasis on
narrow, poorly-designed indicators – such as journal impact factors (JIFs) – can have negative
consequences, as reflected by the 2013 San Francisco Declaration on Research Assessment (DORA),
which now has over 570 organisational and 12,300 individual signatories.
Responses to this review
reflect these possibilities and pitfalls. The majority of those who submitted evidence, or engaged in
other ways, are sceptical about moves to increase the role of metrics in research management.
However, a significant minority are more supportive of the use of metrics, particularly if appropriate
care is exercised in their design and application, and the data infrastructure can be improved.
Peer review, despite its flaws and limitations, continues to command widespread support across
disciplines. Metrics should support, not supplant, expert judgement. Peer review is not perfect, but it
is the least worst form of academic governance we have, and should remain the primary basis for
assessing research papers, proposals and individuals, and for national assessment exercises like the
REF. However, carefully selected and applied quantitative indicators can be a useful complement to
other forms of evaluation and decision-making. One size is unlikely to fit all: a mature research
system needs a variable geometry of expert judgement, quantitative and qualitative indicators.
Research assessment needs to be undertaken with due regard for context and disciplinary diversity.
Academic quality is highly context-specific, and it is sensible to think in terms of research qualities,
rather than striving for a single definition or measure of quality.
Inappropriate indicators create perverse incentives. There is legitimate concern that some
quantitative indicators can be gamed, or can lead to unintended consequences; journal impact factors
and citation counts are two prominent examples. These consequences need to be identified,
These are presented in greater detail in Section 10.1 of the main report.
www.ascb.org/dora. As of July 2015, only three UK universities are DORA signatories: Manchester, Sussex and UCL.
acknowledged and addressed. Linked to this, there is a need for greater transparency in the
construction and use of indicators, particularly for university rankings and league tables. Those
involved in research assessment and management should behave responsibly, considering and pre-
empting negative consequences wherever possible, particularly in terms of equality and diversity.
Indicators can only meet their potential if they are underpinned by an open and interoperable
data infrastructure. How underlying data are collected and processed – and the extent to which they
remain open to interrogation – is crucial. Without the right identifiers, standards and semantics, we
risk developing metrics that are not contextually robust or properly understood. The systems used by
higher education institutions (HEIs), funders and publishers need to interoperate better, and
definitions of research-related concepts need to be harmonised. Information about research –
particularly about funding inputs – remains fragmented. Unique identifiers for individuals and
research works will gradually improve the robustness of metrics and reduce administrative burden.
At present, further use of quantitative indicators in research assessment and management
cannot be relied on to reduce costs or administrative burden. Unless existing processes, such
as peer review, are reduced as additional metrics are added, there will be an overall increase in
burden. However, as the underlying data infrastructure is improved and metrics become more
robust and trusted by the community, it is likely that the additional burden of collecting and
assessing metrics could be outweighed by the reduction of peer review effort in some areas –
and indeed by other uses for the data. Evidence of a robust relationship between newer metrics
and research quality remains very limited, and more experimentation is needed. Indicators
such as patent citations and clinical guideline citations may have potential in some fields for
quantifying impact and progression.
Our correlation analysis of the REF2014 results at output-by-author level (Supplementary
Report II) has shown that individual metrics give significantly different outcomes from the REF
peer review process, and therefore cannot provide a like-for-like replacement for REF peer
review. Publication year was a significant factor in the calculation of correlation with REF scores,
with all but two metrics showing significant decreases in correlation for more recent outputs. There is
large variation in the coverage of metrics across the REF submission, with particular issues with
coverage in units of assessment (UOAs) in REF Main Panel D (mainly arts & humanities). There is
also evidence to suggest statistically significant differences in the correlation with REF scores for
early-career researchers and women in a small number of UOAs.
Within the REF, it is not currently feasible to assess the quality of UOAs using quantitative
indicators alone. In REF2014, while some indicators (citation counts, and supporting text to
highlight significance or quality in other ways) were supplied to some panels to help inform their
judgements, caution needs to be exercised when considering all disciplines with existing bibliographic
databases. Even if technical problems of coverage and bias can be overcome, no set of numbers,
however broad, is likely to be able to capture the multifaceted and nuanced judgements on the quality
of research outputs that the REF process currently provides.
Similarly, for the impact component of the REF, it is not currently feasible to use quantitative
indicators in place of narrative impact case studies, or the impact template. There is a danger that
the concept of impact might narrow and become too specifically defined by the ready availability of
indicators for some types of impact and not for others. For an exercise like the REF, where HEIs are
competing for funds, defining impact through quantitative indicators is likely to constrain thinking
around which impact stories have greatest currency and should be submitted, potentially constraining
the diversity of the UK’s research base. For the environment component of the REF, there is scope
to enhance the use of quantitative data in the next assessment cycle, provided they are used with
sufficient context to enable their interpretation.
There is a need for more research on research. The study of research systems – sometimes
called the ‘science of science policy’ – is poorly funded in the UK. The evidence to address the
questions that we have been exploring throughout this review remains too limited; but the questions
being asked by funders and HEIs – ‘What should we fund?’ ‘How best should we fund?’ ‘Who should
we hire/promote/invest in?’ – are far from new and can only become more pressing. More investment
is needed as part of a coordinated UK effort to improve the evidence base in this area. Linked to this,
there is potential for the scientometrics community to play a more strategic role in informing how
quantitative indicators are used across the research system, and by policymakers.
In recent years, the concept of ‘responsible research and innovation’ (RRI) has gained currency as a
framework for research governance. Building on this, we propose the notion of responsible metrics
as a way of framing appropriate uses of quantitative indicators in the governance, management and
assessment of research. Responsible metrics can be understood in terms of the following dimensions:
Robustness: basing metrics on the best possible data in terms of accuracy and scope;
Humility: recognising that quantitative evaluation should support – but not supplant
– qualitative, expert assessment;
Transparency: keeping data collection and analytical processes open and
transparent, so that those being evaluated can test and verify the results;
Diversity: accounting for variation by field, and using a range of indicators to reflect
and support a plurality of research and researcher career paths across the system;
Reflexivity: recognising and anticipating the systemic and potential effects of
indicators, and updating them in response.
This review has identified 20 specific recommendations for further work and action by stakeholders
across the UK research system. These draw on the evidence we have gathered, and should be seen as part
of broader attempts to strengthen research governance, management and assessment which have been
gathering momentum, and where the UK is well positioned to play a leading role internationally. The
recommendations are listed below, with targeted recipients in brackets:
Supporting the effective leadership, governance and management of research
1 The research community should develop a more sophisticated and nuanced approach to the
contribution and limitations of quantitative indicators. Greater care with language and terminology is
needed. The term ‘metrics’ is often unhelpful; the preferred term ‘indicators’ reflects a recognition that
data may lack specific relevance, even if they are useful overall. (HEIs, funders, managers, researchers)
2 At an institutional level, HEI leaders should develop a clear statement of principles on their
approach to research management and assessment, including the role of quantitative indicators. On
the basis of these principles, they should carefully select quantitative indicators that are appropriate to
their institutional aims and context. Where institutions are making use of league tables and ranking
measures, they should explain why they are using these as a means to achieve particular ends. Where
possible, alternative indicators that support equality and diversity should be identified and included. Clear
communication of the rationale for selecting particular indicators, and how they will be used as a
management tool, is paramount. As part of this process, HEIs should consider signing up to DORA, or
drawing on its principles and tailoring them to their institutional contexts. (Heads of institutions, heads of
research, HEI governors)
3 Research managers and administrators should champion these principles and the use of
responsible metrics within their institutions. They should pay due attention to the equality and
diversity implications of research assessment choices; engage with external experts such as those at the
Equality Challenge Unit; help to facilitate a more open and transparent data infrastructure; advocate the
use of unique identifiers such as ORCID iDs; work with funders and publishers on data interoperability;
explore indicators for aspects of research that they wish to assess rather than using existing indicators
because they are readily available; advise senior leaders on metrics that are meaningful for their
institutional or departmental context; and exchange best practice through sector bodies such as ARMA.
(Managers, research administrators, ARMA)
4 HR managers and recruitment or promotion panels in HEIs should be explicit about the criteria
used for academic appointment and promotion decisions. These criteria should be founded in expert
judgement and may reflect both the academic quality of outputs and wider contributions to policy,
industry or society. Judgements may sometimes usefully be guided by metrics, if they are relevant to the
criteria in question and used responsibly; article-level citation metrics, for instance, might be useful
indicators of academic impact, as long as they are interpreted in the light of disciplinary norms and with
due regard to their limitations. Journal-level metrics, such as the JIF, should not be used. (HR managers,
recruitment and promotion panels, UUK)
5 Individual researchers should be mindful of the limitations of particular indicators in the way they
present their own CVs and evaluate the work of colleagues. When standard indicators are inadequate,
individual researchers should look for a range of data sources to document and support claims about the
impact of their work. (All researchers)
6 Like HEIs, research funders should develop their own context-specific principles for the use of
quantitative indicators in research assessment and management and ensure that these are well
communicated, easy to locate and understand. They should pursue approaches to data collection that are
transparent, accessible, and allow for greater interoperability across a diversity of platforms. (UK HE
Funding Bodies, Research Councils, other research funders)
7 Data providers, analysts and producers of university rankings and league tables should strive for
greater transparency and interoperability between different measurement systems. Some, such as
the Times Higher Education (THE) university rankings, have taken commendable steps to be more open
about their choice of indicators and the weightings given to these, but other rankings remain ‘black-
boxed’. (Data providers, analysts and producers of university rankings and league tables)
8 Publishers should reduce emphasis on journal impact factors as a promotional tool, and only use
them in the context of a variety of journal-based metrics that provide a richer view of performance.
As suggested by DORA, this broader indicator set could include 5-year impact factor, EigenFactor,
SCImago, editorial and publication times. Publishers, with the aid of Committee on Publication Ethics
(COPE), should encourage responsible authorship practices and the provision of more detailed
information about the specific contributions of each author. Publishers should also make available a range
of article-level metrics to encourage a shift toward assessment based on the academic quality of an article
rather than JIFs. (Publishers)
Improving the data infrastructure that supports research information management
9 There is a need for greater transparency and openness in research data infrastructure. A set of
principles should be developed for technologies, practices and cultures that can support open,
trustworthy research information management. These principles should be adopted by funders, data
providers, administrators and researchers as a foundation for further work. (UK HE Funding Bodies,
RCUK, Jisc, data providers, managers, administrators)
10 The UK research system should take full advantage of ORCID as its preferred system of unique
identifiers. ORCID iDs should be mandatory for all researchers in the next REF. Funders and HEIs
should utilise ORCID for grant applications, management and reporting platforms, and the benefits of
ORCID need to be better communicated to researchers. (HEIs, UK HE Funding Bodies, funders,
managers, UUK, HESA)
11 Identifiers are also needed for institutions, and the most likely candidate for a global solution is the
ISNI, which already has good coverage of publishers, funders and research organisations. The use
of ISNIs should therefore be extended to cover all institutions referenced in future REF submissions, and
used more widely in internal HEI and funder management processes. One component of the solution will
be to map the various organisational identifier systems against ISNI to allow the various existing systems
to interoperate. (UK HE Funding Bodies, HEIs, funders, publishers, UUK, HESA)
12 Publishers should mandate ORCID iDs and ISNIs and funder grant references for article
submission, and retain this metadata throughout the publication lifecycle. This will facilitate
exchange of information on research activity, and help deliver data and metrics at minimal burden to
researchers and administrators. (Publishers and data providers)
13 The use of digital object identifiers (DOIs) should be extended to cover all research outputs. This
should include all outputs submitted to a future REF for which DOIs are suitable, and DOIs should also
be more widely adopted in internal HEI and research funder processes. DOIs already predominate in the
journal publishing sphere – they should be extended to cover other outputs where no identifier system
exists, such as book chapters and datasets. (UK HE Funding Bodies, HEIs, funders, UUK)
14 Further investment in research information infrastructure is required. Funders and Jisc should
explore opportunities for additional strategic investments, particularly to improve the interoperability of
research management systems. (HM Treasury, BIS, RCUK, UK HE Funding Bodies, Jisc, ARMA)
Increasing the usefulness of existing data and information sources
15 HEFCE, funders, HEIs and Jisc should explore how to leverage data held in existing platforms to
support the REF process, and vice versa. Further debate is also required about the merits of local
collection within HEIs and data collection at the national level. (HEFCE, RCUK, HEIs, Jisc, HESA,
16 BIS should identify ways of linking data gathered from research-related platforms (including
Gateway to Research, Researchfish and the REF) more directly to policy processes in BIS and
other departments, especially around foresight, horizon scanning and research prioritisation. (BIS, other
government departments, UK HE Funding Bodies, RCUK)
Using metrics in the next REF
17 For the next REF cycle, we make some specific recommendations to HEFCE and the other HE
Funding Bodies, as follows. (UK HE Funding Bodies)
a. In assessing outputs, we recommend that quantitative data – particularly around published
outputs – continue to have a place in informing peer review judgements of research quality.
This approach has been used successfully in REF2014, and we recommend that it be continued and
enhanced in future exercises.
b. In assessing impact, we recommend that HEFCE and the UK HE Funding Bodies build on the
analysis of the impact case studies from REF2014 to develop clear guidelines for the use of
quantitative indicators in future impact case studies. While not being prescriptive, these
guidelines should provide suggested data to evidence specific types of impact. They should include
standards for the collection of metadata to ensure the characteristics of the research being described
are captured systematically; for example, by using consistent monetary units.
c. In assessing the research environment, we recommend that there is scope for enhancing the
use of quantitative data, but that these data need to be provided with sufficient context to
enable their interpretation. At a minimum this needs to include information on the total size of the
UOA to which the data refer. In some cases, the collection of data specifically relating to staff
submitted to the exercise may be preferable, albeit more costly. In addition, data on the structure and
use of digital information systems to support research (or research and teaching) may be crucial to
further develop excellent research environments.
Coordinating activity and building evidence
18 The UK research community needs a mechanism to carry forward the agenda set out in this report.
We propose the establishment of a Forum for Responsible Metrics, which would bring together
research funders, HEIs and their representative bodies, publishers, data providers and others to
work on issues of data standards, interoperability, openness and transparency. UK HE Funding
Bodies, UUK and Jisc should coordinate this forum, drawing in support and expertise from other funders
and sector bodies as appropriate. The forum should have preparations for the future REF within its remit,
but should also look more broadly at the use of metrics in HEI management and by other funders. This
forum might also seek to coordinate UK responses to the many initiatives in this area across Europe and
internationally – and those that may yet emerge – around research metrics, standards and data
infrastructure. It can ensure that the UK system stays ahead of the curve and continues to make real
progress on this issue, supporting research in the most intelligent and coordinated way, influencing
debates in Europe and the standards that other countries will eventually follow. (UK HE Funding Bodies,
UUK, Jisc, ARMA)
19 Research funders need to increase investment in the science of science policy. There is a need for
greater research and innovation in this area, to develop and apply insights from computing, statistics,
social science and economics to better understand the relationship between research, its qualities and
wider impacts. (Research funders)
20 One positive aspect of this review has been the debate it has generated. As a legacy initiative, the
steering group is setting up a blog (www.ResponsibleMetrics.org) as a forum for ongoing discussion
of the issues raised by this report. The site will celebrate responsible practices, but also name and
shame bad practices when they occur. Researchers will be encouraged to send in examples of good or bad
design and application of metrics across the research system. Adapting the approach taken by the Literary
Review’s “Bad Sex in Fiction” award, every year we will award a “Bad Metric” prize to the most
egregious example of an inappropriate use of quantitative indicators in research management. (Review
1. Measuring up
“ The standing of British science, and the individuals and institutions that comprise
it, is rooted firmly in excellence… Much of the confidence in standards of
excellence promoted comes from decisions being informed by peer-review: leading
experts assessing the quality of proposals and work.”
Our Plan for Growth: science and innovation, HM Treasury/BIS, December 2014
“ We have more top ranking universities in London than in any other city in the
world. With 4 universities in the global top 10, we rank second only to the US.”
Jo Johnson MP, Minister for Universities and Science, 1 June 2015
Citations, journal impact factors, h-indices, even tweets and Facebook likes – there are no end of
quantitative measures that can now be used to try to assess the quality and wider impacts of research.
But how robust and reliable are such metrics, and what weight – if any – should we give them in the
future management of research systems at the national or institutional level?
These are questions that have been explored over the past year by the Independent Review of the Role
of Metrics in Research Assessment. The review was announced by David Willetts, then Minister for
Universities and Science, in April 2014, and has been supported by the Higher Education Funding
Council for England (HEFCE).
As the 2014 BIS/HM Treasury science and innovation strategy reminds us, the UK has a remarkable
breadth of excellent research across the sciences, engineering, social sciences, arts and humanities.
These strengths are often expressed in metric shorthand: “with just 3% of global research spending,
0.9% of global population and 4.1% of the world’s researchers, the UK produces 9.5% of article
downloads, 11.6% of citations and 15.9% of the world’s most highly-cited articles”.
The quality and productivity of our research base is, at least in part, the result of smart management of
the dual-support system of research funding. Since the introduction of the Research Assessment
Exercise (RAE) in 1986, the UK has been through six cycles of evaluation and assessment, the latest
of which was the 2014 Research Excellence Framework (REF2014). Processes to ensure and improve
Speech to ‘Going Global’ 2015 conference https://www.gov.uk/government/speeches/international-higher-education
Elsevier. (2013). International Comparative Performance of the UK Research Base – 2013; A report prepared by Elsevier
for the UK’s Department of Business, Innovation and Skills (BIS), p2.
comparative-performance-of-the-UK-research-base-2013.pdf. Retrieved 1 May 2015.
research quality, and more recently its wider impacts, are also used by the UK Research Councils, by
other funders such as the Wellcome Trust, and by universities themselves.
The quality and diverse impacts of research have traditionally been assessed using a combination of
peer review and a variety of quantitative indicators. Peer review has long been the most widely used
method, and underpins the academic system in the UK and around the world. The use of metrics is a
newer approach, but has developed rapidly over the past 20 years as a potential method of measuring
research quality and impact in some fields. How best to do this remains the subject of considerable
There are powerful currents whipping up the metric tide. These include growing pressures for audit
and evaluation of public spending on higher education and research; demands by policymakers for
more strategic intelligence on research quality and impact; the need for institutions to manage and
develop their strategies for research; competition within and between institutions for prestige,
students, staff and resources; and increases in the availability of real-time ‘big data’ on research
uptake, and the capacity of tools for analysing them.
In a positive sense, wider use of quantitative indicators, and the emergence of alternative metrics for
societal impact, can be seen as part of the transition to a more open, accountable and outward-facing
But this has been accompanied by a backlash against the inappropriate weight being
placed on particular indicators – such as journal impact factors (JIFs) – within the research system, as
reflected by the 2013 San Francisco Declaration on Research Assessment (DORA), which now has
over 570 organisational and 12,300 individual signatories.
As DORA argues, “The outputs from
scientific research are many and varied…Funding agencies, institutions that employ scientists, and
scientists themselves, all have a desire, and need, to assess the quality and impact of scientific outputs.
It is thus imperative that scientific output is measured accurately and evaluated wisely.”
1.1. Our terms of reference
Our work builds on an earlier pilot exercise in 2008 and 2009, which tested the potential for using
bibliometric indicators of research quality in REF2014. At that time, it was concluded that citation
information was insufficiently robust to be used formulaically or as a primary indicator of quality, but
that there might be scope for it to enhance processes of expert review.
Royal Society. (2012). Science as an Open Enterprise. The Royal Society Science Policy Centre report 02/12
https://royalsociety.org/~/media/policy/projects/sape/2012-06-20-saoe.pdf. Retrieved 1 June 2015.
www.ascb.org/dora. As of June 2015, only three UK universities are DORA signatories: Manchester, Sussex and UCL.
This review has gone beyond the earlier pilot study to take a deeper and broader look at the potential
uses and limitations of research metrics and indicators. It has explored the use of metrics across
different disciplines, and assessed their potential contribution to the development of research
excellence and impact within higher education. It has also analysed their role in processes of research
assessment, including the next cycle of the REF. And it has considered the changing ways in which
universities are using metrics, particularly the growing power of league tables and rankings. Finally, it
has considered the relationship between the use of indicators and issues of equality and diversity, and
the potential for ‘gaming’ that can arise from the use of particular indicators in systems of funding and
To give structure and focus to our efforts, clear terms of reference were established at the outset. The
review was asked to examine:
The relative merits of different metrics in assessing the academic qualities and
diverse impacts of research;
The advantages and disadvantages of using metrics, compared with peer review, in
creating an environment that enables and encourages excellent research and diverse
impact, including fostering inter- and multidisciplinary research;
How metrics-based research assessment fits within the missions of universities and
research institutes, and the value that they place on published research outputs in
relation to the portfolio of other activities undertaken by their staff, including
training and education;
The appropriate balance between peer review and metrics in research assessment,
and the consequences of shifting that balance for administrative burden and research
cultures across different disciplines;
What is not, or cannot, be measured by quantitative metrics;
The differential impacts of metrics-based assessment on individual researchers,
including the implications for early-career researchers, equality and diversity;
Ethical considerations, and guidance on how to reduce the unintended effects and
inappropriate use of metrics and university league-tables, including the impact of
metrics-based assessment on research culture;
The extent to which metrics could be used in novel ways by higher education
institutions (HEIs) and research funders to support the assessment and management
The potential contribution of metrics to other aspects of research assessment, such as
the matching of reviewers to proposals, or research portfolio analysis;
The use of metrics in broader aspects of government science, innovation and
Reflecting the evidence we received, this report focuses in greater depth on some aspects of these
terms of reference than others (notably, the use of metrics in the REF, by other funders and in HEI
management). However, we hope that the report provides a clear framework for thinking about the
broader role of metrics, data and indicators within research management, and lays helpful foundations
for further work to be carried out by HEFCE, the Research Councils and others.
The review has been conducted in an open and consultative manner, with the aim of drawing in
evidence, views and perspectives from across the higher education and research system. There has
been a strong emphasis on transparency and plurality throughout the project, and the make-up of the
review’s steering group itself reflects a diversity of disciplines and perspectives. In addition, the group
has engaged actively with stakeholders from across the research community through numerous
workshops, meetings, talks and other channels, including the review’s website and social media.
Papers from steering group meetings have been made publicly available at every stage, as have other
resources, including evidence received and slides presented at workshops.
1.2. Definitions and terminology
The research assessment landscape is contested, contentious and complex. Researchers, funders and
managers face an ever-expanding menu of indicators, metrics and assessment methods in operation,
many of which are explored in this review. Some are founded on peer review, others on quantitative
indicators such as citation counts, or measures of input, such as research funding or student numbers.
The term ‘metric’ is itself open to misunderstanding, because something can be a metric in one
context but not in another. For example, the number of citations received by a researcher’s
publications is a citation metric but not an impact metric because it does not directly measure the
impact of that researcher’s work. In other words, it can imply ‘measurement’ of a quantity or quality
which has not in fact been measured. The term indicator is preferable in contexts in which there is the
potential for confusion. To reduce the scope of possible misunderstanding, this report will adopt the
following definitions and terminology throughout.
All of this material is available at the review’s website: https://www.hefce.ac.uk/rsrch/metrics/
A measurable quantity that ‘stands in’ or substitutes for something
less readily measurable and is presumed to associate with it without
directly measuring it. For example, citation counts could be used as
indicators for the scientific impact of journal articles even though
scientific impacts can occur in ways that do not generate citations.
Similarly, counts of online syllabi mentioning a particular book
might be used as an indicator of its educational impact.
Bibliometrics focuses on the quantitative analysis of scientific and
scholarly publications, including patents. Bibliometrics is part of the
field of scientometrics: the measurement of all aspects of science and
technology, which may encompass information about any kind of
research output (data, reagents, software, researcher interactions,
funding, research commercialisation, and other outputs).
The most widely exploited bibliometric relies on counts of citations.
Citation counts are sometimes used as an indicator of academic
impact in the sense that citations from other documents suggest that
the cited work has influenced the citing work in some way.
Bibliometric indicators might normalise these citation counts by
research field and by year, to take into account the very different
citation behaviours between disciplines and the increase in citations
over time. It has to be emphasised that as bibliometrics often do not
distinguish between negative or positive citation, highly cited
literature might attract attention due to controversy or even error.
High numbers of citations might also result from a range of different
contributions to a field e.g. including papers that establish new
methodologies or systematically review the field, as well as primary
Alternative or altmetrics
Altmetrics are non-traditional metrics that cover not just citation
counts but also downloads, social media shares and other measures of
impact of research outputs. The term is variously used to mean
‘alternative metrics’ or ‘article level metrics’, and it encompasses
webometrics, or cybermetrics, which measure the features and
relationships of online items, such as websites and log files. The rise
of new social media has created an additional stream of work under
Definitions adapted from Encyclopedia of Science Technology and Ethics, 2nd Edition (2014). Macmillan.
the label altmetrics. These are indicators derived from social
websites, such as Twitter, Academia.edu, Mendeley, and
ResearchGate with data that can be gathered automatically by
A process of research assessment based on the use of expert
deliberation and judgement.
Academic or scholarly
Academic or scholarly impact is a recorded or otherwise auditable
occasion of influence from academic research on another researcher,
university organisation or academic author. Academic impacts are
most objectively demonstrated by citation indicators in those fields
that publish in international journals.
As for academic or scholarly impact, though where the effect or
influence reaches beyond scholarly research, e.g. on education,
society, culture or the economy.
Research has a societal impact when auditable or recorded influence
is achieved upon non-academic organisation(s) or actor(s) in a sector
outside the university sector itself – for instance, by being used by
one or more business corporations, government bodies, civil society
organisations, media or specialist/professional media organisations or
in public debate. As is the case with academic impacts, societal
impacts need to be demonstrated rather than assumed. Evidence of
external impacts can take the form of references to, citations of or
discussion of a person, their work or research results.
For the purposes of the REF2014,
impact was defined as an effect
on, change or benefit to the economy, society, culture, public policy
or services, health, the environment or quality of life, beyond
academia. REF2014 impact includes, but was not limited to, an effect
on, change or benefit to:
Adapted from: Council of Canadian Academies. (2012). Informing Research Choices: Indicators and Judgment, p11.
ance/scienceperformance_fullreport_en_web.pdf. Retrieved 6 December 2014.
Taken from LSE Public Policy Group (2011) Maximising the Impacts of Your Research: A Handbook for Social Scientists.
London: PPG. http://blogs.lse.ac.uk/impactofsocialsciences/the-handbook/.
REF 02. 2011. Assessment framework and guidance on submissions, p26, para 141.
df. Retrieved 2 April 2015.
the activity, attitude, awareness, behaviour, capacity,
opportunity, performance, policy, practice, process or
of an audience, beneficiary, community, constituency,
organisation or individuals
in any geographic location whether locally, regionally, nationally or
Within REF2014, the research environment was assessed in terms of
its ‘vitality and sustainability’, including its contribution to the
vitality and sustainability of the wider discipline or research base.
Within REF2014, panels assessed the quality of submitted research
outputs in terms of their ‘originality, significance and rigour’, with
reference to international research quality standards.
1.3. Data collection and analysis
The review drew on an extensive range of evidence sources, including:
A formal call for evidence
A call for evidence was launched on 1 May 2014, with a response deadline of 30 June 2014.
steering group appealed for evidence from a wide range of sources, including written summaries or
published research. Respondents were asked to focus on four key themes and associated questions, as
A Identifying useful metrics for research assessment.
B How metrics should be used in research assessment.
C ‘Gaming’ and strategic use of metrics.
D International perspective.
Ibid, p23, para 118, notes that permitted ‘types’ of outputs included: Books (or parts of books); Journal articles and
conference contributions; Physical artefacts; Exhibitions and performances; Other documents; Digital artefacts (including
web content); Other.
The call for evidence letter is available at:
In total, 153 responses were received to the call for evidence: 67 from HEIs, 42 from individuals, 27
from learned societies, 11 from publishers and data providers, three from HE mission groups, and
three from other respondents. An analysis of the evidence received can be found at
A literature review
Two members of the Steering Group, Paul Wouters and Michael Thelwall, researched and wrote a
comprehensive literature review to inform the review’s work. The findings of the literature review
have been incorporated into this report at appropriate points, and the full review is available as
Supplementary Report I.
Community and stakeholder engagement
The review team engaged actively with stakeholders across the higher education and research
community. These activities included a series of six workshops, organised by the steering group, on
specific aspects of the review, such as the role of metrics within the arts and humanities, and links to
equality and diversity. Members of the steering group also gave talks and presentations about the
work of the review at around 30 conferences, roundtables and workshops. Findings and insights from
these events have been incorporated into the report wherever appropriate. A full itinerary of events
linked to the review can be found in the annex of tables at the end of this report (Table 2).
Media and social media
Over the course of the review, the steering group sought to encourage wider discussion of these issues
in the sector press (particularly Times Higher Education and Research Fortnight) and through social
media. There was extensive use of the #HEFCEmetrics hashtag on Twitter. Members of the steering
group, including Stephen Curry,
also wrote blog posts on issues relating to the review, and a number
of other blog posts and articles were written in response to the review.
Wouters, P., et al. (2015). Literature Review: Supplementary Report to the Independent Review of the Role of Metrics in
Research Assessment and Management. HEFCE. DOI: 10.13140/RG.2.1.5066.3520.
Curry, S. (2014). Debating the role of metrics in research assessment. Blog posted at
http://occamstypewriter.org/scurry/2014/10/07/debating-the-role-of-metrics-in-research-assessment/. Retrieved 1 June 2015.
Numerous blog posts, including contributions from steering group members, have been featured at
http://blogs.lse.ac.uk/impactofsocialsciences/2014/04/03/reading-list-for-hefcemetrics/. Retrieved 1 June 2015. We have
referred to some of these posts within this report. Others discussing the review through blog posts include: David
Retrieved 1 June 2015. Also see contributors to: http://thedisorderofthings.com/tag/metrics/. Retrieved 1 June 2015.
Focus groups with REF2014 panel members
The steering group participated in a series of focus group sessions for REF2014 panel members,
organised by HEFCE, to allow panellists to reflect on their experience, and wider strengths and
weaknesses of the exercise. Specific sessions explored the pros and cons of any uses of metrics within
REF2014, and their potential role in future assessment exercises.
Where relevant, the steering group also engaged with and analysed findings from HEFCE’s portfolio
of REF2014 evaluation projects, including:
The nature, scale and beneficiaries of research impact: an initial analysis of
REF2014 case studies;
Preparing impact submissions for REF2014;
Assessing impact submissions for REF2014;
Evaluating the 2014 REF: Feedback from participating institutions;
REF Manager’s report;
REF panel overview reports;
REF Accountability Review: costs, benefits and burden project report.
King’s College London and Digital Science. (2015). The nature, scale and beneficiaries of research impact: An initial
analysis of Research Excellence Framework (REF ) 2014 impact case studies.
www.hefce.ac.uk/pubs/rereports/Year/2015/analysisREFimpact/. Retrieved 1 June 2015.
Manville, C., Morgan Jones, M, Frearson, M., Castle-Clarke, S., Henham, M., Gunashekar, S. and Grant, J. (2015).
Preparing impact submissions for REF 2014: Findings and Observations. Santa Monica, Calif.: RAND Corporation. RR-
Manville, C., Guthrie, S., Henham, M., Garrod, B., Sousa, S., Kirtley, A., Castle-Clark, S. and Ling, T. (2015). Assessing
impact submissions for REF2014: An Evaluation. Santa Monica, Calif. RAND Corp.
HEFCE. (2015). Evaluating the 2014 REF: Feedback from Participating Institutions.
HEFCE. (2015). Research Excellence Framework 2014: Manager’s report.
www.ref.ac.uk/media/ref/content/pub/REF_managers_report.pdf. Retrieved 25 May 2015
HEFCE’s Panel overview reports can be downloaded from www.ref.ac.uk/panels/paneloverviewreports/
Relating REF2014 outcomes to indicators
A final element of our evidence gathering was designed to assess the extent to which the outcome of
the REF2014 assessment correlated with 15 metrics-based indicators of research performance. For the
first time, we were able to associate anonymised REF authors by paper outputs to a selection of metric
indicators, including ten bibliometric indicators and five alternative metric indicators. Previous
research in this area has been restricted to specific subject areas and departmental level metrics, as the
detailed level of data required for this analysis was destroyed before publication of the REF2014
results. This work is summarised in Chapter 9, and presented in detail in Supplementary Report II.
1.4. The structure of this report
This opening chapter has provided a summary of the aims and working methods of the review, and
the range of evidence sources on which this final report draws.
Chapter 2 (The rising tide) gives a brief history of the role of metrics in research management, and
the evolution of data infrastructure and standards to underpin more complex and varied uses of
quantitative indicators. It also surveys the main features of research assessment systems in a handful
of countries: Australia, Denmark, Italy, the Netherlands, New Zealand and the United States.
Chapter 3 (Rough indications) looks in greater detail at the development, uses and occasional abuses
of four categories of quantitative indicators: bibliometric indicators of research quality; alternative
indicators of quality; input indicators; and indicators of impact.
Chapter 4 (Disciplinary dilemmas) maps the diversity in types of research output, publication
practices and citation cultures across different disciplines, and the implications these have for any
attempts to develop standardised indicators across the entire research base. It also considers the extent
to which quantitative indicators can be used to support or suppress multi- or interdisciplinary research.
Chapter 5 (Judgement and peer review) compares the strengths and weaknesses of the peer review
system with metric-based alternatives, and asks how we strike an appropriate balance between
quantitative indicators and expert judgement.
Chapter 6 (Management by metrics) charts the rise of more formal systems of research management
within HEIs, and the growing significance that is being placed on quantitative indicators, both within
HEFCE. (2015). Correlation analysis of REF2014 scores and metrics: Supplementary Report II to the Independent Review
of the Role of Metrics in Research Assessment and Management. HEFCE. DOI: 10.13140/RG.2.1.3362.4162.
institutions and as a way of benchmarking performance against others. It looks specifically at
university rankings and league tables as a visible manifestation of these trends, and considers how
these might be applied in more responsible ways across the sector.
Chapter 7 (Cultures of counting) assesses the wider effects a heightened emphasis on quantitative
indicators may have on cultures and practices of research, including concerns over systems for
performance management, and negative effects on interdisciplinarity, equality and diversity. It also
considers the extent to which metrics exacerbate problems of gaming and strategic approaches to
Chapter 8 (Sciences in transition) looks beyond HEIs to examine changes in the way key institutions
in the wider research funding system are using quantitative indicators, including the Research
Councils, research charities such as the Wellcome Trust, and the national academies. It also looks to
developments at the European level, within Horizon2020. Finally, it considers how government could
make greater use of available quantitative data sources to inform horizon scanning and policies for
research and innovation.
Chapter 9 (Reflections on REF) provides a detailed analysis of the modest role that quantitative
indicators played in REF2014, and considers a range of scenarios for their use in future assessment
exercises. It also outlines the results of our own quantitative analysis, which correlated the actual
outcomes of REF2014 against 15 metrics-based indicators of research performance.
Finally, Chapter 10 (Responsible metrics) summarises our headline findings, and makes a set of
targeted recommendations to HEIs, research funders (including HEFCE), publishers and data
providers, government and the wider research community. Within a framework of responsible
metrics, the report concludes with clear guidance on how quantitative indicators can be used
intelligently and appropriately to further strengthen the quality and impacts of UK research.
2. The rising tide
“ The institutionalization of the citation is the culmination of a decades-long process
starting with the creation of the Science Citation Index. The impact of this
emergence of a new social institution in science and scholarship is often
“ A timid, bureaucratic spirit has come to suffuse every aspect of intellectual life.
More often than not, it comes cloaked in the language of creativity, initiative and
The quantitative analysis of scientific papers and scholarly articles has been evolving since the early
century. Lotka’s Law, dating back to 1926, first highlighted that within a defined area over a
specific period, a low number of authors accounted for a large percentage of publications.
point, the field of scientometrics
developed rapidly, especially after the creation of the Science
Citation Index (SCI), and over time we have seen a proliferation of quantitative indicators for
research. This chapter provides a brief history of the use of metrics in research management and
assessment, focusing on bibliometrics, alternative metrics and the role of data providers and data
infrastructure. We then offer a brief outline of research assessment approaches from six countries.
The SCI was created in 1961, by Eugene Garfield.
Initially, it was mainly used by scientometric
experts, rather than by the wider research community. In this early stage of scientometrics, data were
generally used to describe the development and direction of scientific research, rather than to evaluate
Wouters, P. (2014). The Citation: From Culture to Infrastructure. In Cronin, B. and Sugimoto, C. R. (eds.) Beyond
Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact. MIT Press.
Graeber, D. (2015) The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy. London: Melville
Elsevier (2007). Scientometrics from Past to Present. Research Trends, 1, September 2007.
www.researchtrends.com/issue1-september-2007/sciomentrics-from-past-to-present/. Retrieved 1 March 2015.
“Scientometric research [is] the quantitative mathematical study of science and technology, encompassing both
bibliometric and economic analysis.” Ibid.
Garfield founded the Institute for Scientific Information (ISI), which is now part of Thomson Reuters.
In the 1980s, new approaches to public management, particularly in the UK and US, led to a growing
emphasis on measurable indicators of the value of research. The 1990s gave rise to increasingly
strategic forms of research policy and management, accompanied by greater use of bibliometric
indicators, including JIF scores. These were developed in 1955 by Eugene Garfield, and became
available through Journal Citation Reports from 1975,
but were used quite infrequently initially, and
have only seen a real explosion in usage since the 1990s.
Citation analysis has been much more readily available since 2001, when the Web of Science (WoS)
became easily accessible to all, followed by Scopus in 2003 and Google Scholar (GS) in 2004. J.E.
Hirsch invented the Hirsch or h-index in 2005, and this led to a surge of interest in individual level
2.2. Alternative metrics
From the mid-1990s, as advances in information technology created new ways for researchers to
network, write and publish, interest grew in novel indicators better suited to electronic communication
and to capturing impacts of different kinds.
These alternative metrics include web citations in digitised scholarly documents (e.g. eprints, books,
science blogs or clinical guidelines) and, more recently, altmetrics derived from social media (e.g.
social bookmarks, comments, ratings and tweets). Scholars may also produce and use non-refereed
academic outputs, such as blog posts, datasets and software, where usage-based indicators are still in
the early stages of development. Significant developments in this area include the establishment of
F1000Prime in 2002, Mendeley in 2008 and Altmetric.com in 2011.
2.3. Approaches to evaluation
Research assessment has traditionally focused on input and output indicators, evaluating academic
impact through bibliometric measures such as citation counts. However, there is now far greater focus
on the wider impacts, outcomes and benefits of research, as reflected in exercises such as REF2014.
The measurement of societal impact, with robust indicators and accurate, comparable data, is still in
its relative infancy.
Garfield, E. (2006). The history and meaning of the journal impact factor. Journal of the American Medical Association,
295 (1), 90-93.
Ingwersen, P. (1998). The calculation of Web impact factors. Journal of Documentation, 54 (2), 236-243; Borgman, C.,
and Furner, J. (2002). Scholarly communication and bibliometrics. Annual Review of Information Science and Technology,
36. Medford, NJ: Information Today Inc., pp. 3-72; Priem, J., Taraborelli,, D., Groth, P. and Neylon, C. (2010). Altmetrics:
A manifesto, 26 October 2010. http://altmetrics.org/manifesto. Retrieved 1 June 2015.
Neither research quality nor its impacts are straightforward concepts to pin down or assess. Differing
views on what they are, and how they can be measured, lie at the heart of debates over research
assessment. In this report, we take research quality to include all scholarly impacts. But what
constitutes quality remains contested.
As PLOS noted in its submission to this review, “it is unclear
whether any unique quality of research influence or impact is sufficiently general to be measured”.
In the context of research evaluation, quality typically denotes the overall calibre of research based on
the values, criteria or standards inherent in an academic community.
However, those values and
standards are highly dependent on context: for instance, views vary enormously across and indeed
within certain disciplines, as a result of different research cultures, practices and philosophical
approaches. It is more productive to think in terms of research qualities, rather than striving for a
2.4. Data providers
As scientometrics has developed, and evaluation systems have become more sophisticated, so the
range of data providers and analysts has grown.
Those now engaged with the production of
quantitative data and indicators include government agencies at the international, national and local
level, HEIs, research groups, and a wide range of commercial data providers, publishers and
Funding agencies in the US, France, UK and the Netherlands were pioneers in using bibliometrics for
research evaluation and monitoring, and the Organisation for Economic Co-operation and
Development (OECD) set global standards for national science and technology indicators in its
Today, leading universities around the world have adopted, or are in the process of developing,
comprehensive research information systems in which statistical and qualitative evidence of
Halevi, G. and Colledge, L. (2014). Standardizing research metrics and indicators- perspectives and approaches. Research
Trends. 39, December 2014. www.researchtrends.com/issue-39-december-2014/standardizing-research-metrics-and-
indicators/. Retrieved 4 January 2015.
Council of Canadian Academies. (2012), p43.
Whitley, R. (2010). Reconfiguring the public sciences: the impact of governance changes on authority and innovation in
public science systems, in Reconfiguring knowledge production: changing authority relationships in the sciences and their
consequences for intellectual innovation, edited by R. Whitley et al. Oxford, Oxford University Press.
performance in research, teaching, impact and other services can be recorded.
benchmarking tools such as SciVal and InCites, management systems such as PURE and Converis,
and data consultancy from companies such as Academic Analytics, iFQ, Sciencemetrix and CWTS.
Assisted by reference linking services like CrossRef, these enable users to link sophisticated
bibliometric and other indicator-based analyses with their information infrastructure at all levels, to
monitor institutional, departmental and individual performance. Research funders, such as RCUK, are
also adopting new systems like Researchfish, which gather new information about research progress,
while other funders are using systems such as UberResearch which aggregate existing information
and add value to it.
2.5. Data infrastructure
Systems for data collection and analysis have developed organically and proliferated over the past
decade. In response to this review, many HEIs noted the burden associated with populating and
updating multiple systems, and the need for more uniform standards and identifiers that could work
across all of them. Others raised concerns that underpinning systems may become overly controlled
by private providers, whose long-term interests may not align with those of the wider research
Underpinning infrastructure has to be fit for the purpose of producing robust and trustworthy
Wherever possible, data systems also need to be open and transparent
principles for ‘open’ scholarly infrastructures.
To produce indicators that can be shared across
platforms, there are a number of prerequisites: unique identifiers; defined data standards; agreed data
semantics; and open data processing methods. These are discussed in turn below. In addition, the
infrastructure must be able to present the relevant suites of indicators to optimise forms of assessment
DINI AG Research Information Systems (2015) Research information systems at universities and research
institutions - Position Paper of DINI AG FIS. https://zenodo.org/record/17491/files/DINI_AG-
FIS_Position_Paper_english.pdf. Retrieved 1 July 2015.
Jacso, P. (2006). Deflated, inflated and phantom citation counts. Online Information Review. 30 (3), 297-309; Abramo, G.
and D’Angelo, C. A. (2011). Evaluating research: from informed peer review to bibliometrics. Scientometrics. 87, 499–514.
Bilder, G., Lin, J. and Neylon, C. (2015). Principles for Open Scholarly Infrastructure-v1,
http://cameronneylon.net/blog/principles-for-open-scholarly-infrastructures/. Retrieved 1 June 2015.
Royal Society. (2012). Science as an Open Enterprise. The Royal Society Science Policy Centre report 02/12
https://royalsociety.org/~/media/policy/projects/sape/2012-06-20-saoe.pdf. Retrieved 1 June 2015.
that are sensitive to specific research missions and context. They should not ‘black-box’ particular
indicators or present them as relevant for all fields and purposes.
Some key players in research information
Converis (owned by Thomson Reuters) is an integrated research information system. It provides support for
universities, other research institutions and funding offices in collecting and managing data through the research
CrossRef is a collaborative reference linking service that functions as a sort of digital switchboard. Its specific
mandate is to be the citation linking backbone for all scholarly information in electronic form. It holds no full text
content, but effects linkages through CrossRef Digital Object Identifiers (CrossRef DOI), which are tagged to article
metadata supplied by the participating publishers. www.crossref.org/
Elements (owned by Symplectic) is designed to gather research information to reduce the administrative burden
placed on researchers, and to support research organisation librarians and administrators. http://symplectic.co.uk/
InCites (owned by Thomson Reuters) is a customised, web-based research evaluation tool that allows users to
analyse institutional productivity and benchmark output against peers worldwide, through access to customised citation
data, global metrics, and profiles on leading research institutions. http://researchanalytics.thomsonreuters.com/incites/
PURE (owned by Elsevier) is a research information system. It accesses and aggregates internal and external sources,
and offers analysis, reporting and benchmarking functions. www.elsevier.com/solutions/pure
Researchfish is an online database of outputs reported by researchers linked to awards, now widely used by UK
funding agencies and being taken up by funders in Denmark and Canada. It aims to provide a structured approach to
prospectively capturing outputs and outcomes from as soon as funding starts, potentially to long after awards have
finished. The information is used by funders to track the progress, productivity and quality of funded research, and as a
way of finding examples of impact. https://www.researchfish.com/
SciVal (owned by Elsevier) provides information on the research performance of research institutions across the globe.
This can be used for analysis and benchmarking of performance. www.elsevier.com/solutions/scival
UberResearch provides services aimed at science funders including information tools based on natural language
2.5.1. Unique identifiers
In order for an indicator to be reliable, it is important to be able to collect as much as possible of the
underlying data that the indicator purports to represent. For example, if we consider citations to
academic outputs, it is clear that the main databases do not include all possible citations, and that
numbers of citations within them can vary. As PLOS noted in its response to our call for evidence,
‘there are no adequate sources of bibliometric data that are publicly accessible, useable, auditable and
In order to correctly count the number of citations that an article has, all other articles must be
checked to see if they cite the article in question. This can be achieved through manual processes, but
is subject to error. With unique identifiers for articles, the process can be automated (reducing sources
of error to original mis-citation by the author).
The most commonly used identifier is the Digital Object Identifier (DOI).
While still not universal,
DOIs have gained considerable traction across the sector. For instance, looking at the 191,080 outputs
submitted to REF2014, 149,670 of these were submitted with DOIs (see Supplementary Report II,
Table 1). Use of DOIs varies by discipline, and is still less common in the arts and humanities than in
DOIs in themselves are not sufficient for robust metrics. As well as article identifiers, a robust
management and evaluation system needs unique identifiers for journals, publishers, authors and
institutions. This would enable answers to more sophisticated questions, such as: How many articles
has a particular author produced with citations above the average for the journal in question?
Journals have, in general, adopted the International Standard Serial Number (ISSN
However, there is still a small proportion that have not. Journals which appear in more than one
format (e.g. print and online) will have an ISSN for each media type, but one is the master (ISSN-L),
to which the other ISSNs link.
Publisher and institutional identifiers are more problematic. There are various options for uniquely
identifying organisations. One 2013 study found 22 organisational identifiers currently in use in the
higher education sector in the UK.
But while none of these is wholly authoritative, both the
Hammond, M. and Curtis, G. (2013). Landscape study for CASRAI-UK Organisational ID. http://casrai.org/423
International Standard Name Identifier (ISNI
) and UK Provider Reference Number (UKPRN
traction. The former is international, and the latter is more UK-centric and does not include funders;
so it would seem that ISNI is the preferred route for developing an authoritative list of publishers.
Author identifiers are particularly important, as a particular scholar’s contributions to the scientific
literature can be hard to recognise, as personal names are rarely unique, can change (e.g. through
marriage), and may have cultural differences in name order or abbreviations. Several types of author
identifiers exist, and a detailed analysis of the pros and cons of these was undertaken in 2012 by
The ORCID system is widely regarded as the best, and uptake of ORCID is now growing
rapidly in the UK and internationally. The same analysis recommended that the UK adopted ORCID,
and many of the key players in the UK research system endorsed this proposal in a joint statement in
A similar initiative in the US funded by the Alfred P. Sloan Foundation highlighted
the importance of advocacy and improved data quality.
A recent Jisc-ARMA initiative has
JISC Researcher Identifier Task and Finish Group. (2012). Researcher Identifier Recommendations – Sector Validation.
www.serohe.co.uk/wp-content/uploads/2013/10/Clax-for-JISC-rID-validation-report-final.pdf. Retrieved 1 June 2015.
Signatories to this joint statement include ARMA, HEFCE, HESA, RCUK, UCISA, Wellcome Trust and Jisc.
Brown, J., Oyler, C. and Haak, L. (2015). Final Report: Sloan ORCID Adoption and Integration Program 2013-2014.
http://dx.doi.org/10.6084/m9.figshare.1290632. Retrieved 25 May 2015.
ORCID (Open Researcher and Contributor ID)
ORCID is a non-proprietary alphanumeric code to uniquely identify academic authors. Its stated aim is to aid "the
transition from science to e-Science, wherein scholarly publications can be mined to spot links and ideas hidden in the
ever-growing volume of scholarly literature". ORCID provides a persistent identity for humans, similar to that created for
content-related entities on digital networks by DOIs.
ORCID launched its registry services and started issuing user identifiers on 16 October 2012. It is now an independent
non-profit organisation, and is freely usable and fully interoperable with other ID systems. ORCID is also a subset of the
International Standard Name Identifier (ISNI). The two organisations are cooperating: ISNI has reserved a block of
identifiers for use by ORCID, so it is now possible for an individual to have both an ISNI and an ORCID.
By the end of 2013 ORCID had 111 member organisations and over 460,000 registrants. As of 1 June 2015, the number
of registered accounts reported by ORCID was 1,370,195. Its organisational members include publishers, such as
Elsevier, Springer, Wiley and Nature Publishing Group, funders, learned societies and universities.
successfully piloted the adoption of ORCID in a number of UK HEIs
, and an agreement negotiated
by Jisc Collections will enable UK HEIs to benefit from reduced ORCID membership costs and
enhanced technical support.
UK uptake will also be driven by the Wellcome Trust’s decision to
make ORCID iDs a mandatory requirement for funding applications from August 2015,
and by the
strong support shown by Research Councils UK. ORCID also recently announced an agreement with
ANVUR (National Agency for the Evaluation of University and Research Institutes) and CRUI
(Conference of Italian University Rectors) to implement ORCID on a national scale in Italy.
For outputs other than journal articles, ISBNs (International Standard Book Numbers)
for books are
analogous to ISSNs for journals. A longstanding issue here is that different editions (e.g. hardback
and paperback) have different ISBNs, but retailers such as Amazon have made progress in
disambiguating this information.
Funder references are important unique identifiers for contracts between research-performing and
research-funding organisations. This information is required by most funders to be included in
acknowledgement sections within manuscripts submitted for publication. However despite efforts to
encourage standard forms for this acknowledgement,
there is a need for authoritative sources for
funder names (as with institutional names above), and for authenticating the funding references
(although Europe PubMed Central provides a post-publication grant lookup tool populated by those
agencies that fund it).
Increasingly, other forms of output, such as datasets and conference proceedings, are issued with
DOIs, or DOIs can be obtained retrospectively, for example through platforms such as ResearchGate.
Similarly DOIs can also resolve to ISBNs.
http://orcidpilot.jiscinvolve.org/wp/. ORCID is also discussed in Anstey, A. (2014). How can we be certain who authors
really are? Why ORCID is important to the British Journal of Dermatology. British Journal of Dermatology. 171 (4), 679-
680. DOI 10.1111/bjd.13381. Also Butler, D. (2012) Scientists: your number is up. Nature, 485, 564, DOI:
Retrieved 28 June 2015.
https://orcid.org/blog/2015/06/19/italy-launches-national-orcid-implementation. Retrieved 28 June 2015.
www.rin.ac.uk/our-work/research-funding-policy-and-guidance/acknowledgement-funders-journal-articles Retrieved 1
http://europepmc.org/GrantLookup/. Retrieved 1 June 2015.
Other systems of unique identifiers have been proposed to support the sharing of research equipment
and to improve the citation of research resources.
2.5.2. Defined data standards
Once unique and disambiguated identifiers for objects in the research information arena have been
agreed, the next issue is how to represent them and their associated metadata. Various standards for
data structure and metadata have been proposed over time. Across Europe, one standard for research
information management, the Common European Research Information Format (CERIF),
adopted. In 1991 the European Commission recommended CERIF to the member states, and in 2002
handed stewardship of the standard to euroCRIS.
There have been a number of iterations since
In 2009, Jisc commissioned a report, Exchanging Research Information in the UK,
the use of CERIF as the UK standard for research information exchange. This was followed by
several Jisc-funded initiatives
and a further report: Adoption of CERIF in Higher Education
Institutions in the UK
which noted progress but a lack of UK expertise. The majority of off-the-shelf
research information management systems used in UK HEIs today are CERIF-compliant and able to
exchange data in the agreed format. To date the CERIF standard covers around 300 entities and 2000
attributes, including: people, organisations (and sub units), projects, publications, products,
equipment, funders, programmes, locations, events and prizes, although fully describing research
qualities in this way is an ongoing task.
For example, see the N8 Shared Equipment Inventory System www.n8equipment.org.uk/. Retrieved 1 June 2015.
Bandrowski, A., Brush, M., Grethe, J.S. et al. The Resource Identification Initiative: A cultural shift in publishing [v1; ref
status: awaiting peer review, http://f1000r.es/5fj] F1000Research 2015, 4:134 (DOI:10.12688/f1000research.6555.1).
Retrieved 1 June 2015.
EuroCRIS is a not-for-profit association with offices in The Hague, The Netherlands, that brings together experts on
research information in general and research information systems (CRIS) in particular. The organisation has 200+ members,
mainly coming from Europe, but also from some countries outside of Europe. www.eurocris.org/
Rogers, N., Huxley, L. and Ferguson, N. (2009). Exchanging Research Information in the UK.
http://repository.jisc.ac.uk/448/1/exri_final_v2.pdf. Retrieved 1 June 2015.
Russell, R. (2011). Research Information Management in the UK: Current initiatives using CERIF.
www.ukoln.ac.uk/rim/dissemination/2011/rim-cerif-uk.pdf. Retrieved 1 June 2015.
Russell, R. (2012). Adoption of CERIF in Higher Education Institutions in the UK: A landscape study.
www.ukoln.ac.uk/isc/reports/cerif-landscape-study-2012/CERIF-UK-landscape-report-v1.0.pdf. Retrieved 1 June 2015.
2.5.3. Agreed data semantics
An agreed approach to the semantics of data elements is
required to ensure that everyone interprets data in the
same way. One example is the titles used for academic
staff. In the UK, it might be possible to agree on a
standard scale of lecturer, senior lecturer, reader and
professor, but this does not translate to other countries
where other titles like ‘associate professor’ are
commonly used and ‘readers’ are unknown. Clearly the
context is important to the semantics. In order to
compare research items from different databases, we
need to have a standard vocabulary that we can match
to, ideally at the international level, or else on a country
basis. The Consortia Advancing Standards in Research
Administration Information (CASRAI) is an
international non-profit organisation that constructs such dictionaries, working closely with other
2.5.4. More than pure semantics
Once all these elements are in place, it is possible to build
robust indicators and metrics. But here again, agreed
definitions are key. Take the example of proposal success
rates. If an institution has submitted ten proposals for
funding and three have been funded, it may claim to have a
30% success rate. This indicator could be benchmarked
against other institutions. However, if two of those
proposals were yet to be reviewed, a three in eight or 37.5%
success rate could also be claimed. Alternatively, the
success rate might be calculated based on the financial value
of applications and awards rather than the number
submitted, each definition producing potentially different
‘success rates’ from the same data.
Kerridge, S. (2015). Questions of identity. Research Fortnight. 27 May 2015.
https://www.researchprofessional.com/0/rr/news/uk/views-of-the-uk/2015/5/Questions-of-identity.html. Retrieved 1 June
The Consortia Advancing Standards in Research
Administration Information (CASRAI) is an
international non-profit organisation dedicated to
reducing the administrative burden on
researchers and improving business intelligence
capacity of research institutions and funders.
CASRAI works by partnering with funders,
universities, suppliers and sector bodies to define
a dictionary and catalogue of exchangeable
business ‘data profiles’. These create an
interoperable ‘drawbridge’ between collaborating
organisations and individuals. http://casrai.org/
Snowball Metrics is a bottom-up academia-
industry initiative. The universities involved aim
to agree on methodologies that are robustly and
clearly defined, so that the metrics they describe
enable the confident comparison of apples with
apples. These metrics (described by recipes) are
data source- and system-agnostic, meaning that
they are not tied to any particular provider of data
or tools. The resulting benchmarks between
research-intensive universities provide reliable
information to help understand research strengths,
and thus to establish and monitor institutional
The semantics of any metrics must also be clear and transparent. Progress in this area has been made
by the UK-led Snowball Metrics consortium, which has specified 24 metrics ‘recipes’ to date, in areas
such as publications and citations, research grants, collaboration, and societal impact. Snowball is also
gaining some traction in the USA and Australia.
2.6. International perspectives
Although this review has focused on the UK, we have taken a keen interest in how other countries
approach these issues. At several of our workshops and steering group meetings, we heard
presentations and considered questions from international perspectives.
A handful of the responses
to our call for evidence came from overseas, and our schedule of stakeholder events included
meetings or presentations in Paris, Melbourne, Barcelona and Doha (see Table 2 in the annex).
Dialogue, learning and exchange across different systems are important, and any moves that the UK
makes in respect of greater use of metrics are likely to be watched closely. The UK system continues
to attract the attention of research leaders, managers and policymakers worldwide – particularly since
the introduction of the impact element for REF2014.
Here we offer a brief outline of some of the
striking features of research assessment in a handful of other countries – Australia, Denmark, Italy,
the Netherlands, New Zealand and the United States – chosen to reflect the diversity of systems in
The Australian Research Council administers Excellence in Research for Australia (ERA), which
aims to identify and promote excellence in research across Australian HEIs. There is no funding
attached to its outcomes. The first full round of ERA (in 2010-11) was the first time a nationwide
For relevant discussion, see US Research Universities Futures Consortium. (2013). The current state and
recommendations for meaningful academic research metrics among American research universities.
Retrieved 1 March 2015.
For example, Clare Donovan presented insights from her research in Australia and elsewhere at our Arts and Humanities
workshop hosted by Warwick University;
www.hefce.ac.uk/media/hefce/content/news/Events/2015/HEFCE,metrics,workshop,Warwick/Donovan.pdf. Donovan also
contributed to one of the Review group’s early steering group meetings. Academic Analytics, who presented at our
workshops in Sheffield and Sussex, discussed their approach and use of data in US and UK contexts.
invited to our Sussex workshop, operate at the global level, these being Academic Analytics, Altmetric, PLOS, Snowball
Metrics, Elsevier and The Conversation, Plum Analytics and Thomson Reuters.
See relevant discussion on internationalising the REF
www.researchresearch.com/index.php?option=com_news&template=rr_2col&view=article&articleId=1342955. Retrieved 1
stocktake of disciplinary strengths had been conducted in Australia. Data submitted by 41 HEIs
covered all eligible researchers and their research outputs.
ERA is based upon the principle of expert review informed by citation-based analysis, with the
precise mix depending on discipline; citations are used for most science, engineering and medical
disciplines, and peer review for others. It aims to be “a dynamic and flexible research assessment
system that combines the objectivity of multiple quantitative indicators with the holistic assessment
provided by expert review….”
evaluations were informed by four broad categories of indicators:
Of research quality: publishing profile, citation analysis, ERA peer review and peer
reviewed research income;
Of research volume and activity: total research outputs, research income and other
items within the profile of eligible researchers;
Of research application: commercialisation income and other applied measures;
Of recognition: based on a range of esteem measures.
Evaluation of the data submitted was undertaken by eight evaluation committees, representing
different disciplinary clusters. The next ERA round will take place in 2015.
Danish public university funding is allocated according to four parameters: education based on study
credits earned by the institution (45%); research activities measured by external funding (20%);
research activities measured by the ‘BFI’, a metrics-based evaluation system (25%)
; and number of
PhD graduates (10%). The current system was gradually implemented from 2010 to 2012 following
agreement in 2009 to follow a new model. It is primarily a distribution model, based on the Danish
Agency for Science, Technology and Innovation’s count of peer reviewed research publications. The
goal was to allocate an increasing proportion of the available research funding according to the
outcomes of the national research assessment exercise. Given the methodology employed, the BFI has
www.arc.gov.au/era/faq.htm#Q6. Retrieved 1 June 2015.
Submission guidelines are provided at the following. Australian Research Council (2014) ERA 2015 Submission
Guidelines. www.arc.gov.au/pdf/ERA15/ERA%202015%20Submission%20Guidelines.pdf. These include changes to the
process since 2012, outlined on pp7-9.
Veterager Pedersen, C. (2010). The Danish bibliometric research indicator- BFI: Research publications, research
assessment, university funding. ScieCom Info. 4, 1-4.
been described as a primarily quantitative distribution system, as opposed to a quality measurement
Due to the limitations of existing publications databases (see Chapter 3), the Danish government
decided to create its own. This enables the BFI to be defined by Danish researchers, with 67 expert
groups of academics involved in selecting items for inclusion in two authority lists, one of series
(journals, book series or conference series) and one of publishers. These are then ranked each year by
the panels, and this is then used as the basis of a points system for researchers.
The scoring system includes monographs, articles in series and anthologies, doctoral theses and
patents. Peer review is a prerequisite for inclusion on an authoritative list. These lists decide what
publishers and what journals are recognised as being worth to publish in, and what level this
recognition has – Level 1 or Level 2. Level 2 channels generate more points. These lists effectively
decide which publication channels contain serious research. All eligible research outputs can be
attributed BFI-points as they are entered into the system. Different weights are applied for different
sorts of output and publication channel, so the system aims to assess performance and not just volume
In 2013, Italy’s National Agency for the Evaluation of the University and Research Systems
(ANVUR) completed its largest ever evaluation initiative, known as the ‘eValuation of the Quality of
Research’ (VQR), across 95 universities, 12 public research bodies and 16 voluntary organisations.
The aim was to construct a national ranking of universities and institutes, based on key indicators,
including: research outcomes obtained from 2004 to 2010; ability to attract funding; number of
international collaborations; patents registered; spin-offs; and other third-party activities.
The results of the VQR are being used by the education and research ministry to award €540 million
in ‘prize funds’ from the government’s university budget. The process included the evaluation of
approximately 195,000 publications, using a hybrid approach of two methodologies:
Bibliometric analysis: based on the impact factor (IF) of the journal
number of citations received in a year, divided by articles published;
A useful analysis of the VQR is provided by Abramo, G. and D’Angelo, C.A. (2015). The VQR, Italy’s Second National
Research Assessment: Methodological Failures and Ranking Distortions. Journal of the Association for Information, Science
For those indexed in Web of Science, or the SCImago Journal Rank for those indexed in Scopus.
Peer review: assigned to around 14,000 external reviewers, more than 4,000 of
whom were from outside Italy.
Bibliometric analysis was used in the natural sciences and engineering; whereas for social sciences
and humanities (Panels 10-14), only peer review was used. The overall evaluation of institutions was
based on a weighted sum of various indicators: 50% for the quality of the research products submitted
(for faculty members, the maximum number of products was three); and the remaining 50% based on
a composite score from six indicators. These are: capacity to attract resources (10%); mobility of
research staff (10%); internationalisation (10%); PhD programmes (10%); ability to attract research
funds (5%); and overall improvement from the last VQR (5%). ANVUR used 14 panels to undertake
evaluations, divided by disciplinary area.
Since 2003, it has been the responsibility of individual Dutch university boards and faculties to
organise research assessment on a six-yearly cycle, in line with a ‘Standard Evaluation Protocol’
Assessments are made by expert committees, which may use qualitative and quantitative
indicators to score research groups or programmes on a scale. The distribution of government research
funds is not explicitly linked to this assessment process.
From 2015 onwards, the assessment involves three criteria: quality, societal relevance, and viability.
Productivity was previously a criterion, but has now been removed as a goal in itself (and subsumed
under the quality criterion) to put less emphasis on the number of publications and more on their
quality. The review also looks at the quality of PhD training, and management of research integrity
(including how an institution has dealt with any cases of research misconduct). The research unit’s
own strategy and targets are guiding principles for the evaluation. In addition, the evaluation should
provide feedback to the evaluated research institutes and groups on their research agendas for the near
The Standard Evaluation Protocol (SEP) was jointly developed by the Royal Netherlands Academy of Arts and Sciences
(KNAW), The Association of Universities in the Netherlands (VSNU) and the Netherlands Organisation for Scientific
Research (NWO). The goal of the SEP is to provide common guidelines for the evaluation and improvement of research and
research policy to be used by university boards, institutes and the expert evaluation committees.
Key Perspectives Ltd. (2009). A Comparative Review of Research Assessment Regimes in Five Countries and the Role of
Libraries in the Research Assessment Process: Report Commissioned by OCLC Research.
2.6.5. New Zealand
New Zealand’s evaluation system is known as the ‘Performance-Based Research Fund’ (PBRF), and
is used to assess the performance of all Tertiary Education Organisations (TEOs).
Its four objectives
are: to increase the quality of basic and applied research at degree-granting TEOs; to support world-
leading teaching and learning at degree and postgraduate levels; to assist TEOs to maintain and lift
their competitive rankings relative to their international peers; and to provide robust public
information to stakeholders about research performance within and across TEOs.
The PBRF is carried out every six years; most recently in 2012,
when 27 institutions participated
(eight universities, ten institutes of technology and polytechnics, one wãnanga,
and eight private
training establishments.) The amount of funding that a participating institution receives is based on
three elements: quality evaluation (55%); research degree completions (25%); and external research
The quality element of the process rests on the submission and evaluation of evidence portfolios.
Twelve specialist peer-review panels assess and evaluate these portfolios with additional advice from
expert advisory groups and specialists as needed.
The PBRF is unusual in that it takes the individual
(rather than the department or school) as the unit of assessment, so provides very detailed
performance information that can inform strategic planning and resource allocation within
institutions. It does not systematically measure research impacts outside academia.
2.6.6. United States
The US does not have a centralised national assessment system for its universities and research
institutes; however, in recent years, it has actively supported projects including STAR METRICS
(Science and Technology for America’s Reinvestment: Measuring the Effects of Research, Innovation
and Competitiveness and Science).
This was launched in 2010 and is led by the National Institute of
Health (NIH), the National Science Foundation (NSF), and the Office of Science and Technology
Policy (OSTP). It aims to create a repository of data and tools to help assess the impact of federal
www.tec.govt.nz/Funding/Fund-finder/Performance-Based-Research-Fund-PBRF-/. Retrieved 30 March 2015.
Details of the 2012 exercises can be downloaded from www.tec.govt.nz/Funding/Fund-finder/Performance-Based-
In the New Zealand education system, a wānanga is a publicly-owned tertiary education organisation that provides
education in a Mãori cultural context.
PBRF Quality evaluation guidance 2012 is provided at www.tec.govt.nz/Documents/Publications/PBRF-Quality-
STAR METRICS focus at two different levels:
Level I: Developing uniform, auditable and standardised measures of the impact of
science spending on job creation, using data from research institutions’ existing
Level II: Developing measures of the impact of federal science investment on
scientific knowledge (using metrics such as publications and citations), social
outcomes (e.g. health outcomes measures and environmental impact factors),
workforce outcomes (e.g. student mobility and employment), and economic growth
(e.g. tracing patents, new company start-ups and other measures). This is achieved
through the Federal RePORTER
tool, thus developing an open and automated data
infrastructure that will enable the documentation and analysis of a subset of the
inputs, outputs, and outcomes resulting from federal investments in science.
The STAR METRICS project involves a broad consortium of federal R&D funding agencies with a
shared vision of developing data infrastructures and products to support evidence-based analyses of
the impact of research investment.
It aims to utilise existing administrative data from federal
agencies and their grantees, and match them with existing research databases of economic, scientific
and social outcomes. It has recently been announced that from 2016 onwards resources will be
redirected away from STAR METRICS data scraping to focus on the RePORTER tool, which has
similarities to the UK Gateway to Research approach.
2.7. Adding it up
As these snapshots reveal, the ways that metrics and indicators are conceived and used varies by
country, often significantly. The nature of the assessment approach, and the choice, use and relative
importance of particular indicators, reflect particular policies, and usually involve compromises
around fairness across disciplines, robustness, administrative and/or cost burdens and sector buy in.
STAR METRICS will be discontinuing Level I activities as of 1 January 2016.
But not all funders are involved, e.g. the National Endowment for the Humanities.
Two recent studies provide further discussion of how national approaches differ:
A 2012 report by the Council of Canadian Academies looks at the systems used in
10 different countries.
It emphasises the importance of national research context in
defining a given research assessment, underlining that no single set of indicators for
assessment will be ideal in all circumstances. The report also highlights a global
trend towards national research assessment models that incorporate both quantitative
indicators and expert judgment.
A 2014 study by Technopolis examined 12 EU member states and Norway.
includes a comparative consideration of systems using performance-based research
funding (PRF systems). The report shows that Czech Republic is the only country
that limits the indicators used to the output of research, (even though it is the PRF
system that covers research and innovation-related outputs in the most detailed and
comprehensive manner). In a second group of countries – Denmark, Finland,
Norway (PRI), Belgium/PL (BOF), Norway (HEI) and Sweden – the PRFs include
both output and systemic indicators; (in Denmark, Finland and Norway this includes
indicators related to innovation-oriented activities). Only a few countries also
examine research impacts: Italy, UK (REF), France (AERES) and Belgium/FL
(IOF). While the PRFs in France and Belgium focus on impacts in the spheres of
research and innovation, Italy and the UK also consider societal impacts.
It is valuable to learn from the approaches being used by different countries, particularly as research
and data infrastructure are increasingly global. However, context is also crucial to good assessment,
and there will be elements that are specific to the design, operation and objectives of the UK system.
Overall though, we are likely to see greater harmonisation of approaches, certainly across EU member
states. Recent initiatives, such as the 2014 ‘Science 2.0’ White Paper from the European Commission
point towards a more integrated architecture for research funding, communication, dissemination and
impact. The UK has been at the forefront of these debates since the 1980s, and over that same period
its research system, across many indicators, has grown significantly in strength. Ensuring that the UK
is positioned well for the next wave of change in how the research system operates – in terms of data
Council of Canadian Academies (2012) work included analysis of research assessment systems employed in ten countries
including Australia, China, Finland, Germany, the Netherlands, Norway, Singapore, South Korea, USA and the UK.
Technopolis. (2014). Measuring scientific performance for improved policy making.
Published for the European Parliamentary Research Service. This examined Norway, Sweden, the UK, Spain, France,
Belgium/FL, Italy, Czech Republic, Denmark, the Netherlands, Slovakia, Austria and Finland. A third report published in
2010, by the expert group on assessment in university-based research (AUBR), provided case studies of 16 different
countries, which again represent a breadth of approaches and objectives: http://ec.europa.eu/research/science-
infrastructure, standards and systems of assessment – is a vital part of our overall leadership in
research. Moves by HEFCE to explore the potentially increased internationalisation of research
assessment are to be welcomed, although such steps are not without strategic and operational
challenges. Proceeding cautiously, in an exploratory way, seems an appropriate approach.
3. Rough indications
“ ‘The answer to the Great Question of Life, the Universe and Everything is… forty-
two’, said Deep Thought, with infinite majesty and calm.”
Douglas Adams, The Hitchhiker’s Guide to the Galaxy
Having charted the development of research metrics and indicators and their usage internationally,
this chapter turns the focus on their application. It looks in detail at the current development, uses and
occasional abuses of four broad categories of indicator: bibliometric indicators of quality (3.1);
alternative indicators (3.2); input indicators (3.3); and indicators of impact (3.4).
3.1. Bibliometric indicators of quality
The most common approaches to measuring research quality involve bibliometric methods, notably
weighted publication counts; and citation-based indicators, such as the JIF or h-index. As the
Canadian Council of Academies report states: “Bibliometric indicators are the paradigmatic
quantitative indicators with respect to measurement of scientific research.”
This section gives a brief overview of the technical possibilities of bibliometric indicators. Many
points raised here are addressed in greater detail in our literature review (Supplementary Report I),
reflecting the breadth of existing literature on citation impact indicators, the use of scientometric
indicators in research evaluation
, and in measuring the performance of individual researchers
Several considerations need to be borne in mind when working with bibliometric analyses, including:
differences between academic subjects/disciplines; coverage of sources within databases; the selection
of the appropriate unit of analysis for the indicator in question; the question of credit allocation where
outputs may include multiple authors, and accounting for self-citations.
3.1.1. Bibliographic databases
The three most important multidisciplinary bibliographic databases are Web of Science, Scopus, and
Google Scholar. Scopus has a broader coverage of the scholarly literature than Web of Science. Some
studies report that journals covered by Scopus but not by Web of Science tend to have a low citation
impact and tend to be more nationally oriented, suggesting that the most important international
Council of Canadian Academies. (2012), pp53-54.
Vinkler, P. (2010). The evaluation of research by scientometric indicators. Oxford, Chandos Publishing.
Wildgaard, L., Schneider, J. W. and Larsen, B. (2014). A review of the characteristics of 108 author-level bibliometric
indicators. Scientometrics. 101 (1), 125-158.
academic journals are usually covered by both databases. Certain disciplines, especially the Social
Sciences and Humanities (SSH) create special challenges for bibliometric analyses.
is generally found to outperform both Web of Science and Scopus in terms of its coverage of the
literature. However, there are a few fields, mainly in the natural sciences, in which some studies have
reported the coverage of Google Scholar to be worse than the coverage of Web of Science and
Scopus. On the other hand, the coverage of Google Scholar has been improving over time, so it is not
clear the same still applies today.
3.1.2. Basic citation impact indicators
A large number of citation impact indicators have been proposed in the literature. Most of these
indicators can be seen as variants or extensions of a limited set of basic indicators: the total and the
average number of citations of the publications of a research unit (e.g. of an individual researcher, a
research group, or a research institution); the number and the proportion of highly cited publications
of a research unit; and a research unit’s h-index. There is criticism in the literature of the use of
indicators based on total or average citation counts. Citation distributions tend to be highly skewed,
and therefore the total or the average number of citations of a set of publications may be strongly
influenced by one or a few highly cited publications (‘outliers’). This is often considered undesirable.
Indicators based on the idea of counting highly cited publications are suggested as a more robust
alternative to indicators based on total or average citation counts.
3.1.3. Exclusion of specific types of publications and citations
When undertaking bibliometric analyses, one needs to decide which types of publications and
citations are included and which are not. In Web of Science and Scopus, each publication has a
document type. It is clear that research articles, which simply have the document type ‘article’, should
be included in bibliometric analyses. However, publications of other document types, such as
‘editorial material’, ‘letter’, and ‘review’ may be either included or excluded.
Most bibliometric researchers prefer to exclude author self-citations from bibliometric analyses. There
is no full agreement in the literature on the importance of excluding these citations. In some
bibliometric analyses, the effect of author self-citations is very small, suggesting that there is no need
to exclude these citations. In general, however, it is suggested that author self-citations should
preferably be excluded, at least in analyses at low aggregation levels, for instance at the level of
individual researchers. However, as self-citation is a common and acceptable practice in some
See Sections 1.1 and 1.4.1 of the literature review (Supplementary Report I).
disciplines but frowned upon in others, choosing to exclude them will affect some subject areas more
3.1.4. Normalisation of citation impact indicators
In research assessment contexts, there is often a requirement to make comparisons between
publications from different fields. There is agreement in the literature that citation counts of
publications from different fields should not be directly compared with each other, because there are
large differences among fields in the average number of citations per publication. Researchers have
proposed various approaches to normalise citation impact indicators for differences between field,
between older and more recent publications, and between publications of different types.
Most attention in the literature has been paid to normalised indicators based on average citation
counts. Recent discussions focus on various technical issues in the calculation of these indicators, for
instance whether highly cited publication indicators count the proportion of the publications of a
research unit that belong to the top 10% or the top 1% of their field, and more sophisticated variants
thereof, including the position of publications within the citation distribution of their field.
A key issue in the calculation of normalised citation impact indicators is the way in which the concept
of a research field is operationalised. The most common approach is to work with the predefined
fields in a database such as Web of Science, but this approach is heavily criticised. Some researchers
argue that fields may be defined at different levels of aggregation and that each aggregation level
offers a legitimate but different viewpoint on the citation impact of publications. Other researchers
suggest the use of disciplinary classification systems (e.g. Medical Subject Headings or Chemical
Abstracts sections) or sophisticated computer algorithms to define fields, typically at a relatively low
level of aggregation. Another approach is to calculate normalised citation impact indicators without
defining fields in an explicit way. This idea is implemented in so-called ‘citing-side normalisation’
approaches, which represent a recent development in the literature.
3.1.5. Considerations of author position on scholarly published work
In the absence of other reliable indicators of research contribution or value, the contribution of a
particular researcher to a piece of scholarly published work has been estimated by consideration of the
inclusion of a researcher as a listed author on published work – and the relative position in the list.
However, the average number of authors of publications in the scholarly literature continues to
increase, partly due to the pressure to publish to indicate research progression and also due to a trend,
in many disciplines, toward greater collaboration and ‘team science’
. Research in many disciplines
An extreme case is the recent physics paper with more than 5,000 authors, as discussed at: www.nature.com/news/physics-
paper-sets-record-with-more-than-5-000-authors-1.17567. Retrieved 1 June 2015.
is increasingly collaborative, and original research papers with a single author are – particularly in the
natural sciences – becoming rarer.
This trend makes it increasingly difficult to determine who did what, and who had a particularly
pivotal role or contribution, to scholarly published work. It is currently difficult to decipher individual
contributions by consulting the author lists, acknowledgements or contributions sections of most
journals; and the unstructured information is difficult to text-mine.
There has been a mixture of approaches to identifying contributions of ‘authors’. One example works
on the assumption that any listing of an author is valuable, known as ‘full counting’. The citations to a
multi-author publication are counted multiple times, once for each of the authors, even for authors
who have made only a small contribution. Because the same citations are counted more than once, the
full counting approach has a certain inflationary effect, which is sometimes considered undesirable. A
number of alternative credit allocation approaches have therefore been proposed, including the
fractional counting approach, where the credit for a publication is divided equally among all authors.
Another approach frequently used as short-hand is to assume that the first and/or last authors in a list
have played the most pivotal role in the production of the scholarly outputs. However this does not
apply across disciplines (e.g. economics and high energy physics where author-listing protocols are
frequently alphabetical). An alternative possibility is to fully allocate the credits of a publication to the
corresponding author instead of the first author. A final approach discussed in the literature is to
allocate the credits of a publication to the individual authors in a weighted manner, with the first
author receiving the largest share of the credits, the second author receiving the second-largest share,
and so on.
Developments in digital technology present opportunities to address the challenge of deriving
contributions to published work. A collaboration between the Evaluation team at the Wellcome Trust,
Harvard University and Digital Science has made steps to address this challenge by working across
the research community to develop a simply structured taxonomy of contributions to scholarly
published work which capture what has traditionally been masked as ‘authorship’. The taxonomy is
currently being trialled within publishing manuscript submissions systems and by several
organisations interested to help research gain more visibility around the work that they do.
Academy of Medical Sciences is also exploring how enabling greater visibility and credit around
contributions to research might help to incentivise, encourage and sustain ‘team science’ in disciplines
where this is highly valuable.
For researchers, the ability to better describe what they contributed would be a more useful currency
than being listed as a specific ‘author number’. Researchers could draw attention to their specific
contributions to published work to distinguish their skills from those of collaborators or competitors,
for example during a grant-application process or when seeking an academic appointment. This could
benefit junior researchers in particular, for whom the opportunities to be a ‘key’ author on a paper can
prove somewhat elusive. Methodological innovators would also stand to benefit from clarified roles –
their contributions are not reliably apparent in a conventional author list. It could also facilitate
collaboration and data sharing by allowing others to seek out the person who provided, for example,
an important piece of data or statistical analysis.
Through the endorsement of individuals’ contributions, researchers could move beyond ‘authorship’
as the dominant measure of esteem. For funding agencies, better information about the contributions
of grant applicants would aid the decision-making process. Greater precision could also enable
automated analysis of the role and potential outputs of those being funded, especially if those
contributions were linked to an open and persistent researcher profile or identifier. It would also help
those looking for the most apt peer reviewers. For institutions, understanding a researcher’s
contribution is fundamental to the academic appointment and promotion process.
3.1.6. Indicators of the citation impact of journals
The best-known indicator of the citation impact of journals is the JIF. This is an annual calculation of
the mean number of citations to articles published in any given journal in the two preceding years.
There is a lot of debate about the JIF, both regarding the way in which it is calculated (which skews
the JIF towards a minority of well-cited papers)
and the way in which it is used in research
assessment contexts (as discussed more in Chapter 7).
Various improvements of and alternatives to the JIF have been proposed in the literature. It is for
instance suggested to take into account citations during a longer time period, possibly adjusted to the
specific citation characteristics of a journal, or it is proposed to consider the median instead of the
Curry, S. (2012). Sick of impact factors. Post on Reciprocal Space blog.
http://occamstypewriter.org/scurry/2012/08/13/sick-of-impact-factors/. Retrieved 1 June 2015.
Seglen, P. (1992). The skewness of science. Journal of the Association for Information Science and Technology. 43, 628–
638. DOI: 10.1002/(SICI)1097-4571(199210)43:9<628::AID-ASI5>3.0.CO;2-0.
average number of citations of the publications in a journal. Another suggestion is to calculate an h-
index for journals as an alternative or complement to the JIF.
Researchers also argue that citation impact indicators for journals need to be normalised for
differences in citation characteristics among fields. A number of normalisation approaches have been
suggested, such as the SNIP indicator available in Scopus.
Another idea proposed in the literature is that in the calculation of citation impact indicators for
journals more weight should be given to citations from high-impact sources, such as citations from
Nature and Science, than to citations from low-impact sources, for instance from a relatively unknown
national journal that receives hardly any citations itself. This principle is implemented in the
EigenFactor and article influence indicators reported, along with the JIF, in the Journal Citation
Reports. The same idea is also used in the SJR indicator included in Scopus.
The JIF and other citation impact indicators for journals are often used not only in the assessment of
journals as a whole but also in the assessment of individual publications in a journal. Journal-level
indicators then serve as a substitute for publication-level citation statistics. The use of journal-level
indicators for assessing individual publications is rejected by many bibliometricians. It is argued that
the distribution of citations over the publications in a journal is highly skewed, which means that the
JIF and other journal-level indicators are not representative of the citation impact of a typical
publication in a journal. However, some bibliometricians agree with the use of journal-level indicators
in the assessment of very recent publications. In the case of these publications, citation statistics at the
level of the publication itself provide hardly any information.
3.1.7. Future developments
RCUK has extended its bibliometric analysis beyond an examination of citation counts, having an
interest in the qualities of the literature that cites RCUK-funded research and the qualities of the
literature cited by RCUK-funded research. RCUK has obtained this data from Thomson Reuters,
drawn from Web of Science. Using this approach, it is possible to analyse the body of knowledge that
authors draw upon, and also the diversity of research fields that subsequently draws on these results.
This quantification of the ‘diffusion of ideas’ and mapping of the distance between research subject
According to its provider, “SNIP corrects for differences in citation practices between scientific fields, thereby allowing
for more accurate between-field comparisons of citation impact.” www.journalindicators.com/
areas has been pioneered by Rafols et al. and has contributed to the discussion of how to measure
Another area that is developing fast is analysis of the influence of a given work within a particular
network. In the area of citations this is exemplified by the EigenFactor.
In social media analyses the
concept of reach or page impressions can be more informative than simple counts. Within this
network conception of the spread of knowledge and ideas it is also possible to use knowledge of the
types of connections. Once again this is illustrated by citations in analyses that categorise citations
into types by both function (citing an idea, data) and sentiment (agree or disagree). These much richer
indicators will make it possible to track and understand the way that research outputs spread and
influence activities ranging from further research to public discussion of policy.
3.2. Alternative indicators
Here we consider the more influential of the alternative indicators now in circulation, many of which
are discussed in more detail in Section 3 of our literature review (Supplementary Report I).
Throughout this section, we generally treat alternative indicators in relation to their potential to
indicate scholarly impacts, but in some cases, we also cover wider impacts as well. Table 3 in the
annex provides a summary of key alternative indicators.
The most common method to help assess the value of altmetrics is to investigate their correlation with
citations, despite the hope that they may indicate different aspects of scholarly impact. This is because
it would be strange for two valid impact indicators, no matter how different, to be completely
3.2.1. Open access scholarly databases
The internet now contains a range of websites hosting free general scholarly databases, such as
Google Scholar (discussed above in 3.1.1) and Google Books, as well as institutional and subject
repositories, some of which form new sources of citation or usage data. These inherit many of the
strengths and limitations of traditional bibliometric databases, but with some important differences.
Rafols, I., Porter, A.L. and Leydesdorff, L. (2010). Science overlay maps: A new tool for research policy and library
management. Journal of the American Society for Information Science and Technology. 61(9), 871–1887.
Shotton, D., Portwin, K., Klyne, G. and Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic
Enhancements of a Research Article. PLOS Comput Biol. 5(4). e1000361. DOI:10.1371/journal.pcbi.1000361; Shotton, D.
(2010). Introducing the Semantic Publishing and Referencing (SPAR) Ontologies. Post on Open Citations and Related Work
Retrieved 1 June 2015; Moed, H. and Halevi, G. (2014). Research assessment: Review of methodologies and approaches.
Research Trends. 36, March 2014. www.researchtrends.com/issue-36-march-2014/research-assessment/. Retrieved 1 June
Although Google Scholar was not primarily developed to rival conventional citation indexes, many
studies have now compared it with them for research assessment, as covered by Appendix A of the
literature review (Supplementary Report I).
3.2.2. Usage indicators from scholarly databases
Usage data is a logical choice to supplement citation counts and digital readership information can be
easily and routinely collected, except for paper copies of articles. Bibliometric indicators do not show
the usage of a published work by non-authors, such as students, some academics, and non-academic
users who do not usually publish but may read scholarly publications. Usage-based statistics for
academic publications may therefore help to give a better understating of the usage patterns of
documents and can be more recent than bibliometric indicators.
Many studies have found that correlations between usage and bibliometric indicators for articles and
usage data could be extracted from different sources such as publishers, aggregator services, digital
libraries and academic social websites. Nonetheless, the usage statistics could be inflated or
manipulated and some articles may be downloaded or printed but not read or may be read offline or
via different websites such as authors’ CVs and digital repositories.
Integrated usage statistics from
different sources such as publishers’ websites, repositories and academic social websites would be
optimal for global usage data if they are not manipulated in advance. However, this does not seem to
be practical at present because of differences in how they are collected and categorised.
3.2.3. Citations and links from the general web
It is possible to extract information from the web in order to identify citations to publications, hence
using the web as a huge and uncontrolled de-facto citation database. This data collection can be
automated, such as through the Bing API, making the web a practical source of this type of citation
data. Web and URL citations to publications can be located by commercial search engines (Google
manually and Bing automatically) from almost any type of online document, including blog posts,
presentations, clinical guidelines, technical reports or document files (e.g. PDF files) and there is
evidence (although not recent) that they can be indicators of research impact. In theory, then, web and
URL citations could be used to gather evidence about the scholarly impact of research if they were
filtered to remove non-scholarly sources. In contrast, unfiltered web or URL citation counts are easy
to manipulate and many citations are created for navigation, self-publicity or current awareness and so
it does not seem likely that they would genuinely reflect the wider impacts of research, without time-
consuming manual filtering out of irrelevant sources.
Thelwall, M. (2012). Journal impact evaluation: A webometric perspective. Scientometrics, 92(2), 429-441.
In addition to searching for citations from the general web, citations can be counted from specific
parts of the web, including types of website and types of document. This information can be extracted
from appropriate searches in commercial search engines and automated, for example via the Bing
API. The discussions below cover online presentations, syllabi and science blogs, although there is
also some evidence that mentions in news websites and discussion forums may also be useful.
Citations from online ‘grey’ literature seem to be an additional useful source of evidence of the wider
impact of research,
but there do not seem to be any systematic studies of these.
Statistics about the uptake of academic publications in academic syllabi may be useful in teaching-
oriented and book-based fields, where the main scholarly outputs of teaching staff are articles or
monographs for which students are an important part of the audience, or textbooks. It is practical to
harvest such data from the minority of syllabi that have been published online in the open web and
indexed by search engines, but it seems that such syllabus mentions may be useful primarily to
identify publications with a particularly high educational impact rather than for the systematic
assessment of the educational impact of research. Syllabus mentions have most potential for the
humanities and social sciences, where they are most common and where educational impact may be
Research may be cited and discussed in blogs by academics or non-academics in order to debate with
or inform other academics or a wider audience. Blog citations can perhaps be considered as evidence
of a combination of academic interest and a potential wider social interest, even if the bloggers
themselves tend to be academics. In addition, the evidence that more blogged articles are likely to
receive more formal citations shows that blog citations could be used for early impact evidence.
Nevertheless, blog citations can be easy to manipulate, and are not straightforward to collect, so may
need to be provided by specialist altmetric software or organisations.
In addition to the types of web citations discussed above, preliminary research is evaluating online
clinical guidelines, government documents and encyclopaedias. Online clinical guidelines could be
useful for medical research funders to help them to assess the societal impact of individual studies.
Costas, R., Zahedi, Z. and Wouters, P. (2014). Do altmetrics correlate with citations? Extensive comparison of altmetric
indicators with citations from a multidisciplinary perspective. arXiv preprint arXiv:1401.4321; Thelwall, M., Haustein, S.,
Larivière, V. and Sugimoto, C. (2013). Do altmetrics work? Twitter and ten other candidates. PLOS ONE, 8(5), e64841.
Wilkinson, D., Sud, P. and Thelwall, M. (2014). Substance without citation: Evaluating the online impact of grey
literature. Scientometrics. 98(2), 797-806.
For a discussion of issues see Manchikanti, L., Benyamin, R., Falco, F., Caraway, D., Datta, S. and Hirsch, J. (2012).
Guidelines warfare over interventional techniques: is there a lack of discourse or straw man? Pain Physician, 15, E1-E26;
also Kryl, D., Allen, L., Dolby, K., Sherbon, B., and Viney, I. (2012). Tracking the impact of research on policy and
practice: investigating the feasibility of using citations in clinical guidelines for research evaluation. BMJ Open, 2(2),
In support of this, one study extracted 6,128 cited references from 327 documents produced by the
National Institute of Health and Clinical Excellence (NICE) in the UK, finding articles cited in
guidelines tend to be more highly cited than comparable articles.
3.2.4. Altmetrics: citations, links, downloads and likes from social
The advent of the social web has seen an explosion in both the range of indicators that could be
calculated as well as the ease with which relevant data can be collected (even in comparison to web
impact metrics). Of particular interest are comments, ratings, social bookmarks, and microblogging,
although there have been many concerns about validity and the quality of altmetric indicators due to
the ease with which they can be manipulated.
Elsevier (via Scopus), Springer, Wiley, BioMed
Central, PLOS and Nature Publishing Group have all added article-level altmetrics to their journals,
and uptake is rising among other publishers.
Although the term ‘altmetrics’ refers to indicators for research assessment derived from the social
the term alternative metrics seems to be gaining currency as a catch-all for web-based metrics.
A range of altmetrics have been shown to correlate significantly and positively with bibliometric
indicators for individual articles,
giving evidence that, despite the uncontrolled nature of the social
web, altmetrics may be related to scholarly activities in some way.
This is perhaps most evident
Thelwall, M., and Maflahi, N. (in press). Guideline references and academic citations as evidence of the clinical value of
health research. Journal of the Association for Information Science and Technology.
For instance: Taraborelli, D. (2008). Soft peer review: Social software and distributed scientific evaluation. Proceedings
of the Eighth International Conference on the Design of Cooperative Systems. Carry–Le–Rouet, 20–23
May. http://nitens.org/docs/spr_coop08.pdf; Neylon, C. and Wu. S. (2009). Article-level metrics and the evolution of
scientific impact. PLOS Biol 7(11). DOI: 10.1371/journal.pbio.1000242; Priem, J., and Hemminger, B. M. (2010).
Scientometrics 2.0: Toward new metrics of scholarly impact on the social web. First Monday, 15(7).
Birkholz, J., and Wang, S. (2011). Who are we talking about?: the validity of online metrics for commenting on science.
Paper presented in: altmetrics11: Tracking scholarly impact on the social Web. An ACM Web Science Conference 2011
Workshop, Koblenz (Germany), 14-15. http://altmetrics.org/workshop2011/birkholz-v0; Rasmussen, P. G., and Andersen,
J.P. (2013). Altmetrics: An alternate perspective on research evaluation. Sciecom Info. 9(2).
Priem, J., Taraborelli, D., Groth, P., and Neylon, C. (2010). Altmetrics: A manifesto. Retrieved from
Priem, J., Piwowar, H., and Hemminger, B. (2012). Altmetrics in the wild: Using social media to explore scholarly
impact. Retrieved from http://arXiv.org/html/1203.4745v1; Thelwall, M., Haustein, S., Larivière, V., and Sugimoto, C.
(2013). Do altmetrics work? Twitter and ten other candidates. PLOS ONE. 8(5), e64841.
DOI:10.1371/journal.pone.0064841; Costas, R., Zahedi, Z., and Wouters, P. (2014).
However, recent research suggests that some factors driving social media and citations are quite different: (1) while
editorials and news items are seldom cited, these types of document are most popular on Twitter; (2) longer papers typically
attract more citations, but the converse is true of social media platforms; (3) SSH papers are most common on social media
platforms, the opposite to citations. Haustein, S., Costas, R., and Larivière, V. (2015). Characterizing Social Media Metrics
when the altmetrics are aggregated to entire journals
rather than to individual articles. Social usage
impact can be extracted from a range of social websites that allow users to upload, or register
information about, academic publications, such as Mendeley, Twitter, Academia and ResearchGate.
These sites can be used for assessing an aspect of the usage of publications based on numbers of
downloads, views or registered readers. Fuller information on the following are included within the
literature review (Supplementary Report I): Faculty of 1000 Web Recommendations; Mendeley and
other Online Reference Managers; Twitter and microblog citations.
3.2.5. Book-based indicators
Research evaluation in book-oriented fields is more challenging than for article-based subject areas
because counts of citations from articles, which dominate traditional citation indexes, seem
insufficient to assess the impact of books. The Book Citation Index within Web of Science is a recent
response to this issue
since journal citations on their own might miss about half of the citations to
However, some academic books are primarily written for teaching (e.g. textbooks) or
cultural purposes (e.g. novels and poetry) and citation counts of any kind may be wholly inappropriate
In REF2014, books (authored books, edited books, scholarly editions and book chapters) were more
frequently submitted to Main Panels C and D (29.4%) than to Main Panels A and B (0.4%), and many
of these books (art, music and literary works) may have merits that are not reflected by conventional
bibliometric methods (see Table 3 in the annex for the full distribution of results in REF2014).
Moreover, the main sources of citations to humanities books are other books.
Even today, the
Thomson Reuters Book Citation Index and Scopus index a relatively small number of books
as of September 2014, respectively) and this may cause problems for
of Scholarly Papers: The Effect of Document Properties and Collaboration Patterns. PLOS ONE. 10(3): e0120495.
Alhoori, H. and Furuta, R. (2014). Do altmetrics follow the crowd or does the crowd follow altmetrics? In: Proceedings
of the IEEE/ACM Joint Conference on Digital Libraries (JCDL 2014). Los Alamitos: IEEE Press.
http://people.tamu.edu/~alhoori/publications/alhoori2014jcdl.pdf; Haustein, S. and Siebenlist, T. (2011). Applying social
bookmarking data to evaluate journal usage. Journal of Informetrics. 5(3), 446-457.
Previously noted in Garfield, E. (1996). Citation indexes for retrieval and research evaluation. Consensus Conference on
the Theory and Practice of Research Assessment, Capri.
Hicks, D. (1999). The difficulty of achieving full coverage of international social science literature and the bibliometric
consequences. Scientometrics. 44 (2), 193-215.
Thompson, J. W. (2002). The death of the scholarly monograph in the humanities? Citation patterns in literary
scholarship. Libri. 52(3), 121-136; Kousha, K., and Thelwall, M. (2014). An automatic method for extracting citations from
Google Books. Journal of the Association for Information Science and Technology. DOI: 10.1002/asi.23170.
bibliometric analyses of books.
Expert peer judgment of books seems to be by far the best method
but it is even more time-consuming and expensive than article peer assessment because books are
generally much longer.
In response, alternative sources have been investigated for book impact
assessment, including syllabus mentions, library holding counts, book reviews and publisher prestige.
Many of the indicators discussed elsewhere in the full literature review (Supplementary Report I) can
also be used for books but have not yet been evaluated for this purpose. However, since academic
books are still mainly read in print form, download indicators are not yet so relevant.
contains a large number of academic and non-academic books based upon digitising
the collections of over 40 libraries around the world as well as partnerships with publishers.
studies have shown that the coverage of Google Books is quite comprehensive, but, due to copyright
considerations, Google Books does not always reveal the full text of the books that it has indexed.
Although Google Books is not a citation index and provides no citation statistics of any kind, it is
possible to manually search it for academic publications and hence identify citations to these
publications from digitised books.
Google Books could be useful because citations from books have
been largely invisible in traditional citation indexes and the current book citation search facilities in
Scopus and Web of Science cover relatively few books that are predominantly in English and from a
small number of publishers, which is problematic for citation impact assessment in book-based
For example: Gorraiz, J., Purnell, P. J., and Glänzel, W. (2013). Opportunities for and limitations of the book citation
index. Journal of the American Society for Information Science and Technology, 64(7), 1388-1398; Torres-Salinas, D.,
Robinson-García, N., Jiménez-Contreras, E., and Delgado López-Cózar, E. (2012). Towards a ‘Book Publishers Citation
Reports’. First approach using the ‘Book Citation Index’. Revista Española de Documentación Científica. 35(4), 615-620;
Torres-Salinas, D., Rodríguez-Sánchez, R., Robinson-García, N., Fdez-Valdivia, J., and García, J. A. (2013). Mapping
citation patterns of book chapters in the Book Citation Index. Journal of Informetrics, 7(2), 412-424.
See Weller, A. C. (2001). Editorial peer review: Its strengths and weaknesses. Medford, N.J: Information Today.
Chen, X. (2012). Google Books and WorldCat: A comparison of their content. Online Information Review. 36 (4), 507-
516.; Weiss, A., and James, R. (2013). Assessing the coverage of Hawaiian and pacific books in the Google Books
digitization project. OCLC Systems and Services, 29(1), 13-21.; Weiss, A., and James, R. (2013a). An examination of
massive digital libraries’ coverage of Spanish language materials: Issues of multi-lingual accessibility in a decentralized,
mass-digitized world. Paper presented at the Proceedings – 2013 International Conference on Culture and Computing,
Culture and Computing 2013. 10-14.
Kousha, K., and Thelwall, M. (2009). Google Book Search: Citation analysis for social science and the humanities.
Journal of the American Society for Information Science and Technology. 60(8), 1537-1549; Kousha, K., Thelwall, M., and
Rezaie, S. (2011). Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus. Journal
of the American Society for Information Science and Technology, 62(11), 2147-2164.
Gorraiz, J., Purnell, P., and Glänzel, W. (2013); Torres-Salinas et al. (2012), (2013).
National or international library holdings statistics can indicate library interest in books and seem to
reflect a different type of impact to that of citations, perhaps including educational and cultural
impacts. These statistics are relatively simple to collect automatically from the OCLC WorldCat
library holding catalogue,
with more than 2.2 billion items from over 72,000 libraries in 170
countries. These data, which are based upon book holdings and hence would be costly to manipulate,
seem promising for assessing the wider influence of books in SSH based on the information needs of
users, teaching staff and researchers. While more detailed borrowing statistics might be even more
useful, these data do not seem to be currently available.
Publisher prestige, reputational surveys, libcitation and citation indicators can also help to identify
prestigious scholarly publishers. A combination of all of the above may be more useful for rating
(rather than ranking) academic publishers of books or monographs as long as other factors, such as
geographical, language and disciplinary differences, are taken into consideration when they are used.
3.2.6. Varieties of outputs
While much of this discussion tends to focus on text-based outputs in peer-reviewed publications, it is
common for scholars across all disciplines to produce a wider variety of outputs from their research
processes. These range from research datasets, software, images, videos and patents, through to
exhibitions, compositions, performances, presentations and non-refereed publications (such as policy
documents or ‘grey’ literature). For some of these there may be plausible indicators of impact, such as
audience size, art gallery prestige, composition commissioner prestige, art sales or sales prices. In
most cases, however, it is likely that the contributions of individual works are so varied that any data
presented to support an impact case would not be directly comparable with other available data,
although they could be presented as evidence to support a specific argument about the contribution of
3.2.7. How robust are alternative quality metrics?
There is empirical evidence that a wide range of indicators derived from the web for scholars or their
outputs are related to scholarly activities in some way because they correlate positively and
significantly with citation counts. In many cases these metrics can also be harvested on a large scale
in an automated way with a high degree of accuracy (see Appendix B of the literature review,
Supplementary Report I, for methods to obtain alternative metric data). Nevertheless, most are easy to
and nearly all are susceptible to spam to some extent. Thus, alternative metrics do not
For example: Dullaart, C. (2014). High Retention, Slow Delivery. (Art piece: 2.5 million Instagram followers bought and
distributed to artists. See e.g. http://jeudepaume.espacevirtuel.org/, http://dismagazine.com/dystopia/67039/constant-dullaart-
seem to be suitable as a management tool with any kind of objective to measure, evaluate or manage
Even if no manipulation took place, which seems unlikely, the results would be suspected
of being affected by manipulation and in the worst case scenario the results would be extensively
manipulated and researchers would waste their time and money on this manipulation.
In our call for evidence, 19 respondents (of which 15 were HEIs) proposed that altmetrics could be
used as a research assessment tool; while 12 responses (of which eight were HEIs), argued that
altmetrics are not reliable enough to be used as a measure of research quality. This reflects the
uncertainties often associated with these indicators which are at an early stage of development. For an
altmetric to be taken seriously, empirical evidence of its value is needed in addition to evidence of a
reasonable degree of robustness against accidental or malicious spam.
3.3. Input indicators
In some contexts, there is support for the measurement of research quality through the use of proxy
indicators including: external research income (recognising that organisations are in competition for
these funds, so success is a marker of quality); research student enrolments; and research student
completion data. These were all mentioned by a number of respondents to our call for evidence as
potential measures of quality, but more often as a useful means to measure ‘environment’ or ‘vitality’
or the research base, along the lines of the REF’s environment component.
In UK HEIs, the maturity of current research information systems (CRISs) varies markedly between
institutions. Some HEIs have fully fledged systems that are completely integrated with other core
systems, others have stand-alone systems, and some rely on non-specific systems such as generic
databases and spreadsheets. Some UK HEIs capture or wish to capture data associated with all of the
above items (and more) for internal or external purposes. Publication information is most commonly
collected in central systems. Grant information, commercialisation, and PhD numbers and
completions tend to be collected centrally and most can produce information by staff member/FTE.
On the other hand, prizes, editorships, other esteem indicators and international visitors might more
commonly only be collected locally within departments. Information on research infrastructure is
perhaps the most variable and, anecdotally at least, least likely to be comprehensive (though
Wouters, P., and Costas, R. (2012). Users, narcissism and control: Tracking the impact of scholarly publications in the
21st century. SURFfoundation. Retrieved November 29, 2014, from https://www.surf.nl/kennis-en-
innovatie/kennisbank/2012/rapport-users-narcissism-and-control.html [In Dutch].
For instance, see the University of Durham’s response to our call for evidence, available at
initiatives like equipment.data.ac.uk and the work of sharing consortiums like N8 show that
infrastructure can be established and well-utilised).
3.4. Indicators of impact
Attempting to measure and capture broader societal or external impacts of academic work is a
relatively new concern