- A preview of this full-text is provided by Springer Nature.
- Learn more
Preview content only
Content available from International Journal on Digital Libraries
This content is subject to copyright. Terms and conditions apply.
International Journal on Digital Libraries (2019) 20:335–350
https://doi.org/10.1007/s00799-018-0234-1
Comparing published scientific journal articles to their pre-print
versions
Martin Klein1·Peter Broadwell2·Sharon E. Farb2·Todd Grappone2
Received: 18 May 2017 / Revised: 2 January 2018 / Accepted: 18 January 2018 / Published online: 5 February 2018
© This is a U.S. government work and its text is not subject to copyright protection in the United States; however, its text may be subject to foreign
copyright protection 2018
Abstract
Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and
enhancing text during publication. These contributions come at a considerable cost: US academic libraries paid $1.7 billion
for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price
inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers from
two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two
working assumptions: (1) If the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its
corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and
quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little
from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added
value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly
publications.
Keywords Open access ·Pre-print ·Scholarly publishing ·Text similarity
1 Introduction
Academic publishers of all types claim that they add value to
scholarly communications by coordinating reviews and con-
tributing and enhancing text during publication. These contri-
butions come at a considerable cost: U.S. academic libraries
paid $1.7 billion for serial subscriptions in 2008 alone and
this number continues to rise. Library budgets, in contrast, are
flat and not able to keep pace with serial price inflation. Sev-
eral institutions have therefore discontinued or significantly
scaled back their subscription agreements with commercial
BMartin Klein
mklein@lanl.gov; martinklein0815@gmail.com
Peter Broadwell
broadwell@library.ucla.edu
Sharon E. Farb
farb@library.ucla.edu
Todd Grappone
grappone@library.ucla.edu
1Los Alamos National Laboratory, Los Alamos, NM, USA
2University of California, Los Angeles, Los Angeles, CA, USA
publishers such as Elsevier and Wiley-Blackwell. We have
investigated the publishers’ value proposition by conducting
a comparative study of pre-print papers and their final pub-
lished counterparts in the areas of science, technology, and
medicine (STM). We have two working assumptions:
1. If the publishers’ argument is valid, the text of a pre-print
paper should vary measurably from its corresponding
final published version.
2. By applying standard similarity measures, we should be
able to detect and quantify such differences.
In this paper, we present our preliminary results based on pre-
print publications from arXiv.org and bioRxiv.org and their
final published counterparts. After matching papers via their
digital object identifier (DOI), we applied comparative ana-
lytics and evaluated the textual similarities of components of
the papers such as the title, abstract, and body. Our analysis
revealed that the text of the papers in our test data set changed
very little from their pre-print to final published versions,
although more copyediting changes were evident in the paper
sets from bioRxiv.org than those from arXiv.org. In gen-
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.