Submitted 2 June 2013
Accepted 12 August 2013
Published 5 September 2013
Nicole A. Vasilevsky,
Additional Information and
Declarations can be found on
2013 Vasilevsky et al.
Creative Commons CC-BY 3.0
On the reproducibility of science: unique
identiﬁcation of research resources in the
Nicole A. Vasilevsky
, Matthew H. Brush
, Holly Paddock
, Shreejoy J. Tripathy
, Gregory M. LaRocca
Melissa A. Haendel
Ontology Development Group, Library, Oregon Health & Science University, Portland,
Zebraﬁsh Information Framework, University of Oregon, Eugene, OR, USA
FlyBase, Department of Genetics, University of Cambridge, Cambridge, UK
Department of Biological Sciences and Center for the Neural Basis of Cognition,
Carnegie Mellon University, Pittsburgh, PA, USA
Scientiﬁc reproducibility has been at the forefront of many news stories and there
exist numerous initiatives to help address this problem. We posit that a contributor
is simply a lack of speciﬁcity that is required to enable adequate research repro-
ducibility. In particular, the inability to uniquely identify research resources, such
as antibodies and model organisms, makes it diﬃcult or impossible to reproduce
experiments even where the science is otherwise sound. In order to better understand
the magnitude of this problem, we designed an experiment to ascertain the “iden-
tiﬁability” of research resources in the biomedical literature. We evaluated recent
journal articles in the ﬁelds of Neuroscience, Developmental Biology, Immunology,
Cell and Molecular Biology and General Biology, selected randomly based on a
diversity of impact factors for the journals, publishers, and experimental method
reporting guidelines. We attempted to uniquely identify model organisms (mouse,
rat, zebraﬁsh, worm, ﬂy and yeast), antibodies, knockdown reagents (morpholinos
or RNAi), constructs, and cell lines. Speciﬁc criteria were developed to determine if
a resource was uniquely identiﬁable, and included examining relevant repositories
(such as model organism databases, and the Antibody Registry), as well as vendor
sites. The results of this experiment show that 54% of resources are not uniquely
identiﬁable in publications, regardless of domain, journal impact factor, or reporting
requirements. For example, in many cases the organism strain in which the experi-
ment was performed or antibody that was used could not be identiﬁed. Our results
show that identiﬁability is a serious problem for reproducibility. Based on these
results, we provide recommendations to authors, reviewers, journal editors, vendors,
and publishers. Scientiﬁc eﬃciency and reproducibility depend upon a research-wide
improvement of this substantial problem in science today.
Subjects Cell Biology, Developmental Biology, Neuroscience, Immunology, Science Policy
Keywords Scientiﬁc reproducibility, Materials and Methods, Constructs, Cell lines, Antibodies,
Knockdown reagents, Model organisms
How to cite this article Vasilevsky et al. (2013), On the reproducibility of science: unique identiﬁcation of research resources in the
biomedical literature. PeerJ 1:e148; DOI 10.7717/peerj.148
The scientiﬁc method relies on the ability of scientists to reproduce and build upon each
other’s published results. Although it follows that the prevailing publication model should
support this objective, it is becoming increasingly apparent that it falls short (Haendel,
Vasilevsky & Wirz, 2012; de Waard, 2010). This failure was highlighted in a recent Nature
report from researchers at the Amgen corporation, who found that only 11% of the
academic research in the literature was reproducible by their groups (Begley & Ellis,
2012). Further alarm is raised by the fact that retraction rates, due in large part to a lack
of reproducibility, have steadily increased since the ﬁrst paper was retracted in 1977 (Cokol,
Ozbay & Rodriguez-Esteban, 2008). While many factors are likely at play here, perhaps the
most basic requirement for reproducibility holds that the materials reported in a study
can be uniquely identiﬁed and obtained, such that experiments can be reproduced as
faithfully as possible. Here, we refer to reproducibility deﬁned as the “conditions where
test results are obtained with the same method on identical test materials in diﬀerent
laboratories with diﬀerent operators using diﬀerent equipment” (ISO 5725-1:1994, 1994).
This information is meant to be documented in the ‘materials and methods’ of journal
articles, but as many can attest, the information provided there is often not adequate for
this task. Such a fundamental shortcoming costs time and resources, and prevents eﬃcient
turns of the research cycle whereby research ﬁndings are validated and extended toward
new discoveries. It also prevents us from retrospectively tagging a resource as problematic
or insuﬃcient, should the research process reveal issues with a particular resource.
Until recently, challenges in resource identiﬁcation and methodological reporting have
been largely anecdotal, but several eﬀorts have begun to characterize this problem and
enact solutions. The National Centre for the Replacement, Reﬁnement and Reduction
of Animals in Research (NC3R) evaluated methodological reporting in the literature for
in vivo studies using rodent models or non-human primates. They examined 271 publica-
tions and reported that only 60% of the articles included information about the number
and characteristics of the animals (strain, sex, age, weight) and approximately 30% of the
articles lacked detailed descriptions of the statistical analyses used (Kilkenny et al., 2009).
Based on this study, the ARRIVE guidelines (http://www.nc3rs.org.uk/page.asp?id=1357)
were developed for reporting of in vivo experiments pertaining to animal research. Other
domain speciﬁc standards have been published such as the Minimum information about
a protein aﬃnity reagent (MIAPAR) (Bourbeillon et al., 2010) and the high-proﬁle
communication from Nature to address concerns regarding research reproducibility
where they oﬀered improved standards for reporting life science research (http://www.
nature.com/authors/policies/reporting.pdf). The Neuroscience Information Framework
(NIF; http://neuinfo.org) speciﬁcally developed the Antibody Registry as a means to aid
identiﬁcation of antibodies within published studies, based on a small pilot study which
showed that >50% of antibodies could not be identiﬁed conclusively within published
papers (AE Bandrowski, NA Vasilevsky, MH Brush, MA Haendel, V Astakhov, P Ciccarese,
J McMurry and ME Martone, unpublished data). ISA-TAB provides a generic, tabular
format, which contains metadata standards to facilitate data collection, management,
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 2/22
and reuse (Sansone et al., 2012; Sansone, 2013; Thomas et al., 2013). To promote scientiﬁc
reproducibility, the Force11 community has published a set of recommendations for
minimal data standards for biomedical research (Martone et al., 2012) and published a
manifesto to improve research communication (Phil et al., 2011). The BioSharing initiative
(www.biosharing.org) contains a large registry of community standards for structuring
and curating datasets and has made signiﬁcant strides towards the standardization of data
via its multiple partnerships with journals and other organizations.
While the work highlighted above has oﬀered guidance based on the perceived problem
of inadequate methodological reporting, the fundamental issue of material resource
identiﬁcation has yet to be speciﬁcally characterized using a rigorous scientiﬁc approach.
It is our belief that unless researchers can access the speciﬁc research materials used in
published research, they will continue to struggle to accurately replicate and extend
the ﬁndings of their peers. Until our long held assumptions about a lack of unique
identiﬁability of resources are conﬁrmed with quantitative data, this problem is unlikely
to pique the interest of funding agencies, vendors, publishers, and journals, who are in
a position to facilitate reform. To this end, we report here an experiment to quantify the
extent to which material resources reported in the biomedical literature can be uniquely
identiﬁed. We evaluated 238 journal articles from ﬁve biomedical research sub-disciplines,
including Neuroscience, Developmental Biology, Immunology, Cell and Molecular
Biology, and General Biology. Target journals were selected from each category to include
a representative variety of publishers, impact factors, and stringencies with respect to
materials and methods reporting guidelines. In each article, we tracked reporting of ﬁve
types of resources: (1) model organisms (mouse, rat, zebraﬁsh, worm, ﬂy, frog, and yeast);
(2) antibodies; (3) knockdown reagents (morpholinos or RNAi); (4) DNA constructs; and
(5) cell lines. We developed a detailed set of evaluation criteria for each resource type and
applied them to determine the identiﬁability of over 1,700 individual resources referenced
in our corpus. The results of this experiment quantify a profound lack of unique identiﬁca-
tion of research resources in the biomedical literature across disciplines and resource types.
Based on these results and the insights gained in performing this experiment, we provide
recommendations for how research resource identiﬁcation can be improved by imple-
menting simple but eﬀective solutions throughout the scientiﬁc communication cycle.
Journal selection and classiﬁcation
The core of our evaluated corpus was comprised of articles from a set of target journals
that varied across three features: research discipline, impact factor, and reporting guideline
requirements. For research discipline selection, we followed the Institute for Scientiﬁc In-
formation (ISI) categorization and selected ﬁve journals from Cell Biology, Developmental
Biology, Immunology, and Neuroscience. In addition, a non-ISI category (General Biol-
ogy) was included to cover multidisciplinary journals such as Science, Nature, and PLoS
Biology. Within each discipline, care was taken to include journals with a range of impact
factors as reported in the Journal Citation Report from 2011 (Thomson Reuters, 2011).
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 3/22
Journals were binned into three categories (high, mid, and low) based on whether their
impact factor fell into the top, middle, or lowest third for their discipline in this report.
Finally, we selected journals that varied in the stringency of their recommendations for
reporting data about material resources. Journals were assigned to one of three categories:
(1) Stringent if the journal required detailed information or speciﬁc identiﬁers to reference
materials reported in the manuscript (e.g., required catalog numbers for antibodies);
(2) Sat isfactory if the journal provided only limited recommendations for structured
reporting or resource identiﬁers, but did not restrict space allocated for this information;
and (3) Loose where minimal or no reporting requirements for materials and methods
were provided, and/or the length of material reporting space was restricted. Note that these
guidelines were the ones in eﬀect at the time of manuscript selection (January 18, 2013).
Articles in the core collection of our corpus were selected randomly by performing a
PubMed search ﬁltered for each journal and using the ﬁrst ﬁve publications returned on
January 18, 2013 (all publications were from 2012–2013). This approach was adequate
for all journals except Nature and Science, which cover a very general scientiﬁc spectrum
such that top PubMed hits often failed to include the resource types evaluated in our study.
For these journals, the most recent articles that were likely to contain our resources were
selected directly from the publisher’s website. Recent publications were chosen for our
corpus deliberately to reduce the chance that they had been curated by a model organism
database (MOD) or other curatorial eﬀorts, which could skew results by providing
additional curated data not reported or accessible from the original article alone. NIF
had also noted in a pilot project that the identiﬁability of reagents decreases over time, as
commercial vendors eliminate products from their catalogs.
In addition to this core collection of 135 core articles, we added 86 additional
publications to our study through a collaboration with the Zebraﬁsh Information Network
(ZFIN), who agreed to assess identiﬁability of reported resources according to our
evaluation guidelines as part of their established curation pipeline. Finally, a set of 17 more
articles from the Nathan Urban Laboratory at Carnegie Mellon University was included in
our experiment. The Urban Lab studies cellular and systems neuroscience, and extensively
uses animal models and antibodies. These articles were included to explore how the
thorough and structured documentation practices of this lab in its internal management of
resource inventory and usage is reﬂected in its reporting of materials in the literature they
produce. Articles from these additional ZFIN and Urban Lab collections were also classi-
ﬁed according to discipline and impact factor, so as to be included with our core collection
in our factor analysis. In total, 238 manuscripts were analyzed from 84 journals. All of the
articles contained at least one or more of the research resources we evaluated in this study.
To ensure this was a suﬃcient number of papers, we did preliminary statistical analysis to
determine that we could ﬁnd statistical signiﬁcance in the results. A list of the journals,
domains, impact factors, and PubMed IDs, as well as the complete dataset is available in
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 4/22
Article curation workﬂow
A team of three curators evaluated a selection of articles from the corpus, with each
being reviewed by a single expert to identify and establish the identiﬁability of each
documented resource. In addition, zebraﬁsh and ﬂy genetics experts curated the zebraﬁsh
and drosophila model organisms, respectively, as our primary curators did not have
expertise in these areas. We performed spot-checking of the primary curation and
issues found by the secondary evaluator were documented in the curation spreadsheets
and updates were made to the curation guidelines. Where necessary, the curator used
supplemental data and any referenced articles or publically accessible online data sources,
dating as far back as necessary to ﬁnd uniquely identifying information about a resource.
This included vendor catalogs and a variety of experimental and resource databases,
where identifying information was often resolvable based on information provided in a
publication. More detailed evaluation criteria for unique identiﬁcation of each resource
type are described below. For a given article, evaluation of only the ﬁrst ﬁve resources of
each type was performed in the core publication collection. This was necessary as some
papers referenced a cumbersome number of resources such as antibodies or RNAi oligos,
which were typically reported to the same degree of rigor.
Resource identiﬁcation criteria
Based on our extensive experience in working with these particular resources and on
consultation with several external experts, we developed a set of criteria to determine the
ability of each resource type to be ‘uniquely identiﬁed’. Generally, ‘unique identiﬁcation’
requires that a speciﬁc resource can be obtained or created based on information provided
in or resolvable from the publication directly, or resolvable through referenced literature,
databases, or vendor sites. Below we outline some general and resource-type speciﬁc
requirements for ‘identiﬁability’ applied in our evaluations.
For commercial resources, provision of a catalog number and the name of the vendor
that resolves to a single oﬀering uniquely identiﬁes a resource. In the absence of a
catalog number, if provision of only the vendor and resource name allows unambiguous
resolution to a single oﬀering, a resource is considered identiﬁable. For example, reporting
“polyclonal anti-HDAC4 from Santa Cruz” resolves to a single antibody in the Santa Cruz
catalog even without a catalog number. However, this is not ideal, because the catalog
may expand to include additional polyclonal anti-HDAC4 antibodies in the future, which
would render the resource unidentiﬁable. Additionally, catalog numbers are not stable as
products are discontinued or sold; hence we also looked for a record of the antibody in
the Antibody Registry (www.antibodyregistry.org), which provides stable IDs for antibody
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 5/22
Sequence molecule identiﬁcation
Sequence identiﬁcation is a central aspect of identiﬁability for many resource types.
Examples include specifying the sequence of an immunogenic peptide for a lab-sourced
antibody, the sequence of a DNA insert of a construct, or the sequence of a transgene
incorporated into the genome of an organism or cell line. In such cases, these sequences
need to be resolvable to known information about the speciﬁc nucleic acid or peptide
sequence to support identiﬁability of the resource to which they are related. Criteria
that establish resolution of a sequence in support of identifying a dependent resource
include: (1) directly providing the full sequence; (2) referencing a resource from which
the sequence can be determined (to the extent that it is known)—e.g., by providing
a gene ID or accession number that can be looked up and a sequence determined;
(3) when precise/complete sequence information does not exist, a sequence should be
tied to some other unique entity, such as a single, unique source and procedure through
which the physical sequence can be obtained/replicated (e.g., primers and a speciﬁc
source of template DNA such as a uniquely identiﬁed cell type or biological sample).
The requirement for complete resolution to a speciﬁc sequence is not absolute as it is
sometimes the case that this information is not known, and for some resource types
a complete sequence may not be required to be considered uniquely identiﬁable. One
recurring theme we encountered in our study was authors referencing a gene name or
sequence to identify cDNA or a peptide related to the gene. This can be problematic,
as speciﬁcation of a gene sequence may not be suﬃcient to resolve a single cDNA or
peptide sequence. This is because a single gene may resolve to many diﬀerent transcripts or
peptides (e.g., through alternative splicing), which can prevent unambiguous resolution of
a gene sequence to a cDNA or peptide sequence.
SPECIFIC RESOURCE IDENTIFICATION CRITERIA
Unique antibody identiﬁcation required at least one of the following: (1) an identiﬁer
resolving to a universal registry/database identiﬁer such as the Antibody Registry (www.
antibodyregistry.org) or eagle-i repository (http://www.eagle-i.net), or a vendor name and
catalog number for resolving to a single oﬀering; (2) for antibodies not publicly available,
suﬃcient protocol details on production of the antibody so as to allow reproduction. This
detail minimally includes specifying the host organism and identity of the immunogen
used. For peptide immunogens, criteria for sequence identiﬁcation above apply, i.e., that
an immunogenic protein or peptide resolves to single gene product sequence. Note that the
criteria for identiﬁability do not include the lot or batch number, although a case could be
made for this level of granularity.
For ‘wild-type’ organism strains, an unambiguous name or identiﬁer, such as a stock
number, the oﬃcial International Mouse Strain Resource (IMSR) name or a MOD
number, is required as well as a source vendor, repository, or lab. For genetically modiﬁed
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 6/22
strains, identiﬁability requires reporting or reference to all genotype information known,
including genetic background and breeding information, and precise alterations identiﬁed
in or introduced into the genome (including known sequence, genomic location, and
zygosity of alterations). For random transgene insertions, it is not required that genomic
location of insertion(s) is known, but precise sequence of inserted sequence should be
unambiguously resolvable according to sequence identiﬁcation criteria above. For targeted
alterations, genomic context of the targeted locus and the precise alterations to the locus
should be speciﬁed according to sequence identiﬁcation criteria above. This information
can be provided directly, or through reference to a MOD record or catalog oﬀering where
such information is available. The MODs provide speciﬁc nomenclature guidelines that are
consistent with these views.
For standard publically available lines, an unambiguous name or identiﬁer is required as
well as a source for the line (e.g., a vendor or repository). This information should resolve
to data about the organismal source and line establishment procedures. For example,
a common cell line reported that can be obtained from ATCC would be considered
identiﬁable, however, if only the name of the line is mentioned without any other
identifying information then it is considered unidentiﬁable. For novel lab-generated
cell lines, an organismal source (species and known genotype information, anatomical
entity of origin, developmental stage of origin) and any relevant procedures applied to
establish a stable lineage of cells. Additionally, some indication of passage number is
recommended but not strictly required. For genetically modiﬁed lines, identiﬁability
criteria are analogous to those for genetically modiﬁed organisms, including genomic
location and zygosity or copy number of modiﬁcations where this information is known.
Construct backbone should be unambiguously identiﬁed and resolvable to a complete
vector sequence (typically through a vendor or repository). The sequence of construct
inserts should be identiﬁable according to sequence identiﬁcation criteria above. Most
expression constructs incorporate cDNA—so it is particularly important that the exons
included in this insert are resolvable when more than one splice variant exists for a
gene transcript. This means that specifying the name of a gene or a protein expressed
may not be suﬃcient if this does not allow for unambiguous resolution to a cDNA
sequence. Identiﬁcation does not require precise description of MCS restriction sites
used for cloning, but this information is encouraged. Relative location and sequence of
epitope tags and regulatory sequences (promoters, enhancers, etc.) should be speciﬁed
(e.g., ‘N-terminal dual FLAG tag’ is suﬃcient). For example, referencing the accession
number and the vector backbone is suﬃcient to identify the construct, as in: “for the
full-length Dichaete construct, the insert was ampliﬁed from the full-length cDNA clone
(GenBank accession X96419 and cloned into the HindIII and KpnI sites of pBluescript II
KS(!)” (Shen, Aleksic & Russell, 2013). However, in most constructs, such level of detail is
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 7/22
Identiﬁability requires speciﬁc and complete sequence identiﬁcation according to the
criteria outlined above. This will typically be direct reporting of the sequence, as these
are generally short oligos. For example, this text provided in the method section was
considered identiﬁable: “The DNA target sequence for the rat Egr-2 (NM 053633.1)
gene was CAGGAUCCUUCAGCAUUCUTT” (Yan et al., 2013). In cases where sequence
information was not provided, the reagent was considered unidentiﬁable.
Since the data was binomial in that each resource was either identiﬁable or not, we used
a binomial conﬁdence interval strategy for calculating upper and lower 95% conﬁdence
intervals (CI) (http://www.biyee.net/data-solution/resources/binomial-confidence-
interval-calculator.aspx). Error bars for the corresponding 95% CI are displayed on the
graphs. Statistical signiﬁcance was determined by calculating the z-score.
RESULTS AND DISCUSSION
The goal of our study was to determine the proportion of research resources of ﬁve
common types that can be uniquely identiﬁed as reported in the literature. ‘Unique
identiﬁcation’ requires that a resource can be obtained or re-created based on information
provided in or resolvable from a publication. The criteria for identiﬁability were
established a reasonable level of granularity, recognizing that ﬁner levels, e.g., lot or litter
number, may be possible. Establishing identiﬁability criteria was central to our eﬀort, and
these criteria are complex and varied between resource types as described in the Methods
section. The results of our study provide quantiﬁcation of this problem in the literature. In
total, only 54% (922/1703) of evaluated resources were uniquely identiﬁable. Considerable
variability was found across resource types (Fig. 1A), which may result from the inherent
diﬀerences in the attributes relevant to their identiﬁcation, or from the level of external
support for applying identiﬁers and metadata for their unique identiﬁcation. In addition,
the level of identiﬁability for each resource type is tied directly to the stringency of the
criteria that were separately developed for each, which are unavoidably exposed to some
degree of subjectivity.
Antibody reagents represent one of the most challenging and important resource types to
adequately identify, given their ubiquitous use, expense to create, and condition-speciﬁc
eﬃcacy. The most common issue with reporting of antibodies was a lack of catalog number
(for commercial antibodies) or a lack of reference to the immunogen used to generate
the antibody (for non-commercial antibodies). A separate analysis of commercial versus
non-commercial (e.g., lab-made) antibodies showed an average of 46% of commercial
antibodies, and similarly, 43% of non-commercial antibodies were identiﬁable. While
commercial suppliers do an acceptable job of providing basic metadata about their
oﬀerings (for example, see http://datasheets.scbt.com/sc-546.pdf), the market is ﬂooded
with products of variable quality metadata. In practice, the literature is where most
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 8/22
Figure 1 Resource identiﬁability across disciplines. (A) Summary of average fraction identiﬁed for each resource type. (B–F) Identiﬁability of each
resource type by discipline. The total number of resources for each type is: (B) antibodies, n = 703; (C) cell lines, n = 104; (D) constructs, n = 258;
(E) knockdown reagents, n = 210; (F) organisms, n = 428. The y-axis is the average for each resource type across each domain. Variation from this
average is shown by the bars, error bars indicate upper and lower 95% conﬁdence intervals.
scientists look when searching for the right antibody for their work, as evidenced by a
marketing report from 1 Degree Bio (http://1degreebio.org/) showing 63% of researchers
use journal references to guide antibody selection (A Hodgson, unpublished data). This
makes it all the more troubling that only 44% of antibodies evaluated in our study could
be uniquely identiﬁed (Fig. 1B). While reporting of a catalog number alone is considered
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 9/22
suﬃcient for unique identiﬁcation of a commercially available antibody, we found they
were provided for only 27% of antibodies we evaluated.
A likely reason for the shortcoming in commercial antibody identiﬁcation may be that
journal reporting guidelines rarely require catalog numbers be reported for antibodies
(or any other reagent type for that matter). More commonly, only a name and location
of a manufacturer are required. For example, the journal “Immunology” simply states:
“Materials and Methods: suﬃcient information must be included to permit repetition
of experimental work. For specialist equipment and materials the manufacturer (and
if possible their location) should be stated” (Wiley Online Publishing). By contrast, the
Journal of Comparative Neurology (JCN) is one of the rare journals that do require more
precise reporting of antibody metadata, including their catalog number. An extensive
evaluation of 6,510 antibodies in the JCN Antibody Database (Wiley Online Publishing,
2013) revealed that a catalog number was reported in over 90% of the antibodies captured
in their database (AE Bandrowski, NA Vasilevsky, MH Brush, MA Haendel, V Astakhov, P
Ciccarese, J McMurry and ME Martone, unpublished data, and re-evaluated in this study).
This highlights how a simple solution such as requiring catalog number reporting can
vastly improve resource identiﬁcation in the literature.
Notably, as more data is becoming available about protein structure, localization, and
function, the identity of peptide immunogens and epitopes used in creating an antibody
becomes increasingly valuable for explaining its performance in diﬀerent applications.
Identiﬁcation and tracking of immunogens is one area where there is considerable room
for improvement among vendors and resource databases. Eﬀorts such as the Immune
Epitope Database (IEDB) (http://www.iedb.org/), a manually curated repository of
immunological data about epitope recognition, can be looked to for guidance in how
to capture and represent relevant data about such epitopes. The IEDB curates papers that
report discovery of new epitopes and even in this very speciﬁc use case where the goal is
to report on the speciﬁc epitope, only 81% of the epitopes they curated had the epitope
sequence provided in the published manuscript (R Vita, unpublished data).
A source for cell lines was rarely reported and the lack of source was the most common
factor for their low identiﬁability in our study. For commonly used, unmodiﬁed lines such
as HEK293T cells, our guidelines required a source be provided in addition to the line
name. This information was deemed important given the tendency of lines propagated in
isolation to diverge genetically through continuous passages (Hug hes et al., 2007). There
are increasingly documented occurrences of cell line misidentiﬁcation and contamination,
as highlighted by the infamous HeLa contamination statistics (Gartler, 1968) and other cell
line contaminations (Phuchareon et al., 2009). Simply reporting the name of the line with-
out a source fails to provide any information on the history and integrity of the line. For
lab-generated or genetically modiﬁed cell lines not available from a public source, identi-
ﬁcation required a basic description of the line’s establishment procedure, its anatomical
source, and/or the precise genetic modiﬁcations made (see details in Methods section).
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 10/22
Based on these criteria, the identiﬁability of cell lines was comparable to that for
antibodies, averaging 43% across all disciplines (Fig. 1C). A notable diﬀerence was found
for cell line identiﬁability between our lowest and highest reporting disciplines—General
Biology (0% identiﬁable) and Immunology (88% identiﬁable). This may reﬂect the
tendency for less rigorous reporting requirements and reduced space allocation for
methods that are common in high-proﬁle journals we included in this category (e.g.,
Nature, Science). By contrast, the majority of cell lines reported in Immunology papers
adequately referenced either the lab, investigator, or commercial supplier that provided
the cell line, which may indicate more rigorous conventions for sharing and attribution
for cell lines in this community; however, due to the low number of cell lines evaluated in
immunology journals in this study, we cannot make this conclusion.
An important aspect of cell lines that we found highly neglected in literature reporting
was passage number. This attribute provides an important metric to gauge the integrity
of a cell line sample, and how likely it is to be faithfully reﬂected in another sample. We
found such information to be rarely reported in our study, and thus did not require it
in addition to a source for identiﬁability. But we highly recommend more attention be
paid to tracking and reporting this important attribute in the literature. This practice is
particularly important for lines propagated in research labs, as a survey on cell line usage
reported that 35% of researchers use cell lines obtained from another lab rather than a
cell line repository (Buehring, Eby & Eby). Tracking passage number and contamination
is a lower priority in these labs compared to commercial repositories, such that the use of
genetically or compositionally divergent samples of the same line is likely to be a signiﬁcant
contributor to diﬃculties in reproducing cell-line based research. Towards this end, a
guideline has been published to check for contamination and authenticity of cell lines
(Capes-Davis et al., 2010).
Unique identiﬁcation of constructs was the lowest amongst all resource types examined,
on average 25% were identiﬁable, due to lack of reporting of sequence or other identifying
information (Fig. 1D). This was likely due to the dependency of identiﬁcation on reporting
a complete or approximated sequence, and the lack of incentive, guidelines, or technical
support for providing such metadata. While many construct backbones are obtained
from commercial manufacturers where the relevant sequence information is provided,
the valuable component of a construct are the gene(s) that have been sub-cloned in by
a researcher. Access to this sequence information is critical in order to reproduce the
experiment or fully utilize these resources, but it is rarely directly reported in full. While
resources like Addgene and PlasmID provide detailed information about constructs and
the relevant gene components, submission of plasmids to such repositories is infrequent, as
we found less than 10% of non-commercial plasmids reported in our corpus to be present
in such repositories. In cases where primer sets were used to generate a construct insert,
we often found that the primer sequences were reported; yet the speciﬁc and complete
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 11/22
sequence of the ampliﬁed template was rarely speciﬁed. In such cases, it is not possible to
determine the sequence of the product cloned into a construct.
Gene knockdown reagents
Knockdown reagents were signiﬁcantly more identiﬁable compared to the former resource
types mentioned above, at 83% (Fig. 1E). Knockdown reagents are frequently used, in
particular in Cell and Developmental Biology (Harborth et al., 2001; Nasevicius & Ekker,
2000). Identiﬁability of knockdown reagents was the highest amongst resource types.
This is likely due to the fact that knockdown reagents tend to be comprised of short,
and therefore easy to include, sequence information. Additionally, editors often require
reporting of sequences for custom reagents, as this information is critical to understanding
and verifying the reagent function. MODs also keep track of these sequences as they curate
papers. The majority of knockdown reagents that were curated in this study were from
Developmental Biology journals, which also had the lowest number of identiﬁable reagents
compared to other ﬁelds. Knowing the exact sequence used is necessary to reproduce
the experiment, and concentration and experimental details are similarly important to
determine oﬀ-target eﬀects.
Organisms showed a relatively high identiﬁability of 77% (Fig. 1F). Amongst organisms,
yeast were the most identiﬁable (100%, albeit there were only 5 strains analyzed from one
paper), followed by zebraﬁsh (87%), ﬂies (80%), mice (67%), and rats (60%). Worms
and frogs were the least identiﬁable, at 58%, 33%, and 0%, respectively. The identiﬁcation
of transgenic organisms was higher, with 83% of transgenic organisms being identiﬁable
compared to 46% of non-transgenic wild type strains. The higher identiﬁability may be
due to the fact that 56% of the transgenic strains we analyzed had already been curated
by a MOD, because the organisms reported in our corpus were previously reported in an
earlier publication that had been curated by a MOD. Indeed, identiﬁability of organisms
not found in a MOD was considerably lower at 60%. The MODs review the current
literature and annotate information about genetic modiﬁcations used in transgenic strains,
phenotypes, gene expression, etc., in addition to other relevant types of information
pertaining to the organisms (Bradford et al., 2011; Bowes et al., 2010; Yook et al., 2012;
Marygold et al., 2013; Laulederkind et al., 2013; Bult et al., 2013). While it is reassuring that
these speciﬁc strains have been previously curated via earlier publications, it often requires
the curator to dig through many publications or to contact the authors directly. ZFIN
determined that over a two-month period, they had to contact 29% of authors to properly
curate the resources reported in their manuscript.
Comparing organism identiﬁcation between disciplines, we noted that they were con-
siderably less identiﬁable in Neuroscience papers (46%) relative to other domains. A likely
explanation is that non-transgenic animals are commonly used in neuroscience assays
such as electrophysiology studies (26 out of 62 organisms analyzed were non-transgenic).
Identiﬁcation of such commercially available strains faces similar problems as standard cell
lines, where a source is required to allow some historical information to be obtained about
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 12/22
propagation/breeding. Indeed, it has been reported that there are many variations between
wild type strains of model organisms (Portelli et al., 2009; Sandberg et al., 2000; Wahlsten,
1987), and variations between suppliers (Ezerman & Kromer, 1985).
We further examined if the unique identiﬁcation of resources diﬀered between sub-
disciplines of biomedical research (Table 1). While no discipline was consistently above
or below average with respect to identiﬁcation of the resources, Developmental Biology,
General Biology, and Immunology were generally above average compared to the other
ﬁelds. The identiﬁcation of cell lines was highest in Immunology papers, which was
signiﬁcantly diﬀerent from Cell and General Biology papers, and papers from the “other”
category, even though there was a small sample size (16 out of 104 total cell lines were
from Immunology journals). By contrast, no cell lines were identiﬁable in the General
Biology papers, which was signiﬁcantly lower compared to all disciplines except the
“other” category. However, General Biology journals boasted the highest percentage of
identiﬁable constructs in papers at 59%, which was a signiﬁcantly better compared to
the other disciplines except Immunology. It is notable that identiﬁcation of resources for
Neuroscience was below average compared to the other ﬁelds for all resources except cell
lines. Of note, identiﬁcation of organisms in Neuroscience journals was signiﬁcantly less
than all other disciplines (30 out of 62 organisms were identiﬁable). Overall, there was
not a consistent trend between scientiﬁc sub-domains with respect to identiﬁability of
resources (Figs. 1B–1F).
Impact factor considerations
We next examined whether identiﬁcation of resources diﬀered among journals across a
range of impact factors. We found that resource identiﬁcation did not vary with journal
impact factor, as revealed by the lack of correlation in scatter plot analysis (Figs. 2A–2E).
Analysis by reporting requirements
Very few journals were considered to have stringent reporting requirements, and amongst
those, it was surprising to note that the identiﬁcation of the resources did not appear
improved in journals with satisfactory or loose reporting requirements. Identiﬁcation
of cell lines was especially poor in journals with satisfactory reporting guidelines (0
out of 21 were identiﬁable, from 10 articles analyzed), and overall, the identiﬁcation
of the resources was the poorest in journals with highest reporting requirements (an
average of 45% were identiﬁable in journals with stringent reporting requirements,
while resources from journals with satisfactory and loose were on average 61% and 55%
identiﬁable, respectively; Fig. 3). On average, journals with loose reporting requirements
had a signiﬁcantly higher percentage of identiﬁable resources compared to journals with
stringent reporting requirements.
With most journals having a low or mid-level impact factor (i.e., a skewed distribution),
the majority of high identiﬁability therefore comes from these lower proﬁle journals.
This is an encouraging result, because it means that the lion’s share of the publishing
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 13/22
Table 1 Final numbers of identiﬁable resources in each domain.
Resource type Domain Total number
Antibody Cell biology 69 149 34 46%
Dev biology 68 144 44 47%
General biology 36 74 19 49%
Immunology 48 124 28 39%
Neuroscience 60 136 41 44%
Other 31 76 24 41%
Grand total 312 703 190 44%
Cell lines Cell biology 15 38 17 39%
Dev biology 7 12 5 58%
General biology 0 10 5 0%
Immunology 14 16 6 88%
Neuroscience 4 7 6 57%
Other 5 21 9 24%
Grand total 45 104 48 43%
Constructs Cell biology 16 84 17 19%
Dev biology 18 66 19 27%
General biology 16 27 8 59%
Immunology 3 8 3 38%
Neuroscience 4 35 7 11%
Other 7 38 12 18%
Grand total 64 258 66 25%
Knockdown reagents Cell biology 40 49 16 82%
Dev biology 55 76 22 72%
General biology 31 31 9 100%
Immunology 5 5 3 100%
Neuroscience 9 12 6 75%
Other 35 37 14 95%
Grand total 175 210 70 83%
Organisms Cell biology 57 70 27 81%
Dev biology 119 141 44 84%
General biology 30 36 11 83%
Immunology 38 48 20 79%
Neuroscience 30 62 38 48%
Other 57 71 28 80%
Grand total 331 428 168 77%
Overall total 927 1703 54%
world has already demonstrated a capability of producing identiﬁable resources. It is
especially important to not overlook these higher volume lower-cited journals to produce
quality metadata about research resources. Additionally, higher impact journals tend to
de-emphasize methods over other sections. Therefore, what is needed is to incentivize all
journals to do better with respect to identiﬁability.
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 14/22
Figure 2 Resource identiﬁcation rates across journals of varying impact factors. (A) An overview of fraction identiﬁed by impact factor for all
resource types. (B–F) Fraction identiﬁed by impact factor for each individual resource type. Increasing height on the x-axis corresponds with a
higher impact factor for each journal.
Lab documentation vs. publications
For the Urban lab publications that we evaluated, only 44% of the antibodies used were
identiﬁable (out of 9 total antibodies from 5 papers), and 47% of the organisms were
identiﬁable (out of 17 organisms from 17 papers). We note that this lab internally keeps
highly structured notes and metadata about their resources in the lab; after analyzing
their internal notes, 100% of antibodies and 100% of organisms were identiﬁable using
our criteria. However, despite this information being tracked extensively within the
lab, these details did not make it into their publications. It does suggest, however, that
the information is potentially recoverable, if practices to make resources identiﬁable are
Evaluation criteria and workﬂow
A core challenge of designing this experiment was determining evaluation criteria that
were precise enough to allow for reproducible determination of reported resource
identiﬁability. For simplicity, we used a binary classiﬁcation for the data analysis, but in
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 15/22
Figure 3 Identiﬁcation of resource varies across journals with varying resource-reporting require-
ments. The classiﬁcations of reporting requirements are summarized in the methods. A total of 53
out of 118 resources were identiﬁable in the stringent reporting guidelines category (17 papers were
analyzed), 201 resources were identiﬁable out of 329 resources for the satisfactory category (48 papers
were analyzed) and 662 out of 1,217 resources were identiﬁable in the loose category (182 papers were
analyzed). Variation from this average is shown by the bars, error bars indicate upper and lower 95%
reality the amount of information pertaining to resource identiﬁcation was incremental.
Crafting of these criteria required careful consideration of each resource type, including
how they are generated and acquired and the particular aspects of each that are important
in the context of experimental reproducibility. This was particularly complex for resources
whose identiﬁcation required sequence information relating to a target or part of the
resource, as diﬀerent applications may require diﬀerent degrees of speciﬁcity. Despite
the abundance of public databases that provide identiﬁers for biological sequences, we
found a reluctance of authors to reference such IDs when documenting reagents such as
constructs or antibodies. This may point to a lack of awareness, a lack of incentive, or a
lack of means for the journals and authors to use existing resources to supply uniquely
identiﬁable information. Each problem is likely to have its own set of solutions, which we
discuss in our recommendations below.
To ensure their consistent application, criteria and evaluation workﬂows were centrally
documented, performed, and evaluated performed by expert biocurators. These results
support the speciﬁcity and reproducibility of our guidelines, which we hope will serve to
inform reporting requirements of publishers and the development of support platforms
Improving reporting guidelines for authors is an important step towards addressing
this problem. Very few journals (only 5/83) had high stringency guidelines by our
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 16/22
deﬁnition. Higher impact journals like Science and Nature tended to have looser reporting
requirements, usually due to space limitations in the journal and often required reference
to previously published methods. It has also been previously noted that higher impact
journals have a higher retraction rate (Fang, Casadevall & Morrison, 2011). The Journal
of Comparative Neurology has stringent reporting standards for materials and methods,
requiring that sources for all materials and equipment, sequence information for nucleic
acids and peptides, and immunogen and catalog number for antibodies be reported.
It is our hope that other journals will follow suit. That said, we found that antibody
identiﬁability in the Journal of Comparative Neurology was only slightly higher than
average across all journals (58% in JCN vs. 44% overall). Our ﬁndings are also much
lower than the percentage calculated from the JCN database above, perhaps due to lack of
compliance by authors or lack of enforcement by reviewers. Based on the sampling that
we have, there does not seem to be any relationship between reporting guidelines and
identiﬁability. One might ask, how can this be? The reality is that having quality guidelines
for authors is only one part of the solution. For example, Mike Taylor writes about how the
peer review process fails to enable trustworthy science (Taylor, 2013).
The solution to improving resource identiﬁability and therefore scientiﬁc repro-
ducibility needs to be a partnership between all participants in the scientiﬁc process,
and deﬁciencies in awareness and diﬃculties coordinating across these stakeholders is at
the root of the problem. Better tracking of research resources by researchers during the
course of research can facilitate sharing of information with databases and at publication
time. Electronic lab notebooks and management software (Machina & Wild, 2013;
Hrynaszkiewicz, 2012), or resource sharing repositories such as the eagle-i Network
(www.eagle-i.net) (Vasilevsky et al., 2012) or the Neuroscience Information Framework
(http://www.neuinfo.org/) (Bandrowski et al., 2012) enable creation of stable identiﬁers
and structured tracking of information. The MODs have recommended nomenclature
standards for organisms, but these are not always adhered to (Eppig & Levan, 2005;
MGI, 2013; ZFIN, 2013; Flybase, 2013). In an ideal situation, authors would report the
unique ID pertaining to the model organism directly in the publication by having their
ID assigned and nomenclature approved prior to publication. Then a direct link and easy
access to the information to researchers who are attempting to understand or reproduce an
experiment can be made available. In addition, this can facilitate text-mining and machine
processing using automated agents that recognize these IDs. Journal editors should better
detail reporting requirements, such as in the recent communiqu
e from Nature (http://
www.nature.com/authors/policies/reporting.pdf). Publishers also need functionality to
identify resources at the time of submission. Tools such as the DOMEO Toolkit allow
for semantic markup of papers (Ciccarese, Ocana, & Clark , 2012) and can be utilized
during the submission process whereby researchers can easily check the identiﬁability of
the resources found in their paper. Vendors, if more aware of how their products are being
referenced in the literature and databases, may tend towards better and more stable catalog
schemes as well as to integrate the added knowledge being captured in external resources.
Finally, researchers can be attributed for their resources so that they would be incentivized
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 17/22
to uniquely identify and share them. Recent changes to the NSF biosketch highlight a
speciﬁc area where uniquely identifying such resources can have a positive inﬂuence on the
evaluation of one’s scholarly activities. Similarly, the Bioresource Research Impact Factor
(BRIF) (Mab ile et al., 2013) provides attribution for use and sharing of resources. Unique
reference of resources through databases such as the Antibody Registry, eagle-i or MODs
can facilitate this process. Finally, researchers need to know where the information in their
favorite online resources comes from—the literature and the biocurators that curate their
papers and datasets. Identiﬁability is just as important in the context of data sets, and given
the signiﬁcant eﬀort being made to make informatics analyses reproducible (http://www.
runmycode.org/CompanionSite/) and data sets available (dryad.org, figshare.com/), it is
ironic that in some cases the original data itself may not be reproducible simply because the
antibody used to generate the data was never speciﬁed.
Scientiﬁc reproducibility is dependent on many attributes of the scientiﬁc method.
Being able to the uniquely identify the resources used in the experiments is only one of
these attributes—it just happens to be the easiest one to accomplish. We hope that this
study insights authors, reviewers, editors, vendors, and publishers to work together to
realize this common goal.
We would like to acknowledge Robin Champieux for her help with the experimental
design, John Campbell for his help with data and statistical analysis and discussion, Scott
Hoﬀmann for his help with the data analysis and ﬁgure preparation, Alex Hodgson
for sharing the antibody market analysis and manuscript review, Nathan Urban for
discussions and sharing information on lab internal databases and notes, and Randi Vita
for manuscript review and for sharing the IEDB data, Ceri Van Slkye for her help with
analyzing the yeast strains, and Anita de Waard, Maryann Martone, and Anita Bandrowski
for discussion and manuscript review.
ADDITIONAL INFORMATION AND DECLARATIONS
OHSU acknowledges the support of the OHSU Library and #1R24OD011883-01 from the
NIH Oﬃce of the Director. The Zebraﬁsh Information Network and Flybase are funded
by the National Human Genome Research Institute (P41 HG002659 and P41 HG000739,
respectively). Shreejoy Tripathy of the Urban Lab is funded by an NSF graduate research
fellowship and a RK Mellon Foundation Fellowship. Greg LaRoca is funded by NIH grants
R01DC005798 and R01DC011184. The funders had no role in study design, data collection
and analysis, decision to publish, or preparation of the manuscript.
The following grant information was disclosed by the authors:
NIH Oﬃce of the Director: NIH grant #1R24OD011883-01.
National Human Genome Research Institute: #P41 HG002659, #P41 HG000739.
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 18/22
NSF graduate research fellowship and a RK Mellon Foundation fellowship.
NIH grants: R01DC005798, R01DC011184.
The authors have no competing interests.
• Nicole A. Vasilevsky conceived and designed the experiments, performed the experi-
ments, analyzed the data, wrote the paper.
• Matthew H. Brush and Melissa A. Haendel conceived and designed the experiments,
wrote the paper.
• Holly Paddock and Laura Ponting performed the experiments.
• Shreejoy J. Tripathy provided data from the Nathan Urban Lab that we used in the
analysis and reviewed the manuscript.
• Gregory M. LaRocca provided data from Nathan Urban Lab that we used in the analysis.
The following information was supplied regarding the deposition of related data:
Supplemental information for this article can be found online at http://dx.doi.org/
Bandrowski AE, Cachat J, Li Y, M
uller HM, Sternberg PW, Ciccarese P, Clark T, Marenco L,
Wang R, Astakhov V, Grethe JS, Martone ME. 2012. A hybrid human and machine
resource curation pipeline for the neuroscience information framework. Database
bas005 DOI 10.1093/database/bas005.
Begley CG, Ellis LM. 2012. Drug development: raise standards for preclinical cancer research.
Nature 483(7391):531–533 DOI 10.1038/483531a.
Bourbeillon J, Orchard S, Benhar I, Borrebaeck C, de Daruvar A, D
ubel S, Frank R,
Gibson F, Gloriam D, Haslam N, Hiltker T, Humphrey-Smith I, Hust M, Juncker D,
Koegl M, Konthur Z, Korn B, Krobitsch S, Muyldermans S, Nygren P-
A, Palcy S, Polic B,
Rodriguez H, Sawyer A, Schlapshy M, Snyder M, Stoevesandt O, Taussig MJ, Templin M,
Uhlen M, van der Maarel S, Wingren C, Hermjakob H, Sherman D. 2010. Minimum
information about a protein aﬃnity reagent (MIAPAR). Nature Biotechnology 28(7):650–653
Bowes JB, Snyder KA, Segerdell E, Jarabek CJ, Azam K, Zorn AM, Vize PD. 2010. Xenbase:
gene expression and improved integration. Nucleic Acids Research 38(Suppl 1):D607–D612
Bradford Y, Conlin T, Dunn N, Fashena D, Frazer K, Howe DG, Knight J, Mani P, Martin R,
Moxon SAT, Paddock H, Pich C, Ramachandran S, Ruef BJ, Ruzicka L, Schaper HB,
Schaper K, Shao X, Singer A, Sprague J, Sprunger B, Slyke CV, Westerﬁeld M. 2011. ZFIN:
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 19/22
enhancements and updates to the zebraﬁsh model organism database. Nucleic Acids Research
39(Database issue):D822–D829 DOI 10.1093/nar/gkq1077.
Buehring GC, Eby EA, Eby MJ. 2004. Cell line cross-contamination: how aware are mammalian
cell culturists of the problem and how to monitor it? In Vitro Cellular & Developmental Biology.
Animal 40(7):211–215 DOI 10.1290/1543-706X(2004)40<211:CLCHAA>2.0.CO;2.
Bult CJ, Eppig JT, Blake JA, Kadin JA, Richardson JE. 2013. The mouse genome database:
genotypes, phenotypes, and models of human disease. Nucleic Acids Research
41(D1):D885–D891 DOI 10.1093/nar/gks1115.
Capes-Davis A, Theodosopoulos G, Atkin I, Drexler HG, Kohara A, MacLeod RAF, Masters JR,
Nakamura Y, Reid YA, Reddel RR, Freshney RI. 2010. Check your cultures! A list of
cross-contaminated or misidentiﬁed cell lines. International Journal of Cancer 127(1):1–8
Ciccarese P, Ocana M, Clark T. 2012. Open semantic annotation of scientiﬁc publications using
DOMEO. Journal of Biomedical Semantics 3(Suppl 1):1–14 DOI 10.1186/2041-1480-3-S1-S1.
Cokol M, Ozbay F, Rodriguez-Esteban R. 2008. Retraction rates are on the rise. EMBO Repor ts
9(1):2 DOI 10.1038/sj.embor.7401143.
Eppig JT, Levan G. 2005. MGI 3.22 - rules for nomenclature of genes, genetic markers, alleles, and
mutations in mouse and rat. Available at: http://rgd.mcw.edu/nomen rules.html.
Ezerman EB, Kromer LF. 1985. Outbred Sprague-Dawley rats from two breeders exhibit
diﬀerent incidences of neuroanatomical abnormalities aﬀecting the primary cerebellar ﬁssure.
Experimental Brain Research 59(3): DOI 10.1007/BF00261354.
Fang FC, Casadevall A, Morrison RP. 2011. Retracted science and the retraction index. Infection
and Immunity 79(10):3855–3859 DOI 10.1128/IAI.05661-11.
Flybase. 2013. FlyBase: Genetic nomenclature for Drosophila melanogaster. http://flybase.org/static
Gartler SM. 1968. Apparent Hela cell contamination of human heteroploid cell lines. Nature
217(5130):750–751 DOI 10.1038/217750a0.
Haendel MA, Vasilevsky NA, Wirz JA. 2012. Dealing with data: a case study on information and
data management literacy. PLoS Biology 10(5):e1001339 DOI 10.1371/journal.pbio.1001339.
Harborth J, Elbashir SM, Bechert K, Tuschl T, Weber K. 2001. Identiﬁcation of essential genes
in cultured mammalian cells using small interfering RNAs. Journal of Cell Science 114
Hrynaszkiewicz I. 2012. LabArchives and BioMed Central: a new platform for publishing scientiﬁc
data. BioMed Central Blog. Available at: http://blogs.biomedcentral.com/bmcblog/2012/04/04/
Hughes P, Marshall D, Reid Y, Parkes H, Gelber C. 2007. The costs of using unauthenticated,
over-passaged cell lines: how much more data do we need? BioTechniques 43(5): 575, 577–578,
581–582 passimDOI 10.2144/000112598.
ISO 5725-1:1994. 1994. Accuracy (trueness and precision) of measurement methods and results.
Kilkenny C, Parsons N, Kadyszewski E, Festing MFW, Cuthill IC, Fry D, Hutton J, Altman DG.
2009. Survey of the quality of experimental design, statistical analysis and reporting of research
using animals. PLoS ONE 4(11):e7824 DOI 10.1371/journal.pone.0007824.
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 20/22
Laulederkind SJF, Liu W, Smith JR, Hayman GT, Wang S-J, Nigam R, Petri V, Lowry TF,
de Pons J, Dwinell MR, Shimoyama M. 2013. PhenoMiner: quantitative phenotype curation at
the rat genome database. Database bat015 DOI 10.1093/database/bat015.
Mabile L, Dalgleish R, Thorisson GA, Desch
enes M, Hewitt R, Carpenter J, Bravo E,
Filocamo M, Gourraud PA, Harris JR, Hofman P, Kauﬀmann F, Mu
Pasterk M, Cambon-Thomsen A, BRIF working group. 2013. Quantifying the use of
bioresources for promoting their sharing in scientiﬁc research. GigaScience 2(1):1–8
Machina HK, Wild DJ. 2013. Electronic laboratory notebooks progress and challenges
in implementation. Journal of Laboratory Automation 2211068213484471
Martone M, Eppig J, Turner J, Ellisman M, Bandrowski A, Westerﬁeld M, Canada B, Cheng K,
Dolinski K, Tyers M, Anderson D. 2012. Publishing in the 21st century: Minimal (really) data
standards. FORCE11. http://www.force11.org/node/4145.
Marygold SJ, Leyland PC, Seal RL, Goodman JL, Thurmond J, Strelets VB, Wilson RJ.
2013. FlyBase: improvements to the bibliography. Nucleic Acids Research 41(Database
issue):D751–D757 DOI 10.1093/nar/gks1024.
MGI. 2013. Mouse nomenclature home page. Available at: http://www.informatics.jax.org/
Nasevicius A, Ekker SC. 2000. Eﬀective targeted gene ‘knockdown’ in zebraﬁsh’. Nature Genetics
26(2):216–220 DOI 10.1038/79951.
Phil B, Clark T, Dale R, de Waard A, Herman I, Hovy E, Shotton D. 2011. Force 11 Manifesto:
Improving future research communication and e-Scholarship. FORCE11. Available at: http://
Phuchareon J, Ohta Y, Woo JM, Eisele DS, Tetsu O. 2009. Genetic proﬁling reveals cross-
contamination and misidentiﬁcation of 6 adenoid cystic carcinoma cell lines: ACC2, ACC3,
ACCM, ACCNS, ACCS and CAC2. PLoS ONE 4(6):e6040 DOI 10.1371/journal.pone.0006040.
Portelli J, Aourz N, De Bundel D, Meurs A, Smolders I, Michotte Y, Clinckers R. 2009.
Intrastrain diﬀerences in seizure susceptibility, pharmacological response and basal neurochem-
istry of Wistar rats. Epilepsy Research 87(2):234–246 DOI 10.1016/j.eplepsyres.2009.09.009.
Sandberg R, Yasuda R, Pankratz DG, Carter TA, Del Rio JA, Wodicka L, Mayford M,
Lockhart DJ, Barlow C. 2000. Regional and strain-speciﬁc gene expression mapping in
the adult mouse brain. Proceedings of the National Academy of Sciences of the United States of
America 97(20):11038–11043 DOI 10.1073/pnas.97.20.11038.
Sansone S-A. 2013. Our roadmap to engagement, your call. Scientiﬁc Data Updates Available at:
Sansone S-A, Rocca-Serra P, Field D, Maguire E, Taylor, Hofmann O, Fang H, Neumann S,
Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T,
Coleman L-A, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo CT,
Forster MJ, Gaudet P, Gilbert J, Goble C, Griﬃn JL, Jacob D, Kleinjans J, Harland L,
Haug K, Hermjakob H, Sui SJH, Laederach A, Liang S, Marshall S, McGrath A, Merrill E,
Reilly D, Roux M, Shamu CE, Shang CA, Steinbeck C, Trefethen A, Williams-Jones B,
Wolstencroft K, Xenarios I, Hide W. 2012. Toward interoperable bioscience data. Nature
Genetics 44(2):121–126 DOI 10.1038/ng.1054.
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 21/22
Shen SP, Aleksic J, Russell S. 2013. Identifying targets of the Sox domain protein Dichaete in the
Drosophila CNS via targeted expression of dominant negative proteins. BMC Developmental
Biology 13:1–15 DOI 10.1186/1471-213X-13-1.
Taylor M. 2013. Peer review does not mean we can trust a published paper. Sauropod Vertebra
Picture of the Week. Available at: http://svpow.com/2013/05/03/peer-review-does-not-mean-we-
Thomas DG, Gaheen S, Harper SL, Fritts M, Klaessig F, Hahn-Dantona E, Paik D, Pan S,
Staﬀord GA, Freund ET, Klemm JD, Baker NA. 2013. ISA-TAB-Nano: a speciﬁcation for
sharing nanomaterial research data in spreadsheet-based format. BMC Biotechnology
13(1):2 DOI 10.1186/1472-6750-13-2.
Thomson Reuters. 2011. Journal Citation Report. http://wokinfo.com/products tools/analytical/jcr/.
Vasilevsky N, Johnson T, Corday K, Torniai C, Brush M, Segerdell E, Wilson M, Shaﬀer C,
Robinson D, Haendel M. 2012. Research resources: curating the new eagle-i discovery system.
Database bar067 DOI 10.1093/database/bar067.
de Waard A. 2010. The future of the journal? Integrating research data with scientiﬁc discourse.
Available at: http://precedings.nature.com/documents/4742/version/1
Wahlsten D. 1987. Defects of the fetal forebrain in mice with hereditary agenesis of the corpus
callosum. The Journal of Comparative Neurology 262(2):227–241 DOI 10.1002/cne.902620205.
Wiley Online Publishing. Immunology Author Guidelines. Available at: http://onlinelibrary.wiley.
Wiley Online Publishing. 2013. JCN Antibody Database. Available at: http://onlinelibrary.wiley.
com/journal/10.1002/(ISSN)1096-9861/homepage/jcn antibody database.htm.
Yan Y, Tan X, Wu X, Shao B, Wu X, Cao J, Xu J, Wang X, Gao Y, Cui G. 2013. Involvement of
early growth response-2 (Egr-2) in lipopolysaccharide-induced neuroinﬂammation. Journal
of Molecular Histology 44(3):249–257 DOI 10.1007/s10735-013-9482-y.
Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, de la Cruz N, Duong A,
Fang R, Ganesan U, Grove C, Howe K, Kadam S, Kishore R, Lee R, Li Y, Muller H-M,
Nakamura C, Nash B, Ozersky P, Paulini M, Raciti D, Rangarajan A, Schindelman G, Shi X,
Schwarz EM, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Hodgkin J, Berriman M,
Durbin R, Kersey P, Spieth J, Stein L, Sternberg PW. 2012. WormBase 2012: more genomes,
more data, new website. Nucleic Acids Research 40(D1):D735–D741 DOI 10.1093/nar/gkr954.
ZFIN. 2013. ZFIN Zebraﬁsh Nomenclature Guidelines. Available at: https://wiki.zfin.org/display/
Vasilevsky et al. (2013), PeerJ, DOI 10.7717/peerj.148 22/22