ArticlePDF Available

Distraction to illumination: Mining biomedical publications for serendipity in research



Content may be subject to copyright.
ASIS&T Annual Meeting 2018 10 Papers
Distraction to Illumination: Mining Biomedical Publica-
tions for Serendipity in Research
Carla M. Allen
School of Information Sciences & Learning Technologies,
University of Missouri-Columbia, USA.
Sanda Erdelez
School of Library and Information Science,
Simmons University, MA, USA.
As our technological capabilities for filling information needs improve, developers seek to more effective ways
to support different aspects of the users’ experience. One aspect that is gaining attention as an emerging support
area is serendipity. However, supporting serendipity within a recommender system is difficult because the ex-
perience is unexpected and, therefore unpredictable. While researchers agree that algorithms to support seren-
dipity need to be able to provide a balance of surprise and value to the end user (Niu & Abbas, 2017), an under-
standing of how to provide that balance has not yet been realized. Information that could be puzzling or distract-
ing to someone as they go about their research activities may provide the trigger someone else needs to make
a serendipitous connection in their research. Reports of serendipitous occurrences in research settings have
been identified in research commentaries (Campanario, 1996) and within full-text research articles (Allen,
Erdelez, & Marinov, 2013). This paper investigates the feasibility of automating the identification of information
encounters in full-text research articles. This study contributes to the development of algorithms for supporting
serendipity in information systems. We identified four variables that are useful for predicting information en-
counters in 25-35% of the instances. While we should continue to search for additional predictive variables, these
findings present a novel approach to undertaking the support of serendipity in information systems.
serendipity; information behavior; recommender systems; information encountering; human-centered information retrieval;
research reporting.
In the history of scientific research, ‘serendipitous’ events, or ‘happy accidents’, are known to have played significant roles in
the advancement of innumerable fields. From the development of Post-it® notes (Schwager, 2000) and Super-glue® (Raja,
2016) to the discovery of x-rays (Shapiro, 1986) and the development of Viagra (Lesko, 2017), society has benefitted from
these ‘pleasant surprises’. The process of serendipity connects seemingly unrelated concepts in what can appear to be a highly
subjective manner and relies on unique characteristics of the encountering individual to make these fortunate connections
(Iaquinta et al., 2008; Maccatrozzo, van Everdingen, Aroyo, & Schreiber, 2017). When viewed from an information behavior
perspective, however, it becomes evident that serendipity involves behaviors that are much more purposeful than mere acci-
dents. Makri and Blandford (2012) note that realization of ‘serendipity’ requires insight and contemplation to make connections
between disparate ideas. The unexpected circumstances must occur in the presence of someone who has the intellectual back-
ground and capacity to understand the significance of the event. That individual must possess an understanding of the context
in which the unexpected event occurs and must also be able to apply that understanding to a different problem. In addition to
being unexpected, the user must deem the information interesting (Ge, Delgado-Battenfeld, & Jannach, 2010). The individual
must both perceive and attend to the unanticipated information for it to be useful in making connections between concepts.
These characteristics are defining features of information encountering (Erdelez, 1997). Information encountering is the process
of noticing and making note of unexpected information that, while not relevant to the most immediate information need, ad-
dresses a background need or an unrealized future information need. Information that is encountered rather than sought presents
the user with features that are not only surprising but also are salient in a way that allows the user to make loose associations
(Kefalidou & Sharples, 2016) between the presented information and distant concepts. The idea of 'loose associations' is par-
ticularly relevant to information encountering as additional, purposeful information search is required to build solid connections
between the concepts. These associations are the starting point for making the concrete connections necessary for achieving
'serendipitous' outcomes. Serendipity forms the foundation for linking seemingly unrelated ideas and using that connection to
improve either or both processes. At the heart of serendipity is a unique type of information behavior information encounter-
ing. By understanding and identifying the ways in which information encountering is described in published research literature,
we may be able to develop better methods of supporting serendipity within our information systems and catalyze the positive
outcomes of serendipity.
ASIS&T Annual Meeting 2018 11 Papers
Serendipity and Information Recommender systems
With the ubiquity of electronic resources, information systems have evolved from mere retrieval of information records to
recommender systems which seek to supply end users with the most relevant and salient resources from the vast array of
indexed information. Such recommender systems are powered by carefully developed algorithms (Kefalidou & Sharples, 2016;
Wu, He, & Yang, 2012). While most algorithms focus on the accuracy of the retrieved results, there has been increased interest
in the development of algorithms that will go beyond accuracy to support novelty, diversity and serendipity with the retrieved
results. Development of recommender systems that focus on providing 'serendipitous' encounters have been the focus of several
information behavior studies (Erdelez, 2004; Makri, Toms, McCay-Peet, & Blandford, 2011; McCay-Peet & Toms, 2011;
Wopereis & Braam, 2018) and human computer interaction studies (Dahroug et al., 2017; Fazeli et al., 2017; Maccatrozzo et
al., 2017; Niu & Abbas, 2017) in recent years. One challenge to the development of recommender systems that support seren-
dipity is identifying the type of information that end users will find surprising but also useful. Iaquinta et al. (2008) found that
the concept of serendipity in recommendations is highly subjective, and that serendipity as a quality of recommended infor-
mation is difficult to assess. Many developers (Herlocker, Konstan, Terveen, & Riedl, 2004; Maccatrozzo et al., 2017; Niu &
Abbas, 2017) have defined serendipity in returned results as those items which hold both surprise and value for the end user.
Ge et al (2010) identified key features of serendipitous results as items which have not been previously discovered or expected
by the user and are considered to be interesting, relevant and useful to the user. Other researchers have described serendipity
in recommender systems as a user experience rather than a feature of the information presented (McNee, Riedl, & Konstan,
2006). While it has been widely accepted that recommender systems need to support serendipity, the development of a mech-
anism for identifying which items may be deemed both surprising and valuable remains elusive.
Identifying the Information Encounter
In a qualitative study of full-text research journal articles by Allen et al. (2013), it was found that serendipitous information
encounters during the research process can be identified by examining the context of synonyms for surprising research findings.
This study revealed that synonyms for serendipity were sometimes used to describe the final fortuitous outcome, i.e. the appli-
cation of an information encounter to a new context in a beneficial way. Other times, synonyms for serendipity were used to
describe an information encounter that had not yet been explored or applied to a different situation. These instances described
the recognition of surprising or unexpected data during the research process and were termed ‘mentioned findings’. As recom-
mender systems for the support of serendipity requires identifying information that users will find surprising, these ‘mentioned
findings’ articles that contain information that the author has already described as surprising may provide a mechanism for
fulfilling that aspect of the serendipitous experience. This finding bears further investigation.
This paper investigates features of the ways researchers use terms related to serendipity in their research publications as a
mechanism for identifying characteristics within indexed information resources that may be useful for supplying both surprising
and valuable search results to recommender system end-users. By performing a quantitative analysis on the relationship be-
tween reports of actual information encounters and the synonyms for serendipity used by the author and the location of the
report within the research article, we will look for predictive variables that may enable the creation of algorithms to automate
the identification of these reports. Using logistic regression, this quantitative project addresses the research question:
What characteristics of author reporting are useful for predicting references to actual serendipitous events?
At this stage in the reserarch, serendipity will generally be defined as a process or experience beginning with an information
encounter which is followed up on by an individual with related knowledge, skills and understanding that allows them to realize
a valuable outcome.
Serendipity in research and technological development has been
analyzed in numerous and disparate fields, including chemistry
(Giesy, Newsted, & Oris, 2013), environmental science (Wil-
kinson & Weitkamp, 2013), and psychiatry (Siris, 2011), and is a
widespread topic of interest. Because it crosses disciplinary
boundaries and approaches problems in novel ways, serendipity
is a phenomenon that has the potential to connect disparate ideas
and produce outcomes that improve our quality of life. McCay-
Peet and Toms (2018) have advanced a model of serendipity that
consists of 3 major stages (See Figure 1). This model was synthe-
sized from five different conceptions (Corneli, Jordanous,
ASIS&T Annual Meeting 2018 12 Papers
Guckelsberger, Pease, & Colton, 2014; Makri & Blandford, 2012; McCay-Peet & Toms, 2015; Rubin, Burkell, & Quan-Haase,
2011; Sun, Sharples, & Makri, 2011) and provides a framework for understanding the experience of serendipity. In this model,
a serendipitous experience begins with a triggering event that is unexpected and is coupled with insight by the subject that
identifies the encounter as valuable. In order to capitalize on this value, the subject must follow up the finding until positive
results are achieved. For this experience to reach the public as serendipitous, the the subject must also go through a period of
reflection, where they acknowledge the unexpectedness of the trigger and view the process as fortuitous, describing it to others
as such. It is known that disciplinary culture and the expectations of funding bodies may supress reports of serendipity,
especially for early-career researchers and those whose observations may not fit with commonly accepted models of
understanding. This model of serendipity holds true for those unexpected circumstances that are subsequently explored and
their value brought to fruition.
Serendipity can be defined as a fortuitous experience that can occur in any realm of personal or professional occupation. Key
aspects of a complete serendipitous experience include information that is acted upon, an element of unexpectedness, and a
beneficial outcome. Serendipity is a process that starts with an observation which, when combined with an element of insight,
brings about a beneficial outcome (McCay-Peet & Toms, 2015). The unexpectedness may arise at any point in the process. The
initial observation may provide unexpected information. An unexpected information encounter may trigger a fortuitous insight.
The process of observing and acting upon information may lead to an unexpected outcome. These unexpected aspects of
information behavior that occur during the serendipitous experience are information encounters.
Information encountering was first described as a model of information behavior by Sanda Erdelez (1997) and has been
analyzed in contexts ranging from news reading (Yadamsuren & Erdelez, 2016) to information literacy (Erdelez, Basic, &
Levitov, 2011). Because serendipity is associated with positive outcomes, there is interest in supporting these types of
information experiences in multimedia interfaces (McCay-Peet & Toms, 2018), with the expectation that increased information
encounters will foster novel and creative leaps of understanding. Some work has been done fostering information encountering
in music listening (Wopereis & Braam, 2018) and in multi-media recommender systems (Khalili, van Andel, van den Besselaar,
& de Graaf, 2017; Kumpulainen & Kautonen, 2017; Rubin et al., 2011).
As a starting point for investigating how biomedical researchers use terms related to serendipity in their research publications,
we looked first to published research accounts as indexed in PubMed Central. To identify the representation of serendipity
within the language of research articles, we began with a content analysis of full-text articles.
Unit of Analysis
The first step of a content analysis involves the identification of the sampling units, i.e., the synonyms related to the idea of
serendipity. Search for these synonyms was undertaken through a process of chaining identifying the synonyms of serendip-
ity, and then identifying the synonyms of the synonyms until the possibilities were exhausted. Terms identified through this
chaining process include: accidental, chance, fortuitous, happenstance, incidental, serendipitous, serendipity, surprising, unan-
ticipated, unexpected, and unforeseen.
Study Population
The analysis sample was drawn from full text journal articles indexed in PubMed Central. PubMed Central was selected for its
extensive collection of over two million full-text, primary research reports. Articles indexed in PubMed Central serve as the
population for this study. PubMed Central acts as a free full-text archive of biomedical and life sciences journal literature at
the U.S. National Institutes of Health’s National Library of Medicine, currently archiving over 4.3 million articles. In addition
to full text searching, PubMed Central offers users the ability to search by Medical Subject Headings, or MeSH terms. MeSH
headings are frequently used to narrow literature searches to the user’s key ideas. However, the MeSH category designation
requires the documents to be classified with these terms by an expert examiner and assigned MeSH terms are limited to 10 to
12 terms per document, so minor ideas contained within the articles are not indexed by the MeSH terms. The current PubMed
Central ontology includes the MeSH term “incidental findings,” which includes the identified synonyms for serendipity: “in-
cidental finding(s)”, “finding(s), incidental”, “incidental discovery(ies)”, and “discovery(ies), incidental”. A search of PubMed
Central by usage of this MeSH heading returned only 303 articles. As you can see, while PubMed Central has a MeSH term
that is mapped to serendipity, use of the MeSH heading failed to adequately return instances of serendipity, revealing the
necessity of full text search for the related terms.
Population Strata
A full text search was performed for each of the identified terms with results filtered to include only research and review
articles. The population for this study, consisting of all research or review articles containing a synonym for serendipity in the
ASIS&T Annual Meeting 2018 13 Papers
full-text article, includes 552,288 articles returned with frequencies as indicated in Table 1.
Instances of use of the terms serendipitous and serendipity, which were identified as synonyms in the chaining procedure, were
combined into a single truncated search of serendipit*.
Random Stratified Sampling
In order to most accurately repre-
sent the population of articles, a
proportional stratified random sam-
pling method was employed. PASS
11 software was used to estimate
the proportion of reported serendip-
ity for the population of 552,288
(total over all strata) with 95% con-
fidence, and a precision (half-width
of the interval) of 0.1. A sample of
400 articles is necessary to estimate
the true proportions within the pop-
ulation. SAS 9.2 Proc SurveySelect
was used to proportionally allocate
the sample among the strata for the
total sample of 400 articles. Table
1, above, details the proportion of
the sample drawn from each stra-
Within each stratum, each article
was assigned a number from 1 to
the total number of articles. Article
numbers were assigned based on
the default order generated by Pub-
Med Central. SAS was used to ran-
domly identify the designated num-
ber of articles for review within
each stratum according to the as-
signed article number.
Categorization of Term Use
Term use was coded following the
coding scheme described by Allen,
et al. (2013), where four categories
of term use were of particular interest to our investigation of serendipity. These categories include ‘Inspiration’ where unex-
pected information formed the basis for the research design, ‘Research Focus’ where the entire article is devoted to describing
the unexpected information and its evaluation, ‘Systematic Reviews’ which provide overviews of serendipitous contributions
in a field and ‘Mentioned Findings’ where unexpected or surprising information is presented briefly but is not the focus of the
study. The articles coded into the ‘Inspiration’ and ‘Research Focus’ categories typically describe the ways in which the re-
searcher followed up on the information encounter and applied information gleaned from the encounter to a new context or in
a novel way. The ‘Systematic Reviews’ articles reflect on the impact that serendipitous outcomes have had on a discipline. It
is the ‘Mentioned Findings’ category that is intriguing because it presents surprising information, but investigators have not
yet identified salience of the information to a particular problem context. In achieving a serendipitous outcome, the surprising
information needs to be connected to researchers with the sagacity necessary to apply that information in a novel way. For
example, Roentgen’s discovery of x-rays is a commonly noted serendipitous discovery. Most accounts, however, fail to men-
tion that several other scientists experimenting with Crookes Tubes had noted the same glow from phosphoric plates, yet failed
to connect the significance of that distant glow to the diagnostic tool that has revolutionized medical diagnostics (Harris, 1995,
p. 1). When seeking to populate a recommender system with surprising information, it is the ‘Mentioned Findings’ category
that we feel will prove most useful, as these orphaned findings need to have their salience and application revealed. Accuracy
of the coding was evaluated on 10% of the sample, using independent coding by two researchers. Inter-coder reliability was
Search Details
accidental[Body - All Words] AND
"research and review articles"[fil-
chance[Body - All Words] AND
"research and review articles"[fil-
fortuitous[Body - All Words] AND
"research and review articles"[fil-
happenstance[Body - All Words]
AND "research and review arti-
incidental[Body - All Words] AND
"research and review articles"[fil-
serendipit*[Body - All Words] AND
"research and review articles"[fil-
surprising[Body - All Words] AND
"research and review articles"[fil-
unanticipated[Body - All Words]
AND "research and review arti-
unexpected[Body - All Words]
AND "research and review arti-
unforeseen[Body - All Words]
AND "research and review arti-
Table 1: Article frequency within full-text articles indexed by PubMed Central by
serendipity term use, and associated sample size.
ASIS&T Annual Meeting 2018 14 Papers
calculated for a subset of 100 articles, returning a KALPHA = .9001, indicating a high level of agreement between the coders.
All disagreements were resolved by independently recoding units in disagreement or, failing that, by consensus.
Coding Procedures
PubMed Central’s search tool was used to identify all articles with instances serendipity-related term use. Full-text versions of
each of those articles were uploaded into NVivo 11 for Windows. Each term was tagged within the articles, then individually
analyzed to determine the meaning conveyed by the term. While the particular synonym for serendipity used to select the article
for analysis was noted, all synonyms appearing within the text were coded.
From this examination, 10 categories of term use were identified and 3 of those categories were classified as indicating a
serendipitous event, as described by Allen, et al. (2013). The reporting of serendipitous experiences as ‘Inspiration’ for a re-
search study closely follows current models of the experience of serendipity. The research reported in these instances was
inspired by an information encounter and the current study was subsequently undertaken (followed up) resulting in a positive
outcome (research findings to report). The ‘Research Focus’ category reports a traditionally designed study (not arising as
follow up to previous observations) where the positive result of the study presents in a way that is completely unexpected. This,
likewise, is consistent with models of serendipitous experiences. The third way that terms can be used to indicate a serendipitous
experience is as a ‘Mentioned Findings’. In this category, the anomalous observations are reported, but they have not yet
undergone the follow-up process. It is unknown why follow up is yet to occur. Perhaps the researcher does not have the time
or funding to pursue this observation further. Perhaps they do not possess the sagacity to capitalize on the finding. Regardless,
this type of reporting has yet to realize the possible positive outcomes. This category, in particular, may present opportunities
for information systems to connect such findings with investigators with the education and experiences necessary to bring the
process to fruition.
Descriptive statistics
The 400 sampled articles yielded 1,228 unique uses
of a synonym for serendipity. Each of those instances
was independently coded for term use regardless of
the strata for which the article was selected. Follow-
ing coding of the term use, it was found that 26% of
the terms referred to some aspect of serendipity or in-
formation encountering and 74% had other semantic
meanings (See Figure 2). Of the terms relating seren-
dipitous experiences, 5% indicated using another re-
searcher’s unexpected finding as inspiration for their
study. Twenty-three percent (23%) focused on de-
scribing or further applying serendipitously encoun-
tered information. Seventy-one percent (71%) of the
terms that were relevant to serendipity fell into the
‘Mentioned Findings’ category. In these instances,
authors reported unexpected or surprising observations made in the course of an investigation, but do not further explore the
observation or attempt to apply the observation to another context. None of the articles sampled included terms that fell into
the ‘Systematic Reviews’ category.
When extrapolated to the population of 552,288 articles within PubMed containing synonyms for serendipity, over 143,000
articles are estimated to contain mentions of actual serendipitous events, with nearly 102,000 of those indicating information
encounters where the author noted the information as surprising and valuable but had not yet followed up on the application of
that information.
Predictive statistics
Predictive statistics are key to identifying features that will be useful in creating algorithms for a recommender system. Because
we have a dichotomous dependent variable (relevant to a serendipitous event or not) and categorical independent variables
(serendipity synonym used and location within the article), a binary logistic regression is the preferred predictive analysis. In
this analysis, both serendipity synonym used and location of the term within the article were also analyzed for their relationship
to actual instances of serendipity. The first step in performing this predictive analysis is to conduct a chi-square analysis to
determine if significant relationships exist to support performance of binary logistic regression. In this study the test for overall
relationships between search term, location within the article and relationship to actual serendipitous events that was performed
was a Fisher’s Exact Test. A Fisher’s Exact Test is a type of Chi-Square analysis that is performed when one or more cells
Figure 2: Proportion of term use relevant to serendipity or
information encountering within the sample.
ASIS&T Annual Meeting 2018 15 Papers
have a count of 5 or less. The Fisher’s Exact Test showed a significant interaction between location and relevance to serendip-
itous events, with a significance of .000 with a 95% confidence interval. The Fisher’s Exact Test on the relationship between
synonym used and relevance to serendipitous events also showed a significant interaction, with a significance of .000 with a
95% confidence interval. This test of bivariate relationships was necessary to verify that the data were suitable for performing
the binary logistic regression. Because the chi-square test was significant, we know that both variables can be included in the
binary logistic regression. Based on these results from the bivariate tests, we can then choose the variables to include in the
final model.
When performing the binary logistic regression, we begin by performing a null analysis, or “null model,” where the analysis is
performed without including any predictors. The results, when the independent variables are included, are compared to the
model when they are not included to see if including the variable significantly impacts the predictive value of the model. For
the relationship between synonyms for serendipity and actual serendipitous events, this analysis returned a Wald chi-square
statistic of 232.04, which is significant at p < .05. This indicates that the coefficients for the variables not in the model are
significantly different from zero. This implies that the addition of one or more of these variables to the model will significantly
affect its predictive power. The significant search terms in the single model were ‘Accidental’ and ‘Incidental’. Likewise, a
null analysis Wald chi-square statistic was performed for the relationship between location within the document and relation-
ship to actual serendipitous events. The residual chi-square statistic for this analysis was 37.804, which is significant at p < .05
indicating that the coefficients for the variables not in the model are significantly different from zero. This implies that the
addition of one or more of these variables to the model will significantly affect its predictive power. The significant locations
in the single model were Abstract and Discussion.
Finally, an overall test of the predicting model was performed. The results of this analysis, which summarizes the relative
importance of the explanatory variables individually, can be seen in Table 3. An Enter method was selected and the overall test
of the model was statistically significant, indicating that the independent variables in our model are a significant improvement
over those seen in the null model. The independent variables of serendipity synonym and location within the article can accu-
rately predict the whether the author is describing an actual serendipitous event 25-35% of the time. Within that model, the
only variables that contribute to this predictive value are the synonyms ‘Accidental’ and ‘Serendipity’ and the locations Ab-
stract and Discussion. Location is directly related to the relationship to actual serendipitous events, with terms located in the
abstract or discussion sections of the paper being 4 times more likely to refer to actual serendipitous events than terms located
in other areas of the article. Interestingly, the relationship between serendipitous events and the synonym used to describe them
was actually inverse. The term ‘Accidental’ was 8.85 times more likely to refer to something other than serendipity; and the
term ‘Serendipity’ was 4.4 times more likely to refer to something else.
The descriptive statistics show that there is a sizeable population of articles that contain references to serendipity. The majority
of those references are in an early stage, where the researcher noticed unusual or unexpected information in the course of their
research and valued that information enough to include it in their research publication. As an information behavior, this trig-
gering episode is termed an information encounter. While information encountering is integral to serendipity, additional factors
impact the realization of positive benefits from the information encounter. The large population of ‘Mentioned Findings’ type
reports of serendipity foreshadow the possibility of using this pool of surprising or unexpected information to support seren-
dipity within information systems. In order to effectively incorporate these instances into an information system, the identifi-
cation and cataloging of these reports must be automated.
The results of the binomial logistic regression allowed us to identify four variables that contribute to the effective identification
of information encounters. The location of the report within the research article is very useful for predicting whether the author
is relating an information encounter. As research articles tend to follow a similar format, automating the identification of these
segments of the articles should be fairly straightforward. It was interesting that the terminology used to convey the information
encounters was not an effective positive predictor of information encountering reports. Perhaps this is due to the varied scien-
tific fields indexed by PubMed Central. Analysis of term use as it relates to serendipity and information encountering in a
narrow disciplinary field is necessary to understanding the linguistic indicators of serendipity.
It was most interesting that ‘serendipity’ was negatively correlated with references to an active serendipitous experience. Per-
haps this is due to the reflective nature of serendipity. It is possible that it is not until after the experience is seasoned and some
perspective is gained that the experience is classified as serendipitous. In the midst of experiencing the unexpected, authors
experience the confusion and distraction, but have not yet achieved illumination. Overall, this study contributes to the infor-
mation sciences and human computer interaction literature by identifying a novel way for identifying items that users of infor-
mation systems may find both surprising and valuable. Furthermore, analysis of ways in which authors represent these infor-
ASIS&T Annual Meeting 2018 16 Papers
mation encounters in their research journal articles provides predictive evidence for the development of recommender algo-
Challenges and Limitations
While the current study identified two significant variables to predicting reports of serendipity, these variables only account
for 25-35% of the variability in the reports of serendipity. Further research is needed to identify additional characteristics that
can facilitate the identification of these reports. Initial qualitative analysis hints at the possible existence of domain-specific
terminology that could account for greater proportions of the variability in the reports of serendipity. Furthermore, as PubMed
Central indexes multiple academic disciplines within the biomedical spectrum, it is possible that the variables identified in this
study account for a much greater percentage of the variability in some fields but are not significant at all in others. In depth
qualitative analysis of articles with identified reports of serendipity could prove fruitful.
While this quantitative model holds true for the articles indexed in PubMed Central, it may not be applicable to fields not
indexed by PubMed Central. Furthermore, the cross-section of disciplines within PubMed Central may have masked predictive
variables by combining competing disciplinary languages. For example, ‘accidental’ findings that indicate serendipity in a field
like chemistry or genetics may be hidden by the use of ‘accidental’ to discuss traumatic injury in fields like orthopedics, pedi-
atrics, or rehabilitation sciences.
Another challenge to populating information systems with information encounters reported in full-text journal articles is the
limited accessibility to the published content. This study was performed using PubMed Central because it provides a large
archive of freely accessible full text research articles. Accessing articles indexed through subscription services may prove cost
prohibitive and inhibit access to information encounters in fields that are primarily indexed in subscription-based archives.
Furthermore, PubMed Central already incorporates an option for full-text search within their user interface. Identifying the
articles for initial screening is much more challenging without the option for full text search. Comparison of article identifica-
tion through archives that are equipped for full-text search to the secondary article identification process that would be neces-
sary to analyzing the contents of archives without a full-text search feature could be warranted.
Possibilities and Future Directions
In addition to connecting researchers to unexpected information, information systems that support serendipity may be helpful in
identifying interdisciplinary collaborators. But research funding mechanisms, and thus the landscape of the research arena, are
changing (Aagaard, 2017; Whitley, 2018; Kenney, 2017). Successful research programs are expected to bring in $5-8 for every
dollar invested in their infrastructure. Such returns on investment are impossible to achieve by independent researchers bringing
in single project grants. To maintain economic viability, research programs are increasingly moving away from the approach of
research as a solitary, isolated pursuit, to an interdisciplinary, center-based model that is underpinned by computational informatics
(Goecks, Nekrutenko, & Taylor, 2010; Myers et al., 2004) and leverages academic and industry resources to find solutions for
complex, ill-defined problems (Bozeman, 2004; Whitley, 2018). They require a team of researchers pooling their efforts around a
collective problem, sharing resources and building synergy, where they accomplish more together than any of them could individ-
ually. Building economically viable research teams requires the development of collaborations across disciplinary, organizational
and institutional boundaries. Institutions currently approach the development of these teams rather haphazardly. Holding confer-
ences can provide opportunities to develop these serendipitous connections, but success is far from guaranteed. In fact, many of
the most successful research teams note that their partnerships seem to arise serendipitously. Information systems that mine and
report information encounters would connect users not only to new, surprising and valuable information, but also to new research-
ers from diverse fields whose perspectives could enhance and amplify research productivity. By focusing on the information
behaviors involved in serendipity, we can capitalize on the observations of others, connecting those observations and illuminating
connections for the individuals who are best suited to bring about positive outcomes from them.
Allen, C., Erdelez, S., & Marinov, M. (2013). Looking for Opportunistic Discovery of Information in Recent Biomedical Research A
Content Analysis. In Proceedings of the 76th ASIS&T Annual Meeting (Vol. 49). Montreal: Richard B. Hill.
Campanario, J. M. (1996). UsingCitation Classics to study the incidence of serendipity in scientific discovery. Scientometrics, 37(1), 324.
Corneli, J., Jordanous, A., Guckelsberger, C., Pease, A., & Colton, S. (2014). Modelling serendipity in a computational context. Ar-
Xiv:1411.0440 [Cs]. Retrieved from
Dahroug, A., López-Nores, M., Pazos-Arias, J. J., González-Soutelo, S., Reboreda-Morillo, S. M., & Antoniou, A. (2017). Exploiting rele-
vant dates to promote serendipity and situational curiosity in cultural heritage experiences. In 2017 12th International Workshop on
Semantic and Social Media Adaptation and Personalization (SMAP) (pp. 8488).
ASIS&T Annual Meeting 2018 17 Papers
Erdelez, S. (1997). Information encountering: a conceptual framework for accidental information discovery (pp. 412421). Presented at the
ISIC ’96: Proceedings of an international conference on Information seeking in context, Taylor Graham Publishing. Retrieved from
Erdelez, S. (2004). Investigation of information encountering in the controlled research environment. Information Processing & Manage-
ment, 40(6), 10131025.
Erdelez, S., Basic, J., & Levitov, D. D. (2011). Potential for inclusion of information encountering within information literacy models. In-
formation Research, 16(3).
Fazeli, S., Drachsler, H., Bitter-Rijpkema, M., Brouns, F., Vegt, W. van der, & Sloep, P. B. (2017). User-centric Evaluation of Recom-
mender Systems in Social Learning Platforms: Accuracy is Just the Tip of the Iceberg. IEEE Transactions on Learning Technologies,
PP(99), 11.
Ge, M., Delgado-Battenfeld, C., & Jannach, D. (2010). Beyond accuracy: evaluating recommender systems by coverage and serendipity (p.
257). ACM Press.
Giesy, J., Newsted, J., & Oris, J. (2013). Photo‐enhanced toxicity: serendipity of a prepared mind and flexible program management. Envi-
ronmental Toxicology and Chemistry, 32(5), 969971.
Goecks, J., Nekrutenko, A., & Taylor, J. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transpar-
ent computational research in the life sciences. Genome Biology, 11, R86.
Harris, E. L. (1995). The shadowmakers: A history of radiologic technology (1st edition). Albuquerque, N.M: American Society of Radio-
logic Technologists.
Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM
Transactions on Information Systems, 22(1), 553.
Iaquinta, L., Gemmis, M. d, Lops, P., Semeraro, G., Filannino, M., & Molino, P. (2008). Introducing Serendipity in a Content-Based Rec-
ommender System. In 2008 Eighth International Conference on Hybrid Intelligent Systems (pp. 168173).
Kefalidou, G., & Sharples, S. (2016). Encouraging serendipity in research: Designing technologies to support connection-making. Interna-
tional Journal of Human-Computer Studies, 89, 123.
Khalili, A., van Andel, P., van den Besselaar, P., & de Graaf, K. A. (2017). Fostering Serendipitous Knowledge Discovery Using an Adap-
tive Multigraph-based Faceted Browser. In Proceedings of the Knowledge Capture Conference (pp. 15:115:4). New York, NY, USA:
Kumpulainen, S. W., & Kautonen, H. (2017). Accidentally Successful Searching: Users’ Perceptions of a Digital Library (pp. 257–260).
ACM Press.
Lesko, L. (2017). Efficacy from Strange Sources. Clinical Pharmacology & Therapeutics, 103(2), 253261.
Maccatrozzo, V., van Everdingen, E., Aroyo, L., & Schreiber, G. (2017). Everybody, More or Less, likes Serendipity (pp. 2934). ACM
Makri, S., & Blandford, A. (2012). Coming across information serendipitously Part 1: A process model. Journal of Documentation, 68(5),
Makri, S., Toms, E. G., McCay-Peet, L., & Blandford, A. (2011). Encouraging Serendipity in Interactive Systems. In Human-Computer
Interaction INTERACT 2011 (pp. 728729). Springer, Berlin, Heidelberg.
McCay-Peet, L., & Toms, E. (2011). Measuring the dimensions of serendipity in digital environments. Usos y Gratificaciones: La Medi-
ción de Las Dimensiones de La Serendipia En Entornos Digitales., 16(3), 66.
McCay-Peet, L., & Toms, E. (2015). Investigating serendipity: How it unfolds and what may influence it. Journal of the Association for
Information Science and Technology, 66(7), 14631476.
McCay-Peet, L., & Toms, E. G. (2018). Researching serendipity in digital information environments. San Rafael, California: Morgan &
Claypool Publishers.
McNee, S. M., Riedl, J., & Konstan, J. A. (2006). Being Accurate is Not Enough: How Accuracy Metrics Have Hurt Recommender Sys-
tems. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems (pp. 10971101). New York, NY, USA: ACM.
Myers, J. D., Allison, T. C., Bittner, S., Didier, B., Frenklach, M., Green, W. H., … Yang, C. (2004). A collaborative informatics infra-
structure for multi-scale science. In Proceedings of the Second International Workshop on Challenges of Large Applications in Distrib-
uted Environments, 2004. CLADE 2004. (pp. 2433).
Niu, X., & Abbas, F. (2017). A Framework for Computational Serendipity. In Adjunct Publication of the 25th Conference on User Model-
ing, Adaptation and Personalization (pp. 360363). New York, NY, USA: ACM.
Raja, P. R. (2016). Cyanoacrylate Adhesives: A Critical Review. Reviews of Adhesion and Adhesives, 4(4), 398416.
ASIS&T Annual Meeting 2018 18 Papers
Rubin, V. L., Burkell, J., & Quan-Haase, A. (2011). Facets of serendipity in everyday chance encounters: a grounded theory approach to
blog analysis. Information Research, 16(3), 2727.
Schwager, E. (2000). Little-known facts about little-known people. Drug News and Perspectives, 13(2), 126128.
Shapiro, G. (1986). A skeleton in the darkroom: stories of serendipity in science (1st ed). San Francisco: Harper & Row.
Siris, S. G. (2011). Searching for Serendipity. The Journal of Clinical Psychiatry, 72(8), 11561157.
Sun, X., Sharples, S., & Makri, S. (2011). A user-centred mobile diary study approach to understanding serendipity in information re-
search. Information Research-An International Electronic Journal, 16(3). Retrieved from
Wilkinson, C., & Weitkamp, E. (2013). A Case Study in Serendipity: Environmental Researchers Use of Traditional and Social Media for
Dissemination. PLOS ONE, 8(12), e84339.
Wopereis, I., & Braam, M. (2018). Seeking Serendipity: The Art of Finding the Unsought in Professional Music. In S. Kurbanoğlu, J.
Boustany, S. Špiranec, E. Grassian, D. Mizrachi, & L. Roy (Eds.), Information Literacy in the Workplace (pp. 503–512). Springer In-
ternational Publishing.
Wu, W., He, L., & Yang, J. (2012). Evaluating recommender systems. In Seventh International Conference on Digital Information Man-
agement (ICDIM 2012) (pp. 5661).
Yadamsuren, B., & Erdelez, S. (2016). Incidental Exposure to Online News. Synthesis Lectures on Information Concepts, Retrieval, and
Services, 8(5), i73.
81st Annual Meeting of the Association for Information Science & Technology | Vancouver, Canada | 10 14 November 2018
Author(s) Retain Copyright
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Serendipity is a valuable constituent of professional work. In order to ‘control’ the phenomenon it is important to gain insight in its processes and influencing factors. This study examined two cases of serendipitous information behavior in professional improvised music, a domain often associated with unpredictability. The aim of the study was to validate McCay-Peet and Toms’ latest model on work-related serendipitous experiences. The study followed a semi-structured interview procedure that consisted of three one-hour interview sessions to select cases and collect data. Results show that our data fit the model. Process elements like ‘trigger’, ‘connection’, ‘valuable outcome’, ‘unexpected thread’, and ‘perception of serendipity’ were identified, as well as factors such as ‘trigger-rich’, ‘openness’, and ‘prepared mind’. We also identified other factors (i.e., ‘curiosity’, ‘interest’, and ‘initiative’) that might influence serendipitious discovery. Additional (multi) case studies are necessary to generalize findings.
Conference Paper
Full-text available
Serendipity, the art of making an unsought finding plays also an important role in the emerging field of data science, allowing the discovery of interesting and valuable facts not initially sought for. Previous research has extracted many serendipity-fostering patterns applicable to digital data-driven systems. Linked Open Data (LOD) on the Web which is powered by the Follow-Your-Nose effect, provides already a rich source for serendipity. The serendipity most often takes place when browsing data. Therefore, flexible and intuitive browsing user interfaces which support serendipity triggers such as enigmas, anomalies and novelties, can increase the likelihood of serendipity on LOD. In this work, we propose a set of serendipity-fostering design features supported by an adaptive multigraph-based faceted browsing interface to catalyze serendipity on Semantic Web and LOD environments.
Conference Paper
Full-text available
People usually search information by using queries that are targeted to match the wording of the documents in the collections. However, during their search processes they may discover some useful pieces of information they did not expect. Serendipitous searching occurs when people search for information, but during their search process discover unexpected results. One key aspect of serendipity is chance encountering, which means the accidental and unexpected encountering with useful information. In this paper we address to the chance encounters, search success and search interests of information seekers using a national digital library called Finna. This study shows that chance encountering and search success increase with user satisfaction and that digital libraries can support information encountering.
Reverse translation (RT) refers to perceptions and observations of clinical outcomes, both beneficial and harmful, that can lead to a hypothesis intended to identify a new use of a drug that is different than the original use. I provide a panoramic view of successes in RT from the historic discovery of penicillin to the contemporary development of CAR T therapies. I delineate the core principles of RT that shift discovery from serendipity to a systematic strategy based on target identification based on causal biology, pharmacodynamic biomarkers that recapitulates disease pathophysiology, confirmation of target engagement by clinical proof of concept studies, and optimal selection of dose and interval. The manuscript proposes four different categories of RT and successful examples are provided for each category. It concludes with a summary of open questions related to the business case of RT that includes a comparison of the pros, cons and barriers to future RT development programs. This article is protected by copyright. All rights reserved.
Recommender systems provide users with content they might be interested in. Conventionally, recommender systems are evaluated mostly by using prediction accuracy metrics only. But the ultimate goal of a recommender system is to increase user satisfaction. Therefore, evaluations that measure user satisfaction should also be performed before deploying a recommender system in a real target environment. Such evaluations are laborious and complicated compared to the traditional, data-centric evaluations, though. In this study, we carried out a user-centric evaluation of state-of-the-art recommender systems as well as a graph-based approach in the ecologically valid setting of an authentic social learning platform. We also conducted a data-centric evaluation on the same data to investigate the added value of user-centric evaluations and how user satisfaction of a recommender system is related to its performance in terms of accuracy metrics. Our findings suggest that user-centric evaluation results are not necessarily in line with data-centric evaluation results. We conclude that the traditional evaluation of recommender systems in terms of prediction accuracy only does not suffice to judge performance of recommender systems on the user side. Moreover, the user-centric evaluation provides valuable insights in how candidate algorithms perform on each of the five quality metrics for recommendations: usefulness, accuracy, novelty, diversity, and serendipity.
Conference Paper
In the digital era, personalisation systems are the typical way to deal with the massive amount of information on the Web. ese systems decide in our place what we like, possibly hiding us away from a complete world of potentially interesting content. ese systems do not challenge us to open our horizons of interest, trap- ping us more and more in our lter bubble. Introducing diversity and serendipity in the recommendation results has been widely recognised as the solution to this issue in the information retrieval eld. However, serendipity cannot be addressed and measured with traditional accuracy metrics, because it introduces much more complexity in terms of subjectivity and personality. Inspired by the curiosity theory of Berlyne, further developed by Silvia, we introduce in user pro les a so-called coping potential estimation as a measure of the users' ability to cope with new items (e.g., ability to appreciate serendipitous recommendations). Our assumption is that curiosity leads to serendipity and high coping potential users accept more serendipitous results, and thus we need to model it in the recommendation algorithm. We performed an online ex- periment where we asked users a number of questions about TV programmes recommendations. Our results show that users with a high coping potential are more inclined to accept serendipitous recommendations than their counterparts.
Conference Paper
In this paper, we propose a framework for computational serendipity. The framework is used in a recommender system context to find personalized serendipity and meanwhile stimulate user's curiosity. The framework is novel to the serendipity research community in that it decomposes the concept of serendipity into two elements: surprise and value; and provides computational approaches to modeling both of them. The framework also incorporates the concept of curiosity to keep users' interests over a long term. It brings together several fields including information retrieval, cognitive science, computational creativity in artificial intelligence, and text mining. We will describe the framework first and then evaluate it with an implementation called StumbleOn in the health news context. The evaluation serves as a proof-of-concept of this computational serendipity framework.