
Victoria RubinThe University of Western Ontario | UWO · Faculty of Information and Media Studies
Victoria Rubin
MA Linguistics/PhD Information Science&Technology
About
81
Publications
393,911
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,622
Citations
Introduction
I specialize in information retrieval and natural language processing techniques that enable analyses of texts to identify, extract, and organize structured knowledge. I study complex human information behaviors that are, at least partly, expressed through language such as deception, uncertainty, credibility, and emotions. Multilingual information access and information organization are my other research interests: http://victoriarubin.fims.uwo.ca/
Additional affiliations
Education
September 2001 - May 2006
Publications
Publications (81)
This research examines the concept of 'fake news' in the context of information literacy (IL) in a post-secondary educational setting. Educators' perceptions shape both IL curricula and classroom discussions with students. We conducted 18 interviews with members of 3 integral groups implementing IL education (8 professors, 6 librarians, 4 departmen...
Native ads are ubiquitous in the North American digital news context. Their form, content and presentational style are practically indistinguishable from regular news editorials, and thus are often mistaken for informative content by newsreaders. This advertising practice is deceptive, in that it exploits loopholes in human digital literacy. Despit...
The LiT.RL News Verification Browser is a research tool for news readers, journalists,
editors or information professionals. The tool analyzes the language used in digital news
web pages to determine if they are clickbait, satirical news, or falsified news, and visualizes
the results by highlighting content in color-coded categories. Although the c...
Abstract: Automatic clickbait detection is a relatively novel task in natural language processing (NLP) and machine learning (ML). "Clickbait" is a hyperlink created primarily to attract attention to its target content. This article introduces a binary classifier, the Language and Information Technology Research Lab (LiT.RL, pronounced "literal") C...
Only the Introductory Chapter is freely available.
The full book/e-book/individual chapters/e-chapters are available via Springer Publishers https://link.springer.com/book/10.1007/978-3-030-95656-1 or via any other major book retailer including Amazon or Barnes&Noble (in US or Canada) or Lehmanns (in Europe).
Introductory Chapter
Abstract
How do...
The complexity in finding solutions to the socio-technological problem of mis- and disinformation lies in our human nature. The mind requires practical skills and digital literacy in order to overcome this problem. Socio-political and economic systems incentivize the spread of the infodemic across toxic digital media environments and require the pu...
Chapter 7 focuses on artificially intelligent (AI) systems that can help the human eye identify fakes of several kinds and call them out for the benefit of the public good. I explain, in plain language, the principles behind the AI-based methodologies employed by automated deception detectors, clickbait detectors, satirical fake detectors, rumor de...
Chapter 2 focuses on deception as a communicative behavior and establishes, in broad strokes, what deceptive strategies can be used by deceivers and mass disinformers, and what motivates deceptive communication. We consider definitions of deception and typological distinctions, assuming some deceptive strategies have found their way into the digita...
Chapter 1 frames the problem of deceptive, inaccurate, and misleading information in the digital media content and information technologies as an infodemic. Mis- and disinformation proliferate online, yet the solution remains elusive and many of us run the risk of being woefully misinformed in many aspects of our lives including health, finances, a...
Many practices in marketing, advertising, and public relations, presented in Chapter 6, have the intent to persuade and manipulate the public opinion from the onset of their endeavors. I lay out marketing communications strategies and dissect the anatomy of the ad revenue model. I review key ideas in advertising standards and self-regulation polici...
Chapter 4 establishes that truth can be seen from different philosophical perspectives, and our methods for connecting beliefs to reality and establishing facts matter for determining the resulting cumulative knowledge. We may not all agree on what truth is, but there is little doubt that truth matters. It is essential to us—as individuals and as a...
Chapter 5 focuses on empirical knowledge as it is applied in three sample professions using stepwise procedures to establish facts, detect lies, or discern truth. Law enforcement, scientific inquiry, and investigative reporting each use well-established traditions for truth-seeking, systematic ways of collecting strong supportive evidence, and cond...
Chapter 3 surveys credibility and trust research in information science, human–computer interaction, psychology, communication, and other social sciences. Several models explain the process of credibility assessment, dissecting it into stages and offering components for online content evaluation. Multiple predictive indicators have been considered...
Artificially Intelligent (AI) systems are pervasive, but poorly understood by their users and, at times, developers. It is often unclear how and why certain algorithms make choices, predictions, or conclusions. What does AI transparency mean? What explanations do AI system users desire? This panel discusses AI opaqueness with examples in applied co...
Artificially Intelligent (AI) systems are pervasive, but poorly understood by their users and, at times, developers. It is often unclear how and why certain algorithms make choices, predictions, or conclusions. What does AI transparency mean? What explanations do AI system users desire? This panel discusses AI opaqueness with examples in applied co...
This chapter describes a study that interviewed 18 participants (8 professors, 6 librarians, and 4 department chairs) about their perceptions of ‘fake news' in the context of their educational roles in information literacy (IL) within a large Canadian university. Qualitative analysis of the interviews reveals a substantial overlap in these educator...
Purpose
The purpose of this paper is to treat disinformation and misinformation (intentionally deceptive and unintentionally inaccurate misleading information, respectively) as a socio-cultural technology-enabled epidemic in digital news, propagated via social media.
Design/methodology/approach
The proposed disinformation and misinformation triang...
With the problem of ‘fake news’ in the digital media, there are efforts at creation of awareness, automation of ‘fake news’ detection and news literacy. This research is descriptive as it pulls evidence from the content of online fabricated news for the features that distinguish fabrications from the legitimate political news around the time of the...
This paper offers a conceptual basis and describes elements for a multi-layered system to provide information users (newsreaders) with credible information and improve the work processes of the online news (content) producers. I overview criteria of excellence (what editors consider newsworthy) and how reporters (and traditional newsroom profession...
Native advertising, paid for by corporate funding, may fool news readers into thinking that they are reading investigative journalism editorials. Such misleading practice constitutes an internal threat to the profession of journalism and may further deteriorate mainstream media trust. If information users are unaware of the Native Ads original prom...
Clickbait is a class of internet content characterized by attention-grabbing headlines, but is criticized for being shallow, misleading, or deceptive. Information sciences can offer a range of solutions to clickbaiting, but the field lacks a concrete, unifying definition of the phenomenon. This posteraddresses this need by investigating perceptions...
This research examines the concept of ‘fake news’ in the context of information literacy (IL) in a post‐secondary educational setting. Educators' perceptions shape both IL curricula and classroom discussions with students. We conducted 18 interviews with members of 3 integral groups implementing IL education (8 professors, 6 librarians, 4 departmen...
The News Verification Suite aims to provide users with a set of functions to verify information in the news. This paper offers a conceptual basis and a vision of system elements towards automated fact-checking in news production, curation, and consumption. The traditional model of journalism is compared to 'news sharing a.s.a.p.', highlighting simi...
Native advertising, paid for by corporate funding, may fool news readers into thinking that they are reading investigative journalism editorials. Such misleading practice constitutes an internal threat to the profession of journalism and may further deteriorate mainstream media trust. If information users are unaware of the Native Ads original prom...
I conclude that social media requires content verification analysis with a combination of previously known approaches for deception detection, as well as novel techniques for debunking rumors, credibility assessment, factivity analysis and opinion mining. Hybrid approaches may include text analytics with machine learning for deception detection, ne...
An op-ed commissioned by Tom Zeller Jr, a former New York Times editor, now the Editor-in-Chief of the Undark Magazine, out of MIT. The article was published on 23 November 2016: http://undark.org/article/education-and-automation-tools-for-navigating-a-sea-of-fake-news/. Introduced as: “Every man should have a built-in automatic crap detector opera...
The main premise of this chapter is that the time is ripe for more extensive research and development of social media tools that filter out intentionally deceptive information such as deceptive memes, rumors and hoaxes, fake news or other fake posts, tweets and fraudulent profiles. Social media users’ awareness of intentional manipulation of online...
Satire is an attractive subject in deception detection research: it is a type of deception that intentionally incorporates cues revealing its own deceptiveness. Whereas other types of fabrications aim to instill a false sense of truth in the reader, a successful satirical hoax must eventually be exposed as a jest. This paper provides a conceptual o...
Tabloid journalism is often criticized for its propensity for exaggeration, sensationalization, scare-mongering, and otherwise producing misleading and low quality news. As the news has moved online, a new form of tabloidization has emerged: ‘clickbaiting.’ ‘Clickbait’ refers to “content whose main purpose is to attract attention and encourage visi...
A fake news detection system aims to assist users in detecting and filtering out varieties of potentially deceptive news. The prediction of the chances that a particular news item is intentionally deceptive is based on the analysis of previously seen truthful and deceptive news. A scarcity of deceptive news, available as corpora for predictive mode...
This research surveys the current state-of-the-art technologies that are instrumental in the adoption and development of fake news detection. " Fake news detection " is defined as the task of categorizing news along a continuum of veracity, with an associated measure of certainty. Veracity is compromised by the occurrence of intentional deceptions....
Widespread adoption of internet technologies has changed the way that news is created and consumed. The current online news environment is one that incentivizes speed and spectacle in reporting, at the cost of fact-checking and verification. The line between user generated content and traditional news has also become increasingly blurred. This post...
Purpose
– The purpose of this paper is to respond to Urquhart and Urquhart’s critique of the previous work entitled “Discourse structure differences in lay and professional health communication”, published in this journal in 2012 (Vol. 68 No. 6, pp. 826-851, doi: 10.1108/00220411211277064).
Design/methodology/approach
– The authors examine Urquhar...
News verification is a process of determining whether a particular news report is truthful or deceptive. Deliberately deceptive (fabricated) news creates false conclusions in the readers' minds. Truthful (authentic) news matches the writer's knowledge. How do you tell the difference between the two in an automated way? To investigate this question,...
This paper furthers the development of methods to distinguish truth from deception in textual data. We use rhetorical structure theory (RST) as the analytic framework to identify systematic differences between deceptive and truthful stories in terms of their coherence and structure. A sample of 36 elicited personal stories, self-ranked as truthful...
In hopes of sparking a discussion, I argue for much needed research on automated deception detection in Asian languages. The task of discerning truthful texts from deceptive ones is challenging, but a logical sequel to opinion mining. I suggest that applied computational linguists pursue broader interdisciplinary research on cultural differences an...
Proceedings of the CAIS/ACSI 2012 conference.
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference error...
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference error...
We investigate health care provider and lay consumer perspectives in online health communication, information sharing, and use to improve communication that supports healthy everyday life behavior. With Rhetorical Structure Theory analysis, we differentiate discourse structure patterns and communicative goals in provider and consumer answers regard...
The geographic clues embedded in MARC records have the potential to transform the ways in which library materials are searched and accessed. Focussing on cartographic materials, this study examines the MARC fields used to catalogue maps within two Canadian university library systems.Les indices géographiques que contiennent les fiches MARC ont le p...
This research presents the results of a case study on potential users of Cross Language Information Retrieval (CLIR) systems –international students at the University of Western Ontario. The study is designed to test their awareness of Multi-Lingual Information Access (MLIA) tools on the internet and in select electronic databases. The study also i...
Though not new to online gamers, griefing – an act of play intended to cause grief to game players – is understudied in LIS scholarship. We expand on the definition of griefing for library contexts by considering its deceptive elements and examining gamers’ attitudes in a gaming forum and an e-mail survey.Bien que connu des joueurs de jeux vidéo, l...
This paper analyzes naturally occurring descriptions of chance encounters as found in blogs. We develop a model of serendipity that describes facets of the phenomenon and their interconnections, and examine the applicability of this model to accounts of everyday chance encounters.Cet article analyse les occurrences naturelles des descriptions de ha...
Being innovative is a popular but ambiguous maxim in LIS. To elucidate how institutions use, and what they mean by the concept, we examine white literature and survey website features of 160 libraries across US and Canada. We identify patterns in the language and ethos of modern innovative librarianship.Être novateur est une maxime populaire bien q...
This study analyses 545 sample fanfiction stories (fics) in their stylistic feature variation by popularity and across eleven 'fandoms' in creative writing forums. Lexical richness, average sentence and paragraph lengths are isolated as promising measures for a text classifier to use in predicting a fic's likely popularity in its fandom.
Cette ét...
Purpose – Though not new to online gamers, griefing – an act of play intended to cause grief to
game players – is fairly understudied in LIS scholarship. The purpose of this paper is to expand the
inventory of griefing varieties, consider their deceptive elements and examine attitudes towards the
phenomenon.
Design/methodology/approach – The author...
Information Manipulation is an umbrella term we use for a variety of distortions that occur in the process of transmitting information in the information channel (between human agents via artifacts and various presentation formats). Extending the classical Shannon-Weaver's model of information transmission, we consider alternative outcomes of the t...
An earlier version of this paper was presented at the 2011 Canadian Association for Information Science conference.
The authors wish to thank the Everydayhelth.com forum participants whose publicly available questions and answers illuminate new perspectives on lay and professional health communication. The authors are also grateful for suggestions...
This paper reviews advances in geospatial information systems and applications involving geospatial information and natural language. We discuss the role of geographically aware information access in human information behaviours such as information seeking, retrieval, and use, and highlight the role of automation in enriching current geospatial met...
Recent improvements in effectiveness and accuracy of the emerging field of automated deception detection and the associated potential of language technologies have triggered increased interest in mass media and general public. Computational tools capable of alerting users to potentially deceptive content in computer–mediated messages are invaluable...
Some researchers have suggested that opportunities for serendipitous discovery of information may be limited in the online environment as a result of technological facilitation of information behavior. In response, they suggest building tools that enhance opportunities for serendipity. Based on our model of everyday serendipity, we offer design sug...
This research presents the results of a case study on potential users of Cross Language Information Retrieval (CLIR) systems --- international students at a Canadian University. The study is designed to test their awareness of Multi-Lingual Information Access (MLIA) tools on the internet and in select electronic databases. The study investigates ho...
One of the novel research directions in Natural Language Processing and Machine Learning involves creating and developing methods for automatic discernment of deceptive messages from truthful ones. Mistaking intentionally deceptive pieces of information for authentic ones (true to the writer's beliefs) can create negative consequences, since our ev...
This paper extends information quality (IQ) assessment methodology by arguing that veracity/deception should be one of the components of intrinsic IQ dimensions. Since veracity/deception differs contextually from accuracy and other well-studied components of intrinsic IQ, the inclusion of veracity/deception in the set of IQ dimensions has its own c...
The Information Manipulation Classification Theory offers a systematic approach to understanding the differences and similarities among various types of information manipulation (such as falsification, exaggeration, concealment, misinformation or hoax). We distinguish twelve salient factors that manipulation varieties differ by (such as intentional...
Though innovation is a popular theme of LIS literature, its specific meaning for libraries remains obscure. Clarifying the implicit definition of innovation in librarianship can facilitate a more meaningful use of the term. To do so, we employ a ground-up exploration of innovation through the white literature in conjunction with a detailed survey o...
Introduction. This paper explores serendipity in the context of everyday life by analyzing naturally occurring accounts of chance encounters in blogs. Method. We constructed forty-four queries related to accidental encounters to retrieve accounts from GoogleBlog. From among the returned results, we selected fifty-six accounts that provided a rich d...
:Though innovation is a popular theme of LIS literature, its specific meaning for libraries remains obscure. Clarifying the implicit definition of innovation in librarianship can facilitate a more meaningful use of the term. To do so, we employ a ground-up exploration of innovation through the white literature in conjunction with a detailed survey...
Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader LIS community. Computational tools capable of alerting users to potentially deceptive content in computer-mediated messages are invaluable for supporting undisrupted, computer-mediated communication, information seeking, c...
In this panel we will discuss the importance of knowledge organization and information organization in library and information science curricula and the emerging trends both inside and outside of library and information science which will affect the curriculum in coming years.
In this panel we will discuss … TDB.
Purpose
– Conversational agents are natural language interaction interfaces designed to simulate conversation with a real person. This paper seeks to investigate current development and applications of these systems worldwide, while focusing on their availability in Canadian libraries. It aims to argue that it is both timely and conceivable for Can...
Serendipity has received much attention from library and information science, psychology, and computer science. Yet not much is known about serendipity in the context of everyday information behavior. In general, a key challenge in the study of serendipity is obtaining accounts of serendipitous experiences that provide insight into the phenomenon....
We present a comparative study of abstracts and machine-generated summaries. This study bridges two hitherto independent lines of research: the descriptive analyses of abstracts as a genre and the testing of summaries produced by automatic text summarization (ATS). A pilot sample of eight articles was gathered from Library and Information Science A...
Deception in computer-mediated communication is defined as a message knowingly and intentionally transmitted by a sender to foster a false belief or conclusion by the perceiver. Stated beliefs about deception and deceptive messages or incidents are content analyzed in a sample of 324 computer-mediated communications. Relevant stated beliefs are obt...
This article introduces a type of uncertainty that resides in textual information and requires epistemic interpretation on the information seeker’s part. Epistemic modality, as defined in linguistics and natural language processing, is a writer’s estimation of the validity of propositional content in texts. It is an evaluation of chances that a cer...
This paper defines a concept of “trust incident accounts” as verbal reports of empirical episodes in which a trustor has reached a state of positive or negative expectations of a trustee’s behavior under associated risks. Such expectations are equated to trust and distrust. Correspondingly, and present a sharp contrast with hypocritical use of trus...
Texts exhibit subtle yet identifiable mo- dality about writers' estimation of how true each statement is (e.g., definitely true or somewhat true). This study is an analy- sis of such explicit certainty and doubt markers in epistemically modalized statements for a written news discourse. The study systematically accounts for five levels of writer's...
This study empirically derives a framework for analyzing certainty about written
propositions. CERTAINTY, or EPISTEMIC MODALITY, is a linguistic expression of an
estimation of the likelihood that a particular state of affairs is, has been, or will be true.
The study describes how explicitly marked certainty can be predictably and dependably
identi...
This chapter presents a theoretical framework and preliminary results for manual categorization of explicit certainty information
in 32 English newspaper articles. Our contribution is in a proposed categorization model and analytical framework for certainty
identification. Certainty is presented as a type of subjective information available in text...
Credibility is a perceived quality and is evaluated with at least two major components: trustworthiness and expertise. Weblogs (or blogs) are a potentially fruitful genre for exploration of credibility assessment due to public disclosure of information that might reveal trustworthiness and expertise by webloggers (or bloggers) and availability of a...
The huge increase in volume of online literature has led to a parallel surge in research into methods for retrieving meaningful information from this textual data—"content extraction" has emerged as a prominent field in natural language computing. However, little progress has as yet been made in determining the pragmatic content of a doc-ument, 'hi...
We present an empirically verified model of discernable emotions, Watson and Tellegen’s Circumplex Theory of Affect from social and personality psychology, and suggest its usefulness in NLP as a potential model for an automation of an eight-fold categorization of emotions in written English texts. We developed a data collection tool based on the mo...
We present a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. The explicit certainty markers were identified and categorized according to the four hypothesized dimensions – perspective, focus, timeline, and level of certainty. One hundred twenty one sentences...
The authors describe the difficulties of translating classifications from a source language and culture to another language and culture. To demonstrate these problems, kinship terms and concepts from native speakers of fourteen languages were collected and analyzed to find differences between their terms and structures and those used in English. Us...
Questions
Questions (30)
What's your go-to book on misinformation, disinformation, or fake news? If you have one, please share. Also, why do you like it? How recent is it? Is there anything amiss in that favourite book of yours?
Do you care to read about broader or more specific issues in
how information pollution, toxicity, or manipulations in online news and/or social media
relates to your professional or everyday life?
I'm very curious about your perspective, if you'd like to share.
Thanks much!
I'm curious to hear from practitioners in the newsroom. 1. Are there any mundane tasks that you are tired of doing and you wished you could off-load to some "magic wand" technology? 2. What would you imagine it doing for you? If you had a "crystal ball", what would you wish it were able to tell you to do your job better? (No, not the future! Rather, from the realm of what's knows as of now, or from the past, no matter how recent.) Thanks for your insights!
I'm looking for cases of justified deception. If you see anything reported in the news, would you kindly provide a link? If you have a story to tell or an opinion to share, I'd also be very curious.
Is it always (morally) wrong to lie?
An example may include an episode on This American Life "In Defense of Ignorance" in which "
Lulu Wang tells the story of an elaborate attempt to keep someone ignorant — her grandmother — and how her family pulled it off". The grandmother (who lived in China) was not told of her terminal illness. One important fact was concealed from her: she had cancer and her doctor predicted she only had 6 months to live. This morally debatable act of withholding the diagnosis (at least in the North American context of the 21st century) is apparently customary in China, and some other parts of the world (Russia, for instance). Apparently it was common in patience care in Canada in 1950s as well. The lies is told as justifiable since the Chinese grandmother lived another 3 years after the diagnosis but who knows how the knowledge of her terminal illness would have impacted her, had she been told the prognosis. This is just one example.
Another one is found in the Guardian by an American philosopher (based in the UK), James Garvey in his article "Peter Gleick lied, but was it justified by the wider good?" (Feb., 27, 2012).
I'm aware of the philosophical debate on whether lies are justifiable (e.g., the murderer at the doorstep question: would you lie about your family members sleeping in the house?). But what's I'm looking for is recent examples documented in the press in which lying may be acceptable for a reason. I would much appreciate the help of the ResearchGate community to trace them down. Thank you very much!
VR.
Projects
Project (1)
How do you tell when a text is deceptive?
How do you tell when news are fake?