Chapter

Deception Detection and Rumor Debunking for Social Media

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The main premise of this chapter is that the time is ripe for more extensive research and development of social media tools that filter out intentionally deceptive information such as deceptive memes, rumors and hoaxes, fake news or other fake posts, tweets and fraudulent profiles. Social media users’ awareness of intentional manipulation of online content appears to be relatively low, while the reliance on unverified information (often obtained from strangers) is at an all-time high. I argue that there is need for content verification, systematic fact-checking and filtering of social media streams. This literature survey provides a background for understanding current automated deception detection research, rumor debunking, and broader content verification methodologies, suggests a path towards hybrid technologies, and explains why the development and adoption of such tools might still be a significant challenge. [The book is forthcoming in 2016: https://us.sagepub.com/en-us/nam/the-sage-handbook-of-social-media-research-methods/book245370]

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Organizations that proactively analyze the value of their data have an opportunity to predict future outcomes and contribute to the firm's ability to maximize profits (Bedi et al., 2014). A deeper understanding of the strategies big data analysts need to improve classifiers for the quality of personal data predictors could help boost overall classification accuracy of machine learning algorithms (Rubin, 2016;Salvetti, 2012). ...
... The problem addressed in this study were the strategies big data analysts need to improve classifiers for the quality of personal data predictors have not been fully established Rubin, 2016). Big data is fast becoming a source for predictive and prescriptive analytics to make strategic decisions to improve a firm's profitability, increase customer satisfaction, and reveal insights into new market opportunities (K & Shivakumar, 2014). ...
... Online deception detection has evolved over the past decade and classifiers leveraging Natural Language Processing (NLP) can detect deceptive behavior at a greater level of accuracy than humans (Fuller, Biros, & Delen, 2011;Ott, Choi, Cardie, & Hancock, 2011;Rubin, 2016;Salvetti, 2012). Additional variables such as personality, gender, and age were explored to determine their contributions to enhance the strength of existing classifiers. ...
Thesis
The use of big data analytics to uncover hidden knowledge, make better decisions, and support strategic planning is expanding as digital technology becomes more ubiquitous. Machine learning algorithms leverage digital data that is expected to reach zettabytes in volume over the next few years. However, poor data quality can affect the performance of even the most proficient of machine learners. Deceptive content and online misinformation can have a significant effect on the veracity of big data. The focus of this qualitative phenomenological research study was to assess the use of machine learning algorithms to evaluate the quality of big data so that organizations have a higher level of confidence in making strategic decisions. Researchers with expertise in automated deception detection were interviewed to gain insights on foundational elements of their machine learning algorithms and potential areas of improvement through the use of additional features. The study found that both common and contextual classification methodologies are important in machine learning algorithms to detect deception. Most research teams leveraged supervised machine learning but there was a growing interest in deep learning for future research due to the high overhead investment required to create machine learning features. The study revealed context dependency of features limited transferability of algorithms and there was a lack of consensus on the efficacy of personalized predictors to enhance algorithm efficiency. Keywords: artificial intelligence, big data analytics, data veracity, deception detection, deep learning, descriptive analytics, Five Factor Model, Linguistics Inquiry and Word Count (LIWC), machine learning, Naïve Bayes, natural language processing (NLP), predictive analytics, prescriptive analytics, support vector machines (SVM).
... Other review-based features, such as the numbers of views and comments received, ratings, and favorite times, were also used to identify spammers [34]. Linguistic features, such as the structure of sentences and modifiers, were also proposed to analyze reviews [15]. Based on a finding that the majority of collected spams are generated with underlying templates, Tangram [35] divided spams into segments and extracted templates for accurate and fast spam detections. ...
... Based on a finding that the majority of collected spams are generated with underlying templates, Tangram [35] divided spams into segments and extracted templates for accurate and fast spam detections. Some of these techniques, such as [15] and [35], can be potentially adopted by the followers in this paper to classify messages. Taking the classification probabilities of the techniques as the input, however, our proposed game theoretic models are general and do not depend on specific classification techniques. ...
... Likewise, the condition (14) can be proved by letting C 2 N n − C 3 N t > β b2 , where C 2 N n − C 3 N t is the minimum payoff from a forged message of P, as given in Table II. The condition (15) can be proved by comparing the cumulative payoff of a malicious publisher and that of a benign punisher over the same period of time. If a malicious P publishes f forged messages, it is first punished for τ 1 = (2 − q 1 ) f − 1 rounds, as proved by (3). ...
Article
Full-text available
Online social networks (OSNs) suffer from forged messages. Current studies have typically been focused on the detection of forged messages and do not provide the analysis of the behaviors of message publishers and network strategies to suppress forged messages. This paper carries out the analysis by taking a game theoretic approach, where infinitely repeated games are constructed to capture the interactions between a publisher and a network administrator and suppress forged messages in OSNs. Critical conditions, under which the publisher is disincentivized to publish any forged messages, are identified in the absence and presence of misclassification on genuine messages. Closed-form expressions are established for the maximum number of forged messages that a malicious publisher could publish. Confirmed by the numerical results, the proposed infinitely repeated games reveal that forged messages can be suppressed by improving the payoffs for genuine messages, increasing the cost of bots, and/or reducing the payoffs for forged messages. The increasing detection probability of forged messages or decreasing misclassification probability of genuine messages also has a strong impact on the suppression of forged messages.
... eir hypothesis was that trolls may be prone to deliberately cite news articles in a misleading fashion to support their own perspective. is feature had a positive effect on classification, and links troll-detection to rumor-debunking, where similar content-comparison methods prevail [147]. ...
... ese include, in particular, content comparison, similarity detection, and using metadata. Content comparison allows detecting texts that contain claims with a pre-established truth-value based on an external knowledge base [147]. False claims are not deceptive if they are sincerely believed (see Section 2), but a strong correlation between falsity and deception is nevertheless likely. ...
Preprint
Textual deception constitutes a major problem for online security. Many studies have argued that deceptiveness leaves traces in writing style, which could be detected using text classification techniques. By conducting an extensive literature review of existing empirical work, we demonstrate that while certain linguistic features have been indicative of deception in certain corpora, they fail to generalize across divergent semantic domains. We suggest that deceptiveness as such leaves no content-invariant stylistic trace, and textual similarity measures provide superior means of classifying texts as potentially deceptive. Additionally, we discuss forms of deception beyond semantic content, focusing on hiding author identity by writing style obfuscation. Surveying the literature on both author identification and obfuscation techniques, we conclude that current style transformation methods fail to achieve reliable obfuscation while simultaneously ensuring semantic faithfulness to the original text. We propose that future work in style transformation should pay particular attention to disallowing semantically drastic changes.
... Their hypothesis was that trolls may be prone to deliberately cite news articles in a misleading fashion to support their own perspective. This feature had a positive effect on classification, and links troll-detection to rumor-debunking, where similar content-comparison methods prevail [147]. ...
... These include, in particular, content comparison, similarity detection, and using metadata. Content comparison allows detecting texts that contain claims with a preestablished truth-value based on an external knowledge base [147]. False claims are not deceptive if they are sincerely believed (see Section 2), but a strong correlation between falsity and deception is nevertheless likely. ...
Article
Textual deception constitutes a major problem for online security. Many studies have argued that deceptiveness leaves traces in writing style, which could be detected using text classification techniques. By conducting an extensive literature review of existing empirical work, we demonstrate that while certain linguistic features have been indicative of deception in certain corpora, they fail to generalize across divergent semantic domains. We suggest that deceptiveness as such leaves no content-invariant stylistic trace, and textual similarity measures provide a superior means of classifying texts as potentially deceptive. Additionally, we discuss forms of deception beyond semantic content, focusing on hiding author identity by writing style obfuscation. Surveying the literature on both author identification and obfuscation techniques, we conclude that current style transformation methods fail to achieve reliable obfuscation while simultaneously ensuring semantic faithfulness to the original text. We propose that future work in style transformation should pay particular attention to disallowing semantically drastic changes.
... Metinsel sahteciliğe yoğunlaşan bazı çalışmaların; sosyal medya üzerinden yapılan gönderilerdeki sahteciliğin sınıflandırılması [3], internet iletişiminde cinsiyet aldatması [4], internetten yapılan yorumlardaki sahteciliğin bulunması [5] gibi pek çok farklı konuya odaklandığı görülmektedir. Ayrıca son birkaç yıldır, sahte haberlerin dünya genelindeki politik seçim sonuçlarını etkilemeye yönelik kullanıldığı [6] şeklinde bir algının ortaya çıktığı söylenebilir. ...
... In recent years there has been in increase in research related to false news across various domains. One domain of false news research is centered on developing computer algorithms that can detect and filter false information in social media [11,12,13,14,15], with one having shown an accuracy of 88% within 24 hours [16]. Other research is focused on how to classify false information [17], how false information spreads through social media [18,19], factors that affect social media post credibility [20], and sharing patterns related to how users interact with false information [21,22,23]. ...
... The recent explosion in usage of social networking sites to obtain pecuniary gains from sensationalist stories [2] or strategically influence political campaigns [5], [20], [38] has highlighted the need to better comprehend how misinformation is created, how it diffuses in social media [46] and how it can be spotted [14], [31], [36], [48]. Most efforts to understand the phenomena [1], [2], [6]- [8], [15], [32]- [34], [42], and develop solutions [11], [14], [31], [48] suggest human biases may foster the diffusion of misinformation. ...
Article
Full-text available
We investigated whether and how political misinformation is engineered using a dataset of four months worth of tweets related to the 2016 presidential election in the United States. The data contained tweets that achieved a significant level of exposure and was manually labelled into misinformation and regular information. We found that misinformation was produced by accounts that exhibit different characteristics and behaviour from regular accounts. Moreover, the content of misinformation is more novel, polarised and appears to change through coordination. Our findings suggest that engineering of political misinformation seems to exploit human traits such as reciprocity and confirmation bias. We argue that investigating how misinformation is created is essential to understand human biases, diffusion and ultimately better produce public policy.
... Even if the problem of misinformation and deception detection is not new, its format (e.g., the way it can be easily crafted and then spread through social media) poses new challenges for academics and practitioners alike. In fact, there has been little research in the field of automatic deception detection in social media 1 (Rubin, 2017) as many argue that studying textual elements within social media posts is challenging and contend that a better way to study the problem is by analysing its patterns of diffusion ( Monti et al., 2019). ...
... Even, it is difficult for human beings to differentiate between areal news and a fake news. Human beings are able to identify real and fake news with only 50-60% success rates [8]. So, developing automated fake news detection system is extremely important. ...
Conference Paper
Full-text available
Fake news generally on social media spreads very quickly and this brings many serious consequences. Traditional lexico-syntactic based features have limited success to detect fake news. Majority of fake news detection techniques are tested on small dataset containing limited training examples. In this work, we evaluate our architecture on Liar-Liar dataset which contain 12836 short news from different sources including social media. The proposed architecture incorporates POS (part of speech) tags information of news article through Bidirectional LSTM and speaker profile information through Convolutional Neural Network. The results show that the resulting hybrid architecture significantly improves detection performance of Fake news on Liar Dataset.
... By engineering popularity and relevance, the discursive swarms unleashed by bots and fake accounts can generate an impression of credibility, unanimity and common sense, an outcome essential to normalizing particular modes of thought (Chen, 2015). Ultimately, by concealing the authors and agendas behind communications, such practices facilitate shadow crusades and astroturfing (Rubin, 2017). ...
Article
Answering calls for deeper consideration of the relationship between moral panics and emergent media systems, this exploratory article assesses the effects of social media – web-based venues that enable and encourage the production and exchange of user-generated content. Contra claims of their empowering and deflationary consequences, it finds that, on balance, recent technological transformations unleash and intensify collective alarm. Whether generating fear about social change, sharpening social distance, or offering new opportunities for vilifying outsiders, distorting communications, manipulating public opinion, and mobilizing embittered individuals, digital platforms and communications constitute significant targets, facilitators, and instruments of panic production. The conceptual implications of these findings are considered.
... The current studies have shown that in the social system constructed by social media, individuals have dual identities of netizens and real social beings, so individual behavior has psychological and sociological characteristics [19], for instance, individual behavior is susceptible to the spiral of silence effect. Therefore, some researchers analyze the factors that affect individual spreading behavior in social media, and then establish a variety of models to predict individual spreading behavior in social media. ...
Article
Full-text available
Rumors in social media not only affect the health of online social networks, but also reduce the quality of information accessed by social media users. When emergencies occur, the rapid spread of rumors can even trigger mass anxiety and panic. However, the existing studies did not make a clear distinction between rumor and non-rumor information in public emergencies , so that they cannot effectively predict the rumor retweeting behavior. To this end, a model for predicting rumor retweeting behavior is presented based on the convolutional neural networks (CNN) called R-CNN model in the paper. In this model, the rumor retweeting behavior is considered as an important driving force of increasing the depth and breadth of rumor cascades, and four feature vectors are constructed with the historical textual content published by users, consisting of attention to public emergencies, attention to rumors, reaction time and tweeting frequency. To input the quantitative feature vectors for R-CNN, a K-means based core tweets extraction method is proposed to select the right tweets, and the quantitative feature representations are proposed. The predictive capability of the model has been proved by experiments base on two rumor datasets of emergencies crawled from Sina weibo. Experimental results indicate that the prediction accuracy of the model reaches 88%, and it can be improved by 7% on average compared with other models.
... The definition of rumor is "a claim whose truthfulness is in doubt and has no clear source, even if its ideological or partisan origins and intents are clear" [16]. Rumors bring about harmful effects like spreading fear or even euphoria, cause people to make wrong judgement, cause damages to political events, economy and social stability [45]. The massive increase of the social media data rendered the manual methods to debunk rumor, difficult and costly. ...
Preprint
Full-text available
Increased usage of social media caused the popularity of news and events which are not even verified, resulting in spread of rumors allover the web. Due to widely available social media platforms and increased usage caused the data to be available in huge amounts.The manual methods to process such large data is costly and time-taking, so there has been an increased attention to process and verify such content automatically for the presence of rumors. A lot of research studies reveal that to identify the stances of posts in the discussion thread of such events and news is an important preceding step before identify the rumor veracity. In this paper,we propose a multi-task learning framework for jointly predicting rumor stance and veracity on the dataset released at SemEval 2019 RumorEval: Determining rumor veracity and support for rumors(SemEval 2019 Task 7), which includes social media rumors stem from a variety of breaking news stories from Reddit as well as Twit-ter. Our framework consists of two parts: a) The bottom part of our framework classifies the stance for each post in the conversation thread discussing a rumor via modelling the multi-turn conversation and make each post aware of its neighboring posts. b) The upper part predicts the rumor veracity of the conversation thread with stance evolution obtained from the bottom part. Experimental results on SemEval 2019 Task 7 dataset show that our method outperforms previous methods on both rumor stance classification and veracity prediction
... This natural tendency of humans is used to distinguish between fake news and true news. These properties found in the content of a message can serve as linguistic clues that can detect deception 14 . ...
Preprint
As the world is becoming more dependent on the internet for information exchange, some overzealous journalists, hackers, bloggers, individuals and organizations tend to abuse the gift of free information environment by polluting it with fake news, disinformation and pretentious content for their own agenda. Hence, there is the need to address the issue of fake news and disinformation with utmost seriousness. This paper proposes a methodology for fake news detection and reporting through a constraint mechanism that utilizes the combined weighted accuracies of four machine learning algorithms.
... Fighting conspiracy is a difficult battle, but our study highlighted that influencers and verified organizational users with a larger following could help draw more user participation to debunking posts. Influencers and organizational users can be considered as critical seeds for disseminating debunking information through online social networks (Rubin, 2017). Social media platforms and public agencies may consider actively enlisting their help in the debunking process. ...
Preprint
Full-text available
This paper studies conspiracy and debunking narratives about COVID-19 origination on a major Chinese social media platform, Weibo, from January to April 2020. Popular conspiracies about COVID-19 on Weibo, including that the virus is human-synthesized or a bioweapon, differ substantially from those in the US. They attribute more responsibility to the US than to China, especially following Sino-US confrontations. Compared to conspiracy posts, debunking posts are associated with lower user participation but higher mobilization. Debunking narratives can be more engaging when they come from women and influencers and cite scientists. Our findings suggest that conspiracy narratives can carry highly cultural and political orientations. Correction efforts should consider political motives and identify important stakeholders to reconstruct international dialogues toward intercultural understanding.
... Fighting conspiracy is a difficult battle, but our study highlights that influencers and verified organizational users with a larger following could help draw more user participation to debunking posts. Influencers and organizational users can be considered as critical seeds for disseminating debunking information through online social networks (Rubin, 2017). Social media platforms and public agencies may consider actively enlisting their help in the debunking process. ...
Article
Full-text available
This paper studies conspiracy and debunking narratives about the origins of COVID-19 on a major Chinese social media platform, Weibo, from January to April 2020. Popular conspiracies about COVID-19 on Weibo, including that the virus is human-synthesized or a bioweapon, differ substan-tially from those in the United States. They attribute more responsibility to the United States than to China, especially following Sino-U.S. confrontations. Compared to conspiracy posts, debunking posts are associated with lower user participation but higher mobilization. Debunking narratives can be more engaging when they come from women and influencers and cite scientists. Our find-ings suggest that conspiracy narratives can carry highly cultural and political orientations. Correc-tion efforts should consider political motives and identify important stakeholders to reconstruct international dialogues toward intercultural understanding.
... In this study, besides rumors, we also find widely diffused rumor debunking tweets combating country lockdown rumors, virus-related rumors, and other fake news. Though Twitter users are notorious for their poor rumor judgment (Rubin 2017), our study shows that they are also passionate about retweeting rumor debunking tweets. For instance, a rumor debunking tweet, saying "So I'm hearing many myths about #COVID-19 ... Coronavirus will go away in Summer months. ...
Article
Full-text available
This study conducts an analysis on topics of the most diffused tweets and retweeting dynamics of crisis information amid Covid-19 to provide insights into how Twitter is used by the public and how crisis information is diffused on Twitter amid this pandemic. Results show that Twitter is first and foremost used as a news seeking and sharing platform with more than \(70\%\) of the most diffused tweets being related to news and comments on crisis updates. As for the retweeting dynamics, our results show an almost immediate response from Twitter users, with some first retweets occurring as quickly as within 2 s and the vast majority \((90\%)\) of them done within 10 min. Nearly \(86\%\) of the retweeting processes could have \(75\%\) of their retweets finished within 24 h, indicating a 1-day information value of tweets. Distribution of retweeting behaviors could be modeled by Power law, Weibull, and Log normal in this study, but still there are \(20\%\) original tweets whose retweeting distributions left unexplained. Results of retweeting community analysis show that following retweeters contribute to nearly \(50\%\) of the retweets. In addition, the retweeting contribution of verified Twitter users is significantly \((P<0.05)\) different from that of unverified users. A similar significant \((P<0.05)\) difference is also found in their rates of verified retweeters, and it has been shown that verified Twitter users enjoy seven times as high value as that of unverified users. In other words, users with the same verification status are more likely to get together to diffuse crisis information.
... Artificial intelligence based tools commercially available on the internet for media specialists (Rubin 2017). ...
Conference Paper
Full-text available
Attitude constitutes an important factor underlying an individual behavior. This paper discusses results of a study on EFL teachers’ attitude towards their area consultative forum. An area consultative forum is one where member teachers of the same school subject in the area gather to learn and practice together, and share as well as work with one another. Therefore, if utilized properly, the forum will be able to empower the teachers joining it. The survey study involved twenty EFL urban primary school teachers in the town of Purwokerto, Central Java, Indonesia. Participants were chosen using convenience sampling. Data were collected through questionnaires and interviews. Data were analysed by using descriptive statistics and inductive procedure. Overall, results of this study have shown that the EFL teachers under investigation had positive attitude towards their consultative forum. However, some teachers did not participate enough. Basically, they were sad about, among others, unclearly-planned programs, very low quantity of activities, and poor quality of activities. When these and other weaknesses within the forum are solved, then it is supposed to contribute to the member teachers’ professional development.
... An attentiongrabbing fake story, "Pope Francis shocks world, endorses Donald Trump for president" was intensely widespread among people connected through social-world that caused surprising impacts on election in favor of Donald Trump [3]. Fake story had badly scratched the shares of Steve Job's company in stock market in October 2008 [4]. Thus, identifying such fake materials on social sites is obligatory to educate users and guard them against its devastating effects. ...
... In addition, they tend to use a few of cognitive complexity words and express more negative emotions. To certain extent, even human judgment can achieve 50-63% accuracy on detecting deceptions in the news without the proper knowledge [22]. In our framework, we perform a linguistic analysis to extract and count linguistic characteristics from the speech statement, and use the counts of linguistic characteristics as features. ...
... , share and create news, including online and via social media. While fake news is "completely fabricated, manipulated to resemble credible journalism and attracts maximum attention" (Hunt, 2016) and (Rubin, 2017) describe fake news as covering five types of fake news: deliberately deceptive, jokes based on face value, large-scale hoaxes, skewed reporting of real facts, and stories where the truth is contingent ...
Article
In the midst of the COVID19 pandemic situation, hoax news emerged that made people panic and make wrong decisions. The emergence of this hoax is because many people are not literate with information, so they trust all the information entered. People also do not have the social awareness to filter the information they get. This study aims to describe the level of literacy of the digital community towards COVID19 information measured on the internet. The research method used is a descriptive type with a survey method. The research location was in East Java, with a total of 500 respondents consisting of several segmentations of people who actively use the internet as a source of information. The results of this study indicate that the digital literacy level for COVID19 information is good with an average value of 3.69. Of the five dimensions of digital literacy that are used as the highest measuring tool of the very high ethical awareness dimension, the second position is the media evaluation dimension, the third is a media production, the fourth is media access, and the fifth is media awareness.
... The main challenge is that most users do not pay more attention to the manipulated information, while those who are manipulating it are systematically trying to create more confusion. The outcome of this process is that the people's ability to decipher real from false information is further impeded [138,152]. ...
Preprint
Full-text available
Social Networks' omnipresence and ease of use has revolutionized the generation and distribution of information in today's world. However, easy access to information does not equal an increased level of public knowledge. Unlike traditional media channels, social networks also facilitate faster and wider spread of disinformation and misinformation. Viral spread of false information has serious implications on the behaviors, attitudes and beliefs of the public, and ultimately can seriously endanger the democratic processes. Limiting false information's negative impact through early detection and control of extensive spread presents the main challenge facing researchers today. In this survey paper, we extensively analyze a wide range of different solutions for the early detection of fake news in the existing literature. More precisely, we examine Machine Learning (ML) models for the identification and classification of fake news, online fake news detection competitions, statistical outputs as well as the advantages and disadvantages of some of the available data sets. Finally, we evaluate the online web browsing tools available for detecting and mitigating fake news and present some open research challenges.
... The main challenge is that most users do not pay more attention to the manipulated information, while those who are manipulating it are systematically trying to create more confusion. The outcome of this process is that the people's ability to decipher real from false information is further impeded [138,152]. ...
Article
Full-text available
Social Networks' omnipresence and ease of use has revolutionized the generation and distribution of information in today's world. However, easy access to information does not equal an increased level of public knowledge. Unlike traditional media channels, social networks also facilitate faster and wider spread of disinformation and misinformation. Viral spread of false information has serious implications on the behaviours, attitudes and beliefs of the public, and ultimately can seriously endanger the democratic processes. Limiting false information's negative impact through early detection and control of extensive spread presents the main challenge facing researchers today. In this survey paper, we extensively analyse a wide range of different solutions for the early detection of fake news in the existing literature. More precisely, we examine Machine Learning (ML) models for the identification and classification of fake news, online fake news detection competitions, statistical outputs as well as the advantages and disadvantages of some of the available data sets. Finally, we evaluate the online web browsing tools available for detecting and mitigating fake news and present some open research challenges.
Article
Purpose The purpose of this paper is to treat disinformation and misinformation (intentionally deceptive and unintentionally inaccurate misleading information, respectively) as a socio-cultural technology-enabled epidemic in digital news, propagated via social media. Design/methodology/approach The proposed disinformation and misinformation triangle is a conceptual model that identifies the three minimal causal factors occurring simultaneously to facilitate the spread of the epidemic at the societal level. Findings Following the epidemiological disease triangle model, the three interacting causal factors are translated into the digital news context: the virulent pathogens are falsifications, clickbait, satirical “fakes” and other deceptive or misleading news content; the susceptible hosts are information-overloaded, time-pressed news readers lacking media literacy skills; and the conducive environments are polluted poorly regulated social media platforms that propagate and encourage the spread of various “fakes.” Originality/value The three types of interventions – automation, education and regulation – are proposed as a set of holistic measures to reveal, and potentially control, predict and prevent further proliferation of the epidemic. Partial automated solutions with natural language processing, machine learning and various automated detection techniques are currently available, as exemplified here briefly. Automated solutions assist (but not replace) human judgments about whether news is truthful and credible. Information literacy efforts require further in-depth understanding of the phenomenon and interdisciplinary collaboration outside of the traditional library and information science, incorporating media studies, journalism, interpersonal psychology and communication perspectives.
Chapter
Online social media promotes the development of the news industry and make it easy for everyone to obtain the latest news. Meanwhile, the circumstances get worse because of fake news. Fake news is flooding and become a serious threat which may cause high societal and economic losses, making fake news detection important. Unlike traditional one, news on social media tends to be short and misleading, which is more confusing to identify. On the other hand, fake news may contain parts of the facts and parts of the incorrect contents in one statement, which is not so clear and simple to classify. Hence, we propose a two-stage model to deal with the difficulties. Our model is built on BERT, a pre-trained model with a more powerful feature extractor Transformer instead of CNN or RNN. Besides, some accessible information is used to extend features and calculate attention weights. At last, inspired by fine-grained sentiment analysis, we treat fake news detection as fine-grained multiple-classification task and use two similar sub-models to identify different granularity labels separately. We evaluate our model on a real-world benchmark dataset. The experimental results demonstrate its effectiveness in fine-grained fake news detection and its superior performance to the baselines and other competitive approaches.
Article
Full-text available
This paper offers a conceptual basis and describes elements for a multi-layered system to provide information users (newsreaders) with credible information and improve the work processes of the online news (content) producers. I overview criteria of excellence (what editors consider newsworthy) and how reporters (and traditional newsroom professionals used to) verify information to provide high quality of news. I compare the “traditional model of journalism” to the current journalistic practices of “news sharing a.s.a.p.” and identify certain processes that are currently either missing or could be complemented with automatic verification functions, capitalizing on Natural Language Processing (NLP) and Data-Mining (DM).
Article
Full-text available
Subject and purpose of work: Fake news and disinformation are polluting information environment. Hence, this paper proposes a methodology for fake news detection through the combined weighted accuracies of seven machine learning algorithms. Materials and methods: This paper uses natural language processing to analyze the text content of a list of news samples and then predicts whether they are FAKE or REAL. Results: Weighted accuracy algorithmic approach has been shown to reduce overfitting. It was revealed that the individual performance of the different algorithms improved after the data was extracted from the news outlet websites and 'quality' data was filtered by the constraint mechanism developed in the experiment. Conclusions: This model is different from the existing mechanisms in the sense that it automates the algorithm selection process and at the same time takes into account the performance of all the algorithms used, including the less performing ones, thereby increasing the mean accuracy of all the algorithm accuracies.
Conference Paper
Rapid spreading of misinformation is a growing worldwide concern as it has the capacity to greatly influence individual reputation and societal behavior. The consequences of unchecked spreading of misinformation can not only vary from political to financial but also effect global opinion for a long time. Thus, detecting fake news is important but challenging as the ability to accurately categorize certain information as true or fake is limited even in human. Moreover, fake news are a blend of correct news and false information making accurate classification even more confusing. In this paper, we propose a novel method of multilevel multiclass fake news detection based on relabeling of the dataset and learning iteratively. The proposed method outperforms the benchmark and our experiments indicate that profile of the source of information contributes the most in fake news detection.
Conference Paper
Full-text available
Online Social Network (OSN) platforms are growing regularly and they keep on attracting users from all over the world. However, OSN actively and voluntarily collect data about their users at an exponential rate, jeopardizing their right to privacy. Additionally, people willingly disclose and share their personal information, exacerbating the issue. Malicious third parties piggyback on this data sharing and increase the risk of anyone becoming the victim of deception. Even technologically knowledgeable people, concerned for their privacy, have been trapped, due to of the numerous online deception techniques. In addition, the rise of new technologies used to deceive people in OSN, makes the need for privacy protection more crucial than ever because OSN is bringing new choices, new opportunities, but also new dangers. As such, various researchers keep on working and deriving innovative techniques to detect and prevent online deception. In this work, we propose a taxonomy to help coordinate and organize the efforts of protecting OSN users against online deception. To build the taxonomy, first we conducted a survey of the various online deception techniques and their corresponding countermeasures. Then we organized and grouped them into different categories to derive our taxonomy.
Chapter
Chapter 7 focuses on artificially intelligent (AI) systems that can help the human eye identify fakes of several kinds and call them out for the benefit of the public good. I explain, in plain language, the principles behind the AI-based methodologies employed by automated deception detectors, clickbait detectors, satirical fake detectors, rumor debunkers, and computational fact-checking tools. I trace the evolution of such state-of-the-art AI over the past 10–15 years. The inner workings of the systems are explained in simple terms that are accessible to readers without a computer science background. How do these technologies operate in principle? What important features do developers consider? What rates of success do these systems report? And finally, what are the next steps in improvements, collaboration, and integration of various approaches?
Article
Text messages with strategically placed emoticons impact recipient perceptions regarding truth or deception of the content. This article describes an experiment using 3 treatments applied randomly to 4 deceptive and 4 truthful message snippets. The original content of the snippets related to scholarship interviewee comments that truthfully or deceptively described their background. Each message was represented in one of three ways: plain text, annotated text, or text with embedded emoticons. The data were analyzed using a 2 (Text Veracity: Honest or Dishonest) x 3 (Cues: Plain Text vs Annotated Text vs Emoticons) design. The dependent variable reflected participant perception of the snippet's honesty or dishonesty. Results show extra emotional cues impacted perceptions of the message content. Overall, this study demonstrated annotated text and text with embedded emotion were more likely to be judged as deceptive than plain text. This was particularly true when messages were deceptive. True messages were detected as truthful more often in plain text.
Chapter
Previous work in the social sciences, psychology and linguistics has show that liars have some control over the content of their stories, however their underlying state of mind may “leak out” through the way that they tell them. To the best of our knowledge, no previous systematic effort exists in order to describe and model deception language for Brazilian Portuguese. To fill this important gap, we carry out an initial empirical linguistic study on false statements in Brazilian news. We methodically analyze linguistic features using the Fake.Br corpus, which includes both fake and true news. The results show that they present substantial lexical, syntactic and semantic variations, as well as punctuation and emotion distinctions.
Chapter
In recent years, a lot of false information in medical and healthcare domains has emerged and spread over the Internet. Such false information has become a big risk to public health and safety. This study investigates this problem by analyzing data collected from two fact-checking websites, 416 medical claims from Snopes.com and 1,692 healthcare-related statements from PolitiFact.com. Topic analysis reveals frequent words and common topics occurring in these claims spread online. Furthermore, using text-mining and machine-learning techniques, this study builds prediction models for detecting false information and shows promising performance. Several textual and source features are identified as good indicators for true or false information in medical and healthcare domains.
Conference Paper
Online Social Network (OSN) platforms are growing regularly and they keep on attracting users from all over the world. However, OSN actively and voluntarily collect data about their users at an exponential rate, jeopardizing their right to privacy. Additionally, people willingly disclose and share their personal information, exacerbating the issue. Malicious third parties piggyback on this data sharing and increase the risk of anyone becoming the victim of deception. Even technologically knowledgeable people, concerned for their privacy, have been trapped, due to of the numerous online deception techniques. In addition, the rise of new technologies used to deceive people in OSN, makes the need for privacy protection more crucial than ever because OSN is bringing new choices, new opportunities, but also new dangers. As such, various researchers keep on working and deriving innovative techniques to detect and prevent online deception. In this work, we propose a taxonomy to help coordinate and organize the efforts of protecting OSN users against online deception. To build the taxonomy, first we conducted a survey of the various online deception techniques and their corresponding countermeasures. Then we organized and grouped them into different categories to derive our taxonomy.
Article
Purpose The purpose of this study is to unpack the antecedents and consequences of clickbait prevalence in online media at two different levels, namely, (1) Headline-level: what characteristics of clickbait headlines attract user clicks and (2) Publisher-level: what happens to publishers who create clickbait on a prolonged basis. Design/methodology/approach To test the proposed conjectures, the authors collected longitudinal data in collaboration with a leading company that operates more than 500 WeChat official accounts in China. This study proposed a text mining framework to extract and quantify clickbait rhetorical features (i.e. hyperbole, insinuation, puzzle, and visual rhetoric). Econometric analysis was employed for empirical validation. Findings The findings revealed that (1) hyperbole, insinuation, and visual rhetoric entice users to click the baited headlines, (2) there is an inverted U -shaped relationship between the number of clickbait headlines posted by a publisher and its visit traffic, and (3) this non-linear relationship is moderated by the publisher's age. Research limitations/implications This research contributes to current literature on clickbait detection and clickbait consequences. Future studies can design more sophisticated methods for extracting rhetorical characteristics and implement in different languages. Practical implications The findings could aid online media publishers to design attractive headlines and develop clickbait strategies to avoid user churn, and help managers enact appropriate regulations and policies to control clickbait prevalence. Originality/value The authors propose a novel text mining framework to quantify rhetoric embedded in clickbait. This study empirically investigates antecedents and consequences of clickbait prevalence through an exploratory study of WeChat in China.
Article
In recent years, there has been a rise in incorrect information, or misinformation, being shared on social media. Such misinformation tends to be more eye-catching and misleading. Previous studies on detecting misinformation online have focused mainly on linguistic characteristics; however, the role of linguistic characteristics in misinformation dissemination has not yet been thoroughly researched. In this study, we propose a misinformation dissemination model that includes the direct effects of four novel linguistic characteristics on dissemination and the moderating effect of information richness. The model is tested by using 9,631 examples of misinformation collected from Sina-Weibo, the leading social media platform in China. The results indicate that compared with correct information, misinformation containing persuasive and uncertainty words is more likely to be disseminated than misinformation containing emotional and comparative words. Furthermore, in the case of information richness, the results indicate that when the misinformation includes images, the effects of persuasive, negative emotion, comparative, and uncertainty words are strengthened, while the inclusion of videos weakens the effects of the linguistic characteristics. Finally, the results of robustness check by using the Fakenewsnet data set are consistent with our hypothesis, indicating that the four linguistic characteristics proposed by this study are also suitable for the dissemination of misinformation in English. The robustness check further demonstrates our method with well generalization.
Article
Detection of online subversive activities, such as fake news, concerted campaigns, and bots, is getting increasingly urgent. However, without specific knowledge of underlying facts and disparate valid perspectives of a given issue, it is hard to detect subversive intent in a generic sense. To address this, we approach the problem from a “macro” perspective. Rather than asking whether a specific social media account is acting subversively, we look at the entire discourse around a trending topic, and ask whether the discourse looks “healthy” or is it showing signs of getting hijacked or dominated by one particular perspective. To do this, we break down a social discourse into its constituent narratives. Narratives are in turn modeled as latent stories or worldviews, whose visible characterizations are in the form of specific distributions over different opinions expressed in the discourse. Once the discourse is broken down into narratives, the “health” of the discourse can be addressed using various measures, such as the relative sizes of its constituent narratives, sentiment polarity of internarrative interactions, and presence or absence of dominant players within each narrative. We conduct experiments on several well-known trending topics on Twitter to identify its constituent narratives and provide a report card on the overall discourse quality. We also show how this top-down approach offers the means to delineate roles played by users as drivers of the discourse, the constituent narratives, or their component opinions, determined on the basis of dominance centrality measures and narrative affinities.
Chapter
Chapter 1 frames the problem of deceptive, inaccurate, and misleading information in the digital media content and information technologies as an infodemic. Mis- and disinformation proliferate online, yet the solution remains elusive and many of us run the risk of being woefully misinformed in many aspects of our lives including health, finances, and politics. Chapter 1 untangles key research concepts—infodemic, mis- and disinformation, deception, “fake news,” false news, and various types of digital “fakes.” A conceptual infodemiological framework, the Rubin (2019) Misinformation and Disinformation Triangle, posits three minimal interacting factors that cause the problem—susceptible hosts, virulent pathogens, and conducive environments. Disrupting interactions of these factors requires greater efforts in educating susceptible minds, detecting virulent fakes, and regulating toxic environments. Given the scale of the problem, technological assistance as inevitable. Human intelligence can and should be, at least in part, enhanced with an artificial one. We require systematic analyses that can reliably and accurately sift through large volumes of data. Such assistance comes from artificial intelligence (AI) applications that use natural language processing (NLP) and machine learning (ML). These fields are briefly introduced and AI-enabled tasks for detecting various “fakes” are laid out. While AI can assist us, the ultimate decisions are obviously in our own minds. An immediate starting point is to verify suspicious information with simple digital literacy steps as exemplified here. Societal interventions and countermeasures that help curtail the spread of mis- and disinformation online are discussed throughout this book.
Article
For a rapid dissemination of information during crisis events, official agencies and disaster relief organizations have been utilizing social media platforms, which are susceptible to rumor propagation. To minimize the impact of rumors with limited time and resources, the agencies and social media companies not only need to wisely choose the cases to clarify amongst the numerous cases, but they should also make an informed decision on the timing of clarification. Reacting fast can be misjudged as an obvious best policy as partial/imprecise information may fail to contain the impact of the rumors. On the other hand, investment in terms of time, effort, and money to clarify with more complete information also allows the rumors to spread with their full force during the learning phase, thereby making the process of decision-making very challenging. The objective of this paper is to determine the optimal strategies for the official agencies and social media companies by developing two novel sequential game-theoretic models, namely “Rumor Selection for Clarification” and “Learning for Rumor Clarification”, that can help decide which rumor to clarify and when to clarify, respectively. Results from this study indicate that posting verified information on social media reduces the uncertainties involved in rumor transmission, thereby enabling social media users to make informed decisions on whether to support or oppose the rumor being circulated. This verification needs to be obtained within reasonable limits of time and cost to keep the learning process worthwhile.
Chapter
Automated detection of text with misrepresentations such as fake reviews is an important task for online reputation management. We form the Ultimate Deception Dataset that consists of customer complaints—emotionally charged texts, which include descriptions of problems customers experienced with certain businesses. Typically, in customer complaints, either customer describes company representative lying, or they lie themselves. The Ultimate Deception Dataset includes almost 3 000 complaints in the personal finance domain and provides clear ground truth based on available factual knowledge about the financial domain. Among them, four hundred texts were manually tagged. Experiments were performed in order to explore the links between implicit cues of the rhetoric structure of texts and the validity of arguments, and also how truthful/deceptive are these texts. We confirmed that communicative discourse trees are essential to detect various forms of misrepresentation in text, achieving 76% F1 on the Ultimate Deception Dataset. We believe that this accuracy is sufficient to assist a manual curation of a CRM environment towards having high-quality, trusted content. Recognizing hypocrisy in customer communication concerning his impression with the company or hypocrisy in customer attitude is fairly important for proper tackling and retaining customers. We collect a dataset of sentences with hypocrisy and learn to detect it relying on syntactic, semantic and discourse-level features and also web mining to correlate contrasting entities. The sources are customer complaints, samples of texts with hypocrisy on the web and tweets tagged as hypocritical. We propose an iterative procedure to grow the training dataset and achieve the detection F1 above 80%, which is expected to be satisfactory for integration into a CRM platform. We conclude this section with the detection of a rumor and misinformation in web document where discourse analysis is also helpful.
Chapter
Many practices in marketing, advertising, and public relations, presented in Chapter 6, have the intent to persuade and manipulate the public opinion from the onset of their endeavors. I lay out marketing communications strategies and dissect the anatomy of the ad revenue model. I review key ideas in advertising standards and self-regulation policies that do not allow misleading ads. Advertising techniques in marketing campaigns and political propaganda abound in truth-bending, but regulatory bodies such as the U.S. Federal Trade Commission distinguish between puffery and materially harmful misleading ads. Digital media users may be constantly bombarded with ads—puffed-up, borderline deceptive, emotionally suggestive, or plainly misleading— and they may grow weary, distrustful, and resistant to ads. Advertisers get more creative with variants of covert advertising exemplified in Chap. 6 with native ads, sponsored links, or branded content. I discuss the elusive concept of virality, and explore what makes us vulnerable to viral conspiracy theories. Masters of persuasion may exploit human biases and logical fallacies including the bandwagon appeal, glittering generalities, and bait and switch, to name a few. Propagandists, advertisers, and public relations experts know when and how to appeal to emotions or individuality, use wit and humor, or finetune who delivers the message as a relatable credible source. More AI countermeasures and stricter regulation are needed to curb the existing ad revenue model and unscrupulous financing that instigates the spread of mis- and disinformation. AI-based technologies such as spambots, paybots, autolikers, and other inauthentic accounts are farmed out to create hype and social engagement by propagating falsehoods, clickbait, provocations, misleading, or otherwise inaccurate messages. Policymakers and legislators are broadly encouraged to focus on regulating algorithmic transparency, platform accountability, digital advertising, and data privacy, while avoiding crude measures of controlling and criminalizing digital content or stifling free speech. The ultimate goal is to reestablish trust in the basic institutions of a democratic society by bolstering facts and combatting the systematic efforts at devaluing truth. Professional manipulators, propagandists, and their unscrupulous technologies— at the service of few "deep pockets— require more public oversight of their unscrupulous disinformation campaigns that misuse commerical and public discousre to manipulate the general public. Establishing a global regulatory framework and venues for enforcement may be key to addressing the problem across nations, cultures, and language boundaries. Nation-wide educational efforts in digital literacy should meanwhile instruct digital media users in recognizing manipulative techniques to resist the powers of propaganda and advertising.
Chapter
Full-text available
Research syntheses suggest that verbal cues are more diagnostic of deception than other cues. Recently, to avoid human judgmental biases, researchers have sought to find faster and more reliable methods to perform automatic content analyses of statements. However, diversity of methods and inconsistent findings do not present a clear picture of effectiveness. We integrate and statistically synthesize this literature. Our meta-analyses revealed small, but significant effect-sizes on some linguistic categories. Liars use fewer exclusive words, self-and other-references, fewer time-related, but more space-related, negative and positive emotion words, and more motion verbs or negations than truth-tellers.
Technical Report
Full-text available
While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.
Conference Paper
Full-text available
Tabloid journalism is often criticized for its propensity for exaggeration, sensationalization, scare-mongering, and otherwise producing misleading and low quality news. As the news has moved online, a new form of tabloidization has emerged: ‘clickbaiting.’ ‘Clickbait’ refers to “content whose main purpose is to attract attention and encourage visitors to click on a link to a particular web page” [‘clickbait,’ n.d.] and has been implicated in the rapid spread of rumor and misinformation online. This paper examines potential methods for the automatic detection of clickbait as a form of deception. Methods for recognizing both textual and non-textual clickbaiting cues are surveyed, leading to the suggestion that a hybrid approach may yield best results.
Conference Paper
Full-text available
In this paper, we propose the first real time rumor debunk-ing algorithm for Twitter. We use cues from 'wisdom of the crowds', that is, the aggregate 'common sense' and investigative journalism of Twitter users. We concentrate on identification of a rumor as an event that may comprise of one or more conflicting microblogs. We continue monitoring the rumor event and generate real time updates dynamically based on any additional information received. We show using real streaming data that it is possible, using our approach , to debunk rumors accurately and efficiently, often much faster than manual verification by professionals.
Conference Paper
Full-text available
This paper identifies and evaluates key factors that influence credibility perception in microblogs. Specifically, we report on a demographic survey (N=81) followed by a user experiment (N=102) in order to answer the following research questions: (1) What are the important cues that contribute to information being perceived as credible? and (2) To what extent is such a quantification portable across different microblogging platforms? To answer the second question, we study two popular microblogs, Reddit and Twitter. Key results include that significant effects of individual factors can be isolated, are portable, and that metadata and image type elements are, in general, the strongest influencing factors in credibility assessments.
Conference Paper
Full-text available
Widespread adoption of internet technologies has changed the way that news is created and consumed. The current online news environment is one that incentivizes speed and spectacle in reporting, at the cost of fact-checking and verification. The line between user generated content and traditional news has also become increasingly blurred. This poster reviews some of the professional and cultural issues surrounding online news and argues for a two-pronged approach inspired by Hemingway's " automatic crap detector " (Manning, 1965) in order to address these problems: a) proactive public engagement by educators, librarians, and information specialists to promote digital literacy practices; b) the development of automated tools and technologies to assist journalists in vetting, verifying, and fact-checking, and to assist news readers by filtering and flagging dubious information.
Conference Paper
Full-text available
A fake news detection system aims to assist users in detecting and filtering out varieties of potentially deceptive news. The prediction of the chances that a particular news item is intentionally deceptive is based on the analysis of previously seen truthful and deceptive news. A scarcity of deceptive news, available as corpora for predictive modeling, is a major stumbling block in this field of natural language processing (NLP) and deception detection. This paper discusses three types of fake news, each in contrast to genuine serious reporting, and weighs their pros and cons as a corpus for text analytics and predictive modeling. Filtering, vetting, and verifying online information continues to be essential in library and information science (LIS), as the lines between traditional news and online information are blurring.
Thesis
Full-text available
This study empirically derives a framework for analyzing certainty about written propositions. CERTAINTY, or EPISTEMIC MODALITY, is a linguistic expression of an estimation of the likelihood that a particular state of affairs is, has been, or will be true. The study describes how explicitly marked certainty can be predictably and dependably identified in texts, and answers the following research questions: 1) How is explicit certainty expressed linguistically in English? 2) What are the patterns and frequencies of occurrence of explicit certainty markers? 3) How much inter-subjective agreement can be reached by readers when identifying and categorizing explicit certainty in texts? A dataset of 2,343 sentences in 80 New York Times Service news reports and editorials was analyzed for occurrences of explicit certainty markers (e.g., definitely, somewhat) based on a four-dimensional categorization model. The dimensions included: level (absolute, high, moderate, low certainty, and uncertainty), perspective (the writer’s point of view, reported direct participant’s account, and reported expert’s view), focus (opinions, emotions, or judgments and facts or events), and time (past, present, future, and irrelevant). The intercoder reliability results are modest but show improvements with longer coder training and stricter codebook instructions. The model’s five-level typology is subdivided into forty-three syntactico-semantic classes. Central modal auxiliary verbs, gradable adjectives in their superlative degree, and adverbial intensifiers frequently express explicit certainty, while adjectival downtoners and adverbial value disjuncts rarely do so. Explicitly-qualified statements occur at a significantly higher rate in editorials than in news reports (means of 0.94 and 0.7 certainty markers per sentence, respectively). Editorials have a high likelihood of starting and ending with an explicit certaintyqualified statement, while news reports tend to start with an implicitly certain statement, and have equal chances of ending with an implicitly or explicitly certainty-qualified statement. The model can serve as an analytical framework for automated certainty identification, a novel type of analysis in Natural Language Processing (NLP). Such a framework allows tracking of certainty levels in texts over time and among text genres. This contribution potentially enhances automated search capabilities in NLP and other information access applications.
Article
Full-text available
Recent improvements in effectiveness and accuracy of the emerging field of automated deception detection and the associated potential of language technologies have triggered increased interest in mass media and general public. Computational tools capable of alerting users to potentially deceptive content in computer–mediated messages are invaluable for supporting undisrupted, computer– mediated communication and information practices, credibility assessment and decision–making. The goal of this ongoing research is to inform creation of such automated capabilities. In this study we elicit a sample of 90 computer–mediated personal stories with varying levels of deception. Each story has 10 associated human deception level judgments, confidence scores, and explanations. In total, 990 unique respondents participated in the study. Three approaches are taken to the data analysis of the sample: human judges, linguistic detection cues, and machine learning. Comparable to previous research results, human judgments achieve 50–63 percent success rates, depending on what is considered deceptive. Actual deception levels negatively correlate with their confident judgment as being deceptive (r = -0.35, df = 88, ρ = 0.008). The highest-performing machine learning algorithms reach 65 percent accuracy. Linguistic cues are extracted, calculated, and modeled with logistic regression, but are found not to be significant predictors of deception level, confidence score, or an authors' ability to fool a reader. We address the associated challenges with error analysis. The respondents' stories and explanations are manually content–analyzed and result in a faceted deception classification (theme, centrality, realism, essence, self–distancing) and a stated perceived cue typology. Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader library information science and technology community.
Article
Full-text available
This study explores the connections between social and usage metrics (altmetrics) and bibliometric indicators at the author level. It studies to what extent these indicators, gained from academic sites, can provide a proxy for research impact. Close to 10,000 author profiles belonging to the Spanish National Research Council were extracted from the principal scholarly social sites: ResearchGate, Academia.edu and Mendeley and academic search engines: Microsoft Academic Search and Google Scholar Citations. Results describe little overlapping between sites because most of the researchers only manage one profile (72%). Correlations point out that there is scant relationship between altmetric and bibliometric indicators at author level. This is due to the almetric ones are site-dependent, while the bibliometric ones are more stable across web sites. It is concluded that altmetrics could reflect an alternative dimension of the research performance, close, perhaps, to science popularization and networking abilities, but far from citation impact.
Article
Full-text available
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference errors and improve the accuracy of generated insights. Big data veracity is now being recognized as a necessary property for its utilization, complementing the three previously established quality dimensions (volume, variety, and velocity), But there has been little discussion of the concept of veracity thus far. This paper provides a roadmap for theoretical and empirical definitions of veracity along with its practical implications. We explore veracity across three main dimensions: 1) objectivity/subjectivity, 2) truthfulness/deception, 3) credibility/implausibility – and propose to operationalize each of these dimensions with either existing computational tools or potential ones, relevant particularly to textual data analytics. We combine the measures of veracity dimensions into one composite index – the big data veracity index. This newly developed veracity index provides a useful way of assessing systematic variations in big data quality across datasets with textual information. The paper contributes to the big data research by categorizing the range of existing tools to measure the suggested dimensions, and to Library and Information Science (LIS) by proposing to account for heterogeneity of diverse big data, and to identify information quality dimensions important for each big data type.
Conference Paper
Full-text available
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference errors and improve the accuracy of generated insights. Big data veracity is now being recognized as a necessary property for its utilization, complementing the three previously established quality dimensions (volume, variety, and velocity), But there has been little discussion of the concept of veracity thus far. This paper provides a roadmap for theoretical and empirical definitions of veracity along with its practical implications. We explore veracity across three main dimensions: 1) objectivity / subjectivity, 2) truthfulness / deception, 3) credibility / implausibility – and propose to operationalize each of these dimensions with either existing computational tools or potential ones, relevant particularly to textual data analytics. We combine the measures of veracity dimensions into one composite index – the big data veracity index. This newly developed veracity index provides a useful way of assessing systematic variations in big data quality across datasets with textual information. The paper contributes to the big data research by categorizing the range of existing tools to measure the suggested dimensions, and to Library and Information Science (LIS) by proposing to account for heterogeneity of diverse big data, and to identify information quality dimensions important for each big data type.
Article
Full-text available
Identifying the veracity, or factuality, of event mentions in text is fundamental for reasoning about eventualities in discourse. Inferences derived from events judged as not having happened, or as being only possible, are different from those derived from events evaluated as factual. Event factuality involves two separate levels of information. On the one hand, it deals with polarity, which distinguishes between positive and negative instantiations of events. On the other, it has to do with degrees of certainty (e.g., possible, probable), an information level generally subsumed under the category of epistemic modality. This article aims at contributing to a better understanding of how event factuality is articulated in natural language. For that purpose, we put forward a linguistic-oriented computational model which has at its core an algorithm articulating the effect of factuality relations across levels of syntactic embedding. As a proof of concept, this model has been implemented in De Facto, a factuality profiler for eventualities mentioned in text, and tested against a corpus built specifically for the task, yielding an F1 of 0.70 (macro-averaging) and 0.80 (micro-averaging). These two measures mutually compensate for an over-emphasis present in the other (either on the lesser or greater populated categories), and can therefore be interpreted as the lower and upper bounds of the De Facto's performance.
Article
Full-text available
Deception research has consistently shown that accuracy rates tend to be just over fifty percent when accuracy rates are averaged across truthful and deceptive messages and when an equal number of truths and lies are judged. Breaking accuracy rates down by truths and lies, however, leads to a radically different conclusion. Across three studies, a large and consistent veracity effect was evident. Truths are most often correctly identified as honest, but errors predominate when lies are judged. Truth accuracy is substantially greater than chance, but the detection of lies was often significantly below chance. Also, consistent with the veracity effect, altering the truth‐lie base rate affected accuracy. Accuracy was a positive linear function of the ratio of truthful messages to total messages. The results show that this veracity effect stems from a truth‐bias, and suggest that the single best predictor of detection accuracy may be the veracity of message being judged. The internal consistency and parallelism of overall accuracy scores are also questioned. These findings challenge widely held conclusions about human accuracy in deception detection.
Conference Paper
Full-text available
In this article we explore the behavior of Twitter users under an emergency situation. In particular, we analyze the activity related to the 2010 earthquake in Chile and characterize Twitter in the hours and days following this disaster. Furthermore, we perform a pre-liminary study of certain social phenomenons, such as the dissem-ination of false rumors and confirmed news. We analyze how this information propagated through the Twitter network, with the pur-pose of assessing the reliability of Twitter as an information source under extreme circumstances. Our analysis shows that the propa-gation of tweets that correspond to rumors differs from tweets that spread news because rumors tend to be questioned more than news by the Twitter community. This result shows that it is posible to detect rumors by using aggregate analysis on tweets.
Article
Full-text available
This study examined differences in language usage as a function of message veracity and speech act type. A quasi-experiment crossed truthful and deceptive messages with confessions and denials in an induced cheating situation. Transcribed messages were analyzed with Linguistic Inquiry and Word Count (LIWC) software. Relative to honest participants, liars exhibited fewer negative emotions, less discrepancy, fewer modal verbs, more modifiers, and they spoke longer. Denials, relative to confessions, were characterized by shorter sentences, more negations, greater discrepancy, fewer past tense verbs, and more present tense verbs. The results are inconsistent with previous findings, suggesting a lack of cross-situational diagnostic utility.
Article
Full-text available
Lying is generally viewed negatively in Western society. Notwithstanding, it is a ubiquitous expedient for achieving social goals such as fostering harmony, sparing the feelings of friends, concealing wrongdoing, or exploiting others. Despite the wide use of deception, little research has explored what creativity may underlie it. Are novel liars the most effective at achieving their goals? Are those higher in divergent thinking or in ideation more effective in deception? As a preliminary attempt to chart the relationship between creativity and deception, 18 social dilemmas were written for which deception offered a desirable resolution; 89 college students responded to them, 21 males and 68 females. Their resolutions were coded for novelty, effectiveness in achieving the goals called for in the dilemmas in the short term, and their likely long-term damage on the liar–target relationship. A measure of divergent thinking was also administered, as was a measure of ideational tendencies (Ideational Behavioral Scale, Runco, Plucker, & Lim, 200138. Runco , M. A. , Plucker , J. A. , & Lim , W. ( 2001 ). Development and psychometric integrity of a measure of ideational behavior . Creativity Research Journal , 13 , 393 – 400 . [Taylor & Francis Online], [Web of Science ®]View all references). Among the major findings, creative liars tended to be higher in divergent thinking and more ideational. Theoretical and practical implications are considered.
Article
Full-text available
This study investigated changes in both the liar's and the conversational partner's linguistic style across truthful and deceptive dyadic communication in a synchronous text-based setting. An analysis of 242 transcripts revealed that liars produced more words, more sense-based words (e.g., seeing, touching), and used fewer self-oriented but more other-oriented pronouns when lying than when telling the truth. In addition, motivated liars avoided causal terms when lying, whereas unmotivated liars tended to increase their use of negations. Conversational partners also changed their behavior during deceptive conversations, despite being blind to the deception manipulation. Partners asked more questions with shorter sentences when they were being deceived, and matched the liar's linguistic style along several dimensions. The linguistic patterns in both the liar and the partner's language use were not related to deception detection, suggesting that partners were unable to use this linguistic information to improve their deception detection accuracy.
Article
Full-text available
Interpersonal deception theory (IDT) represents a merger of interpersonal communication and deception principles designed to better account for deception in interactive contexts. At the same time, it bas the potential to enlighten theories related to (a) credibility and truthful communication and (b) interpersonal communication. Presented here are key definitions, assumptions related to the critical attributes and key features of interpersonal communication and deception, and 18 general propositions from which specific testable hypotheses can be derived. Research findings relevant to the propositions are also summarized.
Article
Full-text available
The detection of deception is a promising but challenging task. A systematic discussion of automated Linguistics Based Cues (LBC) to deception has rarely been touched before. The experiment studied the effectiveness of automated LBC in the context of text-based asynchronous computer mediated communication (TA-CMC). Twenty-seven cues either extracted from the prior research or created for this study were clustered into nine linguistics constructs: quantity, diversity, complexity, specificity, expressivity, informality, affect, uncertainty, and nonimmediacy. A test of the selected LBC in a simulated TA-CMC experiment showed that: (1) a systematic analysis of linguistic information could be useful in the detection of deception; (2) some existing LBC were effective as expected, while some others turned out in the opposite direction to the prediction of the prior research; and (3) some newly discovered linguistic constructs and their component LBC were helpful in differentiating deception from truth.
Article
Full-text available
We examined the hypothesis that reliable verbal indicators of deception exist in the interrogation context. Participants were recruited for a study addressing security effectiveness and either committed a theft to test the effectiveness of a new security guard or carried out a similar but innocuous task. They then provided either (1) a truthful alibi, (2) a partially deceptive account, (3) a completely false alibi, or (4) a truthful confession regarding the theft to an interrogator hired for the purpose of investigating thefts with a monetary incentive for convincing the interrogator of their truthfulness. Results indicated that only 3 out of the 18 (16.7%) clues tested significantly differentiated the truthful and deceptive accounts. All 3 clues were derived from the Statement Validity Analysis (SVA) technique (amount of detail reported, coherence, and admissions of lack of memory). Implications for credibility assessment in forensic interrogations are discussed.
Chapter
Full-text available
This chapter presents a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. Our contribution is in a proposed categorization model and analytical framework for certainty identification. Certainty is presented as a type of subjective information available in texts. Statements with explicit certainty markers were identified and categorized according to four hypothesized dimensions — level, perspective, focus, and time of certainty. The preliminary results reveal an overall promising picture of the presence of certainty information in texts, and establish its susceptibility to manual identification within the proposed four-dimensional certainty categorization analytical framework. Our findings are that the editorial sample group had a significantly higher frequency of markers per sentence than did the sample group of the news stories. For editorials, high level of certainty, writer’s point of view, and future and present time were the most populated categories. For news stories, the most common categories were high and moderate levels, directly involved third party’s point of view, and past time. These patterns have positive practical implications for automation.
Conference Paper
Full-text available
Credibility is a perceived quality and is evaluated with at least two major components: trustworthiness and expertise. Weblogs (or blogs) are a potentially fruitful genre for exploration of credibility assessment due to public disclosure of information that might reveal trustworthiness and expertise by webloggers (or bloggers) and availability of audience evaluations. The objectives of the planned exploratory study are to compile a list of factors that users take into account in credibility assessment of blog sites, order them in terms of users' perceived importance, and determine which factors can be recognized and evaluated with Natural Language Processing (NLP) techniques. With partial automation in mind, we propose an analytical framework for blog credibility assessment based on four profile factors: 1) the blogger's expertise and the amount of offline identity disclosure, 2) the blogger's trustworthiness (or the overtly stated value system including beliefs, goals, and values), 3) information quality, and 4) appeals of a personal nature. We describe a multi-stage study that combines a qualitative study of credibility judgments of blog-readers with NLP-based analysis of blogs. The study will elicit and test credibility assessment factors (Phase I), perform NLP-based blog profiling (Phase II), and content- analyze blog-readers' comments for partial profile matching (Phase III).
Conference Paper
Full-text available
Our goal is to use natural language proc- essing to identify deceptive and non- deceptive passages in transcribed narra- tives. We begin by motivating an analy- sis of language-based deception that relies on specific linguistic indicators to discover deceptive statements. The indi- cator tags are assigned to a document us- ing a mix of automated and manual methods. Once the tags are assigned, an interpreter automatically discriminates between deceptive and truthful state- ments based on tag densities. The texts used in our study come entirely from "real world" sources—criminal state- ments, police interrogations and legal tes- timony. The corpus was hand-tagged for the truth value of all propositions that could be externally verified as true or false. Classification and Regression Tree techniques suggest that the approach is feasible, with the model able to identify 74.9% of the T/F propositions correctly. Implementation of an automatic tagger with a large subset of tags performed well on test data, producing an average score of 68.6% recall and 85.3% preci-
Conference Paper
Full-text available
We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally. On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to "trending" topics, and classify them as credible or not credible, based on features extracted from them. We use features from users' posting and re-posting ("re-tweeting") behavior, from the text of the posts, and from citations to external sources. We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
Book
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. Opinion Mining and Sentiment Analysis covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. The focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. The survey includes an enumeration of the various applications, a look at general challenges and discusses categorization, extraction and summarization. Finally, it moves beyond just the technical issues, devoting significant attention to the broader implications that the development of opinion-oriented information-access services have: questions of privacy, vulnerability to manipulation, and whether or not reviews can have measurable economic impact. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided. Opinion Mining and Sentiment Analysis is the first such comprehensive survey of this vibrant and important research area and will be of interest to anyone with an interest in opinion-oriented information-seeking systems.
Article
With the rapid growth of social media, sentiment analysis, also called opinion mining, has become one of the most active research areas in natural language processing. Its application is also widespread, from business services to political campaigns. This article gives an introduction to this important area and presents some recent developments.
Article
This paper studies the problem of automatic detection of false rumors on Sina Weibo, the popular Chinese microblogging social network. Traditional feature-based approaches extract features from the false rumor message, its author, as well as the statistics of its responses to form a flat feature vector. This ignores the propagation structure of the messages and has not achieved very good results. We propose a graph-kernel based hybrid SVM classifier which captures the high-order propagation patterns in addition to semantic features such as topics and sentiments. The new model achieves a classification accuracy of 91.3% on randomly selected Weibo dataset, significantly higher than state-of-the-art approaches. Moreover, our approach can be applied at the early stage of rumor propagation and is 88% confident in detecting an average false rumor just 24 hours after the initial broadcast.
Article
The spread of malicious or accidental misinformation in social media, especially in time-sensitive situations such as real-world emergencies can have harmful effects on individuals and society. This thesis develops models for automated detection and verification of rumors (unverified information) that propagate through Twitter. Detection of rumors about an event is achieved through classifying and clustering assertions made about that event. Assertions are classified through a speech-act classifier for Twitter developed for this thesis. The classifier utilizes a combination of semantic and syntactic features to identify assertions with 91% accuracy. To predict the veracity of rumors, we identify salient features of rumors by examining three aspects of information spread: linguistic style used to express rumors, characteristics of people involved in propagating information, and network propagation dynamics. The predicted veracity of a time series of these features extracted from a rumor (a collection of tweets) is generated using Hidden Markov Models. The verification algorithm was tested on 209 rumors representing 938,806 tweets collected from real-world events including the 2013 Boston Marathon bombings, the 2014 Ferguson unrest and the 2014 Ebola epidemic, and many other rumors reported on popular websites that document public rumors. The algorithm is able to predict the veracity of rumors with an accuracy of 75%. The ability to track rumors and predict their outcomes may have practical applications for news consumers, financial markets, journalists, and emergency services, and more generally to help minimize the impact of false information on Twitter.
Article
This entry defines the concepts of information credibility and cognitive authority, introduces the key terms and dimensions of each, and discusses major theoretical frameworks tested and proposed in library and information science (LIS) and related fields. It also lays out the fundamental notions of credibility and cognitive authority in historical contexts to trace the evolution of the understanding and enhancement of the two concepts. This entry contends that the assessment of information credibility and cognitive authority is a ubiquitous human activity given that people constantly make decisions and selections based on values of information in a variety of information seeking and use contexts. It further contends that information credibility and cognitive authority assessment can be seen as an ongoing and iterative process rather than a discrete information evaluation event. The judgments made in assessment processes are highly subjective given their dependence on individuals??? accumulated beliefs, existing knowledge, and prior experiences. The conclusion of this entry suggests the need for more research by emphasizing the contributions that credibility and cognitive authority research can make to the field of LIS.
Article
Traditionally, most research in NLP has focused on propositional aspects of meaning. To truly understand language, however, extra-propositional aspects are equally important. Modality and negation typically contribute significantly to these extra-propositional meaning aspects. Although modality and negation have often been neglected by mainstream computational linguistics, interest has grown in recent years, as evidenced by several annotation projects dedicated to these phenomena. Researchers have started to work on modeling factuality, belief and certainty, detecting speculative sentences and hedging, identifying contradictions, and determining the scope of expressions of modality and negation. In this article, we will provide an overview of how modality and negation have been modeled in computational linguistics.
Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader LIS community. Computational tools capable of alerting users to potentially deceptive content in computer-mediated messages are invaluable for supporting undisrupted, computer-mediated communication, information seeking, credibility assessment and decision making. The goal of this ongoing research is to inform creation of such automated capabilities. In this study we elicit a sample of 90 computer-mediated personal stories with varying levels of deception. Each story has 10 associated human judgments, confidence scores, and explanations. In total, 990 unique respondents participated in the study. Three analytical approaches are applied: human judgment accuracy, linguistic cue detection, and machine learning. Comparable to previous research results, human judges achieve 50–63% success rates. Actual deception levels negatively correlate with their confident judgments as being deceptive (r= −0.35, df=88, p=0.008). The best-performing machine learning algorithms reach 65% accuracy. Linguistic cues are extracted, calculated, and modeled with logistic regression, but are found not to be significant predictors of deception level or confidence score. We address the associated challenges with error analysis of the respondents' stories, and prose a faceted deception classification (theme, centrality, realism, essence, distancing) as well as a typology for stated perceived cues for deception detection (world knowledge, logical contradiction, linguistic evidence, and intuitive sense).
This paper extends information quality (IQ) assessment methodology by arguing that veracity/deception should be one of the components of intrinsic IQ dimensions. Since veracity/deception differs contextually from accuracy and other well-studied components of intrinsic IQ, the inclusion of veracity/deception in the set of IQ dimensions has its own contribution to the assessment and improvement of IQ. Recently developed software to detect deception in textual information represents the ready-to-use IQ assessment (IQA) instruments. The focus of the paper is on the specific IQ problem related to deceptive messages and affected information activities as well as IQA instruments (or tools) of detecting deception to improve IQ. In particular, the methodology of automated deception detection in written communication provides the basis for measuring veracity/deception dimension and demonstrates no overlap with other intrinsic IQ dimensions. Considering several known deception types (such as falsification, concealment and equivocation), we emphasize that the IQA deception tools are primarily suitable for falsification. Certain types of deception strategies cannot be spotted automatically with the existing IQA instruments based on underlying linguistic differences between truth-tellers and liars. We propose the potential avenues for the future development of the automated instruments to detect deception taking into account the theoretical, methodological and practical aspects and needs. Blending multidisciplinary research on Deception Detection with the one on IQ in Library and Information Science (LIS) and Management Information Systems (MIS), the paper contributes to IQA and its improvement by adding one more dimension, veracity/deception, to intrinsic IQ.
Article
Deception researchers have attempted to improve people’s ability to detect deceit by teaching them which cues to pay attention to. Such training only yields limited success because, we argue, the nonverbal and verbal cues that liars spontaneously display are faint and unreliable. In recent years, the emphasis has radically changed and the current focus is on developing interview techniques that elicit and enhance cues to deception. We give an overview of this innovative research. We also consider to what extent current deception research can be used to fight terrorism. We argue that researchers should pay particular attention to settings that are neglected so far but relevant for terrorism, such as (i) lying about intentions, (ii) examining people when they are secretly observed and (iii) interviewing suspects together. We will commence this paper with general information that puts our reasoning into context. That is, we turn briefly to physiological and neurological lie detection methods that are often discussed in the media, then to the theoretical underpinnings of nonverbal and verbal cues to deceit, and the research methods typically used in nonverbal and verbal lie detection research.
Article
This study examined whether information about a writer and hyperlinks on a citizen journalism Web site affected the perceived credibility of stories. Participants read stories from a popular citizen journalism Web site and rated the stories in terms of perceived credibility. Results show that hyperlinks and information about the writer do enhance perceived story credibility. Credibility is enhanced most greatly when both hyperlink and writer information are included and, to a lesser extent, when just hyperlink or writer information is present.
Article
The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift the premise and study the problem of information credibility on Sina Weibo, China's leading micro-blogging service provider. With eight times more users than Twitter, Sina Weibo is more of a Facebook-Twitter hybrid than a pure Twitter clone, and exhibits several important characteristics that distinguish it from Twitter. We collect an extensive set of microblogs which have been confirmed to be false rumors based on information from the official rumor-busting service provided by Sina Weibo. Unlike previous studies on Twitter where the labeling of rumors is done manually by the participants of the experiments, the official nature of this service ensures the high quality of the dataset. We then examine an extensive set of features that can be extracted from the microblogs, and train a classifier to automatically detect the rumors from a mixed set of true information and false information. The experiments show that some of the new features we propose are indeed effective in the classification, and even the features considered in previous studies have different implications with Sina Weibo than with Twitter. To the best of our knowledge, this is the first study on rumor analysis and detection on Sina Weibo.
Article
Discusses the role of expert psychological witnesses in assessing the truthfulness of testimony by children or juveniles in Germany regarding sexual abuse. An overview of the general diagnostic procedure in these cases is presented, focusing on credibility assessment; and statement analysis, a major diagnostic tool in determining credibility, is described. The empirical basis and scientific status of statement analysis are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
We estimate classification models of deceptive discussions during quarterly earnings conference calls. Using data on subsequent financial restatements (and a set of criteria to identify especially serious accounting problems), we label each call as “truthful” or “deceptive”. Our models are developed with the word categories that have been shown by previous psychological and linguistic research to be related to deception. Using conservative statistical tests, we find that the out-of-sample performance of the models that are based on CEO or CFO narratives is significantly better than random by 6%-16% and statistically dominates or is equivalent to models based on financial and accounting variables. We find that the answers of deceptive executives have more references to general knowledge, fewer non-extreme positive emotions, and fewer references to shareholder value. In addition, deceptive CEOs use significantly more extreme positive emotion and fewer anxiety words.
Article
This article investigates whether deceptions in online dating profiles correlate with changes in the way daters write about themselves in the free-text portion of the profile, and whether these changes are detectable by both computerized linguistic analyses and human judges. Computerized analyses (Study 1) found that deceptions manifested themselves through linguistic cues pertaining to (a) liars' emotions and cognitions and (b) liars' strategic efforts to manage their self-presentations. Technological affordances (i.e., asynchronicity and editability) affected the production of cognitive cues more than that of emotional cues. Human judges (Study 2) relied on different and nonpredictive linguistic cues to assess daters' trustworthiness. The findings inform theories concerned with deception, media, and self-presentation, and also expound on how writing style influences perceived trustworthiness.
Article
In the first part of this article, I briefly review research findings that show that professional lie catchers, such as police officers, are generally rather poor at distinguishing between truths and lies. I believe that there are many reasons contributing towards this poor ability, and give an overview of these reasons in the second part of this article. I also believe that professionals could become better lie detectors and explain how in the final part of this article.
Article
This article introduces a type of uncertainty that resides in textual information and requires epistemic interpretation on the information seeker’s part. Epistemic modality, as defined in linguistics and natural language processing, is a writer’s estimation of the validity of propositional content in texts. It is an evaluation of chances that a certain hypothetical state of affairs is true, e.g., definitely true or possibly true. This research shifts attention from the uncertainty–certainty dichotomy to a gradient epistemic continuum of absolute, high, moderate, low certainty, and uncertainty. An analysis of a New York Times dataset showed that epistemically modalized statements are pervasive in news discourse and they occur at a significantly higher rate in editorials than in news reports. Four independent annotators were able to recognize a gradation on the continuum but individual perceptions of the boundaries between levels were highly subjective. Stricter annotation instructions and longer coder training improved intercoder agreement results. This paper offers an interdisciplinary bridge between research in linguistics, natural language processing, and information seeking with potential benefits to design and implementation of information systems for situations where large amounts of textual information are screened manually on a regular basis, for instance, by professional intelligence or business analysts.
Article
Deception detection is an essential skill in careers such as law enforcement and must be accomplished accurately. However, humans are not very competent at determining veracity without aid. This study examined automated text-based deception detection which attempts to overcome the shortcomings of previous credibility assessment methods. A real-world, high-stakes sample of statements was collected and analyzed. Several different sets of linguistic-based cues were used as inputs for classification models. Overall accuracy rates of up to 74% were achieved, suggesting that automated deception detection systems can be an invaluable tool for those who must assess the credibility of text.
Article
Trust is an integral part of the Semantic Web architecture. Most prior work on trusts focuses on entity-centered issues such as authentication and reputation and does not take into account the content, i.e., the nature and use of the information being exchanged. This paper defines content trust and discusses it in the context of other trust measures that have been previously studied. We introduce several factors that users consider in deciding whether to trust the content provided by a Web resource. Our goal is to discern which of these factors could be captured in practice with minimal user interaction in order to maximize the quality of the system's trust estimates. We present results on a study to determine which factors were more important to capture, and describe a simulation environment that we have designed to study alternative models of content trust.
Conference Paper
Given the importance of credibility in computing products, the research on computer credibility is relatively small. To enhance knowledge about computers and credibility, we define key terms relating to computer credibility, synthesize the literature in this domain, and propose three new conceptual frameworks for better understanding the elements of computer credibility. To promote further research, we then offer two perspectives on what computer users evaluate when assessing credibility. We conclude by presenting a set of credibility-related terms that can serve in future research and evaluation endeavors.
Conference Paper
A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority. In this paper, we address the problem of rumor detection in microblogs and explore the effectiveness of 3 categories of features: content-based, network-based, and microblog-specific memes for correctly identifying rumors. Moreover, we show how these features are also effective in identifying disinformers, users who endorse a rumor and further help it to spread. We perform our experiments on more than 10,000 manually annotated tweets collected from Twitter and show how our retrieval model achieves more than 0.95 in Mean Average Precision (MAP). Finally, we believe that our dataset is the first large-scale dataset on rumor detection. It can open new dimensions in analyzing online misinformation and other aspects of microblog conversations. 1