Working PaperPDF Available

Detecting Fictitious Consumer Reviews: A Theory-Driven Approach Combining Automated Text Analysis and Experimental Design.

Authors:
A preview of the PDF is not available
... In addition, even if people are confident in detecting fake reviews, research shows that humans are only about 60%-80% accurate in labeling user reviews as fake or genuine (Plotkina, Munzel, & Pallud, 2020;Shukla et al., 2019). Compared to machine learning approaches, which can be up to 90% accurate, human judges are far less accurate in detecting fake reviews (Harris, 2012;Hovy, 2016;Kim, Kang, Shin, & Myaeng, 2021;Kronrod, Lee, & Gordeliy, 2017;Masip, Bethencourt, Lucas, Segundo, & Herrero, 2012;Ott, Choi, Cardie, & Hancock, 2011). ...
... The 15 papers we found during our literature search can be categorized into three groups. First we have the papers that directly address our research question by identifying or testing cues consumers use when determining the veracity of an online review (Ansari et al., 2018;Ansari & Gupta, 2019, 2021bDeAndrea et al., 2018;Kronrod et al., 2017;Peng et al., 2016;Román et al., 2019, pp. 141-166). ...
... We found seven papers that concentrated on the identification of fake review detection cues used by consumers (Ansari et al., 2018;Ansari & Gupta, 2019, 2021bDeAndrea et al., 2018;Kronrod et al., 2017;Peng et al., 2016;Román et al., 2019, pp. 141-166). ...
Article
Background: Consumers rely heavily on online user reviews when shopping online and cybercriminals produce fake reviews to manipulate consumer opinion. Much prior research focuses on the automated detection of these fake reviews, which are far from perfect. Therefore, consumers must be able to detect fake reviews on their own. In this study we survey the research examining how consumers detect fake reviews online. Methods: We conducted a systematic literature review over the research on fake review detection from the consumer-perspective. We included academic literature giving new empirical data. We provide a narrative synthesis comparing the theories, methods and outcomes used across studies to identify how consumers detect fake reviews online. Results: We found only 15 articles that met our inclusion criteria. We classify the most often used cues identified into five categories which were (1) review characteristics (2) textual characteristics (3) reviewer characteristics (4) seller characteristics and (5) characteristics of the platform where the review is displayed. Discussion: We find that theory is applied inconsistently across studies and that cues to deception are often identified in isolation without any unifying theoretical framework. Consequently, we discuss how such a theoretical framework could be developed.
... Hence, we proposed attribute salience and review valence were the theoretical mechanisms underlying the effects of room, food and call complaints on trustworthiness. The literature also suggests that content concreteness is another factor that may play a role in the evaluation of review trustworthiness (Kronrod, Lee, & Gordeliy, 2017;Sparks, Perkins, & Buckley, 2013); therefore, we also included this factor in Study 2 to provide a more complete account of the textual effects. We then conducted a 3 (attribute salience: low, medium, high) by 2 (review valence: positive vs. negative) by 2 (product category: hotel vs. restaurant) by 3 (message variation) mixed factorial experiment with content concreteness as a measured variable. ...
... They suggested that fraudulent reviews tend to be more extreme than authenticated reviews, regardless of review valence. Using an experimental design, Kronrod et al. (2017) instructed participants to write reviews for a hotel stay that they had or had not actually experienced. They found that fictitious reviews generally used fewer verbs in the past tense, fewer unique words, and more abstract language due to the lack of actual experiences and concrete memories. ...
... Hence, we proposed that attribute salience and review valence were the underlying mechanisms for the effects of room, food, and call complaint on trustworthiness, subject to further testing. In addition, content concreteness has been identified as a factor that plays a role in the evaluation of review trustworthiness (Kronrod et al., 2017;Sparks et al., 2013). Hotel reviews can be concrete and specific through detailed descriptions of a hotel's attribute(s) and the consumer's own experience; they can also be abstract and general in stating holistic evaluations of the hotel. ...
Article
Online consumer reviews are word-of-mouth exchanges on the Internet that can be harnessed for decision support. Combining computational and experimental methods, the current two-part research uncovered the effects of textual features on trustworthiness of consumer reviews on TripAdvisor. Taking a bottom-up approach, Study 1 employed text mining and human rating methods to explore the salient review topics that impact review trustworthiness. Study 2 took a top-down approach by examining the textual features that drive the effects of review topics identified in Study 1 and testing them across two product categories—hotel and restaurant—in an online experiment. The findings indicate that review trustworthiness has a moderating effect on review adoption in that highly trustworthy reviews are more likely to be adopted by consumers to aid in their judgement formation. This research also explicated the role of three textual features—namely, attribute salience, review valence, and content concreteness—in review trustworthiness.
... Third, Luca and Zervas (2016) pointed to the large dispersion in consumer ratings that is typical of fake reviews, that is, legitimate reviews show some consumer consensus and tend to be unimodal, whereas the distribution of fake reviews is bimodal, with greater frequencies at both extremes (e.g., 1-and 5-stars). Fourth, Kronrod, Lee, and Gordeliy (2017) found that authentic reviews used significantly more past tense verbs, unique words, and concrete, specific nouns than fictitious reviews. Lastly, truthful reviews encoded more "spatial" details, characterized by terms such as "bathroom" and "location" for hotels, compared to deceptive reviews that tend to discuss general concepts such as why or with whom the customer went to the hotel (Li, Cardie, & Li, 2013;Ott, Cardie, & Hancock, 2013;Ott, Choi, Cardie, & Hancock, 2011). ...
... Table 6 reflect the notion that fake reviews tend to be more present-and future-oriented than genuine reviews. In endeavoring to explain these findings, we can conjecture that when reviewers are supposed to evaluate purchases, such as hotel visits that have already happened in the past, they would presumably be more likely write in the past tense (Kronrod, Lee, & Gordeliy, 2017). However, because the composition of a fake review requires making something up, writers might often forget to write in the past tense as if the visit had occurred, and therefore inadvertently tend to write in the present and future tenses. ...
... Whereas Kronrod, Lee, and Gordeliy (2017) showed that authentic reviews used more past tense verbs, our research indicates that present and future language usage may well be signals of fake reviews. ...
Article
Full-text available
As the influence of online consumer reviews grows, deceptive reviews are a worsening problem, betraying consumers' trust in reviews by pretending to be authentic and informative. This research identifies factors that can separate deceptive reviews from genuine ones. First, we create a novel means of detection by contrasting authentic versus fake word patterns specific to a given domain (e.g., hotel services). We use a survey on a crowdsourcing platform to obtain both genuine and deceptive reviews of hotels. We learned the word patterns from each category to discriminate genuine reviews from fake ones for positively and negatively evaluated reviews, respectively. We show that our All Terms procedure outperforms current benchmark methods in computational linguistics and marketing. Our extended analysis reveals the factors that determine fake reviews (e.g., a lack of details, present- and future-time orientation, and emotional exaggeration) and the factors influencing people's willingness to write fake reviews (including social media trust, product quality consciousness, deal proneness, hedonic and utilitarian consumption, prosocial behavior, and individualism). We also use our procedure to analyze more than 250,000 real-world hotel reviews to detect fake reviews and identify the hotel and review characteristics influencing review fakery in the industry (e.g., star rating, franchise hotel, hotel size, room price, review timing, and review rating).
... To help consumers identify fake information, researchers have identified possible quantitative factors, including the frequency with which first-person pronouns, emotional words, and conjunctions are used (Anderson and Simester 2014;Berzack 2011;Newman et al. 2003), and have suggested protocols for consumers to follow when navigating the digital world. Despite this, consumers' success rate for detecting fictitious online information remains low, at around 49%-52%-not much better than guessing by chance (Kronrod, Lee, and Gordeliy 2017). ...
Article
Full-text available
Consumers often observe how other consumers interact with brands to inform their own brand judgments. This research demonstrates that brand relationship quality-indicating cues, such as brand nicknames (e.g., Mickey D’s for McDonald’s and Wally World for Walmart), enhance perceived information authenticity in online communication. An analysis of historical Twitter data followed by six experiments (using both real and fictitious brands across different online platforms, e.g., online reviews and social media posts) show that brand nickname use in user-generated content signals a writer’s relationship quality with the target brand from the reader’s perspective, which the authors term inferred brand attachment (IBA). The authors demonstrate that IBA boosts perceived information authenticity and leads to positive downstream consequences, such as purchase willingness and information sharing. The authors also find that this effect is attenuated when brand nicknames are used in firm-generated content. How consumers’ relationships with brands are portrayed and perceived in a social context (e.g., via brand nickname use) serves as a novel context to examine user-generated content and provides valuable managerial insight regarding how to leverage consumers’ brand attachment cues in brand strategy and online information management.
Article
Full-text available
Two experiments investigated how well bilinguals utilise long-standing semantic associations to encode and retrieve semantic clusters in verbal episodic memory. In Experiment 1, Spanish-English bilinguals (N = 128) studied and recalled word and picture sets. Word recall was equivalent in L1 and L2, picture recall was better in L1 than in L2, and the picture superiority effect was stronger in L1 than in L2. Semantic clustering in word and picture recall was equivalent in L1 and L2. In Experiment 2, Spanish-English bilinguals (N = 128) and English-speaking monolinguals (N = 128) studied and recalled word sequences that contained semantically related pairs. Data were analyzed using a multinomial processing tree approach, the pair-clustering model. Cluster formation was more likely for semantically organised than for randomly ordered word sequences. Probabilities of cluster formation, cluster retrieval, and retrieval of unclustered items did not differ across languages or language groups. Language proficiency has little if any impact on the utilisation of long-standing semantic associations, which are language-general.
Article
Full-text available
Autobiographical remembering can depend on two forms of memory: episodic (event) memory and autobiographical semantic memory (remembering personally relevant semantic knowledge, independent of recalling a specific experience). There is debate about the degree to which the neural signals that support episodic recollection relate to or build upon autobiographical semantic remembering. Pooling data from two fMRI studies of memory for real-world personal events, we investigated whether medial temporal lobe (MTL) and parietal subregions contribute to autobiographical episodic and semantic remembering. During scanning, participants made memory judgments about photograph sequences depicting past events from their life or from others' lives, and indicated whether memory was based on episodic or semantic knowledge. Results revealed several distinct functional patterns: activity in most MTL subregions was selectively associated with autobiographical episodic memory; the hippocampal tail, superior parietal lobule, and intraparietal sulcus were similarly engaged when memory was based on retrieval of an autobiographical episode or autobiographical semantic knowledge; and angular gyrus demonstrated a graded pattern, with activity declining from autobiographical recollection to autobiographical semantic remembering to correct rejections of novel events. Collectively, our data offer insights into MTL and parietal cortex functional organization, and elucidate circuitry that supports different forms of real-world autobiographical memory.
Article
Full-text available
The amount of digital text available for analysis by consumer researchers has risen dramatically. Consumer discussions on the internet, product reviews, and digital archives of news articles and press releases are just a few potential sources for insights about consumer attitudes, interaction, and culture. Drawing from linguistic theory and methods, this article presents an overview of automated text analysis, providing integration of linguistic theory with constructs commonly used in consumer research, guidance for choosing amongst methods, and advice for resolving sampling and statistical issues unique to text analysis. We argue that although automated text analysis cannot be used to study all phenomena, it is a useful tool for examining patterns in text that neither researchers nor consumers can detect unaided. Text analysis can be used to examine psychological and sociological constructs in consumerproduced digital text by enabling discovery or by providing ecological validity. © The Author 2017. Published by Oxford University Press on behalf of Journal of Consumer Research, Inc. All rights reserved.
Article
Full-text available
Text messaging is the most widely used form of computer- mediated communication (CMC). Previous findings have shown that linguistic factors can reliably indicate messages as deceptive. For example, users take longer and use more words to craft deceptive messages than they do truthful messages. Existing research has also examined how factors, such as student status and gender, affect rates of deception and word choice in deceptive messages. However, this research has been limited by small sample sizes and has returned contradicting findings. This paper aims to address these issues by using a dataset of text messages collected from a large and varied set of participants using an Android messaging application. The results of this paper show significant differences in word choice and frequency of deceptive messages between male and female participants, as well as between students and non-students.
Article
Full-text available
We introduce a novel measure of abstractness based on the amount of information of a concept computed from its position in a semantic taxonomy. We refer to this measure as precision. We propose two alternative ways to measure precision, one based on the path length from a concept to the root of the taxonomic tree, and another one based on the number of direct and indirect descendants. Since more information implies greater processing load, we hypothesize that nouns higher in precision will have a processing disadvantage in a lexical decision task. We contrast precision to concreteness, a common measure of abstractness based on the proportion of sensory-based information associated with a concept. Since concreteness facilitates cognitive processing, we predict that while both concreteness and precision are measures of abstractness, they will have opposite effects on performance. In two studies we found empirical support for our hypothesis. Precision and concreteness had opposite effects on latency and accuracy in a lexical decision task, and these opposite effects were observable while controlling for word length, word frequency, affective content and semantic diversity. Our results support the view that concepts organization includes amodal semantic structures which are independent of sensory information. They also suggest that we should distinguish between sensory-based and amount-of-information-based abstractness.
Article
A salient issue for online romantic relationships is the possibility of deception, but it is unclear how lies are communicated before daters meet. We collected mobile dating deceptions from the discovery phase, a conversation period after daters match on profiles but before a face-to-face interaction. Study 1 found that nearly two-thirds of lies were driven by impression management, particularly self-presentation and availability management goals. Study 2 found that approximately 7% of messages were deceptive, and content patterns were consistent with Study 1. Across studies, the participant’s lying rate was correlated with the perceived lying rate of the partner. We discuss the implications of these data in relation to impression management, deception theory, and online dating research.
Article
This article proposes a three-step methodological framework called computational grounded theory, which combines expert human knowledge and hermeneutic skills with the processing power and pattern recognition of computers, producing a more methodologically rigorous but interpretive approach to content analysis. The first, pattern detection step, involves inductive computational exploration of text, using techniques such as unsupervised machine learning and word scores to help researchers to see novel patterns in their data. The second, pattern refinement step, returns to an interpretive engagement with the data through qualitative deep reading or further exploration of the data. The third, pattern confirmation step, assesses the inductively identified patterns using further computational and natural language processing techniques. The result is an efficient, rigorous, and fully reproducible computational grounded theory. This framework can be applied to any qualitative text as data, including transcribed speeches, interviews, open-ended survey data, or ethnographic field notes, and can address many potential research questions.
Article
Payal Arora and her colleagues argue that Facebook has become a widely-used tool for finding romance in the global south, especially among marginalized youth. Yet this reliance on Facebook opens users up to the possibility of deception, forcing many to develop a dynamic online deception literacy. In this response paper, I unpack the notion of online deception literacy by reviewing the existing social scientific literature on this topic. I discuss (1) the prevalence of deception in online romance: (2) people’s ability to detect online deception; (3) the cues people use to detect online deception; and (4) the usefulness of those cues in accurately gauging deception. I highlight avenues for future research, especially those inspired by the experience of marginalized users in the global south.