Table 3 - uploaded by James W. Pennebaker
Content may be subject to copyright.
LIWC2001 Means Across 43 Studies

LIWC2001 Means Across 43 Studies

Contexts in source publication

Context 1
... can be seen in Table 3, the LIWC2001 version captures, on average, 80 percent of the words people used in writing and speech. Note that except for total word count and words per sentence, all means in Table 3 are expressed as percentage of total word use in any given speech/text sample. ...
Context 2
... can be seen in Table 3, the LIWC2001 version captures, on average, 80 percent of the words people used in writing and speech. Note that except for total word count and words per sentence, all means in Table 3 are expressed as percentage of total word use in any given speech/text sample. Across all of the studies, for example, 15.2 percent of words used were pronouns, 5.8 percent articles, and 4.0 percent were emotional words. ...

Citations

... Cognitive attributes represent human cognitive processing, such as causal and insightful thinking and is essential to understanding. We used a validated psycholinguistic lexicon, Linguistic Inquiry and Word Count (LIWC) 32 to measure cognitive attributes in writing, a tool that has been extensively utilized in empirical studies on misinformation and detection research [33][34][35] . Lastly, to measure the accessibility of language, we used Automated Readability Index (ARI) 36 and Flesch-Kincaid grade level 37 that indicate the grade level required to understand the text, with a lower score meaning the text is easier to read. ...
Preprint
Full-text available
    With the wide adoption of large language models (LLMs) in information assistance, it is essential to examine their alignment with human communication styles and values. We situate this study within the context of fact-checking health information, given the critical challenge of rectifying conceptions and building trust. Recent studies have explored the potential of LLM for health communication, but style differences between LLMs and human experts and associated reader perceptions remain under-explored. In this light, our study evaluates the communication styles of LLMs, focusing on how their explanations differ from those of humans in three core components of health communication: information, sender, and receiver. We compiled a dataset of 1498 health misinformation explanations from authoritative fact-checking organizations and generated LLM responses to inaccurate health information. Drawing from health communication theory, we evaluate communication styles across three key dimensions of information linguistic features, sender persuasive strategies, and receiver value alignments. We further assessed human perceptions through a blinded evaluation with 99 participants. Our findings reveal that LLM-generated articles showed significantly lower scores in persuasive strategies, certainty expressions, and alignment with social values and moral foundations. However, human evaluation demonstrated a strong preference for LLM content, with over 60% responses favoring LLM articles for clarity, completeness, and persuasiveness. Our results suggest that LLMs' structured approach to presenting information may be more effective at engaging readers despite scoring lower on traditional measures of quality in fact-checking and health communication.
    ... While developing proposed models, it was observed that the balanced data were causing overfitting, meaning the models were performing well on the training data but not generalizing well to new, unseen data (test). To address this issue, we introduced noise to the data, which is an important step in enhancing the model's generalization capabilities, unlike simpler methods like LIWC, which rely on linguistic categorization [32]. The noise was added during the data augmentation stage. ...
    ... These metrics provide comprehensive details about the performance of classification models, helping to assess their ability to correctly classify emotions and balance precision and recall [32]. ...
    Article
    Full-text available
    Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. Despite its importance in various fields like human–computer interaction and mental health diagnosis, accurately identifying emotions from speech can be challenging due to differences in speakers, accents, and background noise. The work proposes two innovative deep learning models to improve SER accuracy: a CNN-LSTM model and an Attention-Enhanced CNN-LSTM model. These models were tested on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), collected between 2015 and 2018, which comprises 1440 audio files of male and female actors expressing eight emotions. Both models achieved impressive accuracy rates of over 96% in classifying emotions into eight categories. By comparing the CNN-LSTM and Attention-Enhanced CNN-LSTM models, this study offers comparative insights into modeling techniques, contributes to the development of more effective emotion recognition systems, and offers practical implications for real-time applications in healthcare and customer service.
    ... This study examined the content and form of discourse in 10 audio recorded psychoanalytic treatments (10 of the 27 treatments used in this present study) during which the analysts treated patients who were seated versus on the couch. For each session, a segment 10 min in length, beginning 10 min after the start of a session was coded using Pennebaker and Francis's Linguistic Inquiry and Word Count program (Pennebaker et al., 2022) and Bucci's Computerized Referential Activity program (Zhou et al., 2021). The authors noted how strikingly similar couch and chair discourse were, contrary to what some psychoanalytic views would predict. ...
    ... Another research direction could be to extend DiNardo et al.'s (2005) linguistic analyses of the subsample of 10 of the 27 treatments. Future researchers could use the full data set of 27 treatments to compare the content and style of discourse in couch and chair sessions using Pennebaker and Francis' Linguistic Inquiry and Word Count program (Pennebaker et al., 2022) and Bucci's Computerized Referential Activity program (Zhou et al., 2021). ...
    Article
    Full-text available
    The use of the couch has been thought to be integral to the facilitation of a psychoanalytic process and an essential component of the therapeutic frame. Yet, the observed therapeutic differences between sessions conducted on the chair and on the couch have not been adequately examined. Thus, we aimed to compare differences in observed therapeutic process (patient behavior, analyst behavior, and their interaction in sessions) between sessions in which the patient lies on the couch and sessions in which the patient sits on a chair. We also examined if the patient’s position (couch vs. chair) predicts the extent to which the therapeutic process reflects an ideal psychoanalytic session. Based on the observer codings on the Psychotherapy Process Q Set (Jones, 2000) in a sample of 287 sessions from 27 psychoanalytic treatments, we compared the most and least characteristic aspects of the therapeutic process in the 197 sessions conducted on the couch and the 90 sessions conducted on the chair. We then compared the level of resemblance to the expert prototype of the ideal analytic process (Psychoanalytic Prototype of the Process Q Set; Ablon & Jones, 2005). The analytic process in chair and couch sessions largely overlapped. We did not detect significant differences in the analytic process between sessions conducted on the couch or chair nor differences in resemblance to the Psychoanalytic Prototype of the Process Q Set prototype. A more flexible approach to analytic work would be supported in the light of our findings. There may be a variety of ways in which either the couch or chair may facilitate analytic work.
    ... Sentiment tone is calculated as the average sentiment score of these news articles for the same year t [87]. The sentiment score is the emotional tone of a single article as calculated by LIWC, ranging from 0 to 100 (percentage) [88]. We calculated sentiment scores annually from 1998 to 2021. ...
    Article
    Full-text available
    Digital media news has emerged as a transformative force propelling businesses of K-pop firms onto the global arena. Yet, our understanding of how digital media exposure affects K-pop firm performance remains limited. Drawing on the magic bullet theory, this study explores and sheds light on the performance implications of digital media exposure of firms competing in the K-pop industry. Using an extensive panel dataset spanning twenty-four years, our investigation reveals that digital media exposure is positively associated with K-pop firm performance. Importantly, the nature of this impact is intricately tied to the specific narrative styles employed in the digital media discourse.
    ... LIWC can be used as a basis from which more robust machine or deep learning models can be built (Liang et al., 2023). The software works with dictionaries or bags of words validated to capture the percentage of words related to a construct in a text (Garzón-Velandia et al., 2020;Pennebaker et al., 2022). The LIWC has proven to be a reliable representation of the psychological constructs measured in dictionaries and has undergone several validation methodologies since its development Tausczik & Pennebaker, 2010). ...
    Article
    Three studies developed and validated a linguistic dictionary to measure negative affective polarization in English and Spanish political texts. It captures three dimensions: negative affect, delegitimization, and political context. In the first study, two independent judges evaluated the candidate words, and reliability indicators were calculated, showing acceptable values for short texts (.572 in English, .541 in Spanish) and higher values for larger corpora (.964 in English, .957 in Spanish). The second study tested discriminant validity by comparing negative affective polarization scores in social media comments on politics and entertainment. Results showed significantly higher polarization scores in political content, confirming the dictionary's validity. The third study compared the dictionary to an existing online polarization measure, finding greater coverage and alignment with the construct. Additionally, it was observed that polarization scores were higher in texts containing hate speech compared to those where it was absent. The findings suggest that the dictionary in both languages have strong psychometric properties, making it a valuable tool for analyzing online content, particularly social media comments. It can be used as an independent measure or as input for machine and deep learning models.
    ... We next sought to identify features of users' text that can be used for estimating users' personality. We used the Linguistic Inquiry and Word Count software (LIWC, Pennebaker, Francis, and Booth 2001) to calculate several of the features we used, such as the number of negative emotion words, positive emotion words, and social words. We also used emotion norms dictionaries to code affect (Warriner et al. 2013), and bigram and topic analyses (see Appendix A). ...
    Preprint
    Full-text available
    Online communities are an increasingly important stakeholder for firms, and despite the growing body of research on them, much remains to be learned about them and about the factors that determine their attributes and sustainability. Whereas most of the literature focuses on predictors such as community activity, network structure, and platform interface, there is little research about behavioral and psychological aspects of community members and leaders. In the present study we focus on the personality traits of community founders as predictors of community attributes and sustainability. We develop a tool to estimate community members' Big Five personality traits from their social media text and use it to estimate the traits of 35,164 founders in 8,625 Reddit communities. We find support for most of our predictions about the relationships between founder traits and community sustainability and attributes, including the level of engagement within the community, aspects of its social network structure, and whether the founders themselves remain active in it.
    ... Recent research has robustly demonstrated a significant correlation between personality traits and language use (Pennebaker and King 1999;Hirsh and Peterson 2009), underscoring that numerous human behaviours are intricately encoded in language (Tausczik and Pennebaker 2010). A notable advancement in this field is the development of the Linguistic Inquiry and Word Count (LIWC) method (Pennebaker, Francis, and Booth 2001). This methodological innovation enables the examination of the psychometric properties of language and facilitates the summarising of features from human texts, providing a sophisticated tool for linguistic analysis in psychological research. ...
    Article
    Full-text available
    Large language models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This article introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs’ language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the Artificial Intelligence (AI) rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilizing Principal Component Analysis and reliability validation methods, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Centered AI and Computational Linguistics, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.
    ... Their study showed that assessing personality based on language could be a reliable measure. Pennebaker et al. (2001) developed the Linguistic Inquiry and Word Count (LIWC) tool by analysing word frequencies. LIWC is widely used for text analysis and includes vocabulary related to style and personality. ...
    Article
    Full-text available
    Personality is the characteristics of a person represented by thoughts, feelings and behaviours in a certain way. Knowing the personality characteristics of an individual can help improve interpersonal relationships, regardless of their type. Virtual media of social interaction is a rich source of information where online users share and post comments, and express their feelings of likes or dislikes. This information reveals traits about the personality and behaviour of users. In this sense, it is possible to identify personality traits of the dark triad through computational models. In this area, research has found correlations between personality traits and users' online behaviour. In this study, we propose a computational model that uses Neural Network Architectures and Transformer models to identify narcissistic personality traits in Spanish‐language text based on the Narcissistic Personality Inventory (NPI) test. Specifically, we leverage the ability of the pre‐trained Transformers models BERT, RoBERTa and DistilBERT, to capture the semantic context and structural features of text using sentence‐level embeddings. These attributes make them suitable for multi‐class classification tasks, such as identifying personality traits from reviews. Furthermore, the model utilises the algorithms Glove, FastText, and Word2Vec to generate embedding, which are used to represent vectors of semantic and syntactic features of words in narcissistic expressions. The semantic information is then used by several neural network architectures—namely SimpleRNN, LSTM, GRU, BiLSTM, CNN + BiLSTM, and CNN + GRU—to construct a multi‐class model for automatically identifying narcissistic personality traits. The model's performance is assessed using a Twitter dataset that has been annotated by psychology experts and increased using augmentation techniques such as Back Translation, Paraphrasing, and substituting words with their synonyms. Ultimately, the results indicate that BERT and RoBERTa Transformers yield better accuracy and precision compared to Neural Network Architectures.
    ... For example, people who score high on extraversion generally use more social words, show more positive emotions, and tend to write more words but fewer large words [6,15]. Linguistic Inquiry and Word Count (LIWC) [16] has been one of the most widely used tools for analyzing word use. Emotional dictionaries are also frequently mentioned [17,18], as emotional experience has proven to be a key factor in personality analysis [19]. ...
    ... Mairesse Features comprise LIWC [16] features, MRC [47] features, and prosodic and utterance-type features. The LIWC dictionary annotates a word or a prefix with multiple categories involving parts of speech, emotions, society, and the environment, while the MRC psycholinguistic database provides unique annotations based on syntactic and psychological information. ...
    Article
    Detecting personalities in social media content is an important application of personality psychology. Most early studies apply a coherent piece of writing to personality detection, but today, the challenge is to identify dominant personality traits from a series of short, noisy social media posts. To this end, recent studies have attempted to individually encode the deep semantics of posts, often using attention-based methods, and then relate them, or directly assemble them into graph structures. However, due to the inherently disjointed and noisy nature of social media content, constructing meaningful connections remains challenging. While such methods rely on well-defined relationships between posts, effectively capturing these connections in fragmented and sparse content is non-trivial, particularly under limited supervision or noisy input. To tackle this, we draw inspiration from the scanning reading technique—commonly recommended for efficiently processing large volumes of information—and propose an index attention mechanism as a solution. This mechanism leverages prior psycholinguistic knowledge as an “index” to guide attention, thereby enabling more effective information fusion across scattered semantic signals. Building on this idea, we introduce the Index Attention Network (IAN)—a novel framework designed to infer personality labels by performing targeted information fusion over deep semantic representations of individual posts. Through a series of experiments, IAN achieved state-of-the-art performance on the Kaggle dataset and performance comparable to graph convolutional networks (GCN) on the Pandora dataset. Notably, IAN delivered an average improvement of 13% in terms of macro-F1 scores with the Kaggle dataset. The code for IAN is available at GitHub: https://github.com/Once2gain/IAN.
    ... To assess the accuracy of the tool, the authors built a dataset which consists of a mixed textual corpus comprising more than 2 million words ranging over 4500 documents. They further compute Pearson correlations between the results produced by Empath with the ones obtained with Linguistic Inquiry and Word Count (LIWC), which has been extensively validated in the literature 53 We proceed to label the posts and comments retrieved from each subreddit by adopting the following protocol: first, we check if Empath recognizes in the text any of the emotions in F ; if not, the text is discarded; otherwise, we consider as label(s) the union of the emotions detected by both Empath and Google-T5, eventually considering the intersection of the result with F . This experimental choice is justified by the fact that Google-T5 always produces in output one of the emotions above-mentioned, thus leading to potential false positives (in other words, a text could express none of the emotions in F , but one would be produced anyway). ...
    Article
    Full-text available
    In the present online social landscape, while misogyny is a well-established issue, misandry remains significantly underexplored. In an effort to rectify this discrepancy and better understand the phenomenon of gendered hate speech, we analyze four openly declared misogynistic and misandric Reddit communities, examining their characteristics at a linguistic, emotional, and structural level. We investigate whether it is possible to devise substantial and systematic discrepancies among misogynistic and misandric groups when heterogeneous factors are taken into account. Our experimental evaluation shows that no systematic differences can be observed when a double perspective, both male-to-female and female-to-male, is adopted, thus suggesting that gendered hate speech is not exacerbated by the perpetrators’ gender, indeed being a common factor of noxious communities.