Table 2 - uploaded by James W. Pennebaker
Content may be subject to copyright.
Context in source publication
Context 1
... purposes of comparison, four classes of text from 43 separate studies were analyzed and compared. As can be seen in Table 2, these analyses reflect the utterances of at least 1,695 writers or speakers totaling over 1.6 million words. Twenty of the samples are based on individuals from all walks of life -ranging from college students to psychiatric prisoners to elderly and even elementary-aged individuals -who were asked to write about deeply emotional topics. ...
Similar publications
L'autore indaga sui rapporti esistenti fra la documentazione archeologica e quella linguistica riguardanti i concetti e le rappresentazioni della ruota e del sole e i loro valori simbolici.
Key words: ruota, sole, cerchio, solstizio invernale, Natale
Citations
... Of the more recent attempts, Schultheiss (2013a) scored PSE stories based on the closed dictionary Linguistic Inquiry and Word Count (LIWC; Pennebaker et al., 2001), which are psychologyderived dictionaries that have been used in several research areas (Boyd et al., 2022); previous researchers, including Hogenraad (2005) and Pennebaker and King (1999), used a similar approach. The dictionaries range from "first person plural pronouns" to "swear words" and "positive emotions." ...
Implicit motives, nonconscious needs that influence individuals’ behaviors and shape their emotions, have been part of personality research for nearly a century but differ from personality traits. The implicit motive assessment is very resource-intensive, involving expert coding of individuals’ written stories about ambiguous pictures, and has hampered implicit motive research. Using large language models and machine learning techniques, we aimed to create high-quality implicit motive models that are easy for researchers to use. We trained models to code the need for power, achievement, and affiliation (N = 85,028 sentences). The person-level assessments converged strongly with the holdout data, intraclass correlation coefficient, ICC(1,1) = .85, .87, and .89 for achievement, power, and affiliation, respectively. We demonstrated causal validity by reproducing two classical experimental studies that aroused implicit motives. We let three coders recode sentences where our models and the original coders strongly disagreed. We found that the new coders agreed with our models in 85% of the cases (p < .001, ϕ = .69). Using topic and word embedding analyses, we found specific language associated with each motive to have a high face validity. We argue that these models can be used in addition to, or instead of, human coders. We provide a free, user-friendly framework in the established R-package text and a tutorial for researchers to apply the models to their data, as these models reduce the coding time by over 99% and require no cognitive effort for coding. We hope this coding automation will facilitate a historical implicit motive research renaissance.
... Nodes represent editing users and Wikipedia pages, while edges represent individual edit requests. Each edge is timestamped and includes Linguistic Inquiry and Word Count (LIWC) [17] feature vectors characterizing the textual content of the edit. • Reddit [11]: This dataset comprises one month of Reddit posting logs. ...
Aggregating temporal signals from historic interactions is a key step in future link prediction on dynamic graphs. However, incorporating long histories is resource-intensive. Hence, temporal graph neural networks (TGNNs) often rely on historical neighbors sampling heuristics such as uniform sampling or recent neighbors selection. These heuristics are static and fail to adapt to the underlying graph structure. We introduce FLASH, a learnable and graph-adaptive neighborhood selection mechanism that generalizes existing heuristics. FLASH integrates seamlessly into TGNNs and is trained end-to-end using a self-supervised ranking loss. We provide theoretical evidence that commonly used heuristics hinders TGNNs performance, motivating our design. Extensive experiments across multiple benchmarks demonstrate consistent and significant performance improvements for TGNNs equipped with FLASH.
... Liu et al. [13] presented a semi-supervised cognitive engagement classification method called B-LIWC-UDA. This method incorporated BERT [10] and LIWC [14] cognitive lexicon dual feature embeddings. ...
... The most widely used general purpose lexicons are General Inquirer [27], LIWC [28], the opinion lexicon of Hu and Liu (2004a) and the MPQA Subjectivity Lexicon [29]. However, the efficacy of general purpose lexicon for domain-specific task like financial and economics textual analysis is questionable. ...
This chapter is concerned with textual and sentiment analysis in agriculture commodities market using the natural language processing (NLP) methods. There are extensive research on textual and sentiment analysis in financial markets however, most of them are focusing on equity market and a minority on other commodities like energy commodities. Therefore, this chapter first reviews research works on textual and sentiment analysis in agriculture market in general. Then, presents textual analysis methods that can be carried out to study the effect of textual data and sentiment in agriculture market. Finally, it presents an example of implementing a topic modelling task and textual regression for forecasting realized volatility of corn returns. To the best of the author’s knowledge, there is no study focusing on textual regression in agriculture market. Additionally, the studies conducting textual sentiment analysis are very limited. In this spirit, this study tries to fill this gap by introducing both well established and new textual and sentiment analysis methods to the agricultural researchers community. The limited experiment carried out with these methods in the present research testifies the superiority of the text-based models in explaining future movements of corn’s volatility. More specifically, the results of one-month-ahead realized volatility regression indicates statistically significant superior performance of both direct textual regression and sentiment regression compared to traditional methods like HAR and ARIMA. In addition, as the most accurate method, textual regression’s accuracy stands higher above that of the sentiment regression model.
... In vocabulary, any sequence of words or sentences that represent social interaction or positive emotions can be denoted as extroverts, whereas when this sequence of words shows negative emotions, it can be recognized as neurotics. To understand the nuance of a language, there is a manually maintained lexicon method, known as Linguistic Inquiry and Word Count (LIWC) [8], that builds a relationship between words based on psychological classifications. This procedure also has some challenges compared to the advantages it offers. ...
... Text data is an immediate expression of people's thoughts, ideas, feelings, and emotions, making it a fantastic aid in the research area of personality prediction. Previous studies on personality detection primarily based on linguistic aspects, which include the Linguistic Inquiry and Word Count (LIWC) [8], helped researchers pick out linguistic markers related to personality traits. A prominent psychological Big Five model [2], which has been extensively utilized in computational personality prediction techniques, permits researchers to associate personality traits with textual data. ...
Personality prediction via different techniques is an established and trending topic in psychology. The advancement of machine learning algorithms in multiple fields also attracted the attention of Automatic Personality Prediction (APP). This research proposes a novel TraitBertGCN method with a data fusion technique for predicting personality traits. Initially, this work integrates a pre-trained language model, Bidirectional Encoder Representations from Transformers (BERT), with a three-layer Graph Convolutional Network (GCN) to leverage large-scale language understanding and graph-based learning for personality prediction. This study fuses the two datasets (essays and myPersonality) to overcome the bias and generalize the model across different domains. We fine-tuned our TraitBertGCN model on the fused dataset and then evaluated it on both datasets individually to assess its adaptability and accuracy in varied contexts. We compared the proposed model’s results with previous studies; our model achieved better performance in personality trait prediction across multiple datasets, with an average accuracy of 77.42% on the essays dataset and 87.59% on the myPersonality dataset.
... In addition, we will expand on the personality inference module by implementing procedures for existing methods, such as incorporating linguistic features from user-generated text via LIWC [33], as well as by including multi-modal data, such as images or videos. Finally, we plan to extend our evaluation to more platforms; in our sample of popular GitHub users, we also find a large number of users linking to their Mastodon, YouTube, or Telegram accounts, as well as personal blogs, which may offer a larger variety of platforms for a more extensive study. ...
What can we learn about online users by comparing their profiles across different platforms? We use the term profile to represent displayed personality traits, interests, and behavioral patterns (e.g., offensiveness). We also use the term {\it displayed personas} to refer to the personas that users manifest on a platform. Though individuals have a single real persona, it is not difficult to imagine that people can behave differently in different ``contexts'' as it happens in real life (e.g., behavior in office, bar, football game). The vast majority of previous studies have focused on profiling users on a single platform. Here, we propose VIKI, a systematic methodology for extracting and integrating the displayed personas of users across different social platforms. First, we extract multiple types of information, including displayed personality traits, interests, and offensiveness. Second, we evaluate, combine, and introduce methods to summarize and visualize cross-platform profiles. Finally, we evaluate VIKI on a dataset that spans three platforms -- GitHub, LinkedIn, and X. Our experiments show that displayed personas change significantly across platforms, with over 78% of users exhibiting a significant change. For instance, we find that neuroticism exhibits the largest absolute change. We also identify significant correlations between offensive behavior and displayed personality traits. Overall, we consider VIKI as an essential building block for systematic and nuanced profiling of users across platforms.
... Since these relevance judgment datasets greatly influence how neural rankers learn the concept of relevance, the authors focused on quantifying and analyzing gender biases in relevance judgments. The authors use a fine-tuned BERT model to label a large collection of queries within the MS MARCO dataset , which were then used to assess the associated documents for their psychological characteristics using the Linguistic Inquiry and Word Count (LIWC) toolkit (Pennebaker et al., 2001). Their findings showed that stereotypical biases are common in relevance judgment collections, particularly with regards to affective and cognitive processes, as well as personal concerns and drives. ...
... b) NFaiRR Metric ) evaluates fairness at the document level within ranked lists and across all queries, based on the concept of 'document neutrality', where a higher NFaiRR indicates a fairer ranking. c) Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al., 2001) is employed to determine the gender affiliation of text using the social referents category, specifically the male and female reference subcategories, as outlined in . ...
Recent studies have demonstrated that while neural ranking methods excel in retrieval effectiveness, they also tend to amplify stereotypical biases, especially those related to gender. Current mitigation strategies often focus on adjusting training methods, like adversarial techniques or data balancing, but typically overlook explicit consideration of gender as an attribute. In this paper, we introduce a systematic approach that treats gender as a distinct component within neural ranker representations. Our neural disentanglement method separates content semantics from gender information, enabling the neural ranker to evaluate document relevance based on content alone, without the interference of gender-related information during retrieval. Our extensive experiments demonstrate that: (1) our disentanglement approach matches the effectiveness of baseline models and offers more consistent performance across queries of different gender affiliations; (2) isolating gender within the representations allows the neural ranker to produce an unbiased list of documents, not favoring any specific gender; and (3) the disentangled gender component effectively and concisely captures gender information independently from the semantic content.
... VADER combines the lexicon, that is, dictionary-based analysis, and rule-based approach to characterize the sentiment. VADER uses gold-standard quality like Linguistic Inquiry and Word Count (LIWC) [22], which has been validated by humans. It distinguishes itself from other efficient tools, such as LIWC, in that it is more sensitive to sentiment expression in social media contexts. ...
Background
Loneliness is a global public health issue contributing to a variety of mental and physical health issues. It increases the risk of life-threatening conditions and contributes to the burden on the economy in terms of the number of productive days lost. Loneliness is a highly varied concept, which is associated with multiple factors.
Objective
This study aimed to understand loneliness through a comparative analysis of loneliness data on Twitter and Reddit, which are popular social media platforms. These platforms differ in terms of their use, as Twitter allows only short posts, while Reddit allows long posts in a forum setting.
Methods
We collected global data on loneliness in October 2022. Twitter posts containing the words “lonely,” “loneliness,” “alone,” “solitude,” and “isolation” were collected. Reddit posts were extracted in March 2023. Using natural language processing techniques (valence aware dictionary for sentiment reasoning [VADER] tool from the natural language toolkit [NLTK]), the study identified and extracted relevant keywords and phrases related to loneliness from user-generated content on both platforms. The study used both sentiment analysis and the number of occurrences of a topic. Quantitative analysis was performed to determine the number of occurrences of a topic in tweets and posts, and overall meaningful topics were reported under a category.
Results
The extracted data were subjected to comparative analysis to identify common themes and trends related to loneliness across Twitter and Reddit. A total of 100,000 collected tweets and 10,000 unique Reddit posts, including comments, were analyzed. The results of the study revealed the relationships of various social, political, and personal-emotional themes with the expression of loneliness on social media. Both platforms showed similar patterns in terms of themes and categories of discussion in conjunction with loneliness-related content. Both Reddit and Twitter addressed loneliness, but they differed in terms of focus. Reddit discussions were predominantly centered on personal-emotional themes, with a higher occurrence of these topics. Twitter, while still emphasizing personal-emotional themes, included a broader range of categories. Both platforms aligned with psychological linguistic features related to the self-expression of mental health issues. The key difference was in the range of topics, with Twitter having a wider variety of topics and Reddit having more focus on personal-emotional aspects.
Conclusions
Reddit posts provide detailed insights into data about the expression of loneliness, although at the cost of the diversity of themes and categories, which can be inferred from the data. These insights can guide future research using social media data to understand loneliness. The findings provide the basis for further comparative investigation of the expression of loneliness on different social media platforms and online platforms.
... The resulting dictionaries for each generic frame, consisting of unigram lexicons comprising keywords and respective collocates, served as frame indicators as summarized in Supplementary Information file Table A1. Lastly, we employed the Linguistic Inquiry and Word Count (LIWC) methodology (Pennebaker et al. 2001) to align tweet content with established frame lexicons. This process involved recording the count of words associated with each frame, calculating the proportional representation of each frame adjusted by tweet length, and determining the tweet's primary frame by identifying the one with the greatest number of matched words. ...
The discourse on climate change transcends scientific discussions and policy debates, often incorporating moral language and ethical considerations. This study explores framing strategies in political persuasion and the underlying moral foundations associated with climate change by combining computational methods and critical discourse analysis of tweets from 111th-117th U.S. Congresses members. The aim is to map out the bipartisan trends in the use of moral language and framing concerning climate change issues, thereby enriching the understanding of public opinion dynamics and the evolving partisan divide on climate action. Our findings reveal an intensifying partisan polarization in framing. Contrary to the expected moral divide, we uncover a bipartisan agreement on the moral foundation of care and fairness, and a consistent cross-party employment of moral language associated with frames over time. The interplay between generic frames and moral foundations suggests the potential for collective action on climate change across the political spectrum.
... Recent research has robustly demonstrated a significant correlation between personality traits and language use (Pennebaker and King 1999;Hirsh and Peterson 2009), underscoring that numerous human behaviours are intricately encoded in language (Tausczik and Pennebaker 2010). A notable advancement in this field is the development of the Linguistic Inquiry and Word Count (LIWC) method (Pennebaker, Francis, and Booth 2001). This methodological innovation enables the examination of the psychometric properties of language and facilitates the summarising of features from human texts, providing a sophisticated tool for linguistic analysis in psychological research. ...
Large language models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This article introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs’ language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the Artificial Intelligence (AI) rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilizing Principal Component Analysis and reliability validation methods, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Centered AI and Computational Linguistics, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.