Michal Kosinski's research while affiliated with Stanford University and other places

Publications (122)

Article
Objective We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. Method We applied a...
Article
Full-text available
Ubiquitous facial recognition technology can expose individuals’ political orientation, as faces of liberals and conservatives consistently differ. A facial recognition algorithm was applied to naturalistic images of 1,085,795 individuals to predict their political orientation by comparing their similarity to faces of liberal and conservative other...
Article
Full-text available
The widely disseminated convergence in physical appearance hypothesis posits that long-term partners’ facial appearance converges with time due to their shared environment, emotional mimicry, and synchronized activities. Although plausible, this hypothesis is incompatible with empirical findings pertaining to a wide range of other traits—such as pe...
Article
Full-text available
Powered by better hardware and software, and fueled by the emergence of computational social science, digital traces of human activity can be used to make highly personal inferences about their owner’s preferences, habits and psychological characteristics. The gained insights allow the application of psychological targeting and make it possible to...
Article
Psychological targeting describes the practice of extracting people's psychological profiles from their digital footprints (e.g. their Facebook Likes, Tweets or credit card records) in order to influence their attitudes, emotions or behaviors through psychologically informed interventions at scale. We discuss how the increasingly blurred lines betw...
Article
The parasite stress hypothesis predicts that individuals living in regions with higher infectious disease rates will show lower openness, agreeableness, and extraversion, but higher conscientiousness. This article, using data from more than 250,000 U.S. Facebook users, reports tests of these predictions at the level of both U.S. states and individu...
Data
Predictive performance on social media based tasks for factors with residualization of age and gender. We show mean Pearson’s R over 10 random train-test splits for FriendSize, Income and IQ while for Likes we show the mean area under the curve (AUC) over all 20 categories. Language based factors (FA) perform competitively and even outperform quest...
Data
Predictive performance on questionnaire based tasks for factors without residualization of age and gender for 10 and 30 factors. For comparison, with the questionnaire items, we calculate the 10 aspect scores and 30 facet based scores, using the relevant IPIP items. Demog indicates that age and gender were also added as co-variates to learn predict...
Data
Predictive performance as a function of vocabulary size. We show mean Pearson’s R over 10 random train-test splits for FriendSize, and IQ while for Likes we show the mean area under the curve (AUC) over all 20 categories. In particular, we learn factors by restricting the vocabulary size to the top K words and evaluate these learned factors on thei...
Data
Predictive performance on questionnaire based tasks for factors with residualization of age and gender. Demog indicates that age and gender were also added as co-variates to learn predictive models. We show mean Pearsons R over 10 random train-test splits. Language based factors (FA) do not outperform questionnaire based factors. (JPG)
Data
Predictive performance on social media tasks for factors without residualization of age and gender for 10 and 30 factors. For comparison, with the questionnaire items, we calculate the 10 aspect scores and 30 facet based scores, using the relevant IPIP items. Demog indicates that age and gender were also added as co-variates to learn predictive mod...
Data
Predictive performance on social media tasks for factors with residualization of age and gender for 10 and 30 factors. Demog indicates that age and gender were also added as co-variates to learn predictive models. We show mean Pearson’s R over 10 random train-test splits for FriendSize, Income and IQ while for Likes we show the mean area under the...
Data
Word clouds showing the most/least correlated words for each FA factor as obtained using differential language analysis with age and gender residualized. Residualizing out demographics like age and gender appears to reveal other dimensions of variance like (geography, ethnicity) as illustrated by F5 that reveals a factor highlighting language use o...
Data
Predictive performance on questionnaire based tasks for factors with residualization of age and gender for 10 and 30 factors. Demog indicates that age and gender were also added as co-variates to learn predictive models. We show mean Pearsons R over 10 random train-test splits. Language based factors (FA) perform do not outperform questionnaire bas...
Article
Full-text available
Background: Research suggests that humans have the tendency to increase the valence of events when these are imagined to happen in the future, but to decrease the valence when the same events are imagined to happen in the past. This line of research, however, has mostly been conducted by asking participants to value imagined, yet probable, events....
Article
Full-text available
Research over the past decade has shown that various personality traits are communicated through musical preferences. One limitation of that research is external validity, as most studies have assessed individual differences in musical preferences using self-reports of music-genre preferences. Are personality traits communicated through behavioral...
Article
Full-text available
This study empirically examines context collapse on Facebook by examining audience influences on content and language in self-disclosures. Context collapse is the process of disparate audiences being conjoined into one. Using a public longitudinal behavioral data set of 6,378 Facebook users, the study found that the size and heterogeneity of people...
Article
Elderly people are exposed to information technologies to keep them in touch with younger generations. Among various technologies, social network sites (SNSs) are seldom used by the majority of elderly people. To bridge the digital divide, it is necessary to dig deeply into the minority elderly users of SNSs. This study explores usage patterns of e...
Article
Full-text available
Research over the past decade has shown that various personality traits are communicated through musical preferences. One limitation of that research is external validity, as most studies have assessed individual differences in musical preferences using self-reports of music-genre preferences. Are personality traits communicated through behavioral...
Article
We show that faces contain much more information about sexual orientation than can be perceived or interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier could...
Article
For the past 40 years, the conventional univariate model of self-monitoring has reigned as the dominant interpretative paradigm in the literature. However, recent findings associated with an alternative bivariate model challenge the conventional paradigm. In this study, item response theory is used to develop measures of the bivariate model of acqu...
Data
Variable importance graph. The graph shows the top 50 important topics in the random forest model. (TIF)
Data
Topic and topic words for Facebook activities and the ratings from two raters. The list of words relating to activities used for activity sentiment analysis. (DOCX)
Article
Full-text available
Subjective well-being includes ‘affect’ and ‘satisfaction with life’ (SWL). This study proposes a unified approach to construct a profile of subjective well-being based on social media language in Facebook status updates. We apply sentiment analysis to generate users’ affect scores, and train a random forest model to predict SWL using affect scores...
Data
Investigaton of 30-day period immediately prior to SWL survey completion. Here we repeat the SWL pipeline for a limited time period (30 days) for each user. We also study the impact of two different thresholds for minimum number of status updates per user. (DOCX)
Data
Variable importance table, ranked in descending order. The variables are ranked in descending order according to the mean decrease in accuracy. (DOCX)
Article
A growing number of studies have linked facial width-to-height ratio (fWHR) with various antisocial or violent behavioral tendencies. However, those studies have predominantly been laboratory based and low powered. This work reexamined the links between fWHR and behavioral tendencies in a large sample of 137,163 participants. Behavioral tendencies...
Conference Paper
Full-text available
People spend considerable effort managing the impressions they give others. Social psychologists have shown that people manage these impressions differently depending upon their personality. Facebook and other social media provide a new forum for this fundamental process; hence, understanding people's behaviour on social media could provide interes...
Preprint
We show that faces contain much more information about sexual orientation than can be perceived and interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier coul...
Article
Full-text available
Religious affiliation is an important identifying characteristic for many individuals and relates to numerous life outcomes including health, well-being, policy positions, and cognitive style. Using methods from computational linguistics, we examined language from 12,815 Facebook users in the United States and United Kingdom who indicated their rel...
Article
Full-text available
People spend considerable effort managing the impressions they give others. Social psychologists have shown that people manage these impressions differently depending upon their personality. Facebook and other social media provide a new forum for this fundamental process; hence, understanding people's behaviour on social media could provide interes...
Preprint
As participant recruitment and data collection over the Internet have become more common, numerous observers have expressed concern regarding the validity of research conducted in this fashion. One growing method of conducting research over the Internet involves recruiting participants and administering questionnaires over Facebook, the world’s lar...
Article
Full-text available
Over the past century, personality theory and research has successfully identified core sets of characteristics that consistently describe and explain fundamental differences in the way people think, feel and behave. Such characteristics were derived through theory, dictionary analyses, and survey research using explicit self-reports. The availabil...
Conference Paper
Research has typically examined the link of activity patterns and affect among late middle-aged and older people, in the context of continuity and activity theory. The aim of this present research was to test continuity and activity theory among younger employed age (25-54), and late middle-age and older age (over 55years of age) in the online cont...
Article
Full-text available
This article aims to fill some gaps in theory and research on age trends in musical preferences in adulthood by presenting a conceptual model that describes three classes of determinants that can affect those trends. The Music Preferences in Adulthood Model (MPAM) posits that some psychological determinants that are extrinsic to the music (individu...
Preprint
Do others perceive the personality changes that take place between the ages of 14 and 29 in a similar fashion as the aging person him- or herself? This cross-sectional study analyzed age trajectories in self- versus other-reported Big Five personality traits and in self-other agreement in a sample of more than 10,000 individuals from the myPersonal...
Article
Full-text available
There are two conflicting perspectives regarding the relationship between profanity and dishonesty. These two forms of norm-violating behavior share common causes and are often considered to be positively related. On the other hand, however, profanity is often used to express one’s genuine feelings and could therefore be negatively related to disho...
Article
Friends and spouses tend to be similar in a broad range of characteristics, such as age, educational level, race, religion, attitudes, and general intelligence. Surprisingly, little evidence has been found for similarity in personality-one of the most fundamental psychological constructs. We argue that the lack of evidence for personality similarit...
Article
This article aims to introduce the reader to essential tools that can be used to obtain insights and build predictive models using large data sets. Recent user proliferation in the digital environment has led to the emergence of large samples containing a wealth of traces of human behaviors, communication, and social interactions. Such samples offe...
Article
Delay discounting has been linked to important behavioral, health, and social outcomes, including academic achievement, social functioning and substance use, but thoroughly measuring delay discounting is tedious and time consuming. We develop and consistently validate an efficient and psychometrically sound computer adaptive measure of discounting....
Article
Full-text available
Social networking sites are a part of everyday life for over a billion people worldwide. They show no sign of declining popularity, with social media use increasing at 3 times the rate of other Internet use. Despite this proliferation, mental healthcare has yet to embrace this unprecedented resource. We argue that social networking site data should...
Article
Over the past decade, online social media has had a tremendous impact on the way people engage in social activism. For instance, about 26M Facebook users expressed their support in upholding the cause of marriage equality by overlaying their profile pictures with rainbow-colored filters. Similarly, hundreds of thousands of users changed their profi...
Conference Paper
Full-text available
Humans’ episodic memory system allows the remembrance of past events and the creation of hypothetical (im)probable events. Most, if not all, studies examining how past and future events are valued have been conducted using narrowly defined conditions. For example focusing exclusively on positive or negative fictive events. In reality, however, the...
Chapter
In this chapter, we introduce and discuss some of the most important and widely used models of personality. Focusing on trait theories, we first give a brief overview of the history of personality research and assessment. We then move on to discuss some of the most prominent trait models of the nineteenth century—including Allport’s trait theory, C...
Article
Full-text available
A variety of approaches have been recently proposed to automatically infer users’ personality from their user generated content in social media. Approaches differ in terms of the machine learning algorithms and the feature sets used, type of utilized footprint, and the social media environment used to collect the data. In this paper, we perform a c...
Article
Full-text available
Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across g...
Article
Full-text available
Research suggests that musical preferences are linked to personality, but this research has been hindered by genre-based theories and methods. We address this limitation using a novel method based on the actual attributes that people perceive from music. In Study 1, using 102 musical pieces representing 26 genres and subgenres, we show that 38 perc...
Data
(A) Questionnaire Total Scores in Experiments 1 and 2. a For the purposes of comparison across studies using the EAT, responses on the 6-point scale are converted as follows: (1:0, 2:0, 3:0, 4:1, 5:2, 6:3) and totaled to produce the mean reported above (Everitt and Robbins, 2005). For analysis purposes (including correlations reported above), howev...
Data
Significant Predictors of Goal-Directed (Model-Based) Learning from Supervised Analysis Features that are significantly associated with model-based learning identified using elastic net regularization with tenfold cross-validation, observed in >95% of 100 iterations tested. Index refers to the item number from the questionnaire of origin. Beta refe...
Data
(A) ‘Top loading items on ‘Anxious-Depression’ factor Summary of item loadings onto ‘Anxious-Depression’ factor. The top loading items from each questionnaire are displayed in descending order, provided they are above a threshold loading of +/- 0.25. Words in parentheses, e.g. “(do not)” are added here (but were not presented to participants) to fa...
Article
Prominent theories suggest that compulsive behaviors, characteristic of obsessive-compulsive disorder and addiction, are driven by shared deficits in goal-directed control, which confers vulnerability for developing rigid habits. However, recent studies have shown that deficient goal-directed control accompanies several disorders, including those w...
Data
The relationship between model-based learning and Factor 2 is broadly consistent across ‘putative patients’ (top 25%) and subjects scoring in the normal range (bottom 75%). Plotted here are regression lines indicating the strength of the relationship between model-based deficits and Factor 2 (‘Compulsive Behavior and Intrusive Thought’) in putative...
Data
(A) Association between questionnaire total scores and model-based learning defined using full computational model. *p<0.05 ** p<0.01 ***p<0.001. Each row reflects the results from an independent analysis where each questionnaire total score (z-transformed) was entered as SymptomScorez in the following model: lm(Model-Based-Learning ~ IQz + Agez +...
Article
Full-text available
Structured AbstractObjective Prior attempts at locating self-monitoring within general taxonomies of personality traits have largely proved unsuccessful. However, past research has typically neglected 1) the bi-dimensionality of the Self-Monitoring Scale, and 2) the hierarchical nature of personality. The objective of this study was to test hypothe...
Article
Full-text available
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar nationa...
Article
Objective: Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Method: Judges rated the te...
Conference Paper
We explore the factors that determine whether individuals are likely to experience intrinsic motivation in end-user programming (EUP). We report two experiments: one that tests whether there are reliable psychometric constructs that describe different aspects of intrinsic motivation, and one that tests whether these constructs are successful in pre...
Article
Full-text available
Facebook is rapidly gaining recognition as a powerful research tool for the social sciences. It constitutes a large and diverse pool of participants, who can be selectively recruited for both online and offline studies. Additionally, it facilitates data collection by storing detailed records of its users' demographic profiles, social interactions,...
Article
Facebook is rapidly gaining recognition as a powerful research tool for the social sciences. It constitutes a large and diverse pool of participants, who can be selectively recruited for both online and offline studies. Additionally, it facilitates data collection by storing detailed records of its users’ demographic profiles, social interactions,...
Article
Full-text available
Nowadays, millions of people around the world use social networking sites to express everyday thoughts and feelings. Many researchers have tried to make use of social media to study users' online behaviors and psychological states. However, previous studies show mixed results about whether self-generated contents on Facebook reflect users' subjecti...