Conference PaperPDF Available

Lexical, Morphological and Semantic Correlates of the Dark Triad Personality Traits in Russian Facebook Texts

Authors:

Abstract

The presented project is intended to make use of growing amounts of textual data in social networks in the Russian language, in order to find linguistic correlates of the Dark Triad personality traits, comprising non-clinical Narcissism, Machiavellianism and Psychopathy. The background for the investigation includes, on the one hand, psychological research on these phenomena and their measurement instruments, and on the other hand, recent advances in computational stylometry and text-based author profiling. The measures for these psychological phenomena are provided by recognized self-report psychological surveys adapted to Russian. Morphological and semantic analysis are applied to investigate the relationship between the Dark traits and their linguistic manifestation in social network texts. Significant morphological and semantic correlates of Narcissism, Machiavellianism and Psychopathy are identified and compared to respective advances in English author profiling. In order to deepen our understanding of the relation between these psychological characteristics and natural language use, the identified linguistic features are interpreted in terms of the fine-grained factor structure of the Dark traits. Identifying correlated features is a step towards automatic Dark trait prediction and early detection of the potentially harmful mental states.
Lexical, Morphological and Semantic Correlates
of the Dark Triad Personality Traits
in Russian Facebook Texts
Polina Panicheva, Yanina Ledovaya
St.Petersburg State University
St. Petersburg, Russia
ppolin86@gmail.com, y.ledovaya@spbu.ru
Olga Bogolyubova
Department of Psychology,
Clarkson University
obogolyu@clarkson.edu
Abstract—The presented project is intended to make use
of growing amounts of textual data in social networks in the
Russian language, in order to find linguistic correlates of the
Dark Triad personality traits, comprising non-clinical Narcissism,
Machiavellianism and Psychopathy. The background for the
investigation includes, on the one hand, psychological research
on these phenomena and their measurement instruments, and on
the other hand, recent advances in computational stylometry and
text-based author profiling. The measures for these psychological
phenomena are provided by recognized self-report psychological
surveys adapted to Russian. Morphological and semantic analysis
are applied to investigate the relationship between the Dark
traits and their linguistic manifestation in social network texts.
Significant morphological and semantic correlates of Narcissism,
Machiavellianism and Psychopathy are identified and compared
to respective advances in English author profiling. In order to
deepen our understanding of the relation between these psycho-
logical characteristics and natural language use, the identified
linguistic features are interpreted in terms of the fine-grained
factor structure of the Dark traits. Identifying correlated features
is a step towards automatic Dark trait prediction and early
detection of the potentially harmful mental states.
I. INTRODUCTION
Internet provides a vast amount of data, including data on
verbal behaviour of individuals and groups of users. Data on
verbal and social network usage patterns can provide insight
into numerous sociological and psychological characteristics
[1]. Text mining can assist in uncovering the potential of the
online verbal data, with the latest works in the field describing
psychological profiling in a multilingual setting [2].
The empirical study in question is a part of a larger research
project aimed to explore the relations among online and
offline stressful experience, psychological well-being, negative
personality traits and the language a person uses in online com-
munication1. To measure the negatively oriented personality
traits and thus the general possibility of a person’s misbehavior,
two questionnaires were chosen: the Short Dark Triad scale
[3], [4] and the Propensity for Moral Disengagement scale [5].
These two scales have already been used to study the predictors
of unethical behavior in English [6]. Moreover, the three traits
- Narcissism, Machiavellianism and Psycopathy - having the
1"A cross-cultural study of the markers of stress, health and well-being
in social networks" - the research grant of St. Petersburg State University
#8.38.351.2015.
lack of empathy as their core characteristic, are considered very
fruitful in the studies of malevolence, which is the primary goal
of the larger research project [7], [8]. The negatively-marked
personality characteristics have been successfully studied in
English language data [9], [10]. Besides, the Short Dark Triad
scale has been recently translated and adapted into Russian
[11].
We present a linguistic approach to investigation of the
Dark Triad personality traits using natural language processing.
We have launched a Facebook app which gathers textual data
and asks the participants to fill in a survey, thus annotating texts
with levels of Narcissism, Machiavellianism and Psychopathy
of the authors provided by the survey results. We propose
a number of morphological and semantic features, identify
significant correlates of the Dark traits among them, provide
interpretation of the results and relate them to similar findings
in different languages.
The paper is organized as follows: Section II contains
an overview of similar approaches to author profiling and
the Dark Triad research; Section III describes our dataset;
Section IV is a description of the statistical analysis procedure;
Sections V and VI contain the results obtained, their detailed
interpretation, and overall conclusions.
II. RELATED WORK
A. Linguistic Inquiry and Word Count approach
A widely known approach known as Linguistic Inquiry and
Word Count (LIWC) [12] has been developed for English and
other languages [13], [14]. The main idea of the approach
is that words are grouped into psychologically meaningful
categories, and specific counts of the categories in a person’s
text can be interpreted in psychological terms and linked
to their psychological profile. Lexical items are first divided
into function words and content words, where the former are
considered as meta-behavioural information on how the author
thinks and communicates, and the latter providing information
on what the topics of concern are in their texts [15]. Content
and function words are manually grouped to obtain predefined
top-down categories, which are accounted for in texts and used
for author profiling. Content categories include psychological
- social, cognitive, biological, affective - processes, positive
and negative sentiment, topics of personal concern (work,
leisure, home, money); function words include auxiliary parts
of speech like pronouns, quantifiers, articles, and a number of
verb categories - tense, person, number; the total number of
categories reaches 80.
B. Russian author profiling
There has been a significant body of work on the LIWC
approach to word psychometrics, including eastern [13] and
slavic [14] languages. LIWC for Russian [16] has been devel-
oped, but has not undergone specific validation by the authors.
In the Russian language the work on linguistic author
profiling has been mostly confined to the clinical scope of
mental disorders and a manual descriptive framework, with
the focus on interpretative diagnostic power [17], [18]. Recent
exception is a work involving written text samples by 500
Russian participants [19]. The Russian version of LIWC was
applied in this work: it was tested for author-specific word
count stability and applied to author profiling with gender and
Big Five personality characteristics [19], not involving social
media texts.
C. Russian LIWC approach revisited
Despite the altogether relevant and fruitful LIWC approach
giving indispensable insights from different languages, there
is an issue with the LIWC approach in Russian: it has been
developed as a direct translation from English, preserving
English-specific category and algorithm structure.
First of all, a division of features into function and content
words, while basically reflecting the state of affairs in English,
can be misleading for Russian: functional categories are largely
represented in Russian by morphological properties of content
words, i.e. number, person categories of verbs. Second, the
LIWC algorithm is word-base oriented: a word is assigned a
certain category if its base matches a dictionary word. Such
an approach is reasonable for languages with low syntheticity,
where a word base often equals the word. Russian language lies
at the opposite end of the spectrum with high syntheticity and
high fusion, which means that words are formed using a high
number of affixes, and they are often fused, making word form
and meaning non-additive (see [20] and later development,
i.e. [21], [22]). High syntheticity and fusion result in the fact
that a single lexical or grammatical meaning is not as often
represented by a single word base, as in English. A word-base
dictionary approach would thus fail to account for a lot of
functional and content phenomena. It is necessary to introduce
an additional feature category containing morphological and
some lexical features in order to represent the phenomena
which cannot be accounted for in a simple dictionary structure.
We apply a bottom-up approach to content and functional
categories: the former are automatically bootstrapped from
corpus data using distributional semantic techniques; the latter
are based on the vast morphological information in Russian.
The outlined bottom-up approach allows to omit time- and
resource-consuming procedure of manual classification, while
retaining the Russian-specific semantic and morphological
category structure.
D. Facebook language data
A similar data-driven approach to word, phrase and content
topic features correlated with age, gender and the Big Five per-
sonality traits of Facebook users is presented in [23]. Current
work explores the same consideration that an open-vocabulary
approach to lexical features could be more revealing than a pre-
set top-down category list. We also follow [23] in generating
content categories automatically to include most of the vocabu-
lary. However, there is a number of significant differences: first,
analysing texts in Russian requires scrupulous normalization
or morphological analysis. Second, in the current work content
categories are generated based on large balanced Russian
corpora, and not on the social network data obtained, as the
latter dataset is too small and too specific to represent general
semantic categories.
There has been a considerable amount of work done using
Facebook data in Russian, where a large amount of data
(3M+ users, 550M+ posts) was gathered from Facebook and
applied to the primarily linguistic tasks of Sentiment Analysis
and semi-automatic identification of neologisms [24], [25].
However, it is important to notice the difference of our work
in terms of research goals and respective data domains: while
covering a much smaller number of authors, we are concerned
with very fine-grained data containing personal psychological
questionnaire results. The latter data is considerably more
complicated to obtain both technically and from the point of
view of ethical issues, which currently prevents the resulting
corpus from being freely distributed as open access data.
To conclude, the current work presents the first attempt to
gather and explore psychological and linguistic data in Russian
Facebook. To our knowledge, it is the first attempt of using
word-embeddings semantic models to generate meaningful
categories for author profiling. It is also the first work to
approach semantic and morphological correlates of the Dark
traits in the Russian language.
E. The Dark Triad personality traits
Our analysis is focused on the Dark Triad personal charac-
teristics. These are related but distinct sub-clinical categories,
where Narcissism is primarily associated with self-focus and
grandiosity, Psychopathy with impulsiveness, aggression and
asocial behaviour, and Machiavellianism with manipulating.
Lack of empathy is reported to be the common feature of the
Dark Triad [4]. The following factor structure is ascribed to
the Dark traits:
Narcissism is described as a combination of Exploita-
tiveness/Entitlement and Leadership/Authority [4].
Machiavellianism is a multi-dimensional but contro-
versial construction; the factors accepted in [4] are
Machiavellian Tactics and Cynical Worldview.
Psychopathy is reported to incorporate Manipulation,
Callous Affect, Erratic Lifestyle, Antisocial Behaviour
[4].
The Dark Triad questionnaire has been recently adapted to
Russian and has undergone language-specific validation [11].
Linguistic correlates of Psychopathy have been effectively
analysed on English-language data in a clinical context [10].
It involved crime narratives around 2500 words each by
14 psychopaths and 52 controls. Authors of [26] describe
LIWC-based analysis of Twitter texts by 5700 control group
participants and 150-450 individuals labelled with a diagnosis,
each participant being associated with 25 to 3,200 Tweets. An
important contribution of this work is interpretable analytics
combining the users’ verbal and non-verbal online behaviour.
A study of the Dark Triad correlates in English has
been performed with 2,500 authors of Twitter texts [9]. The
positive content correlates included anger, swear and negative
emotion words for Psychopathy and Machiavellianism, sex-
related words for Narcissism. Narcissism was also positively
correlated with function symbols (@, #) used in Twitter to
denote relations between users and between users and topics,
thus modelling a meta-category of social relation symbols in
Twitter. Negative correlates included positive emotion and ’we’
for Machiavellianism.
III. DATA COLLECTION
1972 Facebook users participated in the study by complet-
ing the Dark Triad Scale [11] and providing consent to share
their publicly available posts. This questionnaire is a validated
self-report measure of the so called dark personality traits:
Narcissism, Machiavellianism and subclinical Psychopathy.
Each trait is assessed by 9 questions about the users’ attitude
and behaviour towards themselves and others. The answers are
situated on a 5-point Likert scale representing the degree of
agreement or disagreement with the statements in question.
The total score for each subscale is divided by 9 (the number
of questions) resulting in a score range for each scale from 1
to 5.
The application with the questionnaire had been advertised
on Facebook2. The public posts have been gathered, with text
citated or written by the users themselves, repost information
being out of scope of the current work. The obtained dataset
consists of 7.67 posts on average for each participant, standard
deviation = 5.69. This is on average 24.77 sentences (std =
38.13) or 311.99 tokens (std = 565.56) per participant. The
volume of posts by each author was technically restricted by
Facebook API3), which only allowed for a definite number
of latest posts to be downloaded; only the posts containing
personal comments by the participant were included in the
study.
The volume of text by each author is modest comparing
to previous studies; however, the numbers of authors are
comparable [26], [9]. Preliminary experiments on automatic
classification and regression have confirmed that a larger
dataset is necessary for significant Machine Learning results,
which we proceed to obtaining in future work. On the other
hand, the number of Facebook users in the study allows to
make exploratory conclusions based on statistical analysis,
allowing for fruitful psychological interpretation.
Text volume by each author in tokens and sentences is
significantly negatively correlated with Machiavellianism (p
<0.01) and positively with Narcissism (p<0.05). Text volume
characteristics can’t be directly interpreted, as they depend on
2https://apps.facebook.com/psytest/
3https://developers.facebook.com/docs/javascript/reference/v2.6
the technical restriction on the number of downloadable posts;
however, they will affect further results interpretation.
IV. STATISTICAL ANA LYSIS
All the data have been processed with PyMorphy2 mor-
phological package [27] using the default morphological dis-
ambiguation option, unigram statistics. We apply Spearman’s
correlation coefficient, as we are primarily interested in finding
monotonous relationship between linguistic items and the Dark
Triad measure, not limited to the linear correlation identified
by Pearson’s r.
A. Statistical significance correction
An important issue addressed in this work is the signifi-
cance of the statistical results. Spearman’s correlation reflects
r, the correlation value ranging from -1 to 1, with 0 indicating
no correlation, and p, indicating the degree to which the
current rvalue could be obtained by chance in a random
sample. It has been shown in previous linguistic profiling
works with different sample sizes [9], [23] that the values of r
do not usually exceed 0.2 in absolute value. This supports the
consideration that word categories tend to be sparse in multi-
purpose social network texts, with very high correlation values
between word counts and personal characteristics appearing a
very superficial case.
However, there is another strong correlation significance
filter which applies to the p-value. It is a well-known issue that
in multiple-hypothesis testing the p-values must be adapted
[28]. The intuition behind multiple-testing correction is that
when evaluating correlation with a large number of features,
a small portion of random features obtain statistically signif-
icant correlation by chance. In order to eliminate the random
effects of numerous hypotheses, various statistical filters are
suggested. E.g., the Bonferonni correction procedure requires
the resulting p-values in multiple-hypothesis testing to be
multiplied by the hypothesis number, thus allowing a much
lower number of p-values to pass the level of p< 0.01/0.05
[23].
A surprising number of state-of-the-art works in author
profiling do not mention applying multiple-hypothesis correc-
tion procedures ([12], [9], [13], [19]). We find it necessary to
apply a filtering procedure to our results, as the number of
lexical features exceeds 19K, and content and morphological
features reach 184 and 64 respectively. As the Bonferroni
correction is reported to be too stringent, resulting in a portion
of false rejections [23], we apply the Benjamini-Hochberg false
discovery rate procedure for multiple hypothesis testing(FDR)
[28]. This allows to control for statistically significant results
in the current setting of a modest dataset size with a large
number of correlated features.
B. Numeric characteristics
Spearman’s correlation (r(1,972)) was applied to self-
reported Dark Triad measures and text-length, lexical and mor-
phological features. Average sentence length and post length in
sentences and tokens reveal some significant correlations with
the Dark Triad measures, see Table 1: significant correlations
are highlighted in italics (p<0.05) and bold (p<0.01).
Table I. SPEARMANS CORR ELATION S BETWE EN AVER AGE P OST
LENGT H FEATURE S AND PER SONALI TY S COR ES
Text feature Na Ma Ps
Sentence length 0.022 -0.057 0.006
Post length, sentences 0.054 -0.109 -0.04
Post length, tokens 0.045 -0.101 -0.02
C. Lexical features
Normal forms of words constitute lexical features. Out
of 19K lexemes occurring in the texts there are around 150
correlated words for every Dark Trait; these are exemplefied
in Table II. However, the FDR correction procedure rejects
their significance based on the current dataset.
D. Morphological features
Morphological features are based on Py-
Morphy tagset [27], list of tags is available at
http://opencorpora.org/dict.php?act=gram. Along the lines of
LIWC parameters [15], the features include the following:
All parts of speech;
auxiliary parts of speech - preposition, con-
junction, particle, interjection - are also
grouped together;
Person and number, standalone and grouped with
POS;
Verb modality features:
voice: active, passive;
mood: indicative, imperative;
tense: present, past;
reflexivity;
Named entity features:
name, surname, patronymic;
organization, trademark, geographical location,
abbreviation;
Adjective features:
short, full;
qualitative;
superlative;
Possessive pronouns;
Style characteristics:
vernacular, slang words.
Morphological correlates are illustrated in Table IV.
E. Generalized content features
In order to obtain semantically interpretable features and
reduce the number of testing hypotheses, we apply clustering
based on word-embeddings semantic modelling.
To reduce clustering evaluation and leave out obscure
and rare items we only cluster the words which occur in
at least 10 authors’ texts. Thus we obtain a set of 3,700
lexical items to be clustered. We use a Skip-Gram Word2Vec
model trained with the Russian National Corpus data. We
intentionally apply RNC and not a web-trained model, as the
goal is to capture established semantic regularities interpretable
in terms of general semantic categories, while describing web
language peculiarities is a different task.
The clustering techniques applied in this task have been
compared in [29]. The optimal algorithm for the current
task is K-means with Euclidean distance, yielding the most
homogeneous and precise clusters. The optimal number of
clusters for manual labelling, evaluation and interpretation of
the current data is 20 words per cluster, i.e. 184 clusters. Other
clustering algorithms and parameters have been applied in
preliminary experiments; resulting in various cluster sizes and
slightly different cluster contents, different algorithms maintain
the basic significant topics unchanged.
The clusters have been manually labelled with a concept
comprising their members. Function words, numerals and
unknown words are out of scope of the semantic model and
out of the clusters.
The generalized semantic correlates are illustrated in Table
III. Table V (see Appendix) contains contents of the signifi-
cantly correlated clusters.
V. RESULTS AND INTE RP RE TATION
A. General considerations
Russian-speaking Facebook users with higher scores on the
scale of Machiavellianism are less likely to produce posts; the
length of their posts and sentences is shorter in comparison
with users characterized by lower scores of Machiavellianism.
The posts and sentences of Facebook users scoring high on
Narcissism are significantly longer than those of users scoring
low on Narcissism. This is consistent with psychological mod-
els of Machiavellianism and Narcissism, wherein individuals
with prominent machiavellianistic traits would be more likely
to closely guard their personal image and avoid oversharing as
a result of careful strategic planning [30], [4], and individuals
with high Narcissism would tend to be more ’ego-promoting’,
i.e. expansive and needing to attract attention to their actions
and personality [31], [4].
Because of the modest amount of textual data, lexical
correlates of the Dark traits and morphological correlates
of Narcissism and Psychopathy have been filtered out by
the statistical significance correction procedure. However, by
presenting a number of significant semantic correlates for
every Dark trait we have shown that semantic clustering
is a useful, intuitive and interpretable approach to reducing
feature dimensionality in case of a limited dataset, high lexical
diversity and sparsity.
B. Narcissism
High Narcissism values are characterized by the following
correlated items:
1) Social involvement and communication importance
is present in clusters denoting social interaction as
Appeal, Take_give. These features replicate the find-
ings reported in [9], where high Narcissism was
characterized by social involvement in terms of high
correlations with ’Friends’ category and with punc-
tuation symbols (@, #) denoting social interaction in
Twitter.
Table II. LEXICAL CORRELATES OF THE DARK PERS ONA LITY TR AIT S
Narcissism Machiavellianism Psychopathy
Language Positive, 129 Negative, 10 Positive, 16 Negative, 160 Positive, 136 Negative, 13
Russian масса, мой,
одновременно,
сокращение,
важный, целый,
слово,
благодарность,
решение, смс, я,
президент,
собственный,
спасибо, дурной,
править, забытый,
...
дохлый,
посвящаться,
доставать,
подмосковье, снизу,
мент, 1993, nice,
ерунда, la
russia, оригинал,
стройный, игил,
новороссийск,
инвестиция, жарко,
замуж, бродить,
беспомощный,
выход, фотка,
новыйгод,
президент,
провайдер, трое
и,себя,очень,
любить,чтобы,
сердце,друг,
каждый,мы,
понять,много,
физический,
господь,чувство,
война, ...
сша, российский,
путин, нация,
президент,
пользователь,
привязать,
нормальный,
славянский,
заканчиваться,
деньга,масса,
институт,давно,
состав,признать,
рубль, ...
порадоваться,
даровать,
предновогодний,
подводить,
задуматься,друг,
вдохновлять,
крыло,мел,
раздавать, nice,
наделить,
внимательный
English
translation mass, my,
simultaneously,
reduction, important,
whole, word,
gratitude, decision,
sms, I, president, own,
thank you, silly, rule,
forgotten, ...
dead, dedicated, reach,
Moscow suburbs,
below, police-
man(derogatory),
1993, nice, nonsense,
la
russia, original, slim,
isis, Novorossijsk city,
investment, hot,
married, wander,
helpless, exit, photo,
newyear, president,
provider, three
and, self, much,
love(verb), in order to,
heart, friend, every,
we, understand, a lot,
physical, god, feeling,
war, ...
usa, russian, Putin,
nation, president, user,
tie(verb), normal,
slavic, be over, money,
mass, institute, long
ago, content,
acknowledge, ruble, ...
be glad, grant, new
year eve’s, betray,
think, friend, inspire,
wing, chalk, hand out,
nice, endow, attentive
Table III. MORPHOLOGICAL CORRELATES OF THE DARK PERS ONA LITY TR AIT S, *p<0.05, FDR-CORRECTED
Narcissism Machiavellianism Psychopathy
Cluster Correla-
tion Cluster Correla-
tion Cluster Correla-
tion
Verb_imperative
Interjection
Pronoun_2person_plural
Pronoun_1person_singular
2per_plural
Posessive
Punctuation
Verb_2person_plural
0.073
0.065
0.065
0.062
0.059
0.050
0.047
0.046
Patronymic*
1person_plural*
Verb_1person_plural*
3person_plural*
Pronoun_1person_plural*
3person_singular*
Adjective_short*
Pronoun_3person_plural*
Verb_3person_plural*
Pronoun_2person_plural*
Verb_3person_singular*
Adjective_full*
Participle_full*
Adverbial_participle*
2person_plural*
Pronoun_3person_singular*
Surname
Qualitative
Active
Participle_active
Participle_passive
Name
Verb_2person_plural
Conjunction
Pronoun
Posessive
-0.083
-0.074
-0.073
-0.071
-0.071
-0.069
-0.068
-0.066
-0.066
-0.064
-0.062
-0.062
-0.060
-0.060
-0.059
-0.058
-0.056
-0.054
-0.053
-0.053
-0.052
-0.050
-0.048
-0.046
-0.045
-0.045
Vernacular
Slang
Organization
Adjective_short
0.049
0.049
0.047
-0.045
2) Importance of self-image and status is present in
High_low word cluster.
3) Goal-focus, achievement, competency are stressed by
the Reasoning cluster.
With regard to the accepted factor structure of Narcissism
[4], goal focus applies to Leadership/Authority, the status fea-
tures are related to Exploitativeness/Entitlement, while social
involvement can be related to both Leadership and Exploita-
tiveness.
C. Machiavellianism
Lexical, semantic and morphological features correlated
with Machiavellianism follow a single pattern of mostly neg-
ative correlation. The significant topics include the following:
1) Social involvement, communication and relationship
issues are particularly rare in high Macchiavellian-
ism:
clusters Affirm, Friend, Feeling_vb, Ten-
der_adj, male and female names;
numerous morphological features: first per-
son plural verbs and pronouns, some names
(patronymics), reference to other people with
third and second person verbs and pronouns.
The negative social communication feature was also
reported as prominent in Machiavellianism in [9],
represented by negative correlations with first person
plural pronoun, family and social processes word
categories.
2) Positive affect is less likely to occur with high Mac-
chiavellianism according to the negatively correlated
clusters Wellbeing, Impress, confirming the findings
by [9] of negatively correlated affective processes and
positive emotion words.
3) Issues of mental processing are negatively repre-
sented by the clusters Faith, Religion, Perception,
Table IV. SEM AN TIC C LU STE R CORRE LATE S (P<0.01) OF TH E DAR K PER SONALITY T RAITS , *p<0.05, **p<0.01, FDR-CORRECTED
Narcissism Machiavellianism Psychopathy
Cluster Correlation Cluster Correlation Cluster Correlation
Appeal*
Take_give*
High_low*
Reasoning*
Authority
Monument
Goal
Passion
Pos_quality
Event
Feeling
Want
Casual
Perfection
Regulations
0.080
0.076
0.073
0.073
0.068
0.068
0.068
0.066
0.066
0.064
0.061
0.059
0.059
0.058
0.058
Faith**
Wellbeing**
Friend**
Verbs**
Sensation**
Difficult**
Affirm**
Appearance**
Material**
Tender_adj**
Space**
Neg_action*
Female_name*
Male_name*
Trouble*
Perception*
Feeling_vb*
Impress*
Religion*
Face_part*
Water_object*
Body_situation*
Furniture_object*
Number*
Being*
Wild*
Citizen*
Anniversary*
Clothing*
Age*
-0.101
-0.094
-0.094
-0.085
-0.084
-0.083
-0.083
-0.083
-0.083
-0.083
-0.081
-0.076
-0.076
-0.076
-0.070
-0.070
-0.070
-0.064
-0.063
-0.062
-0.062
-0.062
-0.062
-0.062
-0.062
-0.062
-0.061
-0.061
-0.060
-0.059
Money**
Food*
Money_affair*
Political*
Authority
Virtual
Sky
Friend
Powerful_male
Money_operation
0.110
0.08
0.075
0.074
0.071
0.069
-0.068
-0.068
0.066
0.064
Sensation, Appearance.
4) Negatively correlated pronouns and verbs indicate
personal detachment and formality in speech ([15]).
5) A number of negatively correlated cluster cate-
gories of common, casual topics (Verbs, Neg_action,
Trouble, face- and body-related words, Citizen, Age,
Being) suggest an overall decrease in general, daily
issues-related speech. This goes in line with an
overall decrease in post number and volume, as
Machiavellianism increases.
Based on the accepted factors of Machiavellianism described
in [4], negatively correlated general topics and an overall
lack of positive correlates are in line with the Machiavellian
cynical views on the world and people, leading to lack of self-
disclosure. The same is true for lacking positive emotional ex-
pressions, mental processing features and social involvement.
Personal detachment and formality in speech may apply to
Cynical Worldview and to Machiavellian tactics, as they can
represent deceptive speech.
D. Psychopathy
1) High concern for basic needs in Psychopathy is
demonstrated by a number of money and food-related
clusters (Money, Money_affair, Food). This confirms
strong ’Sex’ and ’Body’ correlates of Psychopathy
reported in [9].
2) Prevailing political and authority issues are repre-
sented by the Political word cluster.
According to the factor structure of Psychopathy [4], polit-
ical and authority focus displays Manipulation, while Callous
Affect and Antisocial Behavior are represented in high concern
for basic needs.
Statistically identified linguistic features show modest cor-
relation measures, however, both statistical significance and
correlation levels replicate those identified in an English-
speaking sample of comparable and larger sizes [9], [23].
Statistical significance procedures have shown that larger text
samples are necessary in future research to add to lexical
and morphological findings and confirm the list of significant
semantic features.
VI. CONCLUSIONS AND FUTURE WORK
We have developed semantic and morphological psycho-
metric categories based specifically on Russian language and
a statistical processing tool based on PyMorphy analyser [27].
The psychometric tool has been applied to texts of Russian
Facebook users, who also filled a self-report questionnaire on
the levels of the Dark Triad personality traits. We have iden-
tified significant morphological and semantic correlates of the
Dark traits, and interpreted them in terms of larger linguistic
categories. The most prominent categories have replicated the
findings reported in previous work on the Dark Triad in English
language. Finally, most of the revealed linguistic features are
related to psychological factors of the Dark traits.
The statistical bottom-up approach has proven to be a
plausible and fruitful research method. The developed morpho-
logical and semantic categories represent Russian language pe-
culiarities and give informative results comparable to English-
language LIWC-based research.
The described significant morphological and semantic pat-
terns will supply a deeper understanding of the psychological
characteristics of individuals with prominent Dark traits. To
our knowledge there have been no previous studies of the lan-
guage of the Dark traits in Russian samples. We may assume
that the increased understanding of the linguistic characteristics
of Narcissism, subclinical Psychopathy and Machiavellianism
in Russian samples will enhance the theoretical psychological
model of the Dark Triad. The local cultural and national
features mirrored in the language will bring more detailed
descriptions of these individuals, which may be used to identify
Dark trait patterns in counselling, psychotherapy, forensic
assessments and negotiation procedures.
In order to enhance the list of significant linguistic features,
the following research step will involve larger amounts of
textual data. The proposed categories are subject to refinement:
the semantic categories are to be associated with LIWC-based
categories, specifically those denoting psychological processes
and sentiment; the morphological and lexical categories should
be enhanced by applying a dataset with larger text sample per
participant.
Our next research steps involve expanding the dataset to
allow automatic personality identification, i.e. regression and
classification of the Dark trait levels based on the Facebook
text collection.
ACKNOWLEDGMENTS
The reported study is supported by Saint-Petersburg State
University research grant 8.38.351.2015 "A cross-cultural
study of the markers of stress, health and well-being in social
networks".
REFERENCES
[1] M. Kosinski, S. C. Matz, S. D. Gosling, V. Popov, and D. Stillwell,
“Facebook as a research tool for the social sciences: Opportunities,
challenges, ethical considerations, and practical guidelines.” American
Psychologist, vol. 70, no. 6, p. 543, 2015.
[2] B. Verhoeven, B. Plank, and W. Daelemans, “Multilingual personality
profiling on twitter,To be presented at DHBenelux 2016, Belval,
Luxembourg, 09/06/2016 In Press.
[3] D. L. Paulhus and K. M. Williams, “The dark triad of personality:
Narcissism, machiavellianism, and psychopathy,” Journal of research
in personality, vol. 36, no. 6, pp. 556–563, 2002.
[4] D. N. Jones and D. L. Paulhus, “Introducing the short dark triad (sd3)
a brief measure of dark personality traits,” Assessment, vol. 21, no. 1,
pp. 28–41, 2014.
[5] C. Moore, J. R. Detert, L. KLEBE TREVIÑO, V. L. Baker, and
D. M. Mayer, “Why employees do bad things: Moral disengagement
and unethical organizational behavior,” vol. 65, no. 1, pp. 1–48.
[Online]. Available: http://onlinelibrary.wiley.com/doi/10.1111/j.1744-
6570.2011.01237.x/full
[6] V. Egan, N. Hughes, and E. J. Palmer, “Moral
disengagement, the dark triad, and unethical consumer
attitudes,” vol. 76, pp. 123–128. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S019188691400703X
[7] A. Book, B. A. Visser, and A. A. Volk, “Unpacking “evil”: Claiming
the core of the dark triad,” vol. 73, pp. 29–38. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0191886914005182
[8] A. Furnham, S. Richards, L. Rangel, and D. N. Jones,
“Measuring malevolence: Quantitative issues surrounding the dark
triad of personality,” vol. 67, pp. 114–121. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0191886914000932
[9] C. Sumner, A. Byers, R. Boochever, and G. Park, “Predicting dark
triad personality traits from twitter usage and a linguistic analysis of
tweets,” in Machine Learning and Applications (ICMLA), 2012 11th
International Conference on, vol. 2. IEEE, 2012, pp. 386–393.
[10] J. T. Hancock, M. T. Woodworth, and S. Porter, “Hungry like the wolf:
A word-pattern analysis of the language of psychopaths,” Legal and
Criminological Psychology, vol. 18, pp. 102–114, 2013.
[11] M. Egorova and M. Sitnikova, “The dark triad,Psikhologicheskie
Issledovaniya, vol. 7(38), p. 12, 2014. [Online]. Available:
http://psystudy.ru/index.php/num/2014v7n38/1071-egorova38.html
[12] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry
and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates,
vol. 71, 2001.
[13] C. H. Lee, K. Kim, Y. S. Seo, and C. K. Chung, “The relations between
personality and language use,” The Journal of general psychology, vol.
134, no. 4, pp. 405–413, 2007.
[14] J. Bjeki´
c, L. B. Lazarevi´
c, M. Živanovi´
c, and G. Kneževi´
c, “Psycho-
metric evaluation of the serbian dictionary for automatic text analysis–
liwcser,Psihologija, vol. 47, no. 1, 2014.
[15] Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning
of words: Liwc and computerized text analysis methods,Journal of
language and social psychology, vol. 29, no. 1, pp. 24–54, 2010.
[16] A. Kailer and C. K. Chung, “The russian liwc2007 dictionary,
LIWC.net, 2011.
[17] N. Zvereva, E. Mikhaleva, S. Nosov, and Y. Y. Nikitina, “Eksperimen-
tal’noe issledovanie osobennostei rechevoi deyatel’nosti u muzhchin,
bol’nykh shizofreniei.[elektronnyi resurs][experimental research of fea-
tures of speech activity in men with schizophrenia],Meditsinskaya
psikhologiya v Rossii [Medical psychology in Russia], no. 4, 2011.
[18] P. Y. Zavitaev, “Autism: a clinical-semantic and experimental-
psychological investigation, [autism: kliniko-semanticheskoe i
eksperimental’no-psichologicheskoe issledovanie], in russian,Russian
Journal of Psychiatry, [Rossiyskiy Psikhiatricheskiy Zhurnal], vol. 5,
pp. 44–48, 2007.
[19] T. Litvinova, O. Litvinova, Y. Ryzhkova, Y. Biryukova, P. Seredin,
and O. Zagorovskaya, “Studying influence of author’s gender and
psychological characteristics on quantitative parameters of text using
“linguistic inquiry and word count” program,” Nauˇcnyj dialog (Scientific
dialogue), in Russian, p. 101, 2015.
[20] J. H. Greenberg, “A quantitative approach to the morphological typology
of language,” International journal of American linguistics, vol. 26,
no. 3, pp. 178–194, 1960.
[21] A. Pirkola, “Morphological typology of languages for ir,Journal of
Documentation, vol. 57, no. 3, pp. 330–348, 2001.
[22] J. Siegel, B. Szmrecsanyi, and B. Kortmann, “Measuring analyticity
and syntheticity in creoles,” Journal of Pidgin and Creole Languages,
vol. 29, no. 1, pp. 49–85, 2014.
[23] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M.
Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E.
Seligman et al., “Personality, gender, and age in the language of social
media: The open-vocabulary approach,PloS one, vol. 8, no. 9, p.
e73791, 2013.
[24] A. Panchenko and others, “Sentiment index of the
russian speaking facebook,” pp. 506–517. [Online]. Available:
http://dial.uclouvain.be/handle/boreal:160359
[25] N. Muravyev, A. Panchenko, S. Obiedkov, and others,
“Neologisms on facebook,” pp. 440–454. [Online]. Available:
http://dial.uclouvain.be/handle/boreal:160352
[26] G. Harman, M. Coppersmith, and C. Dredze, “Quantifying mental
health signals in twitter,ACL 2014, p. 51, 2014.
[27] M. Korobov, “Morphological analyzer and generator for russian and
ukrainian languages,” in Analysis of Images, Social Networks and Texts.
Springer, 2015, pp. 320–332.
[28] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate:
a practical and powerful approach to multiple testing,Journal of the
royal statistical society. Series B (Methodological), pp. 289–300, 1995.
[29] P. Panicheva, Y. Ledovaya, and O. Bogoliubova, “Revealing interpetable
content correlates of the dark triad personality traits,” in Russian
Summer School in Information Retrieval, 2016, accepted for publication.
[30] J. F. Rauthmann and G. P. Kolar, “How “dark” are the dark triad traits?
examining the perceived darkness of narcissism, machiavellianism, and
psychopathy,Personality and Individual Differences, vol. 53, no. 7, pp.
884–889, 2012.
[31] R. Raskin and H. Terry, “A principal-components analysis of the
narcissistic personality inventory and further evidence of its construct
validity.Journal of personality and social psychology, vol. 54, no. 5,
p. 890, 1988.
Table V. SIGNIFICANT CLUSTER CONTENTS (IN ALPHABETICAL ORDER)
Cluster Typical contents
Russian English translation
Affirm зарегистрировать называть составлять утверждать register call constitute affirm
Age возраст детство здоровье молодость прошлое ранний старость age childhood health youth past early
Anniversary поздравлять праздник рождение рождество свадьба юбилей congratulate holiday birth christmas marriage jubilee
Appeal вызов жалоба задание заказ заявление обещание обращение ответ call complaint task order statement promise appeal answer
Appearance внешность выглядеть гордый красавец красивый худой шикарный appearance look proud beauty beautiful slim chic
Being бытие вселенная духовность истина мир природа реальность being universe spirituality truth world nature reality
Body_situation висеть вставать держаться лежать посадить сидеть ставить стоять hang stand up hold lie sit down sit put stand
Citizen американец гражданин еврей китаец немец россиянин француз american citizen jew chinese german russian french
Clothing ботинок костюм куртка носок обувь одежда платье шляпа штаны boot costume jacket sock shoes clothes dress hat trousers
Difficult важный необходимый полезный сложный трудный успешный important inevitable useful difficult hard successful
Face_part бровь веко взгляд глаз лицо рожа рот улыбка eyebrow eyelid look eye face face(derogatory) mouth smile
Faith ангел благодать бог господний дар дух дьявол молитва откровение angel grace god godly gift spirit devil prayer revelation
Feeling_vb верить восхищаться гордиться доверять дружить любить ненавидеть believe admire be proud trust be friends love hate
Female_name анна вера виктория екатерина елена мария надежда наталья anna vera victoria ekaterina elena maria nadezhda natalia
Food блин борщ вкусный картошка каша масло молоко пища салат суп pancake tasty potato porridge butter milk food salad soup
Friend друг забытый знакомый приятель родные сосед чужой friend forgotten familiar friend relative neighbour stranger
Furniture_object дверь доска дыра замок крыша портал потолок стена табличка труба door board hole lock roof portal ceiling wall sign pipe
High_low вершина высокий выше ниже низкий повышение средний peak high higher lower low advancement medium
Impress вдохновлять великолепный впечатлять выдающийся удивительный inspire magnificent impress outstanding amazing
Male_name андрей борис валерий василий виктор владимир дмитрий иван andrey boris valeriy vasiliy victor vladimir dmitriy ivan
Material асфальт зеркало камень металл окно посударазбитый с текло asphalt mirror stone metal window dishes broken glass
Money валюта деньга доллар дорого евро копейка марка рубль currency money dollar expensive euro kopeck mark ruble
Money_affair бюджет вклад доход зарплата кредит налог оплата расход budget deposit income salary credit tax payment expense
Neg_action жестокость измена коррупция ложь месть нарушение насилие обман cruelty treason bribery lie revenge violation violence fraud
Number восемь двадцать девять десять пара пятнадцать пять тридцать четыре eight twenty nine ten pair fifteen five thirty four
Perception влиять воспринимать касаться осознавать относиться ощущать influence perceive regard recognize relate feel
Political власть вождь государство демократия партия политика правительство power leader state democracy party politics government
Reasoning вывод заключение мнение осознание понимание решение inference conclusion opinion awareness understanding decision
Religion возлюбить господь иисус милость молиться прощать сотворить love god jesus mercy pray forgive create
Sensation воспоминание впечатление иллюзия мысль ощущение шок эмоция memory impression illusion thought sensation shock emotion
Space близко вне внизу внутри возле вокруг позади посреди соседний close outside below inside near around behind amid next
Take_give брать взять давать держать забирать нести отдавать хватать take take give hold take away carry give away grasp
Tender_adj вежливый внимательный добрый ласковый любящий нежный polite attentive kind tender affectionate loving
Trouble беда война несчастье неудача перемена подвиг разрушение trouble war disaster failure change achievement destruction
Verbs бывать видеть познакомиться помнить сделать случаться создать be see meet remember do happen create
Water_object берег болото вода канал море озеро океан остров пруд река bank marsh water channel sea lake ocean island pond river
Wellbeing благополучие комфорт отдых покой равновесие спокойствие уют wellbeing comfort rest peace balance tranquillity cosiness
Wild агрессивный быстрый крутой мощный острый сильный смелый aggressive fast hard mighty sharp strong courageous
... There are also studies on texts in Chinese, Arabic, Spanish, Dutch, French, German, Italian and Turkish. Several Slavic studies are available too, particularly for Russian and the Serbian language (e.g., Bjekić et al., 2012;Litvinova et al., 2017;Sboev et al.,;Sikos et al., 2014), but comparability of these studies with anglophone studies is often problematic in terms of both their focus and their methodology (see, for example, Panicheva et al., 2016;Kartelj et al., 2012). In the studied context, Slavic languages are without a doubt an under-researched language group and this situation is even more distinct in the Czech language. ...
Article
Full-text available
The study is a follow-up to three published anglophone researches examining the relation between the use of linguistic categories and personality characteristics as outlined in the Big Five model, with the purpose of replicating these and elaborating for the Czech language. The comparative research study in Czech focuses on analysis of both grammatical and semantic variables in six types of text (written and oral), produced by N = 200 participants. Within the study, there were six confirmed relations, however, these appear only in certain types of text. The results show not only an essential role of the text register, but they also allow us to evaluate the universality of findings of studies in English in comparison with other, especially Slavic, languages.
... Due to the rapid development of the Internet, personality detection from social media texts has become the latest trend. For example, personality detection from the semantic and status updates of social media (Akhtar et al., 2018;Panicheva, Ledovaya & Bogolyubova, 2016), which can help companies select the right employees (Gaddis & Foster, 2015). Similarly, Yin et al. studied the relationship between negative textual information and Big Five personality traits (C. ...
Article
Personality detection based on user-generated text content analysis has a significant impact on information science, for instance, information seeking. Existing deep learning-based approaches, however, have two major limitations. Firstly, they extract only keywords for personality detection and lack the analysis of sentiment information and psycholinguistic features. Secondly, the information about the context and polysemous words are ignored. To tackle these problems, we propose a novel multi-label personality detection model based on neural networks, which combines emotional and semantic features. Specifically, we leverage Bidirectional Encoder Representation from Transformers (BERT) to generate sentence-level embedding for text semantic extraction. In addition, a sentiment dictionary is used for text sentiment analysis in order to consider sentiment information. Finally, we input the above semantic information and emotional information into the neural network to construct an automatic personality detection model. The performance of the model has been evaluated on two public personality datasets. The experiments show that we obtain average accuracy improvements of 6.91% and 6.04% on the Myers-Briggs Type Indicator (MBTI) and Big Five datasets, respectively, compared with the state-of-the-art techniques.
... When linguistic (and usually language-dependent) features are used, morphology and syntax are most commonly modeled [Baayen et al. 1996], [Rogov et al. 2007], [Hosseinia and Mukherjee 2018]. Other linguistic methods involve modeling semantics [Panicheva et al. 2016]. ...
Conference Paper
In this work, we perform a method study for the problem of authorship attribution in Russian and English. The datasets used consist of 324 works written in Russian and 207 works in English. We propose a set of text representation models that reflect various linguistic phenomena, in particular, morphological and syntactic ones. One distinctive feature of the proposed models is that they are interpretable. These models are used individually and in combination against a Doc2Vec baseline. For Russian, some of our models outperform Doc2Vec, but this does not happen in the case of English, for various reasons. However, the proposed models can also be used together with Doc2Vec, dramatically improving its performance: by 16.79% in the case of Russian and by 7.2% for English. Additionally, we experiment with two different methods for separating texts into blocks of K sentences (contiguous and bootstrapped) and performed parameter tuning of K. Finally, we conduct a feature importance analysis and show which linguistic markers of author style are the most pertinent for Russian, English and for both these languages. All code used in this work is made freely available to the community.
... Maria Balmaceda et al. [71] have investigated users' personality through evaluating text messages in social network, then verified the stability of the identified personality. Panicheva et al. [80] have investigated the link between the dark triad personality traits and Russian linguistic features in social networking texts. ...
Article
Full-text available
The identification of human behavior can provide useful information across multiple job spectra. Recent advances in applying data-based approaches to social sciences have increased the feasibility of modeling human behavior. In particular, studying human behavior by analyzing unstructured textual data has recently received considerable attention because of the abundance of textual data. The main objective of the present study was to discuss the primary methods for identifying and predicting human behavior through the mining of unstructured textual data. Of the 823 articles analyzed, 87 met the predefined inclusion criteria and were included in the literature review. Our results show that the included articles could be symmetrically classified into two groups. The first group of articles attempted to identify the leading indicators of human behavior in unstructured textual data. In this group, the data-based approaches had three main components: (1) collecting self-reported survey data, (2) collecting data from social media and extracting data features, and (3) applying correlation analysis to evaluate the relationship between two sets of data. In contrast, the second group focused on the accuracy of data-based approaches for predicting human behavior. In this group, the data-based approaches could be categorized into (1) approaches based on labeled unstructured textual data and (2) approaches based on unlabeled unstructured textual data. The review provides a comprehensive insight into unstructured textual data mining to identify and predict human behavior and personality traits.
Article
Social media users can participate increasingly by sharing online information, and the work in this research content can be helpful to assess their personalities. Personality prediction is defined by extracting the digital features from the digital content and mapping those features into a personality prediction model. This human behaviour identification will be helpful to multiple job processes. The advancement of data-based approaches in social sciences will be helpful to model human behaviour based on unstructured text data. Due to the simple nature of the big five personality traits, it has been used in analyzing human behaviour. This paper focuses on predicting human behaviour based on personality prediction with unstructured textual data mining. So far, many researchers have proposed a personality prediction model based on deep learning approaches. However, the existing model slack processing time and the ability to capture the real meaning of the word. This paper proposes a deep learning-based prediction model from the data stored on Social media such as Facebook, Twitter, and Instagram to overcome these issues. Initially, the data are preprocessed to remove the irrelevant data such as URL, symbols and stop words. The features are extracted using the proposed mRMR based cat optimization algorithm from the preprocessed data. This approach identifies the relationship among feature sets and traits from datasets. The human behaviours are classified with an Improved LSTM classifier optimized with a forest optimization algorithm. The proposed mRMR-Cat optimization-based feature extraction and LSTM with forest optimization approaches outperform all feature extraction average baseline sets and Classify on multiple social datasets with improved accuracy of 86.5%, 88.4% and 90.16% for the datasets Facebook, Twitter and Instagram.
Chapter
Users’ interaction with Facebook generates trails of digital footprints, consisting of activity logs, “Likes”, and textual and visual data posted by users, which are extensively collected and mined for commercial purposes, and represent a precious data source for researchers. Recent studies have demonstrated that features obtained using these data show significant links with users’ demographic, behavioral, and psychosocial characteristics. The existence of these links can be exploited for the development of predictive models allowing for the unobtrusive identification of online users’ characteristics based on their recorded online behaviors. Here, we review the literature exploring use of different forms of digital footprints collected on Facebook, the most used social media platform, for the prediction of personality traits. Then, based on selected studies, meta-analytic calculations are performed to establish the overall accuracy of predictions based on the analyses of digital footprints collected on Facebook. Overall, the accuracy of personality predictions based on the mining of digital footprints extracted from Facebook appear to be moderate, and similar to that achievable using data collected on other social media platforms.
Chapter
Health preservation is one of the urgent priorities for any group of people. There is a lot of research currently underway on diagnosing and monitoring health using data from social media. In this paper, the problem of the automatic classification of users of the Russian-language social network VK.com in terms of whether they lead a healthy lifestyle is considered. To solve this problem, various types of information was collected from user profiles: text, numerical and graphic data. The users then took a lifestyle and health survey. The results of this survey were used in order to split the users into groups according to the degree of adherence to a healthy lifestyle. The survey results were used to train various binary classifiers. The best results (about 0.76 F1-score) in our experiment were shown by a model that was trained on combined features (images from users public “walls”, as well as N-gram features compiled from text from the users public “walls”). These results were achieved using the following machine learning models “multilayer perceptron”, “naive Bayesian classifier” and “k nearest neighbours”.
Book
Full-text available
Abstract: The publication focuses on the area of language use psychology – analysis and interpretation of interpersonal communication through psychological and linguistic methods. The aim of this book is to familiarize the reader with the relationships that may be found between the verbal expression, i.e. the form and content of written communication, and the personality characteristics of the communicator, specifically via the application of quantitative computational methods. The book introduces the theoretical background of psychological-linguistic analysis, selected methods of text and communicator’s personality description, and also the design and results of three original studies on the Czech language. The aim of the publication is to offer the reader a comprehensive overview of the matter, to present contemporary research findings, and to support further scientific development of this discipline. Keywords: personality, language, verbal, analysis, quantitative
Chapter
In this era of digital evaluation, social media plays a vital role in human life. Even though it has both pros and cons, through these medium, a great amount of knowledge can be gathered to understand the personality traits of Human behavior. This paper provides an insight on different types of prediction method used using Facebook dataset for identifying personality traits. By reviewing all those works, an ontology enriched sentimental analysis concept was proposed in identifying the personality traits in Facebook.
Conference Paper
Full-text available
In this paper, we present a study of neologisms and loan words frequently occurring in Facebook user posts. We have collected a dataset of over 573 million posts written during 2006-2013 by Russian-speaking Facebook users. From these, we have built a vocabulary of most frequent lemmatized words missing from the Opencorpora dictionary (http://opencorpora.org/dict.php) the assumption being that many such words have entered common use only recently. This assumption is certainly not true for all the words extracted in this way; for that reason, we manually filtered the automatically obtained list in order to exclude non-Russian or incorrectly lemmatized words, as well as words recorded by other dictionaries or those occurring in pre-2000 texts from the Russian National Corpus (http://www.ruscorpora.ru). The result is a list of 168 words that can potentially be considered neologisms. We present an attempt at an etymological classification of these neologisms (unsurprisingly, most of them have recently been borrowed from English, but there are also quite a few new words composed of previously borrowed stems) and identify various derivational patterns. We also classify words into several large thematic areas, "internet", "marketing", and "multimedia" being among those with the largest number of words. We consider our results preliminary, but believe that, together with the word base collected in the process, they can serve as a starting point in further studies of neologisms and lexical processes that lead to their acceptance into the mainstream language.
Article
Full-text available
Understanding the nature of ‘‘evil’’ has been challenging for a number of reasons. A productive psychological approach to this problem has been to study antisocial traits associated with negative outcomes. One such approach has grouped together three antisocial personalities known as the ‘‘Dark Triad’’: Machiavellianism, Narcissism, and Psychopathy. Researchers have proposed various models to account for the common core of these antisocial personalities – a core that might well be considered the psychological equivalent of the core of ‘‘evil’’ – and these models have not been directly compared, to date. We conducted two studies (total N > 700) to compare the utility of the various models using Canonical Correlation Analyses (CCAs). Results confirm that the HEXACO personality model (and, in particular, the Honesty–Humility factor) is not only the most theoretically parsimonious model, it also best accounts for the empirical overlap between these constructs that represents the core of the Dark Triad. Results also support the idea that the core of the Dark Triad represents an alternative life history strategy.
Conference Paper
Full-text available
A sentiment index measures the average emotional level in a corpus. We introduce four such indexes and use them to gauge average " positiveness" of a population during some period based on posts in a social network. This article for the first time presents a text-, rather than word-based sentiment index. Furthermore, this study presents the first large-scale study of the sentiment index of the Russian-speaking Facebook. Our results are consistent with the prior experiments for English language.
Conference Paper
Full-text available
pymorphy2 is a morphological analyzer and generator for Russian and Ukrainian languages. It uses large efficiently encoded lexi- cons built from OpenCorpora and LanguageTool data. A set of linguistically motivated rules is developed to enable morphological analysis and generation of out-of-vocabulary words observed in real-world documents. For Russian pymorphy2 provides state-of-the-arts morphological analysis quality. The analyzer is implemented in Python programming language with optional C++ extensions. Emphasis is put on ease of use, documentation and extensibility. The package is distributed under a permissive open-source license, encouraging its use in both academic and commercial setting.
Article
Full-text available
LIWC (Linguistic Inquiry and Word Count) is widely used word-level content analysis software. It was used in large number of studies in the fields of clinical, social and personality psychology, and it is adapted for text analysis in 11 world languages. The aim of this research was to validate empirically newly constructed adaptation of LIWC software for Serbian language (LIWCser). The sample of the texts consisted of 384 texts in Serbian and 141 texts in English. It included scientific paper abstracts, newspaper articles, movie subtitles, short stories and essays. Comparative analysis of Serbian and English version of the software demonstrated acceptable level of equivalence (ICCM=.70). Average coverage of the texts with LIWCser dictionary was 69.93%, and variability of this measure in different types of texts is in line with expected. Adaptation of LIWC software for Serbian opens entirely new possibilities of assessment of spontaneous verbal behaviour that is highly relevant for different fields of psychology.
Article
Full-text available
Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.
Article
Facebook is rapidly gaining recognition as a powerful research tool for the social sciences. It constitutes a large and diverse pool of participants, who can be selectively recruited for both online and offline studies. Additionally, it facilitates data collection by storing detailed records of its users’ demographic profiles, social interactions, and behaviors. With participants’ consent, these data can be recorded retrospectively in a convenient, accurate, and inexpensive way. Based on our experience in designing, implementing, and maintaining multiple Facebook-based psychological studies that attracted over 10 million participants, we demonstrate how to recruit participants using Facebook, incentivize them effectively, and maximize their engagement. We also outline the most important opportunities and challenges associated with using Facebook for research; provide several practical guidelines on how to successfully implement studies on Facebook; and finally, discuss ethical considerations.
Article
Paulhus and Williams (2002) proposed a constellation of malevolent traits referred to as the Dark Triad (subclinical narcissism, subclinical psychopathy, and Machiavellianism). They used the Dark Triad term to raise awareness about the need for researchers across different areas of psychology to include relevant theory and assessments of all three traits when predicting behaviour. However, there still remain misunderstandings, misinformation, and misperceptions about how to disentangle the psychometric and statistical web of interconnected variance associated with these three traits. We outline the statistical approaches that have been proposed (to date) in assessing the Dark Triad and relevant outcomes, and discuss some promising future directions. This paper is intended to inspire discussion and clarification for the nebulous issue of assessing and disentangling overlapping but distinguishable traits, including the Dark Triad of personality.
Article
Purpose: Bandura's theory of moral disengagement explains how otherwise ethical persons can behave immorally. We examined whether a trait model of general personality and the "dark triad" underlay moral disengagement, the relationship these constructs have to unethical consumer attitudes, and whether moral disengagement provided incremental validity in the prediction of antisocial behaviour. Methods: Self-report data were obtained from a community sample of 380 adults via an online survey that administered all measures. Results: Correlations between unethical consumer attitudes, lower Agreeableness, lower Conscientiousness, higher moral disengagement, higher psychopathy, and higher Machiavellianism were captured by a single factor. When this broad factor was examined using regression, demographic, personality and the dark triad traits all predicted moral disengagement, specific influences being age, education, intellect, psychopathy, and Machiavellianism. A similar model examining predictors of unethical consumer attitudes again found all blocks contributed to the outcome, with specific influence provided by age, Intellect, and moral disengagement, the latter showing incremental validity as a predictor of unethical consumer attitudes. Conclusions: Moral disengagement is based on low Agreeableness, Machiavellianism and psychopathic-type traits, but provides incremental validity in predicting antisocial attitudes to a trait model alone. Narcissism is neither related to moral disengagement, nor unethical consumer attitudes.