Content uploaded by Polina Panicheva
Author content
All content in this area was uploaded by Polina Panicheva on Nov 25, 2016
Content may be subject to copyright.
Lexical, Morphological and Semantic Correlates
of the Dark Triad Personality Traits
in Russian Facebook Texts
Polina Panicheva, Yanina Ledovaya
St.Petersburg State University
St. Petersburg, Russia
ppolin86@gmail.com, y.ledovaya@spbu.ru
Olga Bogolyubova
Department of Psychology,
Clarkson University
obogolyu@clarkson.edu
Abstract—The presented project is intended to make use
of growing amounts of textual data in social networks in the
Russian language, in order to find linguistic correlates of the
Dark Triad personality traits, comprising non-clinical Narcissism,
Machiavellianism and Psychopathy. The background for the
investigation includes, on the one hand, psychological research
on these phenomena and their measurement instruments, and on
the other hand, recent advances in computational stylometry and
text-based author profiling. The measures for these psychological
phenomena are provided by recognized self-report psychological
surveys adapted to Russian. Morphological and semantic analysis
are applied to investigate the relationship between the Dark
traits and their linguistic manifestation in social network texts.
Significant morphological and semantic correlates of Narcissism,
Machiavellianism and Psychopathy are identified and compared
to respective advances in English author profiling. In order to
deepen our understanding of the relation between these psycho-
logical characteristics and natural language use, the identified
linguistic features are interpreted in terms of the fine-grained
factor structure of the Dark traits. Identifying correlated features
is a step towards automatic Dark trait prediction and early
detection of the potentially harmful mental states.
I. INTRODUCTION
Internet provides a vast amount of data, including data on
verbal behaviour of individuals and groups of users. Data on
verbal and social network usage patterns can provide insight
into numerous sociological and psychological characteristics
[1]. Text mining can assist in uncovering the potential of the
online verbal data, with the latest works in the field describing
psychological profiling in a multilingual setting [2].
The empirical study in question is a part of a larger research
project aimed to explore the relations among online and
offline stressful experience, psychological well-being, negative
personality traits and the language a person uses in online com-
munication1. To measure the negatively oriented personality
traits and thus the general possibility of a person’s misbehavior,
two questionnaires were chosen: the Short Dark Triad scale
[3], [4] and the Propensity for Moral Disengagement scale [5].
These two scales have already been used to study the predictors
of unethical behavior in English [6]. Moreover, the three traits
- Narcissism, Machiavellianism and Psycopathy - having the
1"A cross-cultural study of the markers of stress, health and well-being
in social networks" - the research grant of St. Petersburg State University
#8.38.351.2015.
lack of empathy as their core characteristic, are considered very
fruitful in the studies of malevolence, which is the primary goal
of the larger research project [7], [8]. The negatively-marked
personality characteristics have been successfully studied in
English language data [9], [10]. Besides, the Short Dark Triad
scale has been recently translated and adapted into Russian
[11].
We present a linguistic approach to investigation of the
Dark Triad personality traits using natural language processing.
We have launched a Facebook app which gathers textual data
and asks the participants to fill in a survey, thus annotating texts
with levels of Narcissism, Machiavellianism and Psychopathy
of the authors provided by the survey results. We propose
a number of morphological and semantic features, identify
significant correlates of the Dark traits among them, provide
interpretation of the results and relate them to similar findings
in different languages.
The paper is organized as follows: Section II contains
an overview of similar approaches to author profiling and
the Dark Triad research; Section III describes our dataset;
Section IV is a description of the statistical analysis procedure;
Sections V and VI contain the results obtained, their detailed
interpretation, and overall conclusions.
II. RELATED WORK
A. Linguistic Inquiry and Word Count approach
A widely known approach known as Linguistic Inquiry and
Word Count (LIWC) [12] has been developed for English and
other languages [13], [14]. The main idea of the approach
is that words are grouped into psychologically meaningful
categories, and specific counts of the categories in a person’s
text can be interpreted in psychological terms and linked
to their psychological profile. Lexical items are first divided
into function words and content words, where the former are
considered as meta-behavioural information on how the author
thinks and communicates, and the latter providing information
on what the topics of concern are in their texts [15]. Content
and function words are manually grouped to obtain predefined
top-down categories, which are accounted for in texts and used
for author profiling. Content categories include psychological
- social, cognitive, biological, affective - processes, positive
and negative sentiment, topics of personal concern (work,
leisure, home, money); function words include auxiliary parts
of speech like pronouns, quantifiers, articles, and a number of
verb categories - tense, person, number; the total number of
categories reaches 80.
B. Russian author profiling
There has been a significant body of work on the LIWC
approach to word psychometrics, including eastern [13] and
slavic [14] languages. LIWC for Russian [16] has been devel-
oped, but has not undergone specific validation by the authors.
In the Russian language the work on linguistic author
profiling has been mostly confined to the clinical scope of
mental disorders and a manual descriptive framework, with
the focus on interpretative diagnostic power [17], [18]. Recent
exception is a work involving written text samples by 500
Russian participants [19]. The Russian version of LIWC was
applied in this work: it was tested for author-specific word
count stability and applied to author profiling with gender and
Big Five personality characteristics [19], not involving social
media texts.
C. Russian LIWC approach revisited
Despite the altogether relevant and fruitful LIWC approach
giving indispensable insights from different languages, there
is an issue with the LIWC approach in Russian: it has been
developed as a direct translation from English, preserving
English-specific category and algorithm structure.
First of all, a division of features into function and content
words, while basically reflecting the state of affairs in English,
can be misleading for Russian: functional categories are largely
represented in Russian by morphological properties of content
words, i.e. number, person categories of verbs. Second, the
LIWC algorithm is word-base oriented: a word is assigned a
certain category if its base matches a dictionary word. Such
an approach is reasonable for languages with low syntheticity,
where a word base often equals the word. Russian language lies
at the opposite end of the spectrum with high syntheticity and
high fusion, which means that words are formed using a high
number of affixes, and they are often fused, making word form
and meaning non-additive (see [20] and later development,
i.e. [21], [22]). High syntheticity and fusion result in the fact
that a single lexical or grammatical meaning is not as often
represented by a single word base, as in English. A word-base
dictionary approach would thus fail to account for a lot of
functional and content phenomena. It is necessary to introduce
an additional feature category containing morphological and
some lexical features in order to represent the phenomena
which cannot be accounted for in a simple dictionary structure.
We apply a bottom-up approach to content and functional
categories: the former are automatically bootstrapped from
corpus data using distributional semantic techniques; the latter
are based on the vast morphological information in Russian.
The outlined bottom-up approach allows to omit time- and
resource-consuming procedure of manual classification, while
retaining the Russian-specific semantic and morphological
category structure.
D. Facebook language data
A similar data-driven approach to word, phrase and content
topic features correlated with age, gender and the Big Five per-
sonality traits of Facebook users is presented in [23]. Current
work explores the same consideration that an open-vocabulary
approach to lexical features could be more revealing than a pre-
set top-down category list. We also follow [23] in generating
content categories automatically to include most of the vocabu-
lary. However, there is a number of significant differences: first,
analysing texts in Russian requires scrupulous normalization
or morphological analysis. Second, in the current work content
categories are generated based on large balanced Russian
corpora, and not on the social network data obtained, as the
latter dataset is too small and too specific to represent general
semantic categories.
There has been a considerable amount of work done using
Facebook data in Russian, where a large amount of data
(3M+ users, 550M+ posts) was gathered from Facebook and
applied to the primarily linguistic tasks of Sentiment Analysis
and semi-automatic identification of neologisms [24], [25].
However, it is important to notice the difference of our work
in terms of research goals and respective data domains: while
covering a much smaller number of authors, we are concerned
with very fine-grained data containing personal psychological
questionnaire results. The latter data is considerably more
complicated to obtain both technically and from the point of
view of ethical issues, which currently prevents the resulting
corpus from being freely distributed as open access data.
To conclude, the current work presents the first attempt to
gather and explore psychological and linguistic data in Russian
Facebook. To our knowledge, it is the first attempt of using
word-embeddings semantic models to generate meaningful
categories for author profiling. It is also the first work to
approach semantic and morphological correlates of the Dark
traits in the Russian language.
E. The Dark Triad personality traits
Our analysis is focused on the Dark Triad personal charac-
teristics. These are related but distinct sub-clinical categories,
where Narcissism is primarily associated with self-focus and
grandiosity, Psychopathy with impulsiveness, aggression and
asocial behaviour, and Machiavellianism with manipulating.
Lack of empathy is reported to be the common feature of the
Dark Triad [4]. The following factor structure is ascribed to
the Dark traits:
•Narcissism is described as a combination of Exploita-
tiveness/Entitlement and Leadership/Authority [4].
•Machiavellianism is a multi-dimensional but contro-
versial construction; the factors accepted in [4] are
Machiavellian Tactics and Cynical Worldview.
•Psychopathy is reported to incorporate Manipulation,
Callous Affect, Erratic Lifestyle, Antisocial Behaviour
[4].
The Dark Triad questionnaire has been recently adapted to
Russian and has undergone language-specific validation [11].
Linguistic correlates of Psychopathy have been effectively
analysed on English-language data in a clinical context [10].
It involved crime narratives around 2500 words each by
14 psychopaths and 52 controls. Authors of [26] describe
LIWC-based analysis of Twitter texts by 5700 control group
participants and 150-450 individuals labelled with a diagnosis,
each participant being associated with 25 to 3,200 Tweets. An
important contribution of this work is interpretable analytics
combining the users’ verbal and non-verbal online behaviour.
A study of the Dark Triad correlates in English has
been performed with 2,500 authors of Twitter texts [9]. The
positive content correlates included anger, swear and negative
emotion words for Psychopathy and Machiavellianism, sex-
related words for Narcissism. Narcissism was also positively
correlated with function symbols (@, #) used in Twitter to
denote relations between users and between users and topics,
thus modelling a meta-category of social relation symbols in
Twitter. Negative correlates included positive emotion and ’we’
for Machiavellianism.
III. DATA COLLECTION
1972 Facebook users participated in the study by complet-
ing the Dark Triad Scale [11] and providing consent to share
their publicly available posts. This questionnaire is a validated
self-report measure of the so called dark personality traits:
Narcissism, Machiavellianism and subclinical Psychopathy.
Each trait is assessed by 9 questions about the users’ attitude
and behaviour towards themselves and others. The answers are
situated on a 5-point Likert scale representing the degree of
agreement or disagreement with the statements in question.
The total score for each subscale is divided by 9 (the number
of questions) resulting in a score range for each scale from 1
to 5.
The application with the questionnaire had been advertised
on Facebook2. The public posts have been gathered, with text
citated or written by the users themselves, repost information
being out of scope of the current work. The obtained dataset
consists of 7.67 posts on average for each participant, standard
deviation = 5.69. This is on average 24.77 sentences (std =
38.13) or 311.99 tokens (std = 565.56) per participant. The
volume of posts by each author was technically restricted by
Facebook API3), which only allowed for a definite number
of latest posts to be downloaded; only the posts containing
personal comments by the participant were included in the
study.
The volume of text by each author is modest comparing
to previous studies; however, the numbers of authors are
comparable [26], [9]. Preliminary experiments on automatic
classification and regression have confirmed that a larger
dataset is necessary for significant Machine Learning results,
which we proceed to obtaining in future work. On the other
hand, the number of Facebook users in the study allows to
make exploratory conclusions based on statistical analysis,
allowing for fruitful psychological interpretation.
Text volume by each author in tokens and sentences is
significantly negatively correlated with Machiavellianism (p
<0.01) and positively with Narcissism (p<0.05). Text volume
characteristics can’t be directly interpreted, as they depend on
2https://apps.facebook.com/psytest/
3https://developers.facebook.com/docs/javascript/reference/v2.6
the technical restriction on the number of downloadable posts;
however, they will affect further results interpretation.
IV. STATISTICAL ANA LYSIS
All the data have been processed with PyMorphy2 mor-
phological package [27] using the default morphological dis-
ambiguation option, unigram statistics. We apply Spearman’s
correlation coefficient, as we are primarily interested in finding
monotonous relationship between linguistic items and the Dark
Triad measure, not limited to the linear correlation identified
by Pearson’s r.
A. Statistical significance correction
An important issue addressed in this work is the signifi-
cance of the statistical results. Spearman’s correlation reflects
r, the correlation value ranging from -1 to 1, with 0 indicating
no correlation, and p, indicating the degree to which the
current rvalue could be obtained by chance in a random
sample. It has been shown in previous linguistic profiling
works with different sample sizes [9], [23] that the values of r
do not usually exceed 0.2 in absolute value. This supports the
consideration that word categories tend to be sparse in multi-
purpose social network texts, with very high correlation values
between word counts and personal characteristics appearing a
very superficial case.
However, there is another strong correlation significance
filter which applies to the p-value. It is a well-known issue that
in multiple-hypothesis testing the p-values must be adapted
[28]. The intuition behind multiple-testing correction is that
when evaluating correlation with a large number of features,
a small portion of random features obtain statistically signif-
icant correlation by chance. In order to eliminate the random
effects of numerous hypotheses, various statistical filters are
suggested. E.g., the Bonferonni correction procedure requires
the resulting p-values in multiple-hypothesis testing to be
multiplied by the hypothesis number, thus allowing a much
lower number of p-values to pass the level of p< 0.01/0.05
[23].
A surprising number of state-of-the-art works in author
profiling do not mention applying multiple-hypothesis correc-
tion procedures ([12], [9], [13], [19]). We find it necessary to
apply a filtering procedure to our results, as the number of
lexical features exceeds 19K, and content and morphological
features reach 184 and 64 respectively. As the Bonferroni
correction is reported to be too stringent, resulting in a portion
of false rejections [23], we apply the Benjamini-Hochberg false
discovery rate procedure for multiple hypothesis testing(FDR)
[28]. This allows to control for statistically significant results
in the current setting of a modest dataset size with a large
number of correlated features.
B. Numeric characteristics
Spearman’s correlation (r(1,972)) was applied to self-
reported Dark Triad measures and text-length, lexical and mor-
phological features. Average sentence length and post length in
sentences and tokens reveal some significant correlations with
the Dark Triad measures, see Table 1: significant correlations
are highlighted in italics (p<0.05) and bold (p<0.01).
Table I. SPEARMAN’S CORR ELATION S BETWE EN AVER AGE P OST
LENGT H FEATURE S AND PER SONALI TY S COR ES
Text feature Na Ma Ps
Sentence length 0.022 -0.057 0.006
Post length, sentences 0.054 -0.109 -0.04
Post length, tokens 0.045 -0.101 -0.02
C. Lexical features
Normal forms of words constitute lexical features. Out
of 19K lexemes occurring in the texts there are around 150
correlated words for every Dark Trait; these are exemplefied
in Table II. However, the FDR correction procedure rejects
their significance based on the current dataset.
D. Morphological features
Morphological features are based on Py-
Morphy tagset [27], list of tags is available at
http://opencorpora.org/dict.php?act=gram. Along the lines of
LIWC parameters [15], the features include the following:
•All parts of speech;
◦auxiliary parts of speech - preposition, con-
junction, particle, interjection - are also
grouped together;
•Person and number, standalone and grouped with
POS;
•Verb modality features:
◦voice: active, passive;
◦mood: indicative, imperative;
◦tense: present, past;
◦reflexivity;
•Named entity features:
◦name, surname, patronymic;
◦organization, trademark, geographical location,
abbreviation;
•Adjective features:
◦short, full;
◦qualitative;
◦superlative;
•Possessive pronouns;
•Style characteristics:
◦vernacular, slang words.
Morphological correlates are illustrated in Table IV.
E. Generalized content features
In order to obtain semantically interpretable features and
reduce the number of testing hypotheses, we apply clustering
based on word-embeddings semantic modelling.
To reduce clustering evaluation and leave out obscure
and rare items we only cluster the words which occur in
at least 10 authors’ texts. Thus we obtain a set of 3,700
lexical items to be clustered. We use a Skip-Gram Word2Vec
model trained with the Russian National Corpus data. We
intentionally apply RNC and not a web-trained model, as the
goal is to capture established semantic regularities interpretable
in terms of general semantic categories, while describing web
language peculiarities is a different task.
The clustering techniques applied in this task have been
compared in [29]. The optimal algorithm for the current
task is K-means with Euclidean distance, yielding the most
homogeneous and precise clusters. The optimal number of
clusters for manual labelling, evaluation and interpretation of
the current data is 20 words per cluster, i.e. 184 clusters. Other
clustering algorithms and parameters have been applied in
preliminary experiments; resulting in various cluster sizes and
slightly different cluster contents, different algorithms maintain
the basic significant topics unchanged.
The clusters have been manually labelled with a concept
comprising their members. Function words, numerals and
unknown words are out of scope of the semantic model and
out of the clusters.
The generalized semantic correlates are illustrated in Table
III. Table V (see Appendix) contains contents of the signifi-
cantly correlated clusters.
V. RESULTS AND INTE RP RE TATION
A. General considerations
Russian-speaking Facebook users with higher scores on the
scale of Machiavellianism are less likely to produce posts; the
length of their posts and sentences is shorter in comparison
with users characterized by lower scores of Machiavellianism.
The posts and sentences of Facebook users scoring high on
Narcissism are significantly longer than those of users scoring
low on Narcissism. This is consistent with psychological mod-
els of Machiavellianism and Narcissism, wherein individuals
with prominent machiavellianistic traits would be more likely
to closely guard their personal image and avoid oversharing as
a result of careful strategic planning [30], [4], and individuals
with high Narcissism would tend to be more ’ego-promoting’,
i.e. expansive and needing to attract attention to their actions
and personality [31], [4].
Because of the modest amount of textual data, lexical
correlates of the Dark traits and morphological correlates
of Narcissism and Psychopathy have been filtered out by
the statistical significance correction procedure. However, by
presenting a number of significant semantic correlates for
every Dark trait we have shown that semantic clustering
is a useful, intuitive and interpretable approach to reducing
feature dimensionality in case of a limited dataset, high lexical
diversity and sparsity.
B. Narcissism
High Narcissism values are characterized by the following
correlated items:
1) Social involvement and communication importance
is present in clusters denoting social interaction as
Appeal, Take_give. These features replicate the find-
ings reported in [9], where high Narcissism was
characterized by social involvement in terms of high
correlations with ’Friends’ category and with punc-
tuation symbols (@, #) denoting social interaction in
Twitter.
Table II. LEXICAL CORRELATES OF THE DARK PERS ONA LITY TR AIT S
Narcissism Machiavellianism Psychopathy
Language Positive, 129 Negative, 10 Positive, 16 Negative, 160 Positive, 136 Negative, 13
Russian масса, мой,
одновременно,
сокращение,
важный, целый,
слово,
благодарность,
решение, смс, я,
президент,
собственный,
спасибо, дурной,
править, забытый,
...
дохлый,
посвящаться,
доставать,
подмосковье, снизу,
мент, 1993, nice,
ерунда, la
russia, оригинал,
стройный, игил,
новороссийск,
инвестиция, жарко,
замуж, бродить,
беспомощный,
выход, фотка,
новыйгод,
президент,
провайдер, трое
и,себя,очень,
любить,чтобы,
сердце,друг,
каждый,мы,
понять,много,
физический,
господь,чувство,
война, ...
сша, российский,
путин, нация,
президент,
пользователь,
привязать,
нормальный,
славянский,
заканчиваться,
деньга,масса,
институт,давно,
состав,признать,
рубль, ...
порадоваться,
даровать,
предновогодний,
подводить,
задуматься,друг,
вдохновлять,
крыло,мел,
раздавать, nice,
наделить,
внимательный
English
translation mass, my,
simultaneously,
reduction, important,
whole, word,
gratitude, decision,
sms, I, president, own,
thank you, silly, rule,
forgotten, ...
dead, dedicated, reach,
Moscow suburbs,
below, police-
man(derogatory),
1993, nice, nonsense,
la
russia, original, slim,
isis, Novorossijsk city,
investment, hot,
married, wander,
helpless, exit, photo,
newyear, president,
provider, three
and, self, much,
love(verb), in order to,
heart, friend, every,
we, understand, a lot,
physical, god, feeling,
war, ...
usa, russian, Putin,
nation, president, user,
tie(verb), normal,
slavic, be over, money,
mass, institute, long
ago, content,
acknowledge, ruble, ...
be glad, grant, new
year eve’s, betray,
think, friend, inspire,
wing, chalk, hand out,
nice, endow, attentive
Table III. MORPHOLOGICAL CORRELATES OF THE DARK PERS ONA LITY TR AIT S, *p<0.05, FDR-CORRECTED
Narcissism Machiavellianism Psychopathy
Cluster Correla-
tion Cluster Correla-
tion Cluster Correla-
tion
Verb_imperative
Interjection
Pronoun_2person_plural
Pronoun_1person_singular
2per_plural
Posessive
Punctuation
Verb_2person_plural
0.073
0.065
0.065
0.062
0.059
0.050
0.047
0.046
Patronymic*
1person_plural*
Verb_1person_plural*
3person_plural*
Pronoun_1person_plural*
3person_singular*
Adjective_short*
Pronoun_3person_plural*
Verb_3person_plural*
Pronoun_2person_plural*
Verb_3person_singular*
Adjective_full*
Participle_full*
Adverbial_participle*
2person_plural*
Pronoun_3person_singular*
Surname
Qualitative
Active
Participle_active
Participle_passive
Name
Verb_2person_plural
Conjunction
Pronoun
Posessive
-0.083
-0.074
-0.073
-0.071
-0.071
-0.069
-0.068
-0.066
-0.066
-0.064
-0.062
-0.062
-0.060
-0.060
-0.059
-0.058
-0.056
-0.054
-0.053
-0.053
-0.052
-0.050
-0.048
-0.046
-0.045
-0.045
Vernacular
Slang
Organization
Adjective_short
0.049
0.049
0.047
-0.045
2) Importance of self-image and status is present in
High_low word cluster.
3) Goal-focus, achievement, competency are stressed by
the Reasoning cluster.
With regard to the accepted factor structure of Narcissism
[4], goal focus applies to Leadership/Authority, the status fea-
tures are related to Exploitativeness/Entitlement, while social
involvement can be related to both Leadership and Exploita-
tiveness.
C. Machiavellianism
Lexical, semantic and morphological features correlated
with Machiavellianism follow a single pattern of mostly neg-
ative correlation. The significant topics include the following:
1) Social involvement, communication and relationship
issues are particularly rare in high Macchiavellian-
ism:
•clusters Affirm, Friend, Feeling_vb, Ten-
der_adj, male and female names;
•numerous morphological features: first per-
son plural verbs and pronouns, some names
(patronymics), reference to other people with
third and second person verbs and pronouns.
The negative social communication feature was also
reported as prominent in Machiavellianism in [9],
represented by negative correlations with first person
plural pronoun, family and social processes word
categories.
2) Positive affect is less likely to occur with high Mac-
chiavellianism according to the negatively correlated
clusters Wellbeing, Impress, confirming the findings
by [9] of negatively correlated affective processes and
positive emotion words.
3) Issues of mental processing are negatively repre-
sented by the clusters Faith, Religion, Perception,
Table IV. SEM AN TIC C LU STE R CORRE LATE S (P<0.01) OF TH E DAR K PER SONALITY T RAITS , *p<0.05, **p<0.01, FDR-CORRECTED
Narcissism Machiavellianism Psychopathy
Cluster Correlation Cluster Correlation Cluster Correlation
Appeal*
Take_give*
High_low*
Reasoning*
Authority
Monument
Goal
Passion
Pos_quality
Event
Feeling
Want
Casual
Perfection
Regulations
0.080
0.076
0.073
0.073
0.068
0.068
0.068
0.066
0.066
0.064
0.061
0.059
0.059
0.058
0.058
Faith**
Wellbeing**
Friend**
Verbs**
Sensation**
Difficult**
Affirm**
Appearance**
Material**
Tender_adj**
Space**
Neg_action*
Female_name*
Male_name*
Trouble*
Perception*
Feeling_vb*
Impress*
Religion*
Face_part*
Water_object*
Body_situation*
Furniture_object*
Number*
Being*
Wild*
Citizen*
Anniversary*
Clothing*
Age*
-0.101
-0.094
-0.094
-0.085
-0.084
-0.083
-0.083
-0.083
-0.083
-0.083
-0.081
-0.076
-0.076
-0.076
-0.070
-0.070
-0.070
-0.064
-0.063
-0.062
-0.062
-0.062
-0.062
-0.062
-0.062
-0.062
-0.061
-0.061
-0.060
-0.059
Money**
Food*
Money_affair*
Political*
Authority
Virtual
Sky
Friend
Powerful_male
Money_operation
0.110
0.08
0.075
0.074
0.071
0.069
-0.068
-0.068
0.066
0.064
Sensation, Appearance.
4) Negatively correlated pronouns and verbs indicate
personal detachment and formality in speech ([15]).
5) A number of negatively correlated cluster cate-
gories of common, casual topics (Verbs, Neg_action,
Trouble, face- and body-related words, Citizen, Age,
Being) suggest an overall decrease in general, daily
issues-related speech. This goes in line with an
overall decrease in post number and volume, as
Machiavellianism increases.
Based on the accepted factors of Machiavellianism described
in [4], negatively correlated general topics and an overall
lack of positive correlates are in line with the Machiavellian
cynical views on the world and people, leading to lack of self-
disclosure. The same is true for lacking positive emotional ex-
pressions, mental processing features and social involvement.
Personal detachment and formality in speech may apply to
Cynical Worldview and to Machiavellian tactics, as they can
represent deceptive speech.
D. Psychopathy
1) High concern for basic needs in Psychopathy is
demonstrated by a number of money and food-related
clusters (Money, Money_affair, Food). This confirms
strong ’Sex’ and ’Body’ correlates of Psychopathy
reported in [9].
2) Prevailing political and authority issues are repre-
sented by the Political word cluster.
According to the factor structure of Psychopathy [4], polit-
ical and authority focus displays Manipulation, while Callous
Affect and Antisocial Behavior are represented in high concern
for basic needs.
Statistically identified linguistic features show modest cor-
relation measures, however, both statistical significance and
correlation levels replicate those identified in an English-
speaking sample of comparable and larger sizes [9], [23].
Statistical significance procedures have shown that larger text
samples are necessary in future research to add to lexical
and morphological findings and confirm the list of significant
semantic features.
VI. CONCLUSIONS AND FUTURE WORK
We have developed semantic and morphological psycho-
metric categories based specifically on Russian language and
a statistical processing tool based on PyMorphy analyser [27].
The psychometric tool has been applied to texts of Russian
Facebook users, who also filled a self-report questionnaire on
the levels of the Dark Triad personality traits. We have iden-
tified significant morphological and semantic correlates of the
Dark traits, and interpreted them in terms of larger linguistic
categories. The most prominent categories have replicated the
findings reported in previous work on the Dark Triad in English
language. Finally, most of the revealed linguistic features are
related to psychological factors of the Dark traits.
The statistical bottom-up approach has proven to be a
plausible and fruitful research method. The developed morpho-
logical and semantic categories represent Russian language pe-
culiarities and give informative results comparable to English-
language LIWC-based research.
The described significant morphological and semantic pat-
terns will supply a deeper understanding of the psychological
characteristics of individuals with prominent Dark traits. To
our knowledge there have been no previous studies of the lan-
guage of the Dark traits in Russian samples. We may assume
that the increased understanding of the linguistic characteristics
of Narcissism, subclinical Psychopathy and Machiavellianism
in Russian samples will enhance the theoretical psychological
model of the Dark Triad. The local cultural and national
features mirrored in the language will bring more detailed
descriptions of these individuals, which may be used to identify
Dark trait patterns in counselling, psychotherapy, forensic
assessments and negotiation procedures.
In order to enhance the list of significant linguistic features,
the following research step will involve larger amounts of
textual data. The proposed categories are subject to refinement:
the semantic categories are to be associated with LIWC-based
categories, specifically those denoting psychological processes
and sentiment; the morphological and lexical categories should
be enhanced by applying a dataset with larger text sample per
participant.
Our next research steps involve expanding the dataset to
allow automatic personality identification, i.e. regression and
classification of the Dark trait levels based on the Facebook
text collection.
ACKNOWLEDGMENTS
The reported study is supported by Saint-Petersburg State
University research grant 8.38.351.2015 "A cross-cultural
study of the markers of stress, health and well-being in social
networks".
REFERENCES
[1] M. Kosinski, S. C. Matz, S. D. Gosling, V. Popov, and D. Stillwell,
“Facebook as a research tool for the social sciences: Opportunities,
challenges, ethical considerations, and practical guidelines.” American
Psychologist, vol. 70, no. 6, p. 543, 2015.
[2] B. Verhoeven, B. Plank, and W. Daelemans, “Multilingual personality
profiling on twitter,” To be presented at DHBenelux 2016, Belval,
Luxembourg, 09/06/2016 In Press.
[3] D. L. Paulhus and K. M. Williams, “The dark triad of personality:
Narcissism, machiavellianism, and psychopathy,” Journal of research
in personality, vol. 36, no. 6, pp. 556–563, 2002.
[4] D. N. Jones and D. L. Paulhus, “Introducing the short dark triad (sd3)
a brief measure of dark personality traits,” Assessment, vol. 21, no. 1,
pp. 28–41, 2014.
[5] C. Moore, J. R. Detert, L. KLEBE TREVIÑO, V. L. Baker, and
D. M. Mayer, “Why employees do bad things: Moral disengagement
and unethical organizational behavior,” vol. 65, no. 1, pp. 1–48.
[Online]. Available: http://onlinelibrary.wiley.com/doi/10.1111/j.1744-
6570.2011.01237.x/full
[6] V. Egan, N. Hughes, and E. J. Palmer, “Moral
disengagement, the dark triad, and unethical consumer
attitudes,” vol. 76, pp. 123–128. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S019188691400703X
[7] A. Book, B. A. Visser, and A. A. Volk, “Unpacking “evil”: Claiming
the core of the dark triad,” vol. 73, pp. 29–38. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0191886914005182
[8] A. Furnham, S. Richards, L. Rangel, and D. N. Jones,
“Measuring malevolence: Quantitative issues surrounding the dark
triad of personality,” vol. 67, pp. 114–121. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0191886914000932
[9] C. Sumner, A. Byers, R. Boochever, and G. Park, “Predicting dark
triad personality traits from twitter usage and a linguistic analysis of
tweets,” in Machine Learning and Applications (ICMLA), 2012 11th
International Conference on, vol. 2. IEEE, 2012, pp. 386–393.
[10] J. T. Hancock, M. T. Woodworth, and S. Porter, “Hungry like the wolf:
A word-pattern analysis of the language of psychopaths,” Legal and
Criminological Psychology, vol. 18, pp. 102–114, 2013.
[11] M. Egorova and M. Sitnikova, “The dark triad,” Psikhologicheskie
Issledovaniya, vol. 7(38), p. 12, 2014. [Online]. Available:
http://psystudy.ru/index.php/num/2014v7n38/1071-egorova38.html
[12] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry
and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates,
vol. 71, 2001.
[13] C. H. Lee, K. Kim, Y. S. Seo, and C. K. Chung, “The relations between
personality and language use,” The Journal of general psychology, vol.
134, no. 4, pp. 405–413, 2007.
[14] J. Bjeki´
c, L. B. Lazarevi´
c, M. Živanovi´
c, and G. Kneževi´
c, “Psycho-
metric evaluation of the serbian dictionary for automatic text analysis–
liwcser,” Psihologija, vol. 47, no. 1, 2014.
[15] Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning
of words: Liwc and computerized text analysis methods,” Journal of
language and social psychology, vol. 29, no. 1, pp. 24–54, 2010.
[16] A. Kailer and C. K. Chung, “The russian liwc2007 dictionary,”
LIWC.net, 2011.
[17] N. Zvereva, E. Mikhaleva, S. Nosov, and Y. Y. Nikitina, “Eksperimen-
tal’noe issledovanie osobennostei rechevoi deyatel’nosti u muzhchin,
bol’nykh shizofreniei.[elektronnyi resurs][experimental research of fea-
tures of speech activity in men with schizophrenia],” Meditsinskaya
psikhologiya v Rossii [Medical psychology in Russia], no. 4, 2011.
[18] P. Y. Zavitaev, “Autism: a clinical-semantic and experimental-
psychological investigation, [autism: kliniko-semanticheskoe i
eksperimental’no-psichologicheskoe issledovanie], in russian,” Russian
Journal of Psychiatry, [Rossiyskiy Psikhiatricheskiy Zhurnal], vol. 5,
pp. 44–48, 2007.
[19] T. Litvinova, O. Litvinova, Y. Ryzhkova, Y. Biryukova, P. Seredin,
and O. Zagorovskaya, “Studying influence of author’s gender and
psychological characteristics on quantitative parameters of text using
“linguistic inquiry and word count” program,” Nauˇcnyj dialog (Scientific
dialogue), in Russian, p. 101, 2015.
[20] J. H. Greenberg, “A quantitative approach to the morphological typology
of language,” International journal of American linguistics, vol. 26,
no. 3, pp. 178–194, 1960.
[21] A. Pirkola, “Morphological typology of languages for ir,” Journal of
Documentation, vol. 57, no. 3, pp. 330–348, 2001.
[22] J. Siegel, B. Szmrecsanyi, and B. Kortmann, “Measuring analyticity
and syntheticity in creoles,” Journal of Pidgin and Creole Languages,
vol. 29, no. 1, pp. 49–85, 2014.
[23] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M.
Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E.
Seligman et al., “Personality, gender, and age in the language of social
media: The open-vocabulary approach,” PloS one, vol. 8, no. 9, p.
e73791, 2013.
[24] A. Panchenko and others, “Sentiment index of the
russian speaking facebook,” pp. 506–517. [Online]. Available:
http://dial.uclouvain.be/handle/boreal:160359
[25] N. Muravyev, A. Panchenko, S. Obiedkov, and others,
“Neologisms on facebook,” pp. 440–454. [Online]. Available:
http://dial.uclouvain.be/handle/boreal:160352
[26] G. Harman, M. Coppersmith, and C. Dredze, “Quantifying mental
health signals in twitter,” ACL 2014, p. 51, 2014.
[27] M. Korobov, “Morphological analyzer and generator for russian and
ukrainian languages,” in Analysis of Images, Social Networks and Texts.
Springer, 2015, pp. 320–332.
[28] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate:
a practical and powerful approach to multiple testing,” Journal of the
royal statistical society. Series B (Methodological), pp. 289–300, 1995.
[29] P. Panicheva, Y. Ledovaya, and O. Bogoliubova, “Revealing interpetable
content correlates of the dark triad personality traits,” in Russian
Summer School in Information Retrieval, 2016, accepted for publication.
[30] J. F. Rauthmann and G. P. Kolar, “How “dark” are the dark triad traits?
examining the perceived darkness of narcissism, machiavellianism, and
psychopathy,” Personality and Individual Differences, vol. 53, no. 7, pp.
884–889, 2012.
[31] R. Raskin and H. Terry, “A principal-components analysis of the
narcissistic personality inventory and further evidence of its construct
validity.” Journal of personality and social psychology, vol. 54, no. 5,
p. 890, 1988.
Table V. SIGNIFICANT CLUSTER CONTENTS (IN ALPHABETICAL ORDER)
Cluster Typical contents
Russian English translation
Affirm зарегистрировать называть составлять утверждать register call constitute affirm
Age возраст детство здоровье молодость прошлое ранний старость age childhood health youth past early
Anniversary поздравлять праздник рождение рождество свадьба юбилей congratulate holiday birth christmas marriage jubilee
Appeal вызов жалоба задание заказ заявление обещание обращение ответ call complaint task order statement promise appeal answer
Appearance внешность выглядеть гордый красавец красивый худой шикарный appearance look proud beauty beautiful slim chic
Being бытие вселенная духовность истина мир природа реальность being universe spirituality truth world nature reality
Body_situation висеть вставать держаться лежать посадить сидеть ставить стоять hang stand up hold lie sit down sit put stand
Citizen американец гражданин еврей китаец немец россиянин француз american citizen jew chinese german russian french
Clothing ботинок костюм куртка носок обувь одежда платье шляпа штаны boot costume jacket sock shoes clothes dress hat trousers
Difficult важный необходимый полезный сложный трудный успешный important inevitable useful difficult hard successful
Face_part бровь веко взгляд глаз лицо рожа рот улыбка eyebrow eyelid look eye face face(derogatory) mouth smile
Faith ангел благодать бог господний дар дух дьявол молитва откровение angel grace god godly gift spirit devil prayer revelation
Feeling_vb верить восхищаться гордиться доверять дружить любить ненавидеть believe admire be proud trust be friends love hate
Female_name анна вера виктория екатерина елена мария надежда наталья anna vera victoria ekaterina elena maria nadezhda natalia
Food блин борщ вкусный картошка каша масло молоко пища салат суп pancake tasty potato porridge butter milk food salad soup
Friend друг забытый знакомый приятель родные сосед чужой friend forgotten familiar friend relative neighbour stranger
Furniture_object дверь доска дыра замок крыша портал потолок стена табличка труба door board hole lock roof portal ceiling wall sign pipe
High_low вершина высокий выше ниже низкий повышение средний peak high higher lower low advancement medium
Impress вдохновлять великолепный впечатлять выдающийся удивительный inspire magnificent impress outstanding amazing
Male_name андрей борис валерий василий виктор владимир дмитрий иван andrey boris valeriy vasiliy victor vladimir dmitriy ivan
Material асфальт зеркало камень металл окно посударазбитый с текло asphalt mirror stone metal window dishes broken glass
Money валюта деньга доллар дорого евро копейка марка рубль currency money dollar expensive euro kopeck mark ruble
Money_affair бюджет вклад доход зарплата кредит налог оплата расход budget deposit income salary credit tax payment expense
Neg_action жестокость измена коррупция ложь месть нарушение насилие обман cruelty treason bribery lie revenge violation violence fraud
Number восемь двадцать девять десять пара пятнадцать пять тридцать четыре eight twenty nine ten pair fifteen five thirty four
Perception влиять воспринимать касаться осознавать относиться ощущать influence perceive regard recognize relate feel
Political власть вождь государство демократия партия политика правительство power leader state democracy party politics government
Reasoning вывод заключение мнение осознание понимание решение inference conclusion opinion awareness understanding decision
Religion возлюбить господь иисус милость молиться прощать сотворить love god jesus mercy pray forgive create
Sensation воспоминание впечатление иллюзия мысль ощущение шок эмоция memory impression illusion thought sensation shock emotion
Space близко вне внизу внутри возле вокруг позади посреди соседний close outside below inside near around behind amid next
Take_give брать взять давать держать забирать нести отдавать хватать take take give hold take away carry give away grasp
Tender_adj вежливый внимательный добрый ласковый любящий нежный polite attentive kind tender affectionate loving
Trouble беда война несчастье неудача перемена подвиг разрушение trouble war disaster failure change achievement destruction
Verbs бывать видеть познакомиться помнить сделать случаться создать be see meet remember do happen create
Water_object берег болото вода канал море озеро океан остров пруд река bank marsh water channel sea lake ocean island pond river
Wellbeing благополучие комфорт отдых покой равновесие спокойствие уют wellbeing comfort rest peace balance tranquillity cosiness
Wild агрессивный быстрый крутой мощный острый сильный смелый aggressive fast hard mighty sharp strong courageous