ArticlePDF Available

Abstract and Figures

Background / Introduction. This work explores the relationship between a person's demographic/psychological traits (e.g., gender, personality) and self-identity images and captions. Methods. We use a dataset of images and captions provided by N ≈ 1, 350 individuals, and we automatically extract features from both the images and captions. Results. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. Additionally, we consider the task of predicting gender and personality using both single-modality features and multimodal features. Conclusions. We show that a multimodal predictive approach outperforms purely visual methods and purely textual methods. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day (Meeker M, 2014. Internet trends
Content may be subject to copyright.
Analyzing Connections Between User Attributes, Images, and
Text
Laura Burdick1, Rada Mihalcea1, Ryan L. Boyd2, and James W. Pennebaker§3
1Department of Computer Science and Engineering, University of Michigan (Bob and
Betty Beyster Building, 2260 Hayward Street, Ann Arbor, MI 48109-2121)
2Department of Psychology, Lancaster University (Lancaster, United Kingdom, LA1
4YF)
3Department of Psychology, The University of Texas at Austin (SEA 4.208, 108 E.
Dean Keeton Stop A8000, Austin, TX 78712-1043)
wenlaura@umich.edu, (tel) 734-763-0503
mihalcea@umich.edu
r.boyd@lancaster.ac.uk
§pennebaker@utexas.edu
1
Abstract
Background / Introduction. This work explores the relationship between a person’s
demographic/psychological traits (e.g., gender, personality) and self-identity images and
captions.
Methods. We use a dataset of images and captions provided by N1,350 individuals,
and we automatically extract features from both the images and captions.
Results. We identify several visual and textual properties that show reliable relationships
with individual differences between participants. The automated techniques presented here
allow us to draw interesting conclusions from our data that would be difficult to identify
manually, and these techniques are extensible to other large datasets. Additionally, we
consider the task of predicting gender and personality using both single-modality features
and multimodal features.
Conclusions. We show that a multimodal predictive approach outperforms purely visual
methods and purely textual methods. We believe that our work on the relationship between
user characteristics and user data has relevance in online settings, where users upload billions
of images each day (Meeker M, 2014. Internet trends 2014-Code conference. Retrieved May
28, 2014).
Keywords. Personality, Gender, Natural Language Processing, Computer Vision, Computa-
tional Social Science.
1 Introduction
Images have increasingly become a central part of most people’s online ecosystem – people
upload profile photos, create memes, and use images as a means of communication. In total,
over 1.8 billion digital images are added to the internet each day [1]. This tremendous quantity
of visual data has exciting potential to be used to gain a deeper understanding into the thoughts
and behaviors of people. Since many of the images shared online are personalized by a user,
studying them gives us insight into the user herself.
Specifically, in this work, we aim to present new, interpretable psychological insight into the
ways that image attributes (such as objects, scenes, and faces) and language features (such as
words and semantic categories) relate to personality and gender. We do this by using a dataset
of N1,350 individuals, where each person has provided images and captions. From this
dataset, we extract an extensive set of visual and textual features. We use these features to
2
identify relationships between these features and the individual traits of personality and gender.
To show the strength of these relationships, we also briefly consider whether image and language
features have predictive power for demographic/psychological characteristics. This work builds
on the work presented in Wendlandt et al. [2]. Specifically, in this paper, we expand on our
previous work by including a larger set of correlations between image and text features and
personality traits, reporting results on a regression task not previously considered, validating
our results on a second dataset collected at a later period of time, expanding our overview of
previous related work, and expanding our analyses.
After examining related work, we begin by describing our dataset. We then explain the
various methods used for analyzing images and text, showing significant correlations that are
found. Finally, we use our visual and textual features to predict both personality and gender.
We close with a discussion and conclusion.
1.1 Related Work
When studying individuals, we are often trying to get a general sense of who they are as a person.
These types of evaluations fall under the broader umbrella of individual differences, a large area
of research that tries to understand the various ways in which people are psychologically different
from one another, yet relatively consistent over time [3]. A large amount of research in the past
decade has been dedicated to the assessment and estimation of individual characteristics as a
function of various behavioral traces. In our case, these traces are images and captions collected
from undergraduate students.
The estimation of individual characteristics has been employed in various downstream tasks
in fields such as public health [4] and politics [5, 6]. Some of the attributes targeted for extraction
focus on demographic related information, such as gender/age [7, 8, 9, 10, 11, 12, 13, 14],
race/ethnicity [15, 16, 17, 14], and location [18], yet other aspects are mined as well, among
them emotion and sentiment [19], personality types [20, 21, 22, 23], user political affiliation
and sentiment [24, 6, 25], mental health diagnosis [26], and even lifestyle choices such as coffee
preference [15]. The task is typically approached from a machine learning perspective, with
data originating from a variety of user-generated content, most often microblogs (from Twitter)
[23, 4, 14], article comments to news stories or op-ed pieces [27], social posts (originating from
sites such as Facebook, MySpace, Google+) [26], or discussion forums on particular topics [28].
Classification labels are then assigned either based on manual annotations (such as Amazon
3
Mechanical Turk [14]), self identified user attributes (“I am a 20 year old African American”)
[15], affiliation with a given discussion forum type, or online surveys set up to link a social
media user identification to the responses provided (such as the embedded personality test
survey application developed by Schwartz et al. [29]). Additional modeling information may
surface from meta-data, such as geolocation provided by the Twitter API [16], or by applying
distributions learnt from real world data, such as those collected as part of the US Census [30,
31], or by leveraging the social connections of a given user within a network [32, 33]. Learning
has typically employed bag-of-words lexical features (n-grams) [12, 34, 35], with some works
focusing on deriving additional signals from the underlying social network structure [15, 32, 33,
25], syntactic and stylistic features [36], or the intrinsic social media generation dynamic [25].
We should note that some works have also explored unsupervised approaches for demographic
dimensions extraction, among them large-scale clustering [37] and probabilistic graphical models
[38].
In our work, we focus on two individual attributes, namely personality and gender, and we
highlight below the previous work on these tasks.
1.1.1 Personality
Much of the work in individual differences research focuses on the topic of personality. Generally
speaking, “personality” refers to constellations of feelings, behaviors, and cognitions that co-
occur within an individual and are relatively stable across time and contexts. Personality is
most often conceived within the Big 5 personality framework, and these five dimensions of
personality are predictive of important behavioral outcomes such as marital satisfaction [39]
and even health [40].
From a computational perspective, the problem of identifying user personality has primar-
ily been approached using Natural Language Processing (NLP) methods. While the textual
component of our work focuses on short image captions, most previous research used longer
bodies of text such as essays or social media updates [41]. N-grams, as well as psychologically-
derived linguistic features such as those provided by LIWC, have been shown to have significant
predictive power for personality [42, 43].
In addition to textual inference, there has been a recent movement towards incorporating
images into the study of individual differences [44, 45].
4
1.1.2 Gender
Contemporary research on individual differences extends well beyond personality evaluations to
include variables such as gender, age, life experiences, and so on – facets that differ between
individuals but are not necessarily caused by internal psychological processes. In addition to
personality, we also consider gender in this work.
As with personality, the computational inference of gender has primarily been approached
using NLP techniques [8, 18, 46]. Relevant to the current work, however, is work by You et al.
exploring the task of predicting gender given a user’s selected images on Pinterest, an online
social networking site [47]. In contrast to using data from a social network, this work uses data
collected from students at a university. Additionally, we consider both gender and personality,
while only gender is considered in You et al.
1.1.3 Inference from Multiple Modalities
Our work also relates to the recent body of research on the joint computational use of language
and vision. Our multimodal predictive approach is particularly related to automatic image
annotation, the task of extracting semantically meaningful keywords from images [48]. Other
related multimodal approaches can be found in the fields of image captioning [49] and joint
text-image embeddings [50]. Some of these approaches rely on very large visual and textual
corpora. For example, Johnson et al. train an image captioning algorithm using Visual Genome,
a dataset with greater than 94,000 images [51].
2 Methods
In this section, we describe the dataset that we use, as well as the visual and textual features
that we extract.
2.1 Dataset
We use a dataset provided by James Pennebaker and Samuel Gosling at the University of
Texas at Austin, collected from their Fall 2015 online undergraduate introductory psychol-
ogy class.1The dataset includes free response data and responses to standard surveys col-
lected from 1,353 students ages 16 to 46 (average 18.8 ±2.10). The ethnicity distribution is
1This data was collected under IRB approval at UT Austin.
5
40.3% Anglo-Saxon/White, 27.1% Hispanic/Latino, 22.3% Asian/Asian American, 5.5% African
American/Black, and 4.8% Other/Undefined.
Three elements of this dataset are of particular interest to our research:
2.1.1 Free Response Image Data
Each student was asked to submit and caption five images that expressed who he/she is as a
person. The following prompt was used: Please upload 5 different pictures that express who you
are. They could be pictures of you, your friends, your possessions, or anything that you feel
expresses your personality. Pick out the five pictures before you begin the assignment. Also,
when you upload each picture, write a brief description or caption about it.
As Fig 1 illustrates, students submitted a wide range of images, from memes to family
photos to landscapes. Some students chose to submit fewer than five images. All images were
converted to the JPG format and resized so that the longest edge of the image was 700 pixels;
this preprocessing ensures efficient and uniform calculations across the dataset.
 
Figure 1: Five images from the dataset submitted by a single student (with student faces blurred
out for privacy) [2]. The accompanying captions are: (A) I’d rather be on the water. (B) The
littlest things are always so pretty (and harder to capture). (C) I crossed this bridge almost
every day for 18 years and never got tired of it. (D) The real me is right behind you. (E) Gotta
find something to do when I have nothing to say.
2.1.2 Big 5 Personality Ratings
The Big 5 personality dimensions include: Openness (example adjectives: artistic, curious,
imaginative, insightful, original, wide interests); Conscientiousness (efficient, organized, plan-
ful, reliable, responsible, thorough); Extraversion (active, assertive, energetic, enthusiastic, out-
going, talkative); Agreeableness (appreciative, forgiving, generous, kind, sympathetic, trusting);
and Neuroticism (anxious, self-pitying, tense, touchy, unstable, worrying) [52].
To measure these personality dimensions, each student completed the BFI-44 personality
inventory, a self-report 44-question survey used to score individuals along each of the Big 5
personality dimensions using a 1-to-5 Likert scale [53]. Descriptive statistics for each personality
6
dimension within the current sample are presented in Table 1. Figure 2 shows correlations
between the different dimensions of personality in our dataset. The highest positive correlation is
between Agreeableness and Conscientiousness, while the largest negative correlation is between
Extraversion and Neuroticism.
Table 1: Statistics for each personality dimension.
Dimension Mean Median Std Dev
Openness 3.618 3.600 0.623
Conscientiousness 3.459 3.444 0.649
Extraversion 3.173 3.125 0.813
Agreeableness 3.716 3.778 0.644
Neuroticism 3.011 3.000 0.752
Figure 2: Pearson correlations between Big 5 personality dimensions in our dataset. O, C, E, A,
and N stand for Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism,
respectively.
2.1.3 Gender
Finally, demographic data is also associated with each student, including gender, which we use
in our work. The gender distribution in the dataset is 61.6% female, 37.8% male, and 0.5%
undefined. Gender-unspecified students are omitted from our analyses.
2.1.4 Computing Correlations
An important contribution of our work is gathering new insights regarding textual and image
attributes that correlate with personality and gender. Each of the personality dimensions is
7
continuous, therefore, a version of the Pearson correlation coefficient is used to calculate cor-
relations between personality and visual and textual features. Since there are, in some cases,
thousands of image or text features, we must account for inferential issues associated with mul-
tiple testing (e.g., inflated error rates); we address such issues using a multivariate permutation
test [54].
This approach is done by first calculating the Pearson product-moment correlation coefficient
rfor two variables. Then, for a high number of iterations (in our case, 10,000), the two variables
are randomly shuffled and the Pearson coefficient is re-calculated. At the end of the shuffling,
a two-tailed p-test is conducted. Only when the original Pearson’s ris found to be statistically
significant in comparison to all of the random coefficients is the original result considered to
be legitimate. As discussed in Yoder et al. [54], for small sample sizes, this multivariate
permutation test has more statistical power than the common Bonferroni correction.
Unlike personality, gender is a categorical variable. Thus, Welch’s t-tests are used to look
for significant relationships between gender and image and text features. These relationships
are measured using effect size (Cohen’s d), which measures how many standard deviations the
two groups differ by, and is calculated by dividing the mean difference by the pooled standard
deviation.
2.2 Analyzing Images
In order to explore the relationship between images and psychological attributes, we want to
extract meaningful and interpretable image features that have some connection to the user. In
this section, we summarize both low-level raw visual features as well as high-level attributes
such as the scene of an image, the number of faces in an image, and the objects in an image.
How we extracted these features is described in more detail in previous work [2]. We use these
features to explore significant correlations between image attributes and user attributes.
2.2.1 Raw Visual Features
In this section, we describe basic image statistics that can provide a good summary of the
structural, color, and textual properties of an image, which in turn can provide insights into
the attributes of the person submitting the image. Higher-level visual features are considered
in following sections.
Colors. Past research has shown that colors are associated with abstract concepts [55]. For
8
instance, red is associated with excitement, yellow with cheerfulness, and blue with comfort,
wealth, and trust. Furthermore, research has shown that men and women perceive color differ-
ently. In particular, one study found that men are more tolerant of gray, white, and black than
are women [56].
To characterize the distribution of colors in an image, we classify each pixel as one of eleven
named colors using the method presented by Van De Weijer et al. [57].
Brightness and Saturation. Images are often characterized in terms of their brightness
and saturation. Here, we use the HSV color space, where brightness is defined as the relative
lightness or darkness of a particular color, from black (no brightness) to light, vivid color (full
brightness). Saturation captures the relationship between the hue of a color and its brightness
and ranges from white (no saturation) to pure color (full saturation). We calculate the mean
and the standard deviation for both the brightness and the saturation.
Previous work has also used brightness and saturation to calculate metrics measuring plea-
sure, arousal, and dominance [58].2
Texture. The texture of an image provides information about the patterns of colors or inten-
sities in the image. Following Lovato et al. [60], we use Grey Level Co-occurrence Matrices
(GLCMs) to calculate four texture metrics: contrast, correlation, energy, and homogeneity.
Static and Dynamic Lines. Previous work has shown that the orientation and width of a line
can have various emotional effects on the viewer [59]. For example, diagonal lines are associated
with movement and a lack of equilibrium. To capture some of these effects, we measure the
percentage of static lines with respect to all of the lines in the image. Static lines are defined
as lines that are within π/12 radians of being vertical or horizontal.
Circles. The presence of circles and other curves in images has been found to be associated
with emotions such as anger and sadness [55]. Following the example of Redi et al. [55], we
calculate the number of circles in an image.
Correlations. Once the entire set of raw features is extracted from the images, correlations
between raw features and personality/demographic features are calculated. Table 2 presents
significant correlations between visual features and personality traits. One correlation to note
is a positive relationship between the number of circles in an image and extraversion. This
is likely because the circle detection algorithm often counts faces as circles, and faces have a
2For prediction results, we use a slightly different version of dominance (Dominance = 0.76y+ 0.32s), as
formulated in Machajdik and Hanbury [59].
9
natural connection with the social facets of extraversion. Our results also validate the findings
of Valdez and Mehrabian, who suggest that pleasure, arousal, and dominance have emotional
connections [58]. Here we show that these metrics also have connections to personality.
Table 2: Significant correlations between raw visual features and Big 5 personality traits. These
correlations are corrected using a multivariate permutation test, as described in the paper.
O, C, E, A, and N stand for Openness, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism, respectively.
Big 5 Personality Dimensions
Image Attributes O C E A N
Black - - - - -0.06
Blue - -0.07 0.06 - -
Grey - - -0.11 0.06 -
Orange - - 0.07 - -
Purple - - 0.06 - -
Red - - - -0.06 -
Brightness Std Dev - - 0.07 - -
Saturation Mean - -0.06 0.07 - -
Saturation Std Dev - -0.06 0.06 -0.06 -
Pleasure - -0.05 0.07 - -
Arousal - 0.06 - - -
Dominance - - -0.06 -0.02 0.06
Homogeneity 0.05 - - - -
Static Lines % - - -0.07 - -
Num of Circles - -0.06 0.10 - -
Table 3 shows effect sizes for features significantly different between men and women. As
suggested by previous research, men are more likely to use the color black [56]; other correlations
appear to confirm stereotypes, e.g., a stronger preference by women for pink and purple.
Table 3: Raw visual features where there is a significant difference (p < 0.05) between male
and female images. Positive effect sizes indicate that women prefer the feature, while negative
effect sizes indicate that men prefer the feature.
Image Attributes Effect Size
Pink 0.455
Static Lines % -0.360
Black -0.325
Brightness Mean 0.266
Saturation Std Dev -0.176
Purple 0.167
Brown 0.166
Homogeneity 0.118
Red 0.111
10
2.2.2 Scenes
Previous research has linked personal spaces (such as bedrooms and offices) with various per-
sonality attributes, indicating that how people compose their spaces provides clues about their
psychology, particularly through self-presentation and related social processes [61].
In order to identify the scene of an image, we use Places-CNN [62], a convolutional neural
network (CNN) trained on approximately 2.5 million images and able to classify an image into
205 scene categories. To illustrate, Fig 3 shows two classified images. For each image, we use
the softmax probability distribution over all scenes as a feature vector.
Figure 3: Top scene classifications for two images, along with their probabilities [2]. A: Coffee
Shop (0.53), Ice Cream Parlor (0.24). B: Parking Lot (0.57), Sky (0.26).
Correlations. Scenes strongly correlated with personality traits are shown in Table 4. The
strongest positive correlation is between extraversion and ballrooms, and the strongest negative
correlation is between extraversion and home offices. Findings such as these are conceptually
sound, as individuals tend to engage in personality-congruent behaviors. In other words, indi-
viduals scoring high on extraversion are expected to feel that inherently social locations, such
as ballrooms, are more relevant to the self than locations indicative of social isolation, such as
home offices.
We also measure the relationship between scenes and gender. Table 5 shows scenes that
are associated with either males or females. Men are more commonly characterized by sports-
related scenes, such as football and baseball stadiums, whereas women are more likely to have
photos from ice cream and beauty parlors. As illustrated in Fig 3, the scene detection algorithm
tends to conflate coffee shops and ice cream parlors, so this observed preference for ice cream
parlors could be partially attributed to a preference for coffee shops.
11
Table 4: Significant correlations between scene features and Big 5 personality traits. Only
correlations with p0.07 are shown. These correlations are corrected using a multivariate
permutation test, as described in the paper. O, C, E, A, and N stand for Openness, Conscien-
tiousness, Extraversion, Agreeableness, and Neuroticism, respectively.
Big 5 Personality Dimensions
Image Attributes O C E A N
Art Studio - -0.09 - - -
Auditorium - - 0.08 - -
Ballroom - - 0.12 - -
Baseball Field -0.08 - - - -
Beauty Salon - - - - 0.08
Bedroom - - -0.08 - -
Boardwalk - - - 0.08 -
Bookstore - 0.08 -0.11 - -
Botanical Garden - 0.08 - - -
Bus Interior - - - -0.08 -
Butte 0.08 - - - -
Canyon - - - 0.11 -
Coffee Shop - -0.08 - - -
Creek 0.08 - - - -
Fire Station -0.08 - - - 0.08
Formal Garden - 0.09 - - -
Fountain - 0.08 - - -
Game Room - -0.08 - - -
Home Office - - -0.12 - -
Hot Spring 0.08 - - - -
Mansion - - - 0.10 -
Martial Arts Gym -0.09 - - - -
Museum Indoor - - - -0.08 -
Pantry - - -0.11 -0.10 -
Pavilion - - - - -
Phone Booth -0.08 - - - -
Playground - - - 0.09 -
Restaurant -0.09 - - - -
River - 0.09 - - -
Shower - - -0.09 - -
Stadium Baseball -0.08 - - - -
Valley 0.08 - - - -
Veranda - - - 0.08 -
12
Table 5: Scene features where there is a significant difference (p < 0.05) between male and
female images. Only features with an effect size of magnitude >0.07 are shown. Positive effect
sizes indicate that women prefer the feature, while negative effect sizes indicate that men prefer
the feature.
Image Attributes Effect Size
Beauty Salon 0.168
Ice Cream Parlor 0.156
Office -0.133
Slum 0.131
Football Stadium -0.130
Basement -0.109
Art Studio 0.105
Herb Garden 0.103
Music Studio -0.102
Baseball Stadium -0.102
Gas Station -0.101
Game Room -0.099
Vegetable Garden 0.097
Botanical Garden 0.096
Yard 0.096
Conference Room -0.094
Engine Room -0.094
Home Office -0.092
Reception -0.091
Assembly Line -0.090
Bedroom 0.088
Television Studio -0.083
Baseball Field -0.080
Office Building -0.080
Butcher’s Shop 0.079
Playground 0.075
Hot Spring 0.074
Nursery 0.073
Shoe Shop -0.073
13
2.2.3 Faces
Most aspects of a person’s personality are expressed through their social behaviors, and the
number of faces in an image can capture some of this behavior. We use the work by Mathias
et al. to detect faces in images [63].
Correlations. There is a strong positive correlation (r= 0.17) between the number of faces
and extraversion, which is intuitive because extraverts are often thought of as enjoying social
activities. There are also positive correlations between faces and openness (r= 0.08) and
neuroticism (r= 0.11), while there is a negative correlation between faces and agreeableness
(r=0.07). With respect to gender, women have significantly more faces in their images than
men (effect size = 0.160).
2.2.4 Objects
Previous research has indicated that people can successfully predict other people’s personality
traits by observing their possessions [64]. This indicates that object detection has the ability
to capture certain psychological insight. Fig 4 shows an example image with several detected
objects.
Figure 4: Several detected objects and their bounding boxes in an image of a library [2].
Pictured objects are libraries (cyan), plate racks (blue), tobacco shops (green), window screens
(red), table lamps (white), dining tables (yellow), and bookcases (black).
Because of the small size of our dataset and the large number of ImageNet objects, this
feature vector is somewhat sparse and hard to interpret. To increase interpretability, we con-
sider two coarser-grained systems of classification: WordNet supersenses and WordNet domains.
WordNet [65] is a large hierarchical database of English concepts (or synsets), and each Ima-
geNet object is directly associated with a WordNet concept. Supersenses are broad semantic
classes labeled by lexicographers [66], and eight supersenses are present in our set of ImageNet
objects: communication, object, plant, food, artifact, animal, substance, and person. Word-
Net domains [67] is a complementary synset labeling. It groups WordNet synsets into various
14
domains, such as medicine, astronomy, and history. The domain structure is hierarchical, but
here we consider only basic WordNet domains, which are domains that are broad enough to
be easily interpretable. An object is allowed to fall into more than one domain. Table 6 lists
both the WordNet supersenses and the WordNet domains and provides a few examples of each
category.
Correlations. WordNet supersenses and WordNet domains correlate significantly with mul-
tiple personality traits, as shown in Table 7. We begin to see some patterns emerge across
different domains. For example, multiple technical disciplines (engineering, telecommunication,
physics) are negatively correlated with both conscientiousness and agreeableness. There are
very few correlations with neuroticism, which is something that we observe with other image
features as well.
Table 8 shows object classes that are different for males and females. These object classes
connect back to scenes associated with men and women. For example, men are more likely to
have sports objects in their images, reflected in the fact that men are more likely to include
scenes of offices and sports stadiums.
2.3 Captions
When available, captions can be considered another way of representing image content via a
textual description of the salient objects, people, or scenes in the image. Importantly, the
captions have been contributed by the same people who contributed the images, and therefore
they represent the views that the image “owners” have about their content. How we extracted
these features is described in more detail in previous work [2].
2.3.1 Stylistic Features
To capture writing style, we consider surface-level stylistic features, such as the number of words
and the number of words longer than six characters. We also use the Stanford Named Entity
Recognition system to extract the number of references to people, locations, and organizations
[68].
Finally, we look at readability and specificity metrics. Readability scores are usually based
on the length and difficulty of words and sentences, and they capture how hard the text is
to comprehend. We consider a variety of metrics: Flesch Reading Ease (FRE), Automated
Readability Index (ARI), Flesch-Kincaid Grade Level (FK), Coleman-Liau Index (CLI), Gun-
15
Table 6: Selected examples for each WordNet supersense and basic WordNet domain.
WordNet Supersenses
Communication Web site, comic book, traffic light
Object Alp, bubble, cliff
Plant Rapeseed, daisy, corn
Food Menu, plate, guacamole, trifle
Artifact Abacus, bakery, breastplate
Animal Mud turtle, airedale, meerkat
Substance Toilet tissue
Person Ballplayer, groom, scuba diver
Basic WordNet Domains
History Breastplate, cuirass, pickelhaube
Art Paintbrush, violin, Polaroid camera
Religion Church, monastery, mosque
Radio and TV Radio, screen, television
Play Golf ball, jigsaw puzzle, punching bag
Sport Baseball, basketball, golf cart
Agriculture Harvester, plow, thresher
Food Menu, eggnog, acorn
Home Bath towel, broom, dishwasher
Architecture Triumphal arch, library, traffic light
Computer Science Computer keyboard, mouse, web site
Engineering Pier, oscilloscope, remote control
Telecommunication Loudspeaker, cellular telephone, pay-phone
Medicine Medicine chest, neck brace, Band Aid
Astronomy Radio telescope
Biology Tench, goldfish, daisy
Animals Walker hound, sea cucumber, wood rabbit
Chemistry Face powder, French loaf, cauliflower
Plants Granny Smith, strawberry, coral fungus
Earth Alp, cliff, coral reef
Mathematics Abacus
Physics Gasmask, whistle, oscilloscope
Anthropology Maypole
Health Face powder, hair spray, lipstick
Military Assault rifle, bow, breastplate
Publishing Ballpoint, binder, fountain pen
Artisanship Hammer, plane, thimble
Commerce Grocery store, hand blower, restaurant
Industry Carpenter’s kit, chain, lumbermill, power drill
Transport Ambulance, missile, seat belt
Economy Slot, streetcar, lumbermill
Administration File
Law Guillotine, prison
Tourism Cab, triumphal arch, volcano
Fashion Apron, bow tie, cowboy boot
16
Table 7: Significant correlations between object features and Big 5 personality traits. These
correlations are corrected using a multivariate permutation test, as described in the paper.
O, C, E, A, and N stand for Openness, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism, respectively.
Big 5 Personality Dimensions
Image Attributes O C E A N
WordNet Supersenses
Animal - - 0.063 - -
Person - - - - -0.58
Basic WordNet Domains
History - - 0.06 - -
Play -0.10 - - - -
Sport -0.10 - - - -
Home - -0.06 -0.09 - -
Engineering - -0.06 -0.07 -0.08 -
Telecommunication - -0.07 - -0.08 -
Astronomy - - - -0.06 -
Biology - - 0.07 - -
Animals -0.08 - - - -
Physics - -0.08 - -0.09 -
Anthropology - 0.06 - - -
Artisanship - -0.06 - - -
Industry - - -0.08 - -
Transport - - - - -0.07
Economy - - -0.08 - -
Fashion -0.07 0.06 0.11 0.05 -
Table 8: Object features where there is a significant difference (p < 0.05) between male and
female images. Positive effect sizes indicate that women prefer the feature, while negative effect
sizes indicate that men prefer the feature.
Image Attributes Effect Size
WordNet Supersenses
Artifact -0.213
Person -0.173
Food 0.107
Basic WordNet Domains
Sport -0.235
Play -0.231
Transport -0.186
Military -0.172
Animals -0.155
History -0.153
Art -0.142
Chemistry 0.140
Food 0.136
Plants 0.121
17
ning Fog Index (GFI), and SMOG score (SMOG). Specificity refers to how much detail a text
contains. We calculate this using the Speciteller system [69].
2.3.2 N-grams
In addition to style, we want to capture the content of each caption. We do this by considering
unigrams, bigrams, and trigrams.
2.3.3 LIWC Features
Linguistic Inquiry and Word Count (LIWC) is a word-based text analysis program [42]. It
focuses on emotional, cognitive, and social processes, as well as broad categories such as language
composition. We analyze each piece of text using LIWC in order to capture psychological
dimensions of writing. For each of the 86 LIWC categories, we calculate a feature that reflects
the percentage of words belonging to that category which are present in the caption.
2.3.4 MRC Features
The MRC Psycholinguistic Database contains statistics about word use [70]. MRC features are
calculated by averaging the values of all of the words in a caption. Specifically, in our analysis,
certain MRC features emerge as particularly relevant. These include several features suggested
by Kucera and Francis, including word frequency counts, which capture how common a word
is in standard English usage. We also see measures for meaningfulness, imagery, and length
(e.g., number of letters, phenomes, and syllables). These features provide a complementary
perspective to the LIWC features.
2.3.5 Word Embeddings
For prediction purposes, we also consider each word’s embedding. Word2vec (w2v) is a method
for creating a multidimensional embedding for a particular word [71]. Google provides pre-
trained word embeddings on approximately 100 billion words of the Google News dataset.3For
each caption in our dataset, we average together all of the word embeddings to produce a single
feature vector of length 300. We use the Google embeddings for this, discarding words that are
not present in the pre-trained embeddings.
3Available at https://code.google.com/archive/p/word2vec/.
18
2.3.6 Correlations
For our analysis, all text features are normalized by word count. Table 9 shows correlations be-
tween language features and personality. Interestingly, there are very few strong correlations for
extraversion. This is complementary to what we see with images, where there are many strong
correlations for extraversion, suggesting that we are gleaning different aspects of personality
from both images and text.
Table 10 shows language features that are different between men and women. Things to
note here are that women tend to write longer captions and men again exhibit a preference for
talking about sports.
3 Results
In this section, we consider single-modality and multimodal prediction tasks.
3.1 Multimodal Prediction
The task of prediction can provide valuable insights into the relationship between images, cap-
tions, and demographic or psychological dimensions. In this section, we consider both single
modality features and multimodal features. We also address two prediction tasks: classification
and regression. Classification is a more coarse-grained task, and is the typical prediction task
considered in previous work for such latent user dimensions. On the other hand, regression is
more fine-grained and produces more nuanced results.
3.1.1 Single Modality Methods
To understand the predictive power of images and captions individually, we consider a series of
predictions using feature sets derived from either only visual data or only textual data. These
feature sets are the same features that we described above.
To gain insight into whether textual and visual data complement each other, Fig 5 shows
correlations between attributes predicted using only image features and attributes predicted
using only text features. The low correlations between visual and textual predictions of the
same trait indicate that images and text are capturing different aspects of each trait and have
the potential to be used together to gain a fuller picture. We explore the joint use of images
and text below.
19
Table 9: Significant correlations between language attributes and Big 5 personality traits [2].
All features except for the word count itself are normalized by the word count. Only unigrams,
LIWC categories, and MRC categories that have one of the top five highest correlations or
one of the top five lowest correlations are shown. These correlations are corrected using a
multivariate permutation test, as described in the paper. O, C, E, A, and N stand for Openness,
Conscientiousness, Extraversion, Agreeableness, and Neuroticism, respectively.
Big 5 Personality Dimensions
Language Attributes O C E A N
Stylistic Features
Number of words - 0.14 - - 0.07
Words longer than six chars - 0.09 0.06 - -
Number of locations - 0.07 - - -0.07
Readability - FRE -0.13 - - - -
Readability - ARI - 0.06 - - -
Readability - GFI -0.14 - - - -
Readability - SMOG -0.13 - - - -0.06
Specificity - 0.08 - - -0.06
Unigrams
Decid - -0.12 - - -
Diff 0.11 - - - -
In 0.11 - - - -
It 0.11 - - - -
King - 0.06 - -0.15 -
Level - - -0.12 - -
My -0.14 0.07 - - -
Photoshop 0.10 - - - -
Sport -0.14 - - - -
Writ 0.10 - - - -
LIWC Categories
Achievement - 0.08 - - -
All Punctuation - 0.08 - - -0.07
Discrepancies - - 0.10 - -0.07
1st person singular personal pronouns -0.10 - - - -
Inclusive - - - 0.08 -
Occupation - 0.08 0.06 - -
Other References -0.10 - - - -0.06
1st person personal pronouns -0.10 - - - -
Sports -0.11 0.07 - - -
Unique - 0.07 - 0.08 -0.09
MRC Categories
Imagery -0.07 0.06 - 0.06 -0.07
Kucera-Francis Num of Categories -0.07 0.06 - 0.07 -0.09
Kucera-Francis Num of Samples -0.08 - - - -0.07
Mean Pavio Meaningfulness -0.08 - - - -0.07
Num of Letters in Word - 0.08 - 0.07 -0.08
Num of Phonemes in Word - 0.08 - 0.07 -0.08
Num of Syllables in Word - 0.08 - 0.08 -0.08
20
Table 10: Language features where there is a significant difference (p < 0.05) between male
and female images [2]. All features except for the word count itself are normalized by the word
count. Only unigrams that have one of the top ten effect sizes (by magnitude) are shown.
Positive effect sizes indicate that women prefer the feature, while negative effect sizes indicate
that men prefer the feature.
Feature Effect Size
Stylistic Features
Number of Words 0.174
Readability - GFI -0.161
Readability - SMOG 0.146
Readability - FRE -0.136
Unigrams
Boyfriend 0.361
Girlfriend -0.360
Was 0.287
Play -0.285
She 0.264
Them 0.262
Sport -0.254
Sist 0.244
Gam -0.242
Enjoy -0.236
LIWC Categories
Prepositions -0.200
Past Focus 0.176
Sports -0.173
Work -0.167
Period -0.157
Other References 0.145
Quote -0.133
Other 0.123
1st person plural personal pronouns 0.123
MRC Categories
Kucera-Francis Written Freq. -0.139
Kucera-Francis Num of Samples -0.134
21
Figure 5: Pearson correlations between traits predicted using only image attributes and traits
predicted using only text attributes [2]. Big 5 personality traits are denoted by O,C,E,A,
and N. These predictions are done using 1,346 people (75% training, 25% test) and a random
forest regressor with 500 trees.
3.1.2 Multimodal Methods
We experiment with several methods of combining visual and textual data. First, we concatenate
both the image and text feature vectors (excluding w2v embeddings). In the following results
tables, this is labeled as All row in the Image and Caption Attributes section.
To provide a more nuanced combination of features, we introduce the idea of image-enhanced
unigrams (IEUs). This is a bag-of-words representation of both an image and its corresponding
caption. It includes all of the caption unigrams as well as unigrams derived from each image.
We consider two methods, macro and micro, for generating image unigrams. For the macro
method, we examine each individual image. If a color covers more than one-third of the image,
the name of the color is added to the bag-of-words. The scene with the highest probability and
any objects detected in the image are also added. The unigrams from each individual image are
then combined with the caption unigrams to form the set of macro IEUs. To generate micro
IEUs, we reverse the process. First, we aggregate the image feature vectors into a single vector,
and then we extract the image unigrams and combine them with the caption unigrams.
We use IEUs in several different ways for prediction. First, we consider them both in isolation
and concatenated with all of the previous visual and textual features (excluding w2v). We also
explore using the pre-trained w2v model to represent the IEUs and produce richer embeddings.
Instead of only averaging together the embeddings of each caption unigram, we average together
22
the embeddings of each IEU. Finally, we consider these enriched embeddings concatenated with
all of the previous visual and textual features.
A significant advantage of these multimodal approaches is that they can be used with rela-
tively small corpora of images and text. Large background corpora are used for training (e.g.,
for training the scene CNN), but these models have already been trained and released. Our
approaches work when there is only a small amount of training data, as is often the case when
ground truth labels are expensive to obtain (e.g., when these labels come from a survey, as in
our case). This is demonstrated on our dataset, which consists of short captions and a relatively
limited set of images.
3.1.3 Classification Results
In order to assess the different prediction methods, we consider six coarse-grained classification
tasks, one for each personality trait and one for gender. For each prediction, we divide the data
into high and low segments. The high segment includes any person who has a score greater
than half a standard deviation above the mean, while the low segment includes any person who
has a score lower than half a standard deviation below the mean. All other data points are
discarded. In doing these coarse-grained classification tasks, we follow previous work [72, 43],
which suggested that classification serves as a useful approximation to continuous rating.
We use a random forest with 500 trees and 10-fold cross validation across individuals in the
dataset. Table 11 shows the classification results. As a baseline, we include a model that always
predicts the most common training class.
Results using individual modalities are shown in the top part of Table 11. The prediction
results show that image features in isolation are able to significantly classify both extraversion
and gender. Text features are also able to significantly classify these traits, with slightly less
accuracy than image features. Text features have additional predictive power for openness.
The results obtained with the multimodal methods are shown in the bottom part of Table
11. As seen in the table, the methods using IEUs achieve the best results and are able to
significantly classify both neuroticism and agreeableness, something that neither visual features
nor textual features are able to do in isolation.
The features that we are able to classify with the highest accuracy are openness and agree-
ableness. This could be related to the fact that these two traits are positively correlated in our
dataset (Figure 2).
23
Table 11: Classification accuracy percentages. * indicates significance with respect to the
baseline (p < 0.05). O, C, E, A, and N stand for Openness, Conscientiousness, Extraversion,
Agreeableness, and Neuroticism, respectively.
Predicted Attributes
Feature Set Used O C E A N Gender
Baseline: Most Common Class 51.4 52.0 49.2 51.8 52.3 59.8
Image Attributes Only
Raw 49.7 47.3 53.9 51.6 52.2 61.5
Color 49.8 51.7 52.1 51.8 53.6 62.2
Object 55.6 51.7 57.2* 51.3 51.7 64.7*
Scene 55.5 53.8 59.8* 55.0 55.2 66.8*
Face 50.7 51.2 58.5* 54.1 51.1 59.7
All 54.8 54.3 59.9* 55.3 55.3 68.6*
Caption Attributes Only
Stylistic 60.5* 48.0 50.4 50.8 51.0 58.6
Unigrams 60.2* 53.1 54.3 54.2 53.4 67.6*
Bigrams 58.0* 53.2 57.6* 53.4 57.3 65.1*
Trigrams 56.2* 51.0 55.5* 50.5 54.9 61.7
POS Unigrams 58.0* 49.3 50.0 52.9 56.1 60.2
POS Bigrams 57.5* 48.9 50.9 50.7 53.9 61.0
POS Trigrams 55.7 50.0 52.2 48.4 56.7 61.0
LIWC 59.6* 53.2 54.1 53.4 54.2 64.9*
MRC 55.4 49.4 50.9 52.4 52.4 60.8
All (except pre-trained w2v) 61.2* 52.2 53.3 54.6 55.2 65.1*
Pre-trained w2v (caption only) 61.8* 51.4 55.4 55.4 56.5 67.1*
All + Pre-trained w2v (caption only) 61.2* 52.3 55.5 53.0 56.1 65.6*
Image and Caption Attributes
All 60.5* 55.1 57.9* 55.3 56.8 67.1*
Macro IEU 58.5* 56.6 58.5* 54.2 54.7 71.0*
Micro IEU 58.7* 54.4 58.9* 54.0 52.7 71.0*
All + Macro IEU 60.0* 57.1 58.3* 54.2 56.9 68.1*
All + Micro IEU 59.1* 55.6 60.3* 54.8 58.3* 69.1*
Pre-trained w2v (w/ Micro IEU) 61.4* 54.8 59.6* 56.4* 56.5 68.6*
Pre-trained w2v (w/ Macro IEU) 61.0* 55.6 60.5* 57.0* 56.6 69.0*
All + Pre-trained w2v (w/ Micro IEU) 59.5* 54.8 59.1* 55.3 55.3 70.1*
All + Pre-trained w2v (w/ Macro IEU) 61.4* 54.7 59.4* 55.2 56.5 70.8*
24
To enable direct comparison to previously published results, we also use our data to re-train
the models used by Mairesse et al. to predict personality [43]; the re-trained classifier with the
highest accuracies on our data, SMO, is shown in Table 12. We also include the relative error
rate reduction between this model and our highest multimodal result. As shown in Table 12,
our best multimodal approach outperforms the method from Mairesse et al., achieving relative
error rate reductions between 5% and 16% across all categories. It is true that the relative error
rate reduction for openness and neuroticism is smaller than the other dimensions, but in general
we see that we are able to improve over Mairesse et al.
Table 12: Comparison between our best classification model and the best model (SMO) from
Mairesse et al. [2]. * indicates significance with respect to the baseline (p < 0.05). The relative
error rate reduction is between our model and the model from Mairesse et al. O, C, E, A,
and N stand for Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism,
respectively.
Predicted Attributes
Feature Set Used O C E A N Gender
Baseline: Most Common Class 51.4 52.0 49.2 51.8 52.3 59.8
Mairesse et al.: SMO 59.1* 51.3 53.3 54.4 54.7 63.0
Our model: All + Pre-trained w2v (w/ Macro IEU) 61.0* 55.6 60.5* 57.0* 56.6 69.0*
Relative error rate reduction 4.6% 8.8% 15.4% 5.7% 4.2% 16.2%
3.1.4 Regression Results
Regression is a more fine-grained, and therefore more difficult, task than classification. We
consider it because it gives us a more nuanced view of the effectiveness of our methods.
We use a random forest regressor with 500 trees and 10-fold cross-validation across individu-
als. As a baseline, we include a model that always predicts the average value of the training data.
Table 13 reports r2scores. We notice patterns similar to the ones observed in the classification
results. As before, image features alone are able to significantly predict both extraversion and
gender, while text features are able to significantly predict openness, extraversion, and gender.
Again, multimodal approaches outperform purely textual and purely visual approaches. Here,
multimodal methods are able to significantly predict five out of the six categories, failing only
to predict conscientiousness.
As we did for classification, we use our data to re-train the regression models used in Mairesse
et al. [43]. Results for the regressor with the highest scores on our data, REPTree, is shown
in Table 14, along with the relative error rate reduction between this model and our highest
25
Table 13: Regression r2scores. * indicates significance with respect to the baseline (p < 0.05).
O, C, E, A, and N stand for Openness, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism, respectively.
Predicted Attributes
Feature Set Used O C E A N Gender
Baseline: Average Value -0.011 -0.011 -0.007 -0.06 -0.004 -0.004
Image Attributes Only
Raw -0.044 -0.046 -0.026 -0.035 -0.012 0.036*
Color -0.038 -0.048 -0.017 -0.044 -0.021 0.043*
Object -0.057 -0.101 -0.030 -0.076 -0.097 0.043
Scene -0.005 -0.002 0.021 -0.013 -0.007 0.121*
Face -0.039 -0.036 -0.014 -0.019 -0.027 -0.040
All 0.009* -0.006 0.033* -0.009 -0.000 0.145*
Caption Attributes Only
Stylistic -0.020 -0.058 -0.077 -0.049 -0.066 -0.016
Unigrams 0.050* 0.012 -0.001 -0.011 -0.010 0.169*
Bigrams 0.004 -0.024 0.007 -0.044 -0.037 0.104*
Trigrams -0.049 -0.078 -0.053 -0.082 -0.109 0.021
POS Unigrams -0.005 -0.020 -0.024 -0.021 -0.025 0.016
POS Bigrams 0.007 -0.055 -0.015 -0.019 -0.018 0.033*
POS Trigrams -0.008 -0.052 -0.014 -0.022 -0.014 0.023
LIWC 0.057* -0.005 0.005 -0.013 -0.010 0.101*
MRC 0.001 -0.047 -0.047 -0.015 -0.042 0.014
All (except pre-trained w2v) 0.055* -0.003 0.021* -0.003 -0.010 0.164*
Pre-trained w2v (caption only) 0.065* 0.006 0.018 -0.001 0.003 0.121*
All + Pre-trained w2v (caption only) 0.079* 0.011 0.021* -0.003 -0.000 0.178*
Image and Caption Attributes
All 0.048* 0.003 0.038* -0.002 -0.001 0.203*
Macro IEU 0.028* 0.010 0.027 -0.013 -0.014 0.204*
Micro IEU 0.025 -0.002 0.033 -0.030 -0.035 0.197*
All + Macro IEU 0.057* 0.004 0.043* -0.003 -0.001 0.208*
All + Micro IEU 0.054* 0.004 0.034* -0.001 -0.003 0.207*
Pre-trained w2v (w/ Macro IEU) 0.045* 0.008 0.053* 0.016* 0.004 0.176*
Pre-trained w2v (w/ Micro IEU) 0.048* 0.011 0.050* 0.017 -0.000 0.159*
All + Pre-trained w2v (w/ Macro IEU) 0.063* 0.015 0.057* 0.005 0.005 0.224*
All + Pre-trained w2v (w/ Micro IEU) 0.060* 0.016 0.056* 0.010 -0.003 0.222*
26
multimodal result. When calculating error rates, r2scores are bounded below at zero, which is
the expected performance of a baseline algorithm. Again, we see an improvement over Mairesse
et al. Our best method achieves error rate reductions between 0.5% and 20%.
Table 14: Comparison between our best regression model and the best model (REPTree) from
Mairesse et al. * indicates significance with respect to the baseline (p < 0.05). The relative
error rate reduction is between our model and the model from Mairesse et al. O, C, E, A,
and N stand for Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism,
respectively.
Predicted Attributes
Feature Set Used O C E A N Gender
Baseline: Average Value -0.011 -0.011 -0.007 -0.06 -0.004 -0.004
Mairesse et al.: REPTree 0.012 -0.049 -0.018 -0.013 -0.015 0.027*
Our model: All + Pre-trained w2v (w/ Macro IEU) 0.063* 0.015 0.057* 0.005 0.005 0.224*
Relative error rate reduction 5.2% 1.5% 5.7% 0.5% 0.5% 20.2%
4 Discussion
In order to explore the replicability of these results, we gather another dataset, again from an
online undergraduate introductory psychology class at the University of Texas at Austin. This
second dataset was collected in Winter 2016, and contains 711 students. A comparison of the
features extracted for the 2015 and 2016 data shows that for most of the features, the means
and standard deviations are comparable.
We train a classifier on 2015 data and test it on 2016 data to evaluate how general the
features that we extract are. To build the classifier, for each personality trait, students are split
into a high group (where that trait is greater than a standard deviation above the mean) and
a low group (where that trait is less than a standard deviation below the mean), discarding
everyone who falls in the middle. The high group and the low group are balanced using under-
sampling for each trait, and a random forest classifier with 500 trees is trained using ten-fold
validation. The classifier is trained on all of the features from the 2015 data (excluding n-
grams and part-of-speech n-grams) and tested on the 2016 data. Table 15 shows the results. In
general, the classifier is able to beat the baseline, though not always significantly. However, this
indicates that the features being used carry some information about the underlying personality
and demographic traits.
The small size of both the 2015 and 2016 datasets could explain the lack of statistical
significance in these results. It is possible that because both datasets are small, there is topic
27
Table 15: Precision-recall AUC for a classifier trained on 2015 data and tested on 2016 data.
The standard deviation over ten folds is shown, and significant increases over the baseline are in
bold. O, C, E, A, and N stand for Openness, Conscientiousness, Extraversion, Agreeableness,
and Neuroticism, respectively.
Predicted Attributes
O C E A N Gender
Baseline 0.50 ±0.07 0.50 ±0.11 0.50 ±0.07 0.50 ±0.10 0.50 ±0.06 0.40 ±0.03
Random Forest 0.58 ±0.08 0.56 ±0.09 0.56 ±0.10 0.51 ±0.12 0.57 ±0.07 0.52 ±0.04
shift between the two datasets, causing some of the 2015 results to be irrelevant on the 2016
dataset.
5 Conclusion
This research, using a new dataset of captioned images associated with user attributes, we
have extracted a large set of visual and textual features and identified significant correlations
between these features and the user traits of personality and gender. The automated techniques
used to derive these features and find significant relationships are broadly applicable to other
large visual and textual datasets. Specifically, in the domain of online communities, massive
amounts of data are available. Some of these communities, like Pinterest, rely exclusively
on visual content, while other communities, like Facebook and Twitter, include more textual
content. We show how to automatically analyze this data and find meaningful psychological
relationships. These techniques are not limited to the user dimensions of personality and gender
and could be extended to other dimensions, such as age, education level, or location.
We have demonstrated the effectiveness of these image features in predicting user attributes;
we believe this result can have applications in many areas of the web where textual data is
limited. Finally, we have shown that a multimodal predictive approach outperforms purely
visual methods and purely textual methods. Our multimodal methods are also effective on a
relatively small corpus of images and text, which is useful in situations where data is limited.
6 Acknowledgements
This material is based in part upon work supported by the National Science Foundation
(#1344257), the John Templeton Foundation (#48503), and the Michigan Institute for Data
Science. Any opinions, findings, and conclusions or recommendations expressed in this mate-
28
rial are those of the author and do not necessarily reflect the views of the National Science
Foundation, the John Templeton Foundation, or the Michigan Institute for Data Science.
We would like to thank Samuel Gosling for helping with the dataset collection, Shibamouli
Lahiri for providing the code to calculate readability features, and Steven R. Wilson for pro-
viding the code to implement the Mairesse et al. paper that we use for prediction comparison.
References
[1] Meeker M. Internet trends 2014–Code conference. 2014;Retrieved May 28, 2014.
[2] Wendlandt L, Mihalcea R, Boyd R, Pennebaker J. Multimodal Analysis and Prediction
of Latent User Dimensions. In: Proceedings of the 9th International Conference on Social
Informatics (SocInfo 2017). Oxford, UK; 2017. p. 323–340.
[3] Boyd RL. Psychological text analysis in the digital humanities. In: Data analytics in
digital humanities. Springer; 2017. p. 161–189.
[4] Coppersmith G, Dredze M, Harman C, Hollingshead K. From ADHD to SAD: Analyzing
the language of mental health on Twitter through self-reported diagnoses. In: Proceed-
ings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From
Linguistic Signal to Clinical Reality; 2015. p. 1–10.
[5] Conover M, Gon¸calves B, Ratkiewicz J, Flammini A, Menczer F. Predicting the political
alignment of Twitter users. In: Proceedings of 3rd IEEE Conference on Social Computing
(SocialCom); 2011. p. 192–199.
[6] Cohen R, Ruths D. Classifying political orientation on Twitter: It’s not easy! In: Proceed-
ings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM
2013); 2013. p. 91–99.
[7] van der Goot R, Ljubeˇsi´c N, Matroos I, Nissim M, Plank B. Bleaching Text: Abstract
Features for Cross-lingual Gender Prediction. In: Proceedings of the 56th Annual Meeting
of the Association for Computational Linguistics; 2018. p. 383–389.
[8] Ciccone G, Sultan A, Laporte L, Egyed-Zsigmond E, Alhamzeh A, Granitzer M. Stacked
Gender Prediction from Tweet Texts and Images Notebook for PAN at CLEF 2018. In:
29
CLEF 2018 - Conference and Labs of the Evaluation; 2018. p. 11p. Available from:
https://hal.archives-ouvertes.fr/hal-02013987.
[9] Mukherjee A, Liu B. Improving gender classification of blog authors. In: Proceedings
of the 2010 Conference on Empirical Methods in Natural Language Processing; 2010. p.
207–217.
[10] Rao D, Yarowsky D, Shreevats A, Gupta M. Classifying latent user attributes in Twitter.
In: Proceedings of the 2nd International Workshop on Search and Mining User-generated
Contents; 2010. p. 37–44.
[11] Burger JD, Henderson J, Kim G, Zarrella G. Discriminating gender on Twitter. In:
Proceedings of the Conference on Empirical Methods in Natural Language Processing;
2011. p. 1301–1309.
[12] Van Durme B. Streaming analysis of discourse participants. In: Proceedings of the 2012
Joint Conference on Empirical Methods in Natural Language Processing and Computa-
tional Natural Language Learning; 2012. p. 48–58.
[13] Volkova S, Yarowsky D. Improving gender prediction of social media users via weighted
annotator rationales. In: NeurIPS Workshop on Personalization; 2014. .
[14] Volkova S, Bachrach Y, Armstrong M, Sharma V. Inferring Latent User Properties from
Texts Published in Social Media. In: AAAI Conference on Artificial Intelligence; 2015. p.
4296–4297.
[15] Pennacchiotti M, Popescu AM. Democrats, Republicans and Starbucks afficinados: User
classification in Twitter. In: Proceedings of the 17th ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining; 2011. p. 430–438.
[16] Eisenstein J, Smith NA, Xing EP. Discovering sociolinguistic associations with structured
sparsity. In: Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies - Volume 1; 2011. p. 1365–1374.
[17] Rao D, Paul M, Fink C, Yarowsky D, Oates T, Coppersmith G. Hierarchical Bayesian
models for latent attribute detection in social media. In: International AAAI Conference
on Weblogs and Social Media; 2011. p. 598–601.
30
[18] Li Y, Yang L, Xu B, Wang J, Lin H. Improving User Attribute Classification with Text
and Social Network Attention. Cognitive Computation. 2019 Aug;11(4):459–468. Available
from: https://doi.org/10.1007/s12559-019-9624-y.
[19] Favaretto RM, Knob P, Musse SR, Vilanova F, Costa ˆ
AB. Detecting personality and
emotion traits in crowds from video sequences. Machine Vision and Applications. 2019
Jul;30(5):999–1012. Available from: https://doi.org/10.1007/s00138-018-0979-y.
[20] Al-Ghadir AI, Azmi AM. A Study of Arabic Social Media Users—Posting Behavior and
Author’s Gender Prediction. Cognitive Computation. 2019 Feb;11(1):71–86. Available
from: https://doi.org/10.1007/s12559-018-9592-7.
[21] Favaretto RM, Knob P, Musse SR, Vilanova F, Costa ˆ
AB. Detecting personality and
emotion traits in crowds from video sequences. Machine Vision and Applications.
2019;30(5):999–1012.
[22] An G, Levitan SI, Hirschberg J, Levitan R. Deep Personality Recognition for Deception
Detection. In: Interspeech; 2018. p. 421–425.
[23] Moreno DRJ, Gomez JC, Almanza-Ojeda DL, Ibarra-Manzano MA. Prediction of Person-
ality Traits in Twitter Users with Latent Features. In: 2019 International Conference on
Electronics, Communications and Computers; 2019. p. 176–181.
[24] Bose R, Dey RK, Roy S, Sarddar D. Analyzing Political Sentiment Using Twitter Data. In:
Information and Communication Technology for Intelligent Systems. Springer Singapore;
2019. p. 427–436.
[25] Volkova S, Durme BV. Online Bayesian Models for Personal Analytics in Social Media. In:
AAAI Conference on Artificial Intelligence; 2015. p. 2325–2331.
[26] Seabrook EM, Kern ML, Fulcher BD, Rickard NS. Predicting Depression From
Language-Based Emotion Dynamics: Longitudinal Analysis of Facebook and Twit-
ter Status Updates. J Med Internet Res. 2018 May;20(5):e168. Available from:
http://www.jmir.org/2018/5/e168/.
[27] Riordan B, Wade H, Upal A. Detecting sociostructural beliefs about group status differ-
ences in online discussions. In: Proceedings of the Joint Workshop on Social Dynamics and
Personal Attributes in Social Media; 2014. p. 1–6.
31
[28] Gottipati S, Qiu M, Yang L, Zhu F, Jiang J. An Integrated Model for User Attribute
Discovery: A Case Study on Political Affiliation Identification. In: Tseng V, Ho T, Zhou
ZH, Chen AP, Kao HY, editors. Advances in Knowledge Discovery and Data Mining. vol.
8443 of Lecture Notes in Computer Science. Springer International Publishing; 2014. p.
434–446.
[29] Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, et al.
Personality, gender, and age in the language of social media: The open vocabulary ap-
proach. PLOS ONE. 2013 Sept;8(9):1–16.
[30] Chang J, Rosenn I, Backstrom L, Marlow C. ePluribus: Ethnicity on social networks. In:
Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media;
2010. p. 18–25.
[31] Mohammady E, Culotta A. Using county demographics to infer attributes of Twitter
users. In: Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes
in Social Media; 2014. p. 7–16.
[32] Yang SH, Long B, Smola A, Sadagopan N, Zheng Z, Zha H. Like like alike: Joint friendship
and interest propagation in social networks. In: Proceedings of the 20th International
Conference on World Wide Web. WWW ’11; 2011. p. 537–546.
[33] Gong NZ, Talwalkar A, Mackey LW, Huang L, Shin ECR, Stefanov E, et al. Predicting links
and inferring attributes using a social-attribute network (SAN). In: The 6th SNA-KDD
Workshop; 2012. .
[34] Filippova K. User demographics and language in an implicit social network. In: Proceedings
of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP-CoNLL); 2012. p. 1478–1488.
[35] Nguyen D, Gravel R, Trieschnigg D, Meder T. “How old do you think I am?” A study
of language and age in Twitter. In: Proceedings of the AAAI Conference on Weblogs and
Social Media (ICWSM); 2013. p. 439–448.
[36] Bergsma S, Post M, Yarowsky D. Stylometric Analysis of Scientific Articles. In: Proceed-
ings of the North American Association of Computational Linguistics. Montreal, CA; 2012.
p. 327–337.
32
[37] Bergsma S, Dredze M, Durme BV, Wilson T, Yarowsky D. Broadly improving user classifi-
cation via communication-based name and location clustering on Twitter. In: Proceedings
of the Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies; 2013. p. 1010–1019.
[38] Eisenstein J, O’Connor B, Smith NA, Xing EP. A Latent Variable Model for Geographic
Lexical Variation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural
Language Processing. EMNLP ’10; 2010. p. 1277–1287.
[39] Kelly EL, Conley JJ. Personality and compatibility: A prospective analysis of marital sta-
bility and marital satisfaction. Journal of Personality and Social Psychology. 1987;52(1):27.
[40] Roberts B, Kuncel N, Shiner R, Caspi A, Goldberg L. The power of personality: The
comparative validity of personality traits, socioeconomic status, and cognitive ability for
predicting important life outcomes. Perspectives on Psychological Science. 2007;4(2):313–
345.
[41] Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M, Stillwell DJ, et al. Automatic
Personality Assessment Through Social Media Language. Journal of Personality and Social
Psychology. 2014;.
[42] Pennebaker JW, King LA. Linguistic styles: Language use as an individual difference.
Journal of Personality and Social Psychology. 1999;77(6):1296.
[43] Mairesse F, Walker MA, Mehl MR, Moore RK. Using linguistic cues for the automatic
recognition of personality in conversation and text. Journal of Artificial Intelligence Re-
search. 2007;30:457–500.
[44] Whitty MT, Doodson J, Creese S, Hodges D. A picture tells a thousand words: What
Facebook and Twitter images convey about our personality. Personality and Individual
Differences. 2018;133:109–114.
[45] Lay A, Ferwerda B. Predicting Users’ Personality Based on Their ’Liked’ Images on Insta-
gram. In: The 23rd International on Intelligent User Interfaces, March 7-11, 2018; 2018.
.
[46] Newman ML, Groom CJ, Handelman LD, Pennebaker JW. Gender differences in language
use: An analysis of 14,000 text samples. Discourse Processes. 2008;45(3):211–236.
33
[47] You Q, Bhatia S, Sun T, Luo J. The eyes of the beholder: Gender prediction using images
posted in online social networks. In: 2014 IEEE International Conference on Data Mining
Workshop. IEEE; 2014. p. 1026–1030.
[48] Zhang D, Islam MM, Lu G. A review on automatic image annotation techniques. Pattern
Recognition. 2012;45(1):346–362.
[49] Hossain M, Sohel F, Shiratuddin MF, Laga H. A comprehensive survey of deep learning
for image captioning. ACM Computing Surveys (CSUR). 2019;51(6):118.
[50] Mithun NC, Panda R, Papalexakis EE, Roy-Chowdhury AK. Webly Supervised Joint
Embedding for Cross-Modal Image-Text Retrieval. In: Proceedings of the 26th ACM
International Conference on Multimedia. MM ’18. New York, NY, USA: ACM; 2018. p.
1856–1864. Available from: http://doi.acm.org/10.1145/3240508.3240712.
[51] Johnson J, Karpathy A, Fei-Fei L. Densecap: Fully convolutional localization networks
for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition; 2016. p. 4565–4574.
[52] McCrae RR, John OP. An introduction to the five-factor model and its applications.
Journal of Personality. 1992;60(2):175–215.
[53] John OP, Srivastava S. The Big Five trait taxonomy: History, measurement, and theoretical
perspectives. Handbook of Personality: Theory and Research. 1999;2(1999):102–138.
[54] Yoder PJ, Blackford JU, Waller NG, Kim G. Enhancing power while controlling family-
wise error: An illustration of the issues using electrocortical studies. Journal of Clinical
and Experimental Neuropsychology. 2004;26(3):320–331.
[55] Redi M, Quercia D, Graham L, Gosling S. Like Partying? Your Face Says It All. Predicting
the Ambiance of Places with Profile Pictures. In: Ninth International AAAI Conference
on Web and Social Media; 2015. .
[56] Khouw N. The meaning of color for gender. Colors Matters–Research. 2002;.
[57] Van De Weijer J, Schmid C, Verbeek J, Larlus D. Learning color names for real-world
applications. IEEE Transactions on Image Processing. 2009;18(7):1512–1523.
34
[58] Valdez P, Mehrabian A. Effects of color on emotions. Journal of Experimental Psychology:
General. 1994;123(4):394.
[59] Machajdik J, Hanbury A. Affective image classification using features inspired by psy-
chology and art theory. In: Proceedings of the 18th ACM International Conference on
Multimedia. ACM; 2010. p. 83–92.
[60] Lovato P, Bicego M, Segalin C, Perina A, Sebe N, Cristani M. Faved! Biometrics: Tell
me which image you like and I’ll tell you who you are. IEEE Transactions on Information
Forensics and Security. 2014;9(3):364–374.
[61] Gosling SD, Ko SJ, Mannarelli T, Morris ME. A room with a cue: Personality judg-
ments based on offices and bedrooms. Journal of Personality and Social Psychology.
2002;82(3):379.
[62] Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. Learning deep features for scene
recognition using places database. In: Advances in Neural Information Processing Systems;
2014. p. 487–495.
[63] Mathias M, Benenson R, Pedersoli M, Van Gool L. Face detection without bells and
whistles. In: European Conference on Computer Vision. Springer; 2014. p. 720–735.
[64] Gosling SD, Craik KH, Martin NR, Pryor MR. Material attributes of personal living spaces.
Home Cultures. 2005;2(1):51–87.
[65] Fellbaum C. WordNet. Wiley Online Library; 1998.
[66] Ciaramita M, Johnson M. Supersense tagging of unknown nouns in WordNet. In: Pro-
ceedings of the 2003 Conference on Empirical Methods in Natural Language Processing.
Association for Computational Linguistics; 2003. p. 168–175.
[67] Bentivogli L, Forner P, Magnini B, Pianta E. Revising the WordNet domains hierarchy:
Semantics, coverage and balancing. In: Proceedings of the Workshop on Multilingual
Linguistic Ressources. Association for Computational Linguistics; 2004. p. 101–108.
[68] Finkel JR, Grenager T, Manning C. Incorporating non-local information into information
extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on
Association for Computational Linguistics; 2005. p. 363–370.
35
[69] Li JJ, Nenkova A. Fast and Accurate Prediction of Sentence Specificity. In: AAAI; 2015.
p. 2281–2287.
[70] Coltheart M. The MRC psycholinguistic database. The Quarterly Journal of Experimental
Psychology. 1981;33(4):497–505.
[71] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words
and phrases and their compositionality. In: Advances in Neural Information Processing
Systems; 2013. p. 3111–3119.
[72] Oberlander J, Nowson S. Whose thumb is it anyway?: Classifying author personality from
weblog text. In: COLING/ACL; 2006. p. 627–634.
36
... Recently, researchers focused on studying how the personality traits are manifested on social networks. Such researches proved the presence of relationships between the Big Five traits and traditional features extracted from users" generated data (i.e., text and images) [6][7][8]. On the same side, many psychological studies managed to examine the relationships between the personality traits and some psychological characteristics. These characteristics have proven to be good predictors for many personality traits. ...
... These studies made use of all possible features which can be extracted from an individual"s personal profile. To enhance the assessment"s performance, researchers focused on studying the relationships between personality traits and different social networks related features [7,[11][12][13]. Most of these studies focused on applying text analysis to extract traditional textual features such as words count, part of speech (POS) tags, and N-grams [14]. ...
Article
Full-text available
Recently, researches focused on studying how the Big Five personality traits are manifested on social networks. These researches proved the presence of relationships between the Big Five Personality traits and various social networks features extracted from users' generated content. In this paper, the relationships between the Big Five personality traits (Openness to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) and attributes of personality characteristics identified as the Personal Values and Human Needs are studied. These attributes or namely features, are extracted from users' posts on social media. The relationship between the traits and proposed attributes is investigated through Pearson correlation coefficients. A dataset for 564 Twitter users is used in an experimental study, where findings proved the presence of relevant correlations between the traits and the proposed personality characteristic features. The Conscientiousness, Agreeableness, and Neuroticism traits showed strong relations existence with all of the Personal Values features, while the Openness to experience and Neuroticism traits showed strong correlations with Liberty and Self-expression Needs features consecutively. The proposed study verified the effectiveness of the proposed Personal Values and Human Needs features as indicators for the Big Five personality traits, proving their ability for personality characteristics classification.
... The idea that language use reveals information about personality has long circulated in the social and medical sciences. Previous research has demonstrated that the way people use words convey a great deal of information about themselves and their mental health conditions [1][2][3][4], including academic success [5]; however, much of the previous research has focused on the analysis of self-reports or essays. In contrast, implicit motives, which are indicators used by professional psychologists during the aptitude diagnosis, are not readily accessible to the conscious mind and, therefore, not detected using self-reports of personal needs, or through essay writing [6]. ...
... To perform our experiments, we employed the dataset available in the GermEval 2020 shared task on the "Classification and Regression of Cognitive and Motivational style from the text". 1 The provided data, mostly written in standard German language, has been collected from around 14000 subjects that participated in the OMT test. 4 Each answer was manually labeled with the motives (0, A, L, M, F) and the levels (from 0 to 5). ...
Article
Full-text available
According to the psychological literature, implicit motives allow for the characterization of behavior, subsequent success, and long-term development. Contrary to personality traits, implicit motives are often deemed to be rather stable personality characteristics. Normally, implicit motives are obtained by Operant Motives, unconscious intrinsic desires measured by the Operant Motive Test (OMT). The OMT test requires participants to write freely descriptions associated with a set of provided images and questions. In this work, we explore different recent machine learning techniques and various text representation techniques for facing the problem of the OMT classification task. We focused on advanced language representations (e.g, BERT, XLM, and DistilBERT) and deep Supervised Autoencoders for solving the OMT task. We performed an exhaustive analysis and compared their performance against fully connected neural networks and traditional support vector classifiers. Our comparative study highlights the importance of BERT which outperforms the traditional machine learning techniques by a relative improvement of 7.9%. In addition, we performed an analysis of how the BERT attention mechanism is being modified. Our findings indicate that the writing style features acquire higher importance at the moment of accurately identifying the different OMT categories. This is the first time that a study to determine the performance of different transformer-based architectures in the OMT task is performed. Similarly, our work propose, for the first time, using deep supervised autoencoders in the OMT classification task. Our experiments demonstrate that transformer-based methods exhibit the best empirical results, obtaining a relative improvement of 7.9% over the competitive baseline suggested as part of the GermEval 2020 challenge. Additionally, we show that features associated with the writing style are more important than content-based words. Some of these findings show strong connections to previously reported behavioral research on the implicit psychometrics theory.
... Moreover, sentiments and emotions are complex cognitive processes, with rich features that are difficult to interpret and portray through a single modality alone (Qiu, Liu, & Lu, 2018). Hence, multimodal methods for SER gained increasing interest (Zhao et al., 2019;Kumar, Srinivasan, Cheng, & Zomaya, 2020), especially in light of recent studies which showed that they can substantially improve the prediction of emotional states by overcoming the limitations of unimodal systems (Burdick, Mihalcea, Boyd, & Pennebaker, 2021). ...
Article
Prior approaches for multimodal sentiment and emotion recognition (SER) exploit input data representations and neural networks based on the classical Euclidean geometry. Recently, however, the hyperbolic metric proved to be a powerful tool for data mapping, being able to capture the hierarchical structure of the relations among elements in the data. In this paper we propose the use of hyperbolic learning for SER, and show that the inclusion in the neural network of hyperbolic structures mapping the input into the hyperbolic space can improve the quality of the predictions. The benefits brought by the hyperbolic features are evaluated by developing extensions of existing methods following two approaches. From one side, we modified state-of-the-art models by including hyperbolic output layers. From the other, we generated hybrid neural network architectures by combining hyperbolic and Euclidean layers according to different schemes. The proposed hyperbolic models were tested on several classification tasks applied to benchmark multimodal SER datasets. Experiments gave strong evidence that in both simple and complex networks the introduction of a hyperbolic structure results in an improvement of the model accuracy. Specifically, the combined use of hyperbolic and Euclidean layers showed superior performance in almost all the classification tasks.
... For future work, we plan to improve the current techniques for tweets authorship profiling, such as gender [17], age group [59,60]. Alternative deep learning networks with early exits [61], and ensemble-based approaches [62] will also be explored. ...
Article
Full-text available
Stance detection is a relatively new concept in data mining that aims to assign a stance label (favor, against, or none) to a social media post towards a specific predetermined target. These targets may not be referred to in the post, and may not be the target of opinion in the post. In this paper, we propose a novel enhanced method for identifying the writer's stance of a given tweet. This comprises a three-phase process for stance detection: (a) tweets preprocessing; here we clean and normalize tweets (e.g., remove stop-words) to generate words and stems lists, (b) features generation; in this step, we create and fuse two dictionaries for generating features vector, and lastly (c) classification; all the instances of the features are classified based on the list of targets. Our innovative feature selection proposes fusion of two ranked lists (top-k) of term frequency-inverse document frequency (tf-idf) scores and the sentiment information. We evaluate our method using six different classifiers: K nearest neighbor (K-NN), discernibility-based K-NN, weighted K-NN, class-based K-NN, exemplar-based K-NN, and Support Vector Machines. Furthermore, we investigate the use of Principal Component Analysis and study its effect on performance. The model is evaluated on the benchmark dataset (SemEval-2016 task 6), and the results significance is determined using t-test. We achieve our best performance of macro F-score (averaged across all topics) of 76.45% using the weighted K-NN classifier. This tops the current state-of-the-art score of 74.44% on the same dataset.
... The shifting landscape of human experience has made the subtleties of behavioural traces particularly important: the physical and digital footprints that human behaviour leaves behind are a veritable goldmine of personality data (Lambiotte & Kosinski, 2014). The images that a person chooses to share with others (Burdick, Mihalcea, Boyd, & Pennebaker, 2020;, a person's words and other features of their verbal behaviour Golbeck, 2016;Hoover, Dehghani, Johnson, Iliev, & Graham, 2018;Kern et al., 2014;Mitra, Counts, & Pennebaker, 2016;Park et al., 2015), URL clicks (Lien, Bai, & Chen, 2019;Tellakat, Boyd, & Pennebaker, 2019), social behaviours (Adali & Golbeck, 2012;Hilbig, Thielmann, Hepp, Klein, & Zettler, 2015), and self-presentation behaviours (Liu, Preotiuc-Pietro, Samani, Moghaddam, & Ungar, 2016;Segalin et al., 2017;Shiramizu, Kozma, DeBruine, & Jones, 2019;Todorov et al., 2005;Walker, Schönborn, Greifeneder, & Vetter, 2018) are no longer simply reflections of personality-they are the critical raw material for understanding personality itself. ...
Article
Personality psychology has long been grounded in data typologies, particularly in the delineation of behavioural, life outcome, informant-report, and self-report sources of data from one another. Such data typologies are becoming obsolete in the face of new methods, technologies, and data philosophies. In this article, we discuss personality psychology’s historical thinking about data, modern data theory’s place in personality psychology, and several qualities of big data that urge a rethinking of personality itself. We call for a move away from self-report questionnaires and a reprioritization of the study of behaviour within personality science. With big data and behavioural assessment, we have the potential to witness the confluence of situated, seamlessly interacting psychological processes, forming an inclusive, dynamic, multiangle view of personality. However, big behavioural data come hand in hand with important ethical considerations, and our emerging ability to create a ‘personality panopticon’ requires careful and thoughtful navigation. For our research to improve and thrive in partnership with new technologies, we must not only wield our new tools thoughtfully, but humanely. Through discourse and collaboration with other disciplines and the general public, we can foster mutual growth and ensure that humanity’s burgeoning technological capabilities serve, rather than control, the public interest.
Article
Full-text available
Past literature has shown that extraversion is related to the use of positive emotion and social process words. However, the strength of the relationships varies substantially across studies. In this research, we conducted a meta-analysis (k = 37, N = 82,132) to estimate the overall effect size of the two linguistic correlates of extraversion. In addition, we tested potential moderators including demographic variables (e.g., age and gender) and communication contexts (e.g., synchronous vs. asynchronous, public vs. private). Our random effects models revealed a small correlation between extraversion and positive emotion words (r = .069, 95% CI = [.041, .096]), and a small correlation between extraversion and social process words (r = .077, 95% CI = [.044, .109]). In addition, the strength of the relationship between extraversion and positive emotion words varies across communication contexts, while the relationship between extraversion and social process words remains consistent across contexts. Our results suggest that positive emotion words and social process words are linguistic correlates of extraversion, but they are small in magnitude.
Preprint
Full-text available
Past literature has shown that extraversion is related to the use of positive emotion and social process words. However, the strength of the relationships varies substantially across studies. In this research, we conducted a meta-analysis (k = 37, N = 82,132) to estimate the overall effect size of the two linguistic correlates of extraversion. In addition, we tested potential moderators including demographic variables (e.g., age and gender) and communication contexts (e.g., synchronous vs. asynchronous, public vs. private). Our random effects models revealed a small correlation between extraversion and positive emotion words (r = .069, 95% CI = [.041, .096]), and a small correlation between extraversion and social process words (r = .077, 95% CI = [.044, .109]). In addition, the strength of the relationship between extraversion and positive emotion words varies across communication contexts, while the relationship between extraversion and social process words remains consistent across contexts. Our results suggest that positive emotion words and social process words are linguistic correlates of extraversion, but they are small in magnitude.
Conference Paper
Full-text available
This paper describes our participation at the PAN 2018 Author Profiling task. Given texts and images from some Twitter's authors, the goal is to estimate their genders. We considered all the languages (Arabic, English and Spanish) and all the prediction types (only from texts, only from images and combined). The final submitted system is a stacked classifier composed of two main parts. The first one, based on previous PAN Author Profiling editions, concerns gender prediction from texts. It consists in a pipeline of preprocessing, word n-grams from 1 to 2, TF-IDF with sublinear weighting, Linear Support Vector classification and probability calibration. The second part is formed by different layers of classifiers used for gender estimation from images: four base classifiers (object detection, face recognition, colour histograms, local binary patterns) in the first layer, a meta classifier in the second layer and an aggregation classifier as third layer. Finally, the two gender predictions, from texts and images, feed into the last layer classifier that provides the combined gender predictions.
Article
Full-text available
User attribute classification is an important research topic in social media user profiling, which has great commercial value in modern advertisement systems. Existing research on user profiling has mostly focused on manually handcrafted features for different attribute classification tasks. However, these research has partly overlooked the social relation of users. We propose an end-to-end neural network model called the social convolution attention neural network. Our model leverages the convolution attention mechanism to automatically extract user features with respect to different attributes from social texts. The proposed model can capture the social relation of users by combining semantic context and social network information, and improve the performance of attribute classification. We evaluate our model in the gender, age, and geography classification tasks based on the dataset from SMP CUP 2016 competition, respectively. The experimental results demonstrate that the proposed model is effective in automatic user attribute classification with a particular focus on fine-grained user information. We propose an effective model based on the convolution attention mechanism and social relation information for user attribute classification. The model can significantly improve the accuracy in various user attribute classification tasks.
Conference Paper
Full-text available
Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across modalities, most of these methods are plagued by the issue of training with small-scale datasets covering a limited number of images with ground-truth sentences. Moreover, it is extremely expensive to create a larger dataset by annotating millions of images with sentences and may lead to a biased model. Inspired by the recent success of webly supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation. Specifically, our main idea is to leverage web images and corresponding tags, along with fully annotated datasets, in training for learning the visual-semantic joint embedding. We propose a two-stage approach for the task that can augment a typical supervised pair-wise ranking loss based formulation with weakly-annotated web images to learn a more robust visual-semantic embedding. Experiments on two standard benchmark datasets demonstrate that our method achieves a significant performance gain in image-text retrieval compared to state-of-the-art approaches.
Article
Full-text available
Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning.
Article
Full-text available
This paper presents a methodology to detect personality and basic emotion characteristics of crowds in video sequences. Firstly, individuals are detected and tracked, and then groups are recognized and characterized. Such information is then mapped to OCEAN dimensions, used to find out personality and emotion in videos, based on OCC emotion models. Although it is a clear challenge to validate our results with real life experiments, we evaluate our method with the available literature information regarding OCEAN values of different Countries and also emergent Personal distance among people. Hence, such analysis refer to cultural differences of each country too. Our results indicate that this model generates coherent information when compared to data provided in available literature, as shown in qualitative and quantitative results.
Article
Full-text available
Social media opens up numerous possibilities to study human interaction and collective behavior in an unprecedented scale. It opened a whole new venue for research under the name “social computing”. Researchers are interested in profiling individuals (e.g., gender, age group), groups, community, and networking. We are interested in studying the collective behavior of Arabic social media users. Most studies covering Arabic social media has focused on the sentiment analysis of, say tweets. This study, however, looks into who and when users interact with the Arabic social media. Specifically, there are two objectives of this work. First, studying the demographic posting behavior of social media users from two different perspectives: gender and educational level. Second, author profiling. Identifying author’s gender of a social media post. We use Saudi Arabia, a very prolific country when it comes to social media in general, as a backdrop for this study. The results in this study are based on mining huge amount of metadata of a popular local social media forum covering the period 2011–14 inclusive. The extracted features (normalized list of k highest scoring words, and likewise for stems) from the posts were used to train classifiers to identify the author’s gender. We used two different classifiers, Support Vector Machine (SVM) with linear kernel and 1-NN (1-nearest neighbor), and experimented with different sizes for the list of features. When the number of features (size of the features vector) is small (≤ 50) both classifiers perform equally well in identifying the author’s gender, but we risk overfitting the data. The classifiers achieved their best result when using 100 features. The 1-NN classifier delivered a better performance, achieving a balanced accuracy of 93.16% vs 87.33% for the SVM in predicting the author’s gender. And for a larger set of features, SVM delivered a better performance and more stable behavior than 1-NN, but still nowhere close to its best performance. We used t test to confirm our assessment that the difference between the performance of both classifiers is statistically significant. Based on that, we recommend using 100 features, and we get our best result using 1-NN with a balanced accuracy of 93.16%.
Article
Recent studies have demonstrated that specificity is an important characterization of texts potentially beneficial for a range of applications such as multi-document news summarization and analysis of science journalism. The feasibility of automatically predicting sentence specificity from a rich set of features has also been confirmed in prior work. In this paper we present a practical system for predicting sentence specificity which exploits only features that require minimum processing and is trained in a semi-supervised manner. Our system outperforms the state-of-the-art method for predicting sentence specificity and does not require part of speech tagging or syntactic parsing as the prior methods did. With the tool that we developed --- Speciteller --- we study the role of specificity in sentence simplification. We show that specificity is a useful indicator for finding sentences that need to be simplified and a useful objective for simplification, descriptive of the differences between original and simplified sentences.
Article
We demonstrate an approach to predict latent personal attributes including user demographics, online personality, emotions and sentiments from texts published on Twitter. We rely on machine learning and natural language processing techniques to learn models from user communications. We first examine individual tweets to detect emotions and opinions emanating from them, and then analyze all the tweets published by a user to infer latent traits of that individual. We consider various user properties including age, gender, income, education, relationship status, optimism and life satisfaction. We focus on Ekman’s six emotions: anger, joy, surprise, fear, disgust and sadness. Our work can help social network users to understand how others may perceive them based on how they communicate in social media, in addition to its evident applications in online sales and marketing, targeted advertising, large scale polling and healthcare analytics.
Article
Latent author attribute prediction in social media provides a novel set of conditions for the construction of supervised classification models. With individual authors as training and test instances, their associated content (“features”) are made available incrementally over time, as they converse over discussion forums. We propose various approaches to handling this dynamic data, from traditional batch training and testing, to incremental bootstrapping, and then active learning via crowdsourcing. Our underlying model relies on an intuitive application of Bayes rule, which should be easy to adopt by the community, thus allowing for a general shift towards online modeling for social media.