ArticlePDF Available

Abstract and Figures

Psychological studies have shown that personality traits are associated with book preferences. However, past findings are based on questionnaires focusing on conventional book genres and are unrepresentative of niche content. For a more comprehensive measure of book content, this study harnesses a massive archive of content labels, also known as 'tags', created by users of an online book catalogue, Combined with data on preferences and personality scores collected from Facebook users, the tag labels achieve high accuracy in personality prediction by psychological standards. We also group tags into broader genres, to check their validity against past findings. Our results are robust across both tag and genre levels of analyses, and consistent with existing literature. Moreover, user-generated tag labels reveal unexpected insights, such as cultural differences, book reading behaviors, and other non-content factors affecting preferences. To our knowledge, this is currently the largest study that explores the relationship between personality and book content preferences.
Content may be subject to copyright.
Predicting Personality from Book Preferences
with User-Generated Content Labels
Ng Annalyn, Maarten W. Bos, Leonid Sigal, and Boyang Li
Abstract—Psychological studies have shown that personality traits are associated with book preferences. However, past
findings are based on questionnaires focusing on conventional book genres and are unrepresentative of niche content. For a
more comprehensive measure of book content, this study harnesses a massive archive of content labels, also known as ‘tags,
created by users of an online book catalogue, Combined with data on preferences and personality scores
collected from Facebook users, the tag labels achieve high accuracy in personality prediction by psychological standards. We
also group tags into broader genres, to check their validity against past findings. Our results are robust across both tag and
genre levels of analyses, and consistent with existing literature. Moreover, user-generated tag labels reveal unexpected
insights, such as cultural differences, book reading behaviors, and other non-content factors affecting preferences. To our
knowledge, this is currently the largest study that explores the relationship between personality and book content preferences.
Index TermsPersonality Profiling, Narrative Preferences, Social Media, Behavioural Footprints
—————————— u ——————————
“Histories make men wise;
poets witty;
the mathematics subtle;
natural philosophy deep;
moral grave;
logic and rhetoric able to contend.”
By Francis Bacon, Of Studies (1597)
rancis Bacon may have been the first to suggest a corre-
lationperhaps even a causal relationbetween book
preferences and the personality of readers. Indeed, re-
search has found that reading fiction leads to changes in
personality [1] and increased empathy [2]. While book
reading may influence personality, personality in turn may
affect book choice. This is supported by correlations found
between personality and book preferences [3]. Being able
to predict book preferences using readers’ personality has
many potential applications, such as personalizing prod-
ucts and services, improving recommender systems, and
enabling targeted advertising.
However, due to difficulties in data collection, research
on personality and book preferences typically focus on a
few dozen book genres or less, such as having four genres
for novels [4], 16 genres for books [5], or 34 genres for
books and magazines combined [6]. The largest study [3]
to our knowledge inspected 81 book topics and their corre-
lations with readers’ personality. Narrow categorizations
of book content can be problematic as preferences for niche
genres may be inaccurately inferred. Moreover, studies
measured book preferences using self-report question-
naires, which can be lengthy and thus vulnerable to errant
or null responses [7].
To im pro ve bo th t he q ua nt ity an d qu al it y of d at a fo r ou r
study, we combine two online data sources. For data on
book content, we use over 24,000 user-supplied tags from
a book catalogue website, For data on
reader personality and book preferences, we use a data-
base of Facebook profiles comprising more than 60,000 re-
spondents who had liked book-themed Facebook pages
and who had also completed a personality survey on the
social networking site [8].
We a do pt the Big Five personality model, also known as
the five-factor model, which consists of extraversion,
agreeableness, openness, neuroticism and conscientious-
ness. This set of five traits is known to predict a wide range
of behaviors and psychopathology [9]. We briefly review
their known associations with book preferences here:
Extraverts enjoy social activities and have high arousal
levels [9]. They prefer content related to social activities
such as parties [10], as well as arousing content such as hor-
ror [6]. Hence, we hypothesize that extraversion would be
associated with a preference for genres with socially ori-
ented themes, as well as genres which are stimulating.
Agreeable individuals are kind and considerate [9], and
tend to empathize with story characters [11]. They prefer
narratives on positive social relationships, such as romance
and family [6], hence they are likely to steer clear of violent
xxxx-xxxx/0x/$xx.00 © 200x IEEE Published by the IEEE Computer Society
N. Annalyn is with the Ministry of Defence (Singapore), #16-01 Defence
Te ch n o lo g y To w e r B , Singapore 109681.
M. W. Bos, L. Sigal, and B. Li are with Disney Research Pittsburgh, 4720
Forbes Avenue, Lower Level, Suite 110, Pittsburgh, PA 15213.
E-mail: {mbos, lsigal, }
or disturbing themes. As agreeable people tend to evaluate
media content favorably in general [10], we might also ex-
pect them to likemore books on average.
Open individuals seek intellectual stimulation and are
comfortable with new ideas [9]. Openness predicts a pref-
erence for avant-garde genres [12], and has been a con-
sistent predictor of fiction exposure [2], [3], [4], [13]. Based
on existing literature, we hypothesize that individuals
with high openness would appreciate intellectually stimu-
lating and fiction genres. Since reading is an intellectual ac-
tivity, we would also expect open individuals to ‘like’ more
books in general.
Neurotic individuals are emotionally unstable. Being
prone to feeling lonely and depressed [14], they may use
media narratives as a means of escape from everyday life
[4]. Hence, we hypothesize that neurotic individuals
would prefer narratives about feel-good, alternative reali-
Conscientious individuals are achievement-striving
and self-disciplined, preferring deliberate planning over
spontaneity [9]. They have also been reported to like non-
fiction content such as news and politics [6], [15]. Hence,
we hypothesize that conscientiousness would be associ-
ated with a preference for books of informative and practi-
cal content.
Merging personality data (from Facebook surveys) with
book data (from Facebook ‘likes’ and GoodReads), we con-
duct three levels of analysis. The first is a tag-level analysis,
in which we correlate personality with book tags. As tags
were spontaneously generated by readers themselves, they
contain richer information on book content compared to
the usual, smaller set of genre categories. Second is a genre-
level analysis. To verify that our findings are consistent
with those from previous studies, tags are clustered into
broader book genres and then correlated with personality
again. Third, we examine whether personality has an over-
arching influence on one’s tendency to like books in gen-
2.1 Personality and Book Preferences
We use data collected from a Facebook app called myPer-
sonality [8], which allows users to measure their Big Five
personality traits with the International Personality Item
Pool questionnaire [16]. Because users received feedback
on their personality scores, they were likely to be moti-
vated to respond diligently.
Besides their personality scores, users of the myPerson-
ality app also shared which Facebook pages they had
‘liked’. Facebook pages can be dedicated to any entity, such
as a book, movie, or celebrity (see Fig. 1 for an example).
For our study, we focus on pages labeled as books. This en-
ables us to examine the correlation between users’ person-
ality scores and their book preferences.
Data collected via Facebook has been shown to be com-
parable to data collected via standalone websites [17].
Moreover, the Facebook personality dataset we use in this
study has successfully predicted a range of personal traits,
from web browsing habits [18] to language use [19].
Yo uy ou e t al . even suggested that personality inferences of
users made based on digital footprints such as Facebook
‘likes’ are more accurate than those made by users' friends
[20]. Due to its wide adoption, we deem the reliability of
this dataset to be satisfactory for our study.
We use books pages with at least 50 ‘likes’ from Face-
book users who also completed the personality question-
naire. This achieves more reliable personality profile esti-
mates of people who liked each book. In all, we analyze
479 books that were likedby 61,662 users. For each of the
five personality dimensions, we took the median score of
all users who likedthe book as the aggregate personality
score for that book. Median scores were favored over mean
scores to reduce the influence of outliers.
2.2 Book Content
To e xt ra ct a b o ok ’s c on te n t, w e adopt a data-driven ap-
proach by mining user-generated tags from GoodReads, an
online book catalogue. When this study was conducted in
Fig. 1. Facebook ‘likes’ for various pages.
2016, the site had more than 40 million users and more than
1.3 billion books. Users can label books with descriptive
tags, which cover a broad range of concepts like genre (e.g.
children’s-literature), time of publication (e.g. 20th-
century-fiction), story characters (e.g. Dumbledore), au-
thor information (e.g. British-author), awards (e.g. or-
ange-prize) and reading behavior (e.g., back-burner).
Due to their richness, we chose these user-generated tags
as a proxy for book content. However, some tags, such as
upstairs-bookshelves, appear to make sense only to a
small group of users. This calls for robust analysis tech-
niques that can withstand noise.
With permission from Goodreads, we crawl their site for
the 479 books in our personality dataset, and then harvest
the tags which Goodreads users had associated with these
books. Besides accounting for the books present in the per-
sonality dataset, we also identify the top 50 books associ-
ated with each of the top 2000 most frequently-used tags
across the whole catalogue. We t he n crawl all tags associ-
ated with these books.
Next, we match book titles from Goodreads to their re-
spective pages on Facebook, leading to a many-to-many re-
lationship. For example, the book Harry Potter and the Phi-
losopher’s Stone is matched not only to the Facebook page
of the same name, but also to a general page for the Harry
Potter series. At the same time, the general Facebook page
for Harry Potter is matched to all seven books in the series.
Goodreads users can create their own tags. While this pro-
vides a rich source of information, it also introduces noise
that poses several challenges for analysis. To o ve rc om e
these challenges, we employ several techniques.
First, we use a set of criteria to filter tags for analysis:
For each book crawled, only tags that applied 3 times
or more are recorded.
Tags applied less than 50 times in total and tags ap-
plied to less than 15 books are discarded.
Tags must consist of at least 3 characters, at least 1
letter, and at most 2 non-English characters.
We u se these filtering criteria because they were
deemed via manual inspection to be effective at eliminat-
ing non-informative tags. After crawling and filtering, our
dataset contained 14,731 unique books, 24,091 unique tags
and 193,498,469 total tags.
Next, we identify four challenges in analyzing the tags
Information Value. Common tags (e.g. fiction,
book-club, and favorites) appear frequently across
many titles, and thus are not useful in distinguishing
between books.
Synonyms. Some tags have identical or similar
meanings (e.g. children and kids), and hence need
to be analysed as one.
Idiosyncrasies. Some tags are used whimsically. For
example, Harry Potter and the Philosopher’s Stone was
tagged as science more than 20 times.
Random Noise. We expect a baseline level of random
noise. If a tag is applied to a book 20 times, and to
another for 21 times, this difference would likely be
due to random fluctuations rather than actual differ-
ence in content.
To d is ti ng ui sh i nf or ma ti ve t ag s, we u s e the term fre-
quency-inverse document frequency (tf-idf) measure. With
tf-idf, the frequency !
"#$ of a tag appearing in a book is dis-
counted by how common the tag % generally is. In other
words, common tags such as fiction and favourites are
discounted heavily to indicate their low information value.
Letting !
"#$ denote the frequency of tag % appearing in book
&, ' denote the total number of books, (" denote the num-
ber of unique books that tag % is applied to, and we have
)f-idf )# * + !
"#$ ,-. / 0 '
Using tf-idf, we can build a book-by-tag matrix, 1. In 1,
each row represents a book, each column represents a tag, and
each entry represents the corresponding tf-idf value.
Next, we group similar tags together. We do this by
combining results from two similarity measures.
The first similarity measure is derived from the co-oc-
currence of tags in books. That is, if two tags occur in sim-
ilar books, the tags are likely to share similar meanings and
belong to the same genre. We compute a low-rank approx-
imation of 1, matrix 1. Formally, we minimize the follow-
ing objective:
67 1 8 1 9 s.t. :;5 < 1 = >
where the > is the desired rank of 1 and ?9 is the Fro-
benius norm. The minimization is achieved using singular
value decomposition. Each tag % is represented as a column
vector @" in 1. The similarity between two tags % and %A is
then computed as the cosine of the angle in between:
similarity %# 7% A+@"? @"B7
Although the above captures co-occurrence between
tags, we also want to directly capture lexical similarity.
Thus, we derive a second similarity measure based on
shared words between tags (e.g. between historical-
novel and historical-fiction). Each word in a tag is first
lemmatized using ClearNLP [21]. As in co-occurrence sim-
ilarity, we compute a tag-by-word matrix using tf-idf to
discount frequent words, followed by a low-rank approxi-
mation of the matrix. Similarity between tags can be com-
puted as the cosine distance between row vectors in this
Overall similarity is computed as a weighted sum of co-
occurrence-based (95% weight) and word-lemma-based
(5% weight) similarities. Then, we use the OPTICS cluster-
ing algorithm [22] to cluster similar tags together and to
discard tags that do not fit into any cluster. A round of
manual coding is performed to correct any errors in the
clustering, resulting in a total of 396 tag clusters, where
each tag cluster corresponds to a single semantic meaning.
Each tag cluster is then labelled with a semantically repre-
sentative tag as a label for book content, and henceforth
treated as a single tag for analysis against personality. Spe-
cifically, to consolidate tags belonging to the same cluster,
the median of their tf-idf values is used.
As a single Facebook page can contain multiple books
on Goodreads, we consolidate the book data by manually
mapping Goodreads books to Facebook pages. For each
book, we first normalize its feature vector comprising tf-
idf values to unit length. Next, feature vectors of books re-
ferred to by the same Facebook page are summed; the re-
sulting summed vector is then normalized to unit length
We c on du ct a t wo -level analysis to examine how personal-
ity predicts book content preferences at the tag level and at
the genre level. We also analyze how personality could in-
fluence one’s general tendency to like books.
4.1 Tag-Level Analysis
We compute correlations between the tf-idf values of each
tag cluster and each of the Big Five personality dimensions.
Next, we perform lasso regression to predict personality
from tag cluster features. Unlike regular regression, lasso
regression maintains a higher prediction accuracy despite
correlations between features (i.e. multi-collinearity)
through regularization. For each personality trait, we use
the regularization coefficient yielding the lowest mean
squared test error from a 10-fold cross-validation.
We a ls o p er fo rm the same prediction using a random
forest regression with 500 trees. The technique involves
simulating different combinations of features in multiple
decision trees to select the best combination of features that
predicts personality. As determined by cross validation,
each tree utilizes 132 variables selected randomly. With the
random forest regression, we compute the importance of
each tag cluster feature based on the increase in mean
squared prediction error when that feature is removed.
4.1.1 Results
Ta bl e 1 shows the tag clusters that are most strongly corre-
lated with each personality trait. All correlations shown are
statistically significant at C D EFEG and most are significant
at C D EFEE/. The most positive correlation is between the
back-burner tag cluster and the openness trait (H + EFIJ),
while the most negative correlation is between the light-
fantasy tag cluster and openness (H + 8EFIK).
Ta bl e 2 shows the tag clusters with the biggest absolute
coefficients in the five lasso regression analyses predicting
scores for each personality trait. Based on the LM values
H = correlation coefficient; * C D EFEG; ** C D EFE/; ***7C D EFEE/.
from each analysis, book content seems best at predicting
scores on the openness trait.
Fig. 2 shows the tag clusters that result in the largest de-
creases in mean squared error in the five random forest re-
gression analyses predicting scores for each personality
trait. Green and red colors represent positive and negative
correlations respectively between clusters and traits. Note
that most of the top predictive tag clusters for agreeable-
ness have positive correlations with the trait.
Results from correlation, lasso regression and random
forest regression analyses are largely consistent. For exam-
ple, fantasy-sci-fiction has a strong negative correla-
tion with extraversion (Ta b le 1), and this is supported by
the lasso regression predicting extraversion, which shows
strong negative coefficients for fantasy settings such as
parallel-world and forgotten_realms, (Table 2). This is
again supported by random forest regression findings that
the second most important variable in predicting extraver-
sion is fantasy-sci-fiction (Fig. 2). Differences between
Ta bl es 1 an d 2 may be attributed to the use of L1-regulari-
zation in lasso. The regularization penalizes the number of
non-zero coefficients, forcing the algorithm to assign
weights to tags that are not strongly correlated with each
While results from lasso and random forest regressions
are consistent, their LM values for each personality trait dif-
fer. For example, predictions of conscientiousness scores
have the lowest LM for lasso regression, but the second
highest LM for random forest regression. This difference
may be explained by the linearity constraint for lasso re-
gressionif the distribution of personality scores is non-lin-
ear, its LM in lasso regression may be affected.
4.1.2 Discussion
Overall, we find that book preferences can potentially be
used to predict personality traits:
Extraversion. As expected, our findings suggest that ex-
traverts enjoy books with social themes, as described by
the tags like relationships and chick lit. They also
seem interested to read about the lives of others, from
memoirs to celebrity romance. Curiously, preference for
African American literature is also associated with being
extraverted. This may be explained by African Americans
themselves being more extraverted than white Americans
[23]. Since we did not record race in our study, we cannot
rule out this explanation. On the other hand, introverts
seem to prefer books with themes such as fantasy, science
fiction, and supernatural forces, exhibiting a tendency to
indulge in imagination. Appreciation of Japanese culture,
especially manga and comics, is also associated with intro-
version. In general, book preferences explain a substantial
amount of variation in the extraversion dimension, con-
sistent with the consensus that extraversion is typically a
more salient trait to measure.
Agreeableness. Our findings suggest that agreeable
people enjoy books with family and religious themes, both
of which promote positive social relationships. On the flip
side, disagreeable individuals seem attracted to dark-
themed content such as psychological dramas. Cult clas-
sics, known for their controversial narratives, also seem ap-
pealing to these individuals who may have fewer qualms
about resisting popular opinion. Books with content re-
lated to Japan, Italy, and Russia are also read by people
who are less agreeable, possibly because people from these
cultures tend to score lower on agreeableness compared to
Americans [23]. Interestingly, most of the top tags predict-
ing agreeableness are positively correlated with the trait.
The absence of consistent tags endorsed by disagreeable
people suggests that these people also tend to disagree on
what they ‘liked’.
Openness. Open individuals seem to enjoy intellectu-
ally challenging books that the average person may find
difficult to complete (e.g. back-burners). Their preference
for classic literature further reinforces this view, as books
of this genre usually take substantial effort to finish. This
is consistent with past studies that found openness to be
LM = coefficient of determination, or the proportion of variance explained by
the lasso regression model; N = regression coefficient.
highly correlated with appreciation for art and literature
[13]. Our results also show that individuals scoring lower
on openness prefer mainstream content that are less cogni-
tively taxing and easier to digest, such as light-fantasy.
Content related to Christianity and India are also preferred
by readers with low openness, likely due to religious indi-
viduals [24] and Indians [23] scoring low on this trait.
Neuroticism. Neurotic individuals seem to indulge in
narratives that reflect their own emotional states, such as
Fig. 2. Top tags that resulted in largest decreases in mean squared error (MSE) in random forest regression analysis for each personality
trait. Colors show the correlation between tag and personality trait (red is negative while green is positive). (a) extraversion (LM+ EFOG); (b)
agreeableness (LM+ EFIP); (c) openness (LM+ EFPQ); (d) neuroticism (LM+ EFO/); (e) conscientiousness (LM+ EFOQ).
sad endings and mental issues. They also appear to en-
joy books on alternative realities, in line with the hypothe-
sis that these genres provide a means of escape [4]. Inter-
estingly, neurotic individuals like books with pretty co-
vers, possibly due to a gender effect as females tend to
score higher in neuroticism than males [25]. On the other
hand, emotionally stable individuals prefer self-improve-
ment and other non-fiction content that better reflect real-
ity. In general, we found book preferences to be good pre-
dictors of neuroticism, explaining as much as 59% of the
variance in this dimension.
Conscientiousness. Hardworking people appear to
prefer informative content that contributes to their profes-
sional development, or that simply boosts their knowledge
[6], [15]. On the other hand, people with low conscientious-
ness scores tend to like lighthearted content (e.g. humor)
and books aimed at youths (e.g., teenage-books). This can
be explained by how teenagers tend to score lower than the
middle-aged in conscientiousness [26].
In sum, our results show how book preferences can be
used to predict one’s personality. Besides personality, our
findings also reveal cultural differences in book prefer-
ences, further supporting the utility of online, user-gener-
ated data in deducing more comprehensive profiles of tar-
get audiences.
4.2 Genre-Level Analysis
Conclusions from our tag-level analyses are based on finer
descriptions of book content rather than traditional genres.
To te st t he i nt eg ri ty o f t ag s a s b oo k co nt en t d escr ip to rs , we
further group tag clusters into broader genres, which are
then used to predict personality scores again.
To o bt ai n ge n re c lu st er s, w e c om pu te the Pearson’s cor-
relation between the tf-idf values of books as a proxy for
dissimilarity (i.e. distance) between books. Next, the books
are clustered using the Partitioning Around Medoids
(PAM) algorithm, a form of k-medoid clustering [27]. Like
k-means, PAM aims to minimize the distance between
cluster members and their respective cluster centers
through an iterative algorithm. Unlike k-means however,
PA M a ss ig ns a ct ua l da ta p oi n ts as cl ust er ce nt er s. Hence, it
is more robust to noise and outliers than k-means because
it minimizes the sum of pairwise dissimilarities rather than
the sum of squared Euclidean distances.
To de te r mine the optimal number of clusters, we use sil-
houette width, a measure for data points' similarity within
their assigned cluster against their similarity to points in
other clusters. For data point R, we let STRU denote the aver-
age distance between R and all other data points in the clus-
ter that R is assigned to, and &TRU be the lowest average dis-
tance of R to any other cluster. Silhouette VTRU is defined as
V R + & R 8 STRU
3;W7XS R # & R Y
We e xa mi ne d results for 4 to 30 clusters, and eventually
chose the 27-cluster solution as it yielded large mean and
median silhouette widths across all clusters. These clusters
also represented a diverse range of genres that enable com-
parison with past literature. The composition of each genre
in terms of tags, as well as the personality profile of each
genre, are presented in the following results section.
4.2.1 Results
Genres clusters are given labels that are representative of
their member tags. Top t ag s f ro m example clusters are
shown in Ta bl e 3. These are the tags that appear most fre-
quently in a genre relative to the entire dataset.
For each genre, we took the median personality scores
of all books in that genre cluster, thus generating an overall
personality profile for that genre. Fig. 3 shows the aggre-
gated personality profiles for all 27 genres. Size of pie chart
slices are normalized to zero mean and unit variance.
A pr incipal component analysis was performed on the
aggregated personality scores across genres, and we found
that the openness and conscientiousness traits captured
the most variation in genre profiles. Thus, for visualization
purposes, we plot book genres for these two dimensions in
Fig. 4.
4.2.2 Discussion
Results from the genre-level analysis are consistent with
findings from both existing literature and our tag-level
analysis. For instance, people who like Self-improvement
books are more conscientious than those who like Comics,
and people who like Philosophy books are more open than
those who like Religious books.
There is one exception, however. While a previous
study [6] found that extraverts like horror, our findings
suggest the opposite books with horror themes seem to
appeal more to introverts. This discrepancy may be due to
the mode of narrative: while our study focuses on books,
the other study had examined television shows in addition
to books. While horror TV shows may be highly stimulat-
ing, the arousal may be muted in books, explaining the
lower preference for horror books among our extraverted
Apart from confirming our earlier results, genre clusters
also give new insights. For example, the Thriller cluster
contains detective and legal elements, which require criti-
cal thinking and perhaps even background knowledge on
law for a reader to fully appreciate the plot. This may ex-
plain why readers who like mystery books also score
higher on conscientiousness. Another interesting observa-
tion is that the Classics cluster has an average profile for all
five personality traits. This cluster contains time-honored
and household favorites, which would have appealed to
most people regardless of personality, thus resulting in a
profile that reflects the sample average.
Cultural differences are also apparent. People who like
Asian books are less open [23], consistent with results from
our tag-level analysis. However, people who like Asian
books are also relatively extraverted, which runs contrary
to claims that Asians are more introverted [28]. This dis-
crepancy may be due to Facebook being more attractive to
extraverted individuals in the first place [29], thus result-
ing in a more extraverted Asian user base.
We h ave sh ow n how personality profiles of readers can
be inferred from their preferred content, at both the tag and
genre level. A detailed tag-level analysis can provide more
resolution on book content, while a broader genre-level
analysis can identify associations between tags.
4.3 General Reading Disposition
Since personality has been found to correlate with book
preferences, it may also correlate with the tendency to like
books in the first place. To examine this, we compute cor-
relations between users’ personality scores and the num-
ber of book pages they ‘liked’.
It turns out that correlations are very weak (r’s < 0.06)
across four of the five traits: conscientiousness, extraver-
sion, neuroticism and, importantly, agreeableness. Alt-
hough previous studies found that agreeable people tend
to evaluate content favorably [10], our study finds a near-
zero correlation between agreeableness and the number of
books ‘liked’ on Facebook (r = -0.02). A possible explana-
tion may be that while agreeable people are less likely to
express dislike to avoid disagreements, they may nonethe-
less only ‘like’ a book when they genuinely enjoy the con-
The openness trait, on the other hand, is a relatively
strong and significant predictor (r = 0.12, p < 0.001) of num-
ber of books ‘liked’. This result lends support to our earlier
hypothesis: Open individuals appreciate a wider variety of
books, and thus ‘like’ more books on average.
We ac kn ow le dg e t ha t re li an ce on we b d ata may lead to a
few limitations. First, only popular books that had a Face-
book page with sufficient ‘likes’ are included in the analy-
sis. Hence, newer or niche books on the heavy tail of a book
popularity distribution may be overlooked. Second, Face-
book users have been found to be more extraverted, more
narcissistic, and less conscientious than average, and hence
they may not be representative of the general population
[29]. Third, Facebook ‘likes’ may be driven by the need for
social acceptance or recognition [30], [31], and thus may
not be a faithful reflection of a person’s preferences.
However, because our findings are in line with existing
literature, the above concerns are unlikely to have been sig-
nificant enough to skew results. In fact, despite sources of
noise and idiosyncrasies, user-generated tags have proven
to be a rich well of information that not only enabled us to
dive deeper into sub-genre preferences, but also to explore
broader preference-related behaviors.
Findings from our study are consistent across both tag and
genre levels of analyses, and also in line with existing liter-
ature, thus demonstrating the utility of online user-gener-
ated data in profiling target audiences. Besides predicting
personality from book preferences, user tags allow us to
uncover unexpected insights, such as cultural differences,
Fig. 4. Genres on the Openness-Conscientiousness dimensions.
book reading behaviors (e.g. ‘back-burner’), and other
non-content factors affecting preferences (e.g. ‘pretty co-
Future research can incorporate additional dimensions
such as year of publication, which may allow us to track
the evolution of genres. For instance, vanilla love stories in
the romance genre seem to be increasingly overtaken by
vampire-related themes, with series such as Tw ilight (2005-
2008) and Vam pire Aca demy (2007-2010). Trends like these
may be overlooked if books are analyzed by genre instead
of tags. Another possible avenue of research may be to ex-
amine popular combinations of tags within books. Find-
ings may help authors identify unique tag combinations to
spin fresh story plots.
With growing online activity, we believe that large, user-
generated datasets, as well as the ability to parse them ef-
fectively, can play an important role in the study of arts
and social sciences fields, such as literature, psychology,
and marketing.
The authors thank Dr. Michal Kosinski for his valuable
feedback, and for allowing us to crawl for
[1] M. Djikic, K. Oatley, S. Zoeterman, and J. B. Peterson, "On being
moved by art: How reading fiction transforms the self," Creativity Re-
search Journal, vol. 21, no. 1, pp. 2429, Feb. 2009.
[2] R. A. Mar, K. Oatley, and J. B. Peterson, "Exploring the link between
reading fiction and empathy: Ruling out individual differences and ex-
amining outcomes," Communications, vol. 34, no. 4, Jan. 2009.
[3] W. C. Tirre and S. Dixit, "Reading interests: Their dimensionality and
correlation with personality and cognitive factors," Personality and In-
dividual Differences, vol. 18, no. 6, pp. 731738, Jun. 1995.
[4] G. Kraaykamp and K. van Eijck, "Personality, media preferences, and
cultural participation," Personality and Individual Differences, vol. 38,
no. 7, pp. 16751688, May 2005.
[5] I. Cantador, I. Fernández-Tobías, and A. Bellogín, "Relating personal-
ity types with user preferences in multiple entertainment domains,"
CEUR Workshop Proceedings, vol. 997, 2013.
Fig. 3. Personality profiles of genres.
[6] P. Rentfrow, L. Goldberg, and R. Zilca, "Listening, watching, and read-
ing: The structure and correlates of entertainment preferences," Journal
of personality., vol. 79, no. 2, pp. 22358, Jul. 2010.
[7] M. Galesic and M. Bosnjak, "Effects of questionnaire length on partic-
ipation and indicators of response quality in a web survey," Public
Opinion Quarterly, vol. 73, no. 2, pp. 349360, Jan. 2009.
[8] M. Kosinski, D. Stillwell, T. Graepel, "Private traits and attributes are
predictable from digital records of human behavior," Proceedings of the
National Academy of Sciences, vol. 110, no. 15, pp. 58025805, Sep.
[9] P. T. Costa, R. R. McCrae, “The revised neo personality inventory
(NEO-PI-R),” The SAGE Handbook of Personality Theory and Assess-
ment, vol. 2, pp. 179198, 2008.
[10] J. B. Weaver, H.-B. Brosius, and N. Mundorf, "Personality and movie
preferences: A comparison of American and German audiences," Per-
sonality and Individual Differences, vol. 14, no. 2, pp. 307315, Feb.
[11] M. T. Soto-Sanfiel, L. Aymerich Franch, and E. Romero, "Personality
in interaction: How the big Five relate to the reception of interactive
narratives," Comunicación y sociedad = Communication & Society,
vol. 27, no. 3, pp. 151186, 2014.
[12] A. Furnham and J. Walker, "Personality and judgements of abstract,
pop art, and representational paintings," European Journal of Personal-
ity, vol. 15, no. 1, pp. 5772, Jan. 2001.
[13] I. C. McManus and A. Furnham, "Aesthetic activities and aesthetic at-
titudes: Influences of education, background and personality on interest
and involvement in the arts," British Journal of Psychology, vol. 97, no.
4, pp. 555587, Nov. 2006.
[14] J. C. Conway and A. M. Rubin, "Psychological predictors of television
viewing motivation," Communication Research, vol. 18, no. 4, pp. 443
463, Aug. 1991.
[15] A. S. Gerber, G. A. Huber, D. Doherty, and C. M. Dowling, "Personal-
ity traits and the consumption of political information," American Pol-
itics Research, vol. 39, no. 1, pp. 3284, Sep. 2010.
[16] L. R. Goldberg et al., "The international personality item pool and the
future of public-domain personality measures," Journal of Research in
Personality, vol. 40, no. 1, pp. 8496, Feb. 2006.
[17] S. C. Rife, K. L. Cate, M. Kosinski, and D. Stillwell, "Participant re-
cruitment and data collection through Facebook: The role of personality
factors," International Journal of Social Research Methodology, pp. 1
15, Sep. 2014.
[18] M. Kosinski, Y. Bachrach, P. Kohli, D. Stillwell, and T. Graepel, "Man-
ifestations of user personality in website choice and behaviour on online
social networks," Machine Learning, vol. 95, no. 3, pp. 357380, Jan.
[19] G. Park et al., "Automatic personality assessment through social media
language," Journal of Personality and Social Psychology, vol. 108, no.
6, pp. 934952, 2015.
[20] W. Youyou, M. Kosinski, and D. Stillwell, "Computer-based personal-
ity judgments are more accurate than those made by humans," Proceed-
ings of the National Academy of Sciences, vol. 112, no. 4, pp. 1036
1040, Jan. 2015.
[21] J. D. Choi and M. Palmer, "Fast and robust part-of-speech tagging using
dynamic model selection," Association for Computational Linguistics,
2012, pp. 363367.
[22] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, "OPTICS:
ordering points to identify the clustering structure," ACM SIGMOD
Record, vol. 28, no. 2, pp. 4960, Jan. 1999.
[23] J. Allik and R. R. McCrae, "Toward a geography of personality traits:
Patterns of profiles across 36 cultures," Journal of Cross-Cultural Psy-
chology, vol. 35, no. 1, pp. 1328, Jan. 2004.
[24] V. Saroglou, "Religion and the five factors of personality: A meta-ana-
lytic review," Personality and Individual Differences, vol. 32, no. 1, pp.
1525, Jan. 2002.
[25] D. P. Schmitt, A. Realo, M. Voracek, and J. Allik, "Why can’t a man
be more like a woman? Sex differences in big Five personality traits
across 55 cultures," Journal of Personality and Social Psychology, vol.
94, no. 1, pp. 168182, 2008.
[26] M. B. Donnellan and R. E. Lucas, "Age differences in the big five
across the life span: Evidence from two national samples," Psychology
and Aging, vol. 23, no. 3, pp. 558566, 2008.
[27] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Intro-
duction to Cluster Analysis (Wiley Series in Probability and Statistics).
John Wiley & Sons, 2008, ch. 2, pp. 68–125.
[28] D. P. Schmitt, J. Allik, R. R. McCrae, and V. Benet-Martinez, "The
geographic distribution of big Five personality traits: Patterns and pro-
files of human self-description across 56 nations," Journal of Cross-
Cultural Psychology, vol. 38, no. 2, pp. 173212, Mar. 2007.
[29] T. Ryan and S. Xenos, "Who uses Facebook? An investigation into the
relationship between the big Five, shyness, narcissism, loneliness, and
Facebook usage," Computers in Human Behavior, vol. 27, no. 5, pp.
16581664, Sep. 2011.
[30] H. Gangadharbatla, "Facebook me: Collective self-esteem, need to be-
long, and Internet self-efficacy as predictors of the igeneration’s atti-
tudes toward social networking sites," Journal of Interactive Advertis-
ing, vol. 8, no. 2, pp. 515, Mar. 2008.
[31] L. E. Buffardi and W. K. Campbell, "Narcissism and social networking
web sites," Personality and Social Psychology Bulletin, vol. 34, no. 10,
pp. 13031314, Jul. 2008.
Ng Annalyn was a research associate with Disney Research Pitts-
burgh and is currently employed by Singapore’s Ministry of Defence.
She received her M.Phil. degree in Psychology with the University of
Cambridge, where she mined consumer data for targeted advertising
and programmed cognitive tests for job recruitment with the Cam-
bridge Psychometrics Centre. She has a B.Sc. in Psychology and
Economics from the University of Michigan (Ann Arbor), where she
was also an undergraduate statistics tutor. Her research interests in-
clude machine learning applications in social sciences. She is the au-
thor of the book: “Numsense! Data Science for the Layman”.
Maarten W. Bos received his MS degree in Social Psychology from
the University of Amsterdam, and his PhD degree from the Radboud
University in The Netherlands. After a postdoctoral fellowship at the
Harvard Business School, he is currently a Research Scientist at Dis-
ney Research. His research interests include decision science and
behavioral economics. He has high impact publications in decision
science, and he is a member of the Society for Personality and Social
Psychology, the Association for Psychological Science, and the Soci-
ety for Judgment and Decision Making.
Leonid Sigal is a Senior Research Scientist at Disney Research Pitts-
burgh and an adjunct faculty at Carnegie Mellon University. Prior to
this he was a postdoctoral fellow in the Department of Computer Sci-
ence at University of Toronto. He completed his Ph.D. at Brown Uni-
versity in 2008; he received his B.Sc. degrees in Computer Sci- ence
and Mathematics from Boston University (1999), his M.A. from Boston
University (1999), and his M.S. from Brown University (2003). From
1999 to 2001, he worked as a senior vision engineer at Cognex Cor-
poration, where he developed industrial vision applications for pattern
analysis and verification. Leonid’s research interests mainly lie in the
areas of computer vision, machine learning, and computer graphics.
He has published more than 50 peer reviewed papers in venues and
journals in in these fields (including publications in PAMI, IJCV, CVPR,
ICCV, ECCV, NIPS, UAI, and ACM SIGGRAPH). His work received
the Best Paper Awards at the AMDO conference in 2006 / 2012 and
at WACV in 2014. He has also coedited the book Guide to Visual An-
alytics of Humans: Looking at People (Springer, 2011).
Boyang "Albert" Li is a Research Scientist at Disney Research,
where he directs the Narrative Intelligence group. He obtained his
Ph.D. in Computer Science from Georgia Institute of Technology in
2014, and his B. Eng. from Nanyang Technological University, Singa-
pore in 2008. His research interests include computational narrative
intelligence, or the creation of Artificial Intelligence that can under-
stand, craft, tell, direct, and respond appropriately to narratives, and
understanding how human cognition comprehends narratives and
produces narrative-related affects. He has authored and co-authored
more than 30 peer-reviewed papers in international journals and con-
... Finally, behavior-based APR detects the user's personality trait by analyzing behavioral patterns and associate them with relevant dominant traits. Annalyn et al. [14] studied the relationship content labels "tags" generated by users from, and match it with personality scores collected from Facebook users. ...
... Annalyn et al. [14] Investigated the relationship between user book preferences by analyzing labels "tags" generated by users, and match it with personality scores collected from Facebook users ...
Full-text available
With the emergence of personality computing as a new research field related to artificial intelligence and personality psychology, we have witnessed an unprecedented proliferation of personality-aware recommendation systems. Unlike conventional recommendation systems, these new systems solve traditional problems such as the cold start and data sparsity problems. This survey aims to study and systematically classify personality-aware recommendation systems. To the best of our knowledge, this survey is the first that focuses on personality-aware recommendation systems. We explore the different design choices of personality-aware recommendation systems, by comparing their personality modeling methods, as well as their recommendation techniques. Furthermore, we present the commonly used datasets and point out some of the challenges of personality-aware recommendation systems.
... Finally, behavior-based APR detects the user's personality trait by analyzing behavioral patterns and associate them with relevant dominant traits. Annalyn et al. [66] studied the relationship content labels "tags" generated by users from, and match it with personality scores collected from Facebook users. ...
... Analyzed and compared four machine learning models to investigate the relationship between user behavior on Facebook and big-five personality traits. Annalyn et al. [66] 2018 ...
Full-text available
With the emergence of personality computing as a new research field related to artificial intelligence and personality psychology, we have witnessed an unprecedented proliferation of personality-aware recommendation systems. Unlike conventional recommendation systems, these new systems solve traditional problems such as the cold start and data sparsity problems. This survey aims to study and systematically classify personality-aware recommendation systems. To the best of our knowledge, this survey is the first that focuses on personality-aware recommendation systems. We explore the different design choices of personality-aware recommendation systems, by comparing their personality modeling methods, as well as their recommendation techniques. Furthermore, we present the commonly used datasets and point out some of the challenges of personality-aware recommendation systems.
... The method and dataset that Kosinski et al. (2013) presented has subsequently been utilised by additional researchers, some of whom have used the dataset for different experiments (e.g. Boyd et al. 2015;Annalyn et al. 2018)-Sect. 3 will review some of these experiments in more detail. ...
... Study 16 Annalyn et al. (2018) also made use of the MyPersonality dataset, but focused on those "likes" that represented books. In combination with data mined from the book review site, ...
Full-text available
We explore the question of whether machines can infer information about our psychological traits or mental states by observing samples of our behaviour gathered from our online activities. Ongoing technical advances across a range of research communities indicate that machines are now able to access this information, but the extent to which this is possible and the consequent implications have not been well explored. We begin by highlighting the urgency of asking this question, and then explore its conceptual underpinnings, in order to help emphasise the relevant issues. To answer the question, we review a large number of empirical studies, in which samples of behaviour are used to automatically infer a range of psychological constructs, including affect and emotions, aptitudes and skills, attitudes and orientations (e.g. values and sexual orientation), personality, and disorders and conditions (e.g. depression and addiction). We also present a general perspective that can bring these disparate studies together and allow us to think clearly about their philosophical and ethical implications, such as issues related to consent, privacy, and the use of persuasive technologies for controlling human behaviour.
... In (Xiao and Gao 2020), several recommender systems are developed using various features of books and users on Goodreads. Personality detection based on tags added by users to their profiles was also performed on the data provided on this platform (Annalyn et al. 2018). This paper builds on the foundation of psychological studies showing the presence of a relationship between personality and reading habits. ...
Full-text available
What people choose to read is believed to be highly influenced by their cultural backgrounds and environment. Thus, understanding the cultural and socioeconomic factors that have significant relations with the reading preferences of nations can be of great importance for the book retail industry. These insights can have applications in recommender systems, sales predictions, and customer relationship management. Motivated by these values, we conducted a large-scale cross-country analysis of cultural similarities and differences based on book preferences using data collected from Goodreads. We use bipartite configuration model and community detection to explore the relationships between countries based on their book preferences. We also investigate the similarities between countries based on their favorite genres and authors. Additionally, this paper explores the significance of cultural, lingual, and socioeconomics factors on book preferences using quadratic assignment procedure statistical tests. Our results indicate that geographical distance, lingual distance, cognitive ability, intelligence quotient, and individualism are all significantly associated with book preference at the country level.
... A recent study explores the relationship between personality and content preferences of books [17]. ...
Personality is a unique trait which distinguish people from each other. It is a set of individual differences in thinking, feeling and behaving of people, and it affects interaction, relationships and environment of people. Personality can be useful to several tasks like education, training, marketing and personnel recruitment. Several methods to detect personality have been proposed and there are several psychological models proposing different personality dimensions. Previous research states that personality can be detected by means of text analysis. We have built a model for personality detection based on statistical analysis of language and DISC model. As fundamental components of the model, we built a linguistic corpus with personality annotations and a corpus of words related to personality. To build the model, we conducted a study where 120 individuals participated. The study consisted in filling a personality test and writing some paragraphs. We trained several machine learning algorithms with data from the study, and we found Sequential Minimal Optimization algorithm achieved best results in classification.
... An egogram [1] is a personality diagnosis method based on the theory of exchange analysis, and it identifies character types using the question paper method. This method is said to have high objectivity, reliability, and validity, and is utilized in various situations such as educational and medical approach [4][5][6][7][8][9]. ...
The aim of this research is to develop a search system for comics based on the personalities of appearing characters. For this purpose, this paper describes the classification of characters using egograms, which are used to classify personalities. In the proposed method, texts that express a comic book character's personality are acquired from web resources, and semantic vectors are allocated based on these texts using egograms. The resulting egogram pattern is used to estimate typical properties. Our experiment reveals that the performance accuracy of this classification method is 55.0%.
Previous online policy opinion analyses based on social media data have focused on topic detection and sentiment classification of policy opinion after a given period following policy implementation. These approaches are limited and inefficient because they provide no opportunity to change citizens’ opinions once they have been formed. Furthermore, incorporating auxiliary information to enrich semantic representations is vital and challenging due to limited texts, and a lack of both semantic information and strict syntactic structure. Therefore, we propose a novel framework to extract and integrate multidimensional features from user-related and policy-related social media information and predict policy comment polarity in the policy release phase. First, we construct four machine learning models for model-induced features to capture topic-related and opinion-related features and identify the policy-opinion nexus. In addition, we integrate basic and behavioral user features. Then, we leverage multidimensional features to construct a stacked learning model for predicting the policy opinion. Finally, we conduct experiments on 20 policy comment datasets to demonstrate that our prediction framework can effectively predict public opinion about a policy once it is released. Our model provides key insights into policy opinions in advance and can enable policymakers to engage in better policy communication before opinion formation.
Cascade prediction estimates the size or the state of a cascade from either microscope or macroscope. It is of paramount importance for understanding the information diffusion process such as the spread of rumors and the propagation of new technologies in social networks. Recently, instead of extracting hand-crafted features or embedding cascade sequences into feature vectors for cascade prediction, graph neural networks (GNNs) are introduced to utilize the network structure which governs the cascade effect. However, these models do not take into account social factors such as personality traits which drive human’s participation in the information diffusion process. In this work, we propose a novel multitask framework for enhancing cascade prediction with a personality recognition task. Specially, we design a general plug-and-play GNN gate, named PersonalityGate, to couple into existing GNN-based cascade prediction models to enhance their effectiveness and extract individuals’ personality traits jointly. Experimental results on two real-world datasets demonstrate the effectiveness of our proposed framework in enhancing GNN-based cascade prediction models and in predicting individuals’ personality traits as well.
The neoliberal transformation of social change communication is intricately tied to the foregrounding of communication technologies as the basis of change. Through various communicative inversions, technologies are projected as emancipatory tools of transformation, while simultaneously building new markets for transnational capital. This chapter works through the techno-seduction in communication for social change, critically interrogating the sites of power that consolidate control through the deployment of technologies. The frontiers of techno-capital, smart cities, nudge economics and artificial intelligence are critically interrogated, outlining the potential sites for culture-centered interventions in the contexts of the colonizing capacities of these technologies.
Conference Paper
Full-text available
We present a preliminary study on the relations between personality types and user preferences in multiple entertainment domains, namely movies, TV shows, music, and books. We analyze a total of 53,226 Facebook user profiles composed of both personality scores (openness, conscientiousness, extraversion, agreeableness, neuroticism) from the Five Factor model, and explicit interests about 16 genres in each of the above domains. As a result of our analysis, we extract personality-based user stereotypes and association rules for some of the considered domain genres, and infer similarities of personality types related to genres in different domains.
Full-text available
As participant recruitment and data collection over the Internet have become more common, numerous observers have expressed concern regarding the validity of research conducted in this fashion. One growing method of conducting research over the Internet involves recruiting participants and administering questionnaires over Facebook, the world’s largest social networking service. If Facebook is to be considered a viable platform for social research, it is necessary to demonstrate that Facebook users are sufficiently heterogeneous and that research conducted through Facebook is likely to produce results that can be generalized to a larger population. The present study examines these questions by comparing demographic and personality data collected over Facebook with data collected through a standalone website, and data collected from college undergraduates at two universities. Results indicate that statistically significant differences exist between Facebook data and the comparison data-sets, but since 80% of analyses exhibited partial η2 < .05, such differences are small or practically nonsignificant in magnitude. We conclude that Facebook is a viable research platform, and that recruiting Facebook users for research purposes is a promising avenue that offers numerous advantages over traditional samples.
Full-text available
Language use is a psychologically rich, stable individual difference with well-established correlations to personality. We describe a method for assessing personality using an open-vocabulary analysis of language from social media. We compiled the written language from 66,732 Facebook users and their questionnaire-based self-reported Big Five personality traits, and then we built a predictive model of personality based on their language. We used this model to predict the 5 personality factors in a separate sample of 4,824 Facebook users, examining (a) convergence with self-reports of personality at the domain- and facet-level; (b) discriminant validity between predictions of distinct traits; (c) agreement with informant reports of personality; (d) patterns of correlations with external criteria (e.g., number of friends, political attitudes, impulsiveness); and (e) test-retest reliability over 6-month intervals. Results indicated that language-based assessments can constitute valid personality measures: they agreed with self-reports and informant reports of personality, added incremental validity over informant reports, adequately discriminated between traits, exhibited patterns of correlations with external criteria similar to those found with self-reported personality, and were stable over 6-month intervals. Analysis of predictive language can provide rich portraits of the mental life associated with traits. This approach can complement and extend traditional methods, providing researchers with an additional measure that can quickly and cheaply assess large groups of participants with minimal burden. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Full-text available
In this study, we explore how users’ personalities affect their responses to interactive narratives. In particular, we analyze the relationship between personality traits and relevant variables in narrative reception: identification with characters, enjoyment, self-perceived physiological sensations, emotional experience and content. Experimental participants (N=310) answered the NEOFFI personality questionnaire and watched a movie in one of four experimental conditions that combined modality (interactive vs. linear) and content (happy vs. tragic end). Results suggest that personality traits influence users’ responses to fiction and interactivity.
Conference Paper
Full-text available
This paper presents a novel way of improving POS tagging on heterogeneous data. First, two separate models are trained (generalized and domain-specific) from the same data set by controlling lexical items with different document frequencies. During decoding, one of the models is selected dynamically given the cosine similarity between each sentence and the training data. This dynamic model selection approach, coupled with a one-pass, left-to-right POS tagging algorithm, is evaluated on corpora from seven different genres. Even with this simple tagging algorithm, our system shows comparable results against other state-of-the-art systems, and gives higher accuracies when evaluated on a mixture of the data. Furthermore, our system is able to tag about 32K tokens per second. We believe that this model selection approach can be applied to more sophisticated tagging algorithms and improve their robustness even further.
Full-text available
The Big Five Inventory (BFI) is a self-report measure designed to assess the high-order personality traits of Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. As part of the International Sexuality Description Project, the BFI was translated from English into 28 languages and administered to 17,837 individuals from 56 nations. The resulting cross-cultural data set was used to address three main questions: Does the factor structure of the English BFI fully replicate across cultures? How valid are the BFI trait profiles of individual nations? And how are personality traits distributed throughout the world? The five-dimensional structure was robust across major regions of the world. Trait levels were related in predictable ways to self-esteem, sociosexuality, and national personality profiles. People from the geographic regions of South America and East Asia were significantly different in open- ness from those inhabiting other world regions. The discussion focuses on limitations of the current data set and important directions for future research.
Full-text available
Within the user-generated content sites, the role and growth of social networking sites has been undeniably overwhelming. Social networking sites (SNS) generate millions of dollars in revenue and advertising, yet little is known about why college students join and participate in these sites, which allow users to create their own content or space. This study adopts survey methodology to investigate the influence of college students' level of Internet self-efficacy, need to belong, need for cognition, and collective self-esteem on their attitude toward SNS. Internet self-efficacy, need to belong, and collective self-esteem all have positive effects on attitudes toward SNS. Furthermore, attitude toward SNS mediates the relationship between willingness to join SNS and (1) Internet self-efficacy and (2) need to belong, and the mediation is only partial between willingness to join and collective self-esteem. The author also draws managerial implications.
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "Cluster analysis is the increasingly important and practical subject of finding groupings in data. The authors set out to write a book for the user who does not necessarily have an extensive background in mathematics. They succeed very well." textemdash}Mathematical Reviews "Finding Groups in Data [is] a clear, readable, and interesting presentation of a small number of clustering methods. In addition, the book introduced some interesting innovations of applied value to clustering literature." textemdash{Journal of Classification "This is a very good, easy-to-read, and practical book. It has many nice features and is highly recommended for students and practitioners in various fields of study." textemdashTechnometrics An introduction to the practical application of cluster analysis, this text presents a selection of methods that together can deal with most applications. These methods are chosen for their robustness, consistency, and general applicability. This book discusses various types of data, including interval-scaled and binary variables as well as similarity data, and explains how these can be transformed prior to clustering.
Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.