ArticlePDF Available

Mass-scale emotionality reveals human behaviour and marketplace success


Abstract and Figures

Online reviews promise to provide people with immediate access to the wisdom of the crowds. Yet, half of all reviews on Amazon and Yelp provide the most positive rating possible, despite human behaviour being substantially more varied in nature. We term the challenge of discerning success within this sea of positive ratings the ‘positivity problem’. Positivity, however, is only one facet of individuals’ opinions. We propose that one solution to the positivity problem lies with the emotionality of people’s opinions. Using computational linguistics, we predict the box office revenue of nearly 2,400 movies, sales of 1.6 million books, new brand followers across two years of Super Bowl commercials, and real-world reservations at over 1,000 restaurants. Whereas star ratings are an unreliable predictor of success, emotionality from the very same reviews offers a consistent diagnostic signal. More emotional language was associated with more subsequent success. Rocklage et al. find a positivity problem: 80% or more of online ratings are positive and are unreliable predictors of success. As an alternative, mass-scale emotion predicts behaviour towards and success of movies, books, commercials and restaurants.
This content is subject to copyright. Terms and conditions apply.
1College of Management, University of Massachusetts, Boston, MA, USA. 2Kellogg School of Management, Northwestern University, Evanston, IL, USA.
People have always looked to and relied on the opinions of
those around them to make decisions1,2. Now, the rise and pro-
liferation of online crowd-sourced platforms, such as Yelp and
Glassdoor, have fundamentally transformed the scope and speed
with which people can harness others’ assessments. Given their
scale, openness and availability, these platforms promise to facilitate
people’s ability to find the best option3,4. Indeed, rather than rely on
trial and error or small, informal networks, people have immedi-
ate access to the experience and wisdom of crowds. In the case of
movies and restaurants, for instance, this aggregated wisdom should
help quickly identify success—those items that have thrived and
become popular. For most platforms, the primary means to identify
successful goods is through an aggregated ‘star rating’—a numeric
rating that measures the extent to which people’s opinions are rela-
tively positive versus negative.
Perhaps surprisingly, a striking limitation of these online rating
systems has emerged: reviews are overwhelmingly positive5. On, for example, the average star rating is approximately
4.2 out of 5, with well over half of the reviews being 5-star ratings6,7.
Nearly half of all Yelp reviews are 5-star ratings8, and recent research
indicates that nearly 90% of Uber ratings may be 5 stars9. A visual
representation of most online ratings reveals a J-shaped distribution,
with many 4- and 5-star ratings, a few 1-star ratings and few ratings
in between5. The degree of overwhelming positivity suggests that
individuals are often confronted with choosing between numerous
items with similar star ratings, especially given that people will not
even consider options that garner less than a 3-star rating.
A principal problem with this degree of positivity is that the rat-
ings themselves may ultimately be an unreliable indicator of the
success of that item and the human behaviour that underlies this
success (for example, restaurant reservations). Specifically, two
items might receive nearly identical ratings but vary vastly in their
success. Indeed, past research has shown substantial variability
in the link between the positivity of individuals’ ratings and suc-
cess1012. For example, the positivity of online ratings shows little
association with the underlying quality of products and fails to pre-
dict their resale value13. Moreover, an analysis of over 400 movies
revealed that greater positivity in online ratings was associated with
fewer people attending a movie, as evidenced by lower box office
revenue14. This problem has even led companies such as Netflix to
abandon standard rating systems due to their poor performance15.
Put simply, these ratings seem not to hold the wisdom that people
believe they do.
Across disciplines, behavioural scientists are beginning to recog-
nize the problematic nature of these ratings. That is, given this large
degree of positivity, a number of cases exist where items receive a
similarly positive rating. Yet, when it comes to human behaviour,
substantial differences exist—not all 5-star restaurants are equally
popular. The high degree of positivity effectively makes the ratings
ineffective signals for discriminating what are likely to be the best or
most successful options. We label this challenge to discern success
within the mass of positive ratings the ‘positivity problem.
Although quantitative ratings are the most salient and accessible
output of online reviews, most crowd-sourced platforms include a
written portion where people provide qualitative assessments. As
technology has improved, researchers have embraced computa-
tional social science techniques to quantify these qualitative assess-
ments. Perhaps the most common method to analyse text in this
way is via sentiment analysis, which most often quantifies language
in terms of its positivity16. Some words suggest greater favourability
(for example, the word ‘liked’), whereas others suggest greater nega-
tivity (for example, ‘disliked’).
Computational social science has focused primarily on the
positivity (also known as valence) of peoples attitudes17. Relatively
few efforts have sought to quantify aspects of individuals’ attitudes
beyond positivity18. Nevertheless, social psychologists have long
acknowledged that positivity is not always a reliable predictor of
behaviour17,19. To address the limitations of positivity, scholars have
introduced and explored additional facets of an attitude that can
improve its predictive ability17,20. One such facet is the emotionality
of an attitude—the extent to which an attitude is based on individu-
als’ feelings or emotional reactions2124. Positivity and emotionality
are conceptually and empirically distinct. For example, the words
enjoyable’ and ‘impeccable’ imply very similar levels of positivity,
but research indicates that the word ‘enjoyable’ is likely to be indica-
tive of a more emotional attitude than the word ‘impeccable’25.
Mass-scale emotionality reveals human behaviour
and marketplace success
Matthew D. Rocklage 1 ✉ , Derek D. Rucker2 and Loran F. Nordgren2
Online reviews promise to provide people with immediate access to the wisdom of the crowds. Yet, half of all reviews on Amazon
and Yelp provide the most positive rating possible, despite human behaviour being substantially more varied in nature. We term
the challenge of discerning success within this sea of positive ratings the ‘positivity problem’. Positivity, however, is only one
facet of individuals’ opinions. We propose that one solution to the positivity problem lies with the emotionality of people’s opin-
ions. Using computational linguistics, we predict the box office revenue of nearly 2,400 movies, sales of 1.6 million books, new
brand followers across two years of Super Bowl commercials, and real-world reservations at over 1,000 restaurants. Whereas
star ratings are an unreliable predictor of success, emotionality from the very same reviews offers a consistent diagnostic sig-
nal. More emotional language was associated with more subsequent success.
Articles NAtUre HUMAN BeHAviOUr
Moreover, the emotionality of individuals’ attitudes can now be cap-
tured via text analysis25,26.
Attitudes based more on emotion tend to be stronger and more
predictive of behaviour. In the political domain, voters’ emotional
reactions to a political candidate—compared with their more cogni-
tive reactions—were better predictors of future voting behaviour27.
Attitudes based more on emotion also tend to come to mind more
quickly28, are more extreme25,26 and are more consistent across con-
texts29 and time30. One reason for this relationship is that emotions
provide individuals themselves with an indication that something
especially impactful has occurred31,32, and they can thereby act as
a particularly clear signal to individuals regarding their own atti-
tude28,33,34. This strong signal, in turn, can lead attitudes to be held
more strongly in memory28, which is an established predictor of the
impact and durability of an attitude17,35.
Outward displays of emotion also signal the importance of one’s
attitude to others. The social–functional approach to emotion
puts forth that a primary function of emotion is to communicate
the strength of one’s attitudes, desires and intentions3638. As social
animals, understanding others’ goals and intentions is vital for suc-
cessful social coordination. Displays of joy and anger, for instance,
provide others with strong signals regarding a person’s state of
mind, goals and priorities. In the context of negotiation, expressions
of happiness signal that one is open to concession, whereas displays
of anger signal that one is unlikely to compromise39,40. These find-
ings indicate that when humans use emotion online, it is probably a
signal that an experience was particularly impactful to them.
Taken together, research suggests that attitudes based on emo-
tion are stronger and more predictive of one’s own behaviour, and
that people use emotion to communicate the impact of their experi-
ence to others. The consequence is that emotionality in text may be
more indicative of the success of a product or service. To illustrate,
consider a restaurant. From an attitudinal perspective, the ability of
a restaurant to elicit a positive, emotional, feelings-based reaction
is likely to lead to a more strongly held attitude in the individual.
This stronger attitude could lead that restaurant to come to mind
more frequently in the future and lead the individual to be more
likely to visit again. From a social–functional perspective, individu-
als’ emotional reactions may also signal to others just how impactful
an experience was and thereby generate more attention for that res-
taurant from others. Thus, for both these reasons, more emotional
language may be able to predict success where star ratings cannot.
In short, we argue that capturing the emotionality expressed in
online reviews may offer one solution to the positivity problem.
More specifically, we hypothesize that the emotionality of people’s
online reviews can predict success and the mass-scale human behav-
iour that underlies this success where aggregated online ratings do
not. In providing evidence of both the positivity problem and the
relationship between emotionality and mass-scale human behav-
iour across multiple domains, we aim to accomplish two objec-
tives. First, we demonstrate the breadth of the positivity problem.
Second, we offer one solution to this problem using a theory-based
approach. In doing so, this work also advances our understanding
of emotionality—a construct considered of great importance across
the social sciences32—by revealing that it has the ability to predict
mass-scale behaviour and marketplace success.
Study 1. In Study 1, we predicted human behaviour and success
in the movie industry in the form of box office revenue earned in
the United States. We obtained all online reviews for all movies
from from 2005 to 2018—13 years of data—and
used the first 30 user reviews written for each movie to measure
the movie’s star rating (0 to 10 stars) and text emotionality. We
also measured the valence (that is, positivity) of the text to assess
the unique contribution of emotionality. We selected the first 30
reviews for two reasons. First, using the first reviews written for a
movie helped avoid a situation where the success of the movie is
already known by reviewers, which can influence how individu-
als write about the movie41. Second, this approach helped ensure
that reviewers were expressing their own opinions as opposed
to echoing the consensus viewpoint of others. Prior work indi-
cates that early reviews can systematically bias subsequent post-
ing behaviour both in the real world and in well-controlled
laboratory experiments42,43. By using early reviews, we sought to
avoid these influences. Moreover, we used this same number of
reviews consistently in all applicable studies. These results were
also robust when using an alternative number of reviews (that is,
the first 40 reviews) and when using all possible reviews (see the
Supplementary Results for Study 1).
Across all studies, we used the Evaluative Lexicon (www.evalu- to quantify the average valence and emotional-
ity expressed25,26. Specifically, the Evaluative Lexicon measures the
valence and emotionality implied by the words that individuals
use (for example, ‘amazing’ or ‘enjoyable’). It has been directly vali-
dated as a measure of the valence and emotionality of individuals’
attitudes with both well-controlled laboratory experiments and
real-world naturalistic text25,26. While past work using the Evaluative
Lexicon has focused on the relationship between emotionality and
star ratings25,26, that work did not examine emotionality’s unique
relation with mass-scale behaviour, above and beyond emotional-
ity’s connection with star ratings. As overviewed earlier, while there
is a relationship between emotionality and individuals’ positivity,
these are separable constructs. Unless noted otherwise, all results
across studies utilize multiple regression with standardized coef-
ficients (B), log-transformed dependent variables and two-tailed
significance tests.
As evidence for the large number of positive ratings on this plat-
form, 81% of movies were rated positively (that is, they received an
average star rating above the midpoint of 5 stars). Given that our
aim is to predict success and human behaviour within a sea of posi-
tive reviews, our analyses examined whether emotionality was pre-
dictive of box office revenue for movies that were judged as positive
(those rated above 5 stars on average). There were 2,383 movies.
We first assessed whether the movie’s average star rating was
predictive of its box office revenue. A movie’s star rating was pre-
dictive of a movie making less box office revenue (B = 0.08;
t(2,381) = 3.24; P = 0.001; 95% confidence interval (CI), (0.136,
0.033)). When all movies were included—even those with an ini-
tial negative rating—star ratings were not significantly predictive
of box office revenue (B = 0.004; t(2,931) = 0.15; P = 0.88; 95% CI,
(0.043, 0.050)).
We then added the average emotionality of the reviews’ text
to this same model and the average text valence as a control. Star
ratings continued to be a significant negative predictor of the
movie’s box office revenue (B = 0.13; t(2,379) = 3.86; P < 0.001;
95% CI, (0.193, 0.063); Fig. 1, left panel), and text valence was
in the positive direction but ultimately non-significant (B = 0.06;
t(2,379) = 1.78; P = 0.07; 95% CI, (0.006, 0.124)). Of the great-
est importance, beyond these effects, emotionality was a sig-
nificant positive predictor of future box office revenue (B = 0.08;
t(2,379) = 3.01; P = 0.003; 95% CI, (0.027, 0.130); Fig. 1, right panel).
These results hold when controlling for (1) movie genre, (2) the
year the movie was released, (3) the length of the movie, (4) the
budget of the movie and (5) the arousal implied by the text as mea-
sured by the word list in Warriner et al.18. Regarding the arousal
of the text, although arousal and emotionality are related, arousal
refers to energy level, whereas the emotionality of an attitude is the
extent to which that attitude is based on emotions or feelings25,44.
Emotionality can be high or low in arousal. For example, the adjec-
tives ‘exciting’ and ‘lovable’ imply similar levels of emotionality but
higher or lower levels of arousal, respectively. Research has shown
that emotionality and arousal are separable in online reviews25.
Emotionality is thus a measure of whether a movie was able to elicit
a feeling or emotional reaction (for example, a movie as ‘inspira-
tional’, ‘enchanting’ or ‘adorable’) rather than how ‘exciting’ that
movie was.
To summarize, whereas the effects of star rating were incon-
sistent across these models, emotionality was a consistent positive
predictor of box office revenue (Supplementary Table 2). Finally,
emotionality was also a significant predictor when not controlling
for any additional variables (B = 0.07; t(2,381) = 2.72; P = 0.007; 95%
CI, (0.020, 0.122); see the Supplementary Results for Study 1 for the
details of all robustness analyses).
Study 2. In Study 2, we generalized these results to a new domain.
Specifically, we predicted the success of all books on
from 1995 to 2015 (20 years of data). We again used the first 30
reviews for each book to index the book’s star rating (1 to 5 stars),
text valence and text emotionality. The results that follow also hold
when using an alternative cut-off (that is, the first 40 reviews) and
when using all possible reviews (see the Supplementary Results for
Study 2). We measured the success of each book on the basis of the
number of verified purchases it accrued over time.
A full 91% of the books received a positive rating by falling above
the midpoint of the star rating scale (3 stars). There were 1.6 million
positively rated books.
The regression results with average star ratings were mixed.
Aggregated ratings were a negative predictor of the number of
book purchases (B = 0.047; t(1,576,840) = 164.60; P < 0.001;
95% CI, (0.047, 0.046)). When books rated as negative were
also included, positive star ratings were significantly predictive of
more purchases (B = 0.015; t(1,727,821) = 57.54; P < 0.001; 95% CI,
(0.015, 0.016)). However, the overall evidence here was mixed, as
star ratings were non-significant or negative predictors in 1/3 of
book genres (Supplementary Table 4).
Analysing positive books, we then predicted the book’s pur-
chases on the basis of that book’s average star rating and emotional-
ity. As in Study 1, we included text valence as a control. The average
star rating was a negative predictor of purchases (B = 0.057;
t(1,576,838) = 189.25; P < 0.001; 95% CI, (0.058, 0.057)), and the
valence of the text was a significant positive predictor (B = 0.024;
t(1,576,838) = 78.28; P < 0.001; 95% CI, (0.024, 0.025)). Beyond
these effects, greater emotionality of the first 30 reviews predicted
greater purchases (B = 0.017; t(1,576,838) = 56.47; P < 0.001; 95%
CI, (0.016, 0.017)). Moreover, greater emotionality was predictive of
more book purchases in 93% of genres.
We also conducted robustness analyses controlling for (1) book
genre, (2) the year the book was released and (3) the arousal implied
by the review text. All primary results replicated (Supplementary
Table 5). Finally, emotionality was also a significant predic-
tor when not controlling for any additional variables (B = 0.016;
t(1,576,840) = 54.87; P < 0.001; 95% CI, (0.015, 0.016); see the
Supplementary Results for Study 2 for the details of all robustness
Study 3. Study 3 examined whether the emotionality of real-time
tweets in response to television commercials predicted success and
human behaviour in the form of daily new followers of a brand.
For both the 2016 and 2017 Super Bowls, we obtained all real-time
tweets that occurred on the day of that Super Bowl that referenced
a commercial shown during the Super Bowl. There were 94 com-
mercials across 84 businesses and a total of 187,206 tweets about
these commercials. We then used the Evaluative Lexicon to quantify
the average valence and emotionality expressed towards each com-
mercial across the tweets.
For the ratings of each commercial, we used the results from USA
Today’s Ad Meter survey, which is the most popular set of Super
Bowl ratings45. The Ad Meter survey specified to respondents that
ratings between 1 and 3 indicate a ‘poor’ commercial, between 4 and
7 a ‘good’ commercial, and between 8 and 10 an ‘excellent’ commer-
cial. Though the final number of survey participants is not disclosed
by USA Today, they indicate the panel to be in the thousands46.
We predicted the average number of daily new followers each
company obtained on Facebook in the two weeks after the Super
Bowl. This number of new followers reflects the number of individ-
uals who became interested in learning more about company and its
general offerings and took active steps to interact with that company.
Because each company has only a single Facebook page, we aggre-
gated the Twitter and ratings data at the level of each company by
averaging across that company’s commercials for each Super Bowl
(n = 84). Given that our analysis emphasized the change in new fol-
lowers that a company accrued after the Super Bowl, we controlled
for the average number of daily new followers each company gained
prior to the Super Bowl (see the Supplementary Methods for Study
3 for additional details).
The USA Today scale explicitly specifies ‘good’ commercials as
those above 3 on the scale. Thus, unlike the rating scales in Studies
1 and 2 where we counted a movie or book as positively rated if it
fell above the midpoint of the scale, using the midpoint of the USA
Today scale would not capture all of the positive commercials. We
therefore included commercials that earned a ‘good’ rating or higher
Star rating
log(box office revenue)
0 5 6 7 8 9 10
Fig. 1 | Predicting movie revenue. Scatter plots, best-fit lines and 95% CIs predicting each movie’s total US box office revenue (US dollars, log transformed)
from Metacritic star ratings (left) and emotionality (right; possible range: 0 to 9). The scatter points are the raw data and thus not adjusted for covariates.
Articles NAtUre HUMAN BeHAviOUr
(that is, above 3). In fact, 100% of commercials were rated as ‘good’
or higher across both Super Bowls. Thus, we used all observations.
We again began with a regression model that included each com-
mercial’s average USA Today rating to predict the average daily new
Facebook followers that a company gained in the two weeks after
the Super Bowl. We additionally controlled for the average daily
new Facebook followers (log transformed) that the company gained
prior to the Super Bowl to assess change. The number of followers
that a company accrued before the Super Bowl predicted the fol-
lowers they accrued after the Super Bowl (B = 0.15; t(81) = 14.57;
P < 0.001; 95% CI, (0.131, 0.171)), but the USA Today rating was
not predictive of followers (B = 0.01; t(81) = 1.39; P = 0.17; 95% CI,
(0.006, 0.033)).
We then added the average emotionality of the tweets for each
commercial as our primary predictor and the average valence as
a control. The average USA Today rating (B = 0.02; t(79) = 1.59;
P = 0.12; 95% CI, (0.004, 0.039)) and valence of the tweets were not
predictive of the number of new followers (B = 0.02; t(79) = 1.49;
P = 0.14; 95% CI, (0.039, 0.005)). However, beyond these effects,
the greater the emotionality of the tweets about a commercial, the
more Facebook followers a company accrued over the next two
weeks (B = 0.02; t(79) = 2.38; P = 0.02; 95% CI, (0.004, 0.042)).
Past research has indicated that the relative number of positive
versus negative tweets can be predictive of different outcomes47,48.
We therefore also included this metric as a test of the robustness
of the effects. Conceptually replicating previous research, the
greater the number of positive (minus negative) tweets a commer-
cial received, the more followers the company gained (B = 0.03;
t(78) = 2.62; P = 0.01; 95% CI, (0.007, 0.051)). As before, the USA
Today rating was not predictive (B = 0.01; t(78) = 0.53; P = 0.59; 95%
CI, (0.016, 0.028)), and the average valence of the tweets became
a negative predictor of new followers (B = 0.02; t(78) = 2.08;
P = 0.04; 95% CI, (0.045, 0.001)). Beyond these effects, greater
emotionality once again predicted a greater number of new follow-
ers (B = 0.02; t(78) = 2.66; P = 0.009; 95% CI, (0.007, 0.043)).
In additional robustness analyses, we controlled for (1) the num-
ber of commercials a company showed, (2) the quarter in the game
when the commercial was advertised and (3) the arousal implied
by the tweets. All effects were similar (Supplementary Table 7).
Moreover, the effects were consistent across both Super Bowls.
Emotionality was also a significant predictor when controlling only
for the average daily new Facebook followers each company gained
prior to the Super Bowl (B = 0.02; t(81) = 2.45; P = 0.016; 95% CI,
(0.005, 0.042); see the Supplementary Results for Study 3 for the
details of all robustness analyses).
Study 4. In Study 4, we examined success and human behaviour
in the form of table reservations for restaurants on the basis of the
first 30 reviews for all restaurants that existed in Chicago,
Illinois, as of 2017. We used these reviews to index each restaurant’s
average star rating (1 to 5 stars), text valence and text emotionality.
The results also hold when using an alternative number of reviews
(that is, the first 40 reviews) and when using all possible reviews (see
the Supplementary Results for Study 4). We examined the average
daily table reservations across a two-month period on OpenTable.
com—the most popular online table reservation service in the
United States. Across this two-month period, there were 1.30 mil-
lion table reservations (see the Supplementary Methods for Study 4
for additional details).
On Yelp, restaurants are rated on a 5-point star rating scale. As
evidence for the large number of positive reviews, 92% of restau-
rants received an average star rating that was above the midpoint of
3 stars. We used the restaurants falling above this midpoint. There
were 1,052 restaurants.
Unlike prior studies, the average star rating was predictive of
more table reservations (B = 0.05; t(1,050) = 3.06; P = 0.002; 95% CI,
(0.019, 0.085)). This outcome was the same when including even
negatively rated restaurants (B = 0.08; t(1137) = 4.97; P < 0.001;
95% CI, (0.049, 0.112); see the Supplementary Results for Study 4).
This positive predictive effect of star ratings allows us to examine
whether emotionality continues to be a unique predictor even when
ratings are initially in the positive direction.
We then added the average emotionality of the restaurant’s first
30 reviews as well as the average valence to the model. The aver-
age star rating fell to non-significance (B = 0.03; t(1,048) = 0.97;
P = 0.33; 95% CI, (0.089, 0.030); Fig. 2, left panel), and text valence
was a positive predictor (B = 0.08; t(1,048) = 2.76; P = 0.006; 95% CI,
(0.024, 0.143)). Beyond these effects, restaurants that elicited more
emotion were associated with more table reservations (B = 0.06;
t(1,048) = 3.39; P < 0.001; 95% CI, (0.025, 0.092); Fig. 2, right panel).
We conducted additional analyses to assess the robustness of our
findings. Specifically, we controlled for (1) how well-established the
restaurant is as indexed by the relative number of years the restau-
rant has been open, (2) the neighbourhood where the restaurant is
located, (3) the cuisine of the restaurant (for example, American,
Indian or seafood), (4) the average price of a meal at the restaurant
and (5) the arousal of the text. Again, an individual can use words
that convey an emotional attitude (for example, describing a res-
taurant and its food as ‘enjoyable’, ‘comforting’ or ‘alluring’), inde-
pendent of whether it fosters high or low arousal in that individual.
We found that, across these analyses, emotionality was a significant
Star rating
log(average times booked each day)
0 3 4 5
0 4.0 4.4 4.8 5.2
Fig. 2 | Predicting restaurant table reservations. Scatter plots, best-fit lines and 95% CIs predicting each restaurant’s table reservations from Yelp star
ratings (left) and emotionality (right). The scatter points are the raw data and thus not adjusted for covariates.
predictor, whereas the star rating was not (Supplementary Table 9).
Finally, emotionality was again a significant predictor when not
controlling for additional variables (B = 0.07; t(1,050) = 4.12;
P < 0.001; 95% CI, (0.036, 0.102); see the Supplementary Results for
Study 4 for the details of all robustness analyses).
Across four large-scale studies, we demonstrate that anywhere from
80% to 100% of ratings were positive. The challenge of discerning
success and how people will behave in this sea of positive ratings is
what we term the positivity problem.
Reflecting this problem, the current research indicates that mov-
ies, books, commercials and restaurants that receive similar ratings
often do not have similar levels of success. Throughout our studies,
online ratings tended to provide an unreliable signal of behaviour
towards, and thus success of, a large range of items. As one solu-
tion to this problem, we examined whether emotionality assessed
on a massive scale using computational linguistics provided a more
diagnostic signal. We found that emotionality predicted behaviour
across diverse items and several distinct sources—from Metacritic,
Amazon, Twitter, Yelp, Facebook and OpenTable.
This work has implications for work on online ratings and
discerning the aggregated wisdom from these ratings. In line
with past research, the current work further calls into question
the utility of star ratings for assessing and understanding human
behaviour and ultimately success. Research has indicated that the
predictive ability of star ratings is at best variable1012 and at worst
not at all or even negatively predictive of behaviour and success14.
In the current work, we demonstrate similar outcomes: increas-
ingly positive ratings were commonly non-diagnostic of success.
Moreover, we demonstrate these outcomes across a wide range
of items and online platforms. As we show, one solution to this
problem is for people and organizations to pay greater attention
to the emotionality of individuals’ attitudes. One possibility is
that organizations could consider aggregating reviewers’ language
and providing an ‘emotional star rating’ to provide more mean-
ingful assessments to individuals. Future research could explore
whether star ratings can be fruitfully replaced with other, more
predictive metrics.
The aim of this research is to demonstrate the positivity problem
and the predictive ability of emotionality as one solution. As such,
one limitation to the current work is that we did not identify the
mechanism behind emotionality’s predictive ability. This research
thus provides a springboard for future work where researchers can
delve further into illuminating the paths through which emotional-
ity is able to predict human behaviour. As noted earlier, attitudes
based more on emotion tend to be stronger and more consistent
across contexts and time27,29,30,49. One reason for these outcomes is
that these attitudes tend to be stored more strongly in memory28.
Stronger links in memory predict what individuals think about and
what captures their attention in their environment, thereby pro-
viding a general guide for behaviour17,35,50. Thus, when individuals
consider which restaurant to frequent, website to visit or movie to
see again, attitudes based more on emotion are less likely to have
changed, more likely to come to mind and consequently more likely
to guide behaviour.
Additional work could explore whether attitudes based more on
emotion also affect success by increasing individuals’ propensity
to spread information via word of mouth. This may happen either
spontaneously or when individuals are directly asked for recom-
mendations. In the former case, attitudes based on emotion may
come to mind with relatively little prodding and lead individuals to
spontaneously think of and talk to others about an item. In the latter
case, when asked for a recommendation, individuals may think of
and recommend an emotion-evoking item first, given its stronger
link in memory. In line with this possibility, prior research indicates
that emotion-evoking news articles are generally more likely to be
shared with others51. Future research could explore this potential
implication of attitudes based on emotion.
We show that emotionality offers one means to solve the posi-
tivity problem, but if maximizing predictive accuracy is one’s final
goal, a second limitation of this work is that we did not maximize
predictive ability, and other solutions are possible. For example, one
approach would be to use machine learning to predict success in an
effort to maximize accuracy. However, the present approach bene-
fits from offering a theory-based solution to the positivity problem.
Indeed, machine learning is powerful in its predictive ability but
often does not provide a clear understanding of the underlying con-
structs that help provide this accuracy52. We show that emotionality,
considered of great importance across the behavioural sciences, is
predictive. In doing so, we also provide a conceptual advance to the
study of emotion itself. We show that mass-scale emotion can pre-
dict behaviour and marketplace success.
Whereas most past work on sentiment analysis has focused
on valence, the current work builds on theorizing and empiri-
cal findings in the attitudes and affective science literatures to
put forth emotionality as a unique diagnostic signal. Though the
words ‘enjoyable’ and ‘impeccable’ indicate similar levels of posi-
tivity (valence), they signal higher or lower levels of emotionality,
respectively. Through the current research, it is our hope to urge
researchers to assess factors outside of valence in the endeavour
to understand mass-scale sentiment and to use it to address issues
such as the positivity problem.
Study 1. We obtained all of the online user reviews for all movies from Metacritic.
com from 2005 to 2018 using Python v.2.7 (ref. 53) in consultation with the site
owners regarding the use of the data. We began with movies released in 2005
because this was the rst year in which there was a meaningful number of user
reviews on the platform.
We used the first 30 reviews for each movie to measure the movies star rating
(0 to 10 stars), text valence and text emotionality. We quantified text valence and
emotionality using the Evaluative Lexicon25. Some movies garnered fewer than 30
reviews, so we used the maximum number of reviews possible for these movies.
As a robustness analysis, we controlled for the number of initial reviews for each
movie, and the results replicate. The results also replicate when focusing only on
those movies that garnered at least 30 reviews.
We measured the success of movies using the box office revenue for each movie
(total United States box office revenue). See the Supplementary Results for Study 1
for more detail.
Study 2. We obtained all book reviews from from its beginning in
1995 until 2015 and used those books that had an identified genre. These reviews
are publicly available for download54,55. We used the first 30 reviews for each book
to measure the book’s star rating (1 to 5 stars), text valence and text emotionality.
We quantified text valence and emotionality using the Evaluative Lexicon.
We measured the success of each book by the number of verified purchases that
book had. See the Supplementary Results for Study 2 for more detail.
Study 3. We obtained all the tweets associated with Super Bowl commercials
from both the 2016 and 2017 Super Bowls using Python v.2.7 and in line with the
terms of use. We used tweets that occurred in real time on the day of each Super
Bowl, that mentioned the name of the company or an affiliated keyword, and that
referenced either the Super Bowl or a commercial. This helped ensure that the
tweets were about the target commercials (see the Supplementary Methods for
Study 3 for additional detail).
Given that Facebook did not provide easy access to long-term historical data
for companies’ Facebook pages, we began to collect the number of followers from
each company’s Facebook page in real time as soon as that company announced
it would be advertising during the Super Bowl. This was done manually and in
line with the terms of use. We used the Facebook page that corresponded to the
most salient brand or company advertised in each commercial. As the Super
Bowl is primarily viewed by those in the United States, we used the Facebook
page specifically affiliated with the United States (for example, mercedesbenzusa)
as opposed to its worldwide Facebook page (for example, mercedesbenz). We
obtained an average of 21.85 days of daily new followers for each company
before the 2016 Super Bowl (s.d. = 7.83) and 16.05 days for the 2017 Super Bowl
(s.d. = 10.73). Capturing these pre–Super Bowl data was imperative to assess the
change in the average number of followers for each company after the Super Bowl.
Articles NAtUre HUMAN BeHAviOUr
We then continued to extract the daily number of new followers for each
company for the two weeks after each Super Bowl. This average number of daily
new followers over these two weeks served as the dependent variable. See the
Supplementary Methods and Supplementary Results for Study 3 for more detail.
Study 4. We obtained all reviews on for all restaurants in Chicago,
Illinois, using Python v.2.7 in consultation with the site owners regarding the use of
the data. To do so, we used an existing database of all zip codes in the United States
and used those zip codes in the state of Illinois that directly named Chicago as the
originating city (nzip codes = 91; see the Supplementary Methods for Study 4). The
reviews began in 2004 when Yelp was first founded and continued until September
To measure the success of and behaviour towards each restaurant, we obtained
the number of daily table reservations made at all Chicago restaurants that used the
table reservation platform from—the most popular online table
reservation platform in the United States56. We used Python v.2.7 and obtained
the data in line with the terms of use. Over a two-month period (14 July to 27
September 2017), we obtained the average number of daily table reservations
made at each restaurant. There was a total of 1.30 million table reservations
across the Chicago restaurants at this time. See the Supplementary Methods and
Supplementary Results for Study 4 for more detail.
Reporting Summary. Further information on research design is available in the
Nature Research Reporting Summary linked to this article.
Data availability
The data for Study 2 are available from Amazon (
amazon-reviews-pds/readme.html). The data from Studies 1, 3 and 4 are publicly
hosted on (Study 1), (Study 3), www. (Study 3), (Study 4) and (Study
4). For purposes of verification and reproducibility, readers will be provided with
the code and anonymized aggregated data results upon request. Although the
data are publicly available, their use is governed by each site’s terms of use. Those
interested in the original data should contact the site administrators for permission.
Code availability
The code for these analyses is available from the authors upon request.
Received: 14 May 2019; Accepted: 10 March 2021;
Published: xx xx xxxx
1. Asch, S. E. Studies of independence and conformity: I. A minority of one
against a unanimous majority. Psychol. Monogr. Gen. Appl. 70, 1–70 (1956).
2. Sherif, M. A study of some social factors in perception. Arch. Psychol.
Columbia Univ. 187, 60 (1935).
3. Simonson, I. & Rosen, E. Absolute Value: What Really Inuences Customers in
the Age of (Nearly) Perfect Information (HarperBusiness, 2014).
4. Smith, A. & Anderson, M. Online Shopping and E-Commerce (Pew Research
Center, 2016);
5. Hu, N., Zhang, J. & Pavlou, P. A. Overcoming the J-shaped distribution of
product reviews. Commun. ACM 52, 144–147 (2009).
6. Woolf, M. Playing with 80 million Amazon product review ratings using
Apache Spark. minimaxir
7. McAuley, J., Pandey, R. & Leskovec, J. Inferring networks of substitutable and
complementary products. in Proc. 21st ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2015).
8. Yelp Factsheet (Yelp, 2017);
9. Athey, S., Castillo, J. C. & Knoepe, D. Service quality in the gig economy:
empirical evidence about driving quality at Uber. White Paper. https://doi.
org/10.2139/ssrn.3499781 (2019).
10. Babić Rosario, A., Sotgiu, F., De Valck, K. & Bijmolt, T. H. A. e eect of
electronic word of mouth on sales: a meta-analytic review of platform,
product, and metric factors. J. Mark. Res. 53, 297–318 (2015).
11. Floyd, K., Freling, R., Alhoqail, S., Cho, H. Y. & Freling, T. How
online product reviews aect retail sales: a meta-analysis. J. Retail. 90,
217–232 (2014).
12. You, Y., Vadakkepatt, G. G. & Joshi, A. M. A meta-analysis of electronic
word-of-mouth elasticity. J. Mark. 79, 19–39 (2015).
13. de Langhe, B., Fernbach, P. M. & Lichtenstein, D. R. Navigating by
the stars: investigating the actual and perceived validity of online user ratings.
J. Consum. Res. 42, 817–833 (2016).
14. Holbrook, M. B. & Addis, M. Taste versus the market: an extension of
research on the consumption of popular culture. J. Consum. Res. 34,
415–424 (2007).
15. Fowler, G. A. When 4.3 stars is average: the Internet’s grade-ination
problem; Netix is going with simpler thumbs-up or thumbs-down reviews,
while online star ratings for many products have lost their meaning. Wall
Street Journal
he-internets-grade-ination-problem-1491414200 (5 April, 2017).
16. Pang, B., Lee, L. & Vaithyanathan, S. umbs up? Sentiment classication
using machine learning techniques. in Proc. ACL-02 Conference on Empirical
Methods in Natural Language Processing 10, 79–86 (Association for
Computational Linguistics, 2002).
17. Petty, R. E. & Krosnick, J. A. Attitude Strength: Antecedents and Consequences
(Psychology Press, 1995).
18. Warriner, A. B., Kuperman, V. & Brysbaert, M. Norms of valence, arousal,
and dominance for 13,915 English lemmas. Behav. Res. Methods 45,
1191–1207 (2013).
19. Wicker, A. W. Attitudes versus actions: the relationship of verbal and overt
behavioral responses to attitude objects. J. Soc. Issues 25, 41–78 (1969).
20. Visser, P. S., Bizer, G. Y. & Krosnick, J. A. in Advances in Experimental Social
Psychology Vol. 38 (ed. Zanna, M. P.) 1–61 (Academic Press, 2006).
21. Petty, R. E., Fabrigar, L. R. & Wegener, D. T. in Handbook of Aective Sciences
(eds Davidson, R. J. et al.) 752–772 (Oxford Univ. Press, 2003).
22. Zanna, M. P. & Rempel, J. K. in e Social Psychology of Knowledge (eds
Bar-Tal, D. & Kruglanski, A. W.) 315–334 (Cambridge Univ. Press, 1988).
23. Haddock, G., Zanna, M. P. & Esses, V. M. Assessing the structure of
prejudicial attitudes: the case of attitudes toward homosexuals. J. Pers. Soc.
Psychol. 65, 1105–1118 (1993).
24. Maio, G. R. & Esses, V. M. e need for aect: individual dierences in the
motivation to approach or avoid emotions. J. Pers. 69, 583–614 (2001).
25. Rocklage, M. D., Rucker, D. D. & Nordgren, L. F. e Evaluative Lexicon 2.0:
the measurement of emotionality, extremity, and valence in language. Beha v.
Res. Methods 50, 1327–1344 (2018).
26. Rocklage, M. D. & Fazio, R. H. e evaluative lexicon: adjective use as a
means of assessing and distinguishing attitude valence, extremity, and
emotionality. J. Exp. Soc. Psychol. 56, 214–227 (2015).
27. Lavine, H., omsen, C. J., Zanna, M. P. & Borgida, E. On the primacy of
aect in the determination of attitudes and behavior: the moderating role of
aective-cognitive ambivalence. J. Exp. Soc. Psychol. 34, 398–421 (1998).
28. Rocklage, M. D. & Fazio, R. H. Attitude accessibility as a function of
emotionality. Pers. Soc. Psychol. Bull. 44, 508–520 (2018).
29. Rocklage, M. D. & Fazio, R. H. On the dominance of attitude emotionality.
Pers. Soc. Psychol. Bull. 42, 259–270 (2016).
30. Rocklage, M. D. & Luttrell, A. Attitudes based on feelings: xed or eeting?
Psychol. Sci. (2021).
31. Tooby, J. & Cosmides, L. e past explains the present. Ethol. Sociobiol. 11,
375–424 (1990).
32. Ekman, P. E. & Davidson, R. J. e Nature of Emotion: Fundamental
Questions (Oxford Univ. Press, 1994).
33. Fazio, R. H. in Attitude Strength: Antecedents and Consequences (eds Petty, R. E.
& Krosnick, J. A.) 247–282 (Lawrence Erlbaum Associates, 1995).
34. Schwarz, N. in Handbook of eories of Social Psychology (eds Van Lange, P.
et al.) 289–308 (Sage, 2012).
35. Fazio, R. H. Attitudes as object–evaluation associations of varying strength.
Soc. Cogn. 25, 603–637 (2007).
36. Frijda, N. H. & Mesquita, B. in Emotion and Culture: Empirical Studies of
Mutual Inuence (eds Kitayama, S. & Markus, H. R.) 51–87 (American
Psychological Association, 1994).
37. Keltner, D. & Haidt, J. Social functions of emotions at four levels of analysis.
Cogn. Emot. 13, 505–521 (1999).
38. Rocklage, M. D., Rucker, D. D. & Nordgren, L. F. Persuasion, emotion, and
language: the intent to persuade transforms language via emotionality.
Psychol. Sci. 29, 749–760 (2018).
39. Van Kleef, G. A., De Dreu, C. K. W. & Manstead, A. S. R. e interpersonal
eects of anger and happiness in negotiations. J. Pers. Soc. Psychol. 86,
57–76 (2004).
40. Andrade, E. B. & Ho, T.-H. Gaming emotions in social interactions.
J. Consum. Res. 36, 539–552 (2009).
41. Lee, Y.-J., Hosanagar, K. & Tan, Y. Do I follow my friends or the
crowd? Information cascades in online movie ratings. Manage. Sci. 61,
2241–2258 (2015).
42. Schlosser, A. E. Posting versus lurking: communicating in a multiple audience
context. J. Consum. Res. 32, 260–265 (2005).
43. Moe, W. W. & Schweidel, D. A. Online product opinions: incidence,
evaluation, and evolution. Mark. Sci. 31, 372–386 (2012).
44. Russell, J. A. & Barrett, L. F. Core aect, prototypical emotional episodes, and
other things called emotion: dissecting the elephant. J. Pers. Soc. Psychol. 76,
805–819 (1999).
45. Ad Meter https://
30th-ad-150000342.html (2018).
46. Ad Meter 2017 FAQ (Ad Meter, 2017); http://admeter.usatoday.
47. Asur, S. & Huberman, B. A. Predicting the future with social media. in Proc.
2010 IEEE/ACM International Conference on Web Intelligence-Intelligent Agent
Technology (WI-IAT) 1, 492–499 (IEEE Computer Society, 2010).
48. O’Connor, B., Balasubramanyan, R., Routledge, B. & Smith, N. From tweets to
polls: linking text sentiment to public opinion time series. in Proc. 4th AAAI
Conference on Weblogs and Social Media 11, 122–129 (AAAI Press, 2010).
49. Pham, M. T., Cohen, J. B., Pracejus, J. W. & Hughes, G. D. Aect monitoring
and the primacy of feelings in judgment. J. Consum. Res. 28, 167–188 (2001).
50. Roskos-Ewoldsen, D. R. & Fazio, R. H. On the orienting value of attitudes:
attitude accessibility as a determinant of an object’s attraction of visual
attention. J. Pers. Soc. Psychol. 63, 198–211 (1992).
51. Berger, J. & Milkman, K. L. What makes online content viral? J. Mark. Res.
49, 192–205 (2012).
52. Castelvecchi, D. Can we open the black box of AI? Nature 538, 20–23 (2016).
53. Python Language Reference, version 2.7. (Python
Soware Foundation, 2017).
54. Amazon Customer Reviews Dataset (Amazon, 2020); https://s3.amazonaws.
55. Ni, J., Li, J. & McAuley, J. Justifying recommendations using distantly-labeled
reviews and ne-grained aspects. In Proc. 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP) 188–197
(Association for Computational Linguistics, 2019).
56. Filloon, W. In the battle for restaurant reservations, OpenTable is still way
ahead. Eater
reservations-app-danny-meyer (2018).
We received no specific funding for this work. We thank Internet Video Archive LLC for
their assistance in providing access to the movie data and metadata from Study 1.
Author contributions
M.D.R., D.D.R. and L.F.N. conceptualized the work. M.D.R. obtained and analysed
the data with collaboration from D.D.R. and L.F.N. M.D.R., D.D.R. and L.F.N. wrote
the manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains supplementary material
available at
Correspondence and requests for materials should be addressed to M.D.R.
Peer review information Nature Human Behaviour thanks Jonah Berger, Saif
Mohammad and the other, anonymous, reviewer(s) for their contribution to the peer
review of this work.
Reprints and permissions information is available at
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
© The Author(s), under exclusive licence to Springer Nature Limited 2021
nature research | reporting summary April 2020
Corresponding author(s): Matthew D. Rocklage
Last updated by author(s): Feb 17, 2021
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Software and code
Policy information about availability of computer code
Data collection Python 2.7
Data analysis R 3.5.1, SPSS 25
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
The data for Study 2 are available from Amazon ( The data from Studies 1, 3, and 4 are publicly
hosted on (Study 1), (Study 3), (Study 3), (Study 4), and (Study 4).
For purposes of verification and reproducibility, readers will be provided with the code and anonymized aggregated data results upon request. Although the data
are publicly available, their use is governed by each site’s terms of use. Those interested in the original data should contact the site administrators for permission.
nature research | reporting summary April 2020
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see
Behavioural & social sciences study design
All studies must disclose on these points even when the disclosure is negative.
Study description Quantitative field data.
Research sample Online reviews (Metacritic, Amazon, Yelp) and tweets from Twitter. Each sample is representative of online postings from the
different online platforms. The online reviews are those from among the most popular online review websites for their category and
we obtained all possible tweets on Twitter.
Sampling strategy The sample size represents the available data for each domain.
Data collection Data were collected manually or by an automated script from each corresponding website.
Timing Data were collected from 2016 to 2020.
Data exclusions No data were excluded.
Non-participation No participants dropped out.
Randomization Participants were not allocated into experimental groups.
Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
Materials & experimental systems
n/a Involved in the study
Eukaryotic cell lines
Palaeontology and archaeology
Animals and other organisms
Human research participants
Clinical data
Dual use research of concern
n/a Involved in the study
Flow cytometry
MRI-based neuroimaging
... This matters because emotionality in text predicts behavior in a variety of domains. Text with greater emotional content predicts voting behavior (Lavine, Thomsen, Zanna & Borgida 1998), and reviews with more emotionality predict restaurant reservations, movie box office gross, and book sales (Rocklage, Rucker, & Nordgren, 2021), over and above positive sentiment. ...
... When convincing someone to eat healthy food, people use less emotion. Prior research suggests that more emotionality in authors' persuasive messages increases readers' engagement across a variety of domains (Rocklage et al., 2021;Rocklage & Fazio, 2021). Analogously, to understand whether emotional descriptions increase readers' desire to eat the foods described, and in particular, healthy foods, we ran a follow-up study. ...
... Previous research explored the effect of emotionality as a predictor (finding it increases persuasion, e.g., Rocklage et al., 2021;Rocklage & Fazio, 2021). In contrast, this research treats emotionality as an outcome variable. ...
... First, it offers a simple but effective approach to eliciting helpful information from online reviews. While previous studies 1,4,7,21 demonstrated various approaches for eliciting helpful information, these approaches require sophisticated analyses of each review's content, while our study shows that based on simple analyses of reviews' degrees of deviation from the average opinion (i.e., the outlier degrees of reviews), people can effectively identify helpful reviews. Thus, our research provides a less costly way to harness the online wisdom-of-crowds. ...
... In recent decades, online reviews have become a valuable source of information for assisting people in making decisions on online platforms, such as whether to purchase a product 1 . Given the vast number of online reviews 2,3 , a prominent issue is how to effectively elicit helpful information for decision-making purposes [1][2][3][4][5][6] . ...
... In recent decades, online reviews have become a valuable source of information for assisting people in making decisions on online platforms, such as whether to purchase a product 1 . Given the vast number of online reviews 2,3 , a prominent issue is how to effectively elicit helpful information for decision-making purposes [1][2][3][4][5][6] . In this regard, many previous studies [7][8][9] have considered average aggregation to be a useful method. ...
Full-text available
Identifying helpful information from large-scale online reviews has become a core issue in studies on harnessing wisdom-of-crowds. We investigated whether online reviews expressing dissenting opinions (i.e., outlier reviews) can provide helpful information. Using statistical and simulation methods with a large-scale dataset, we found that, compared with other online reviews, outlier reviews were deemed more helpful because they provided more sufficient, neutral, and concise information. To interpret these results, we considered that in collective behaviours, a prevalent social psychological process—conformity (i.e., changing one’s behaviour in response to pressure from others)—pressured reviewers expressing dissenting opinions. This motivated them to provide more convincing evidence (i.e., sufficient, neutral, and concise information). This study offers a simple yet effective approach for eliciting helpful information from many online reviews and deepens the understanding of the mechanism underlying collective online behaviour. Specifically, conformity was considered to cause biases in the collective behaviour of humans; however, this study revealed that conformity can elicit valuable outcomes in collective behaviour.
... Focusing on text-based communications, Berger and Milkman (2012) reported that consumers are more likely to share content that is highly arousing. Similarly, Rocklage, Rucker, and Nordgren (2021) find that emotional language in online reviews was diagnostic for market success of products. Berger and Packard (2018) found that cultural items that are atypical of their genre tend to be more popular, while Packard and Berger (2020) find that the use of second-person pronouns in cultural items results in more purchases. ...
... We face a similar quandary with regard to the text of the post. While prior research suggests that we want to use arousing and emotional language (e.g., Berger & Milkman, 2012;Rocklage et al., 2021) and second-person pronouns (Packard & Berger, 2020), what else should appear in the text? ...
... Early social scientific evidence argued that words offered rich accounts of peoples' thoughts and feelings (Allport, 1942;Berger, 2023;Freud, 1915;Weiner & Mehrabian, 1968), which could then be used to model their internal states. In marketing, analyses of language patterns, for example, have been instrumental to understand hedonic and utilitarian consumption (Kronrod et al., 2012a;Kronrod & Danziger, 2013), cultural products' virality and popularity (Berger & Milkman, 2012;Berger & Packard, 2018;Packard & Berger, 2020), loan defaults (Netzer et al., 2019), online review helpfulness (Lafreniere et al., 2022;, customer satisfaction Packard & Berger, 2021), market structure (Netzer et al., 2012), and consumer emotionality (Rocklage et al., 2021a(Rocklage et al., , 2021b. Similarly, in psychological sciences, such analyses have shed light on psychological experiences and disorders such as depression (Weintraub, 1981), and expanded to more recent evaluations such as personality (Yarkoni, 2010) and romantic relationships (Ireland et al., 2011;Seraj et al., 2021). ...
The academic study of grammatical voice (e.g., active and passive voice) has a long history in the social sciences. It has been examined in relation to psychological distance, attribution, credibility, and deception. Most evaluations of passive voice are experimental or small‐scale field studies, however, and perhaps one reason for its lack of adoption is the difficulty associated with obtaining valid, reliable, and replicable results through automated means. We introduce an automated tool to identify passive voice from large‐scale text data, PassivePy, a Python package (readymade website: ). This package achieves 98% agreement with human‐coded data for grammatical voice as revealed in two large validation studies. In this paper, we discuss how PassivePy works, and present preliminary empirical evidence of how passive voice connects to various behavioral outcomes across three contexts relevant to consumer psychology: product complaints, online reviews, and charitable giving. Future research can build on this work and further explore the potential relevance of passive voice to consumer psychology and beyond.
... Fiction offers us vicarious emotional experiences through empathic identification with one or more fictional characters negotiating a series of emotionally salient events. There is evidence to suggest that what readers most want from fiction is for it to elicit strong emotions, such as the fact that the emotionality of book reviews is a better predictor of sales than the number of stars a book receives (Rocklage et al., 2021). The genre of fiction in which King made his name, horror, is defined by the emotion it is supposed to elicit in audiences: fear. ...
Full-text available
Many successful novelists offer writing advice, but do they actually follow it themselves? And if so, can it truly account for the success of their novels? We dissect and examine three pieces of writing advice from Stephen King's book On Writing (2000). King counsels writers to (1) write in a simple language to aid readers' narrative immersion; (2) avoid -ly adverbs, especially in dialogue attribution; and (3) avoid the passive voice. We examine these three pieces of advice both theoretically, reviewing them in light of what we know about how literature affects readers from such fields as literary linguistics and evolutionary literary studies, and empirically, using a computational linguistics approach to test whether King follows his own advice and whether it can explain his success as a novelist. We find that King's advice about simple language makes sense if an author's goal is to sell books while his advice against -ly adverbs makes sense if the goal is instead literary recognition. For his advice against using the passive voice, we find no substantial theoretical or empirical basis.
... Excepting all the quantitative ratings, the qualitative assessments, i.e., participants' mixed feelings, were also collected, given that the information contained in natural language texts may be able to predict human behaviour [76]. The lower-left corner of Fig. 2D shows an example of one participant's mixed feelings about the past stage: '过红绿灯 时停车较急促。' (The car stopped more quickly at traffic lights). ...
Full-text available
Autonomous cars are indispensable when humans go further down the hands-free route. Although existing literature highlights that the acceptance of the autonomous car will increase if it drives in a human-like manner, sparse research offers the naturalistic experience from a passenger's seat perspective to examine the humanness of current autonomous cars. The present study tested whether the AI driver could create a human-like ride experience for passengers based on 69 participants' feedback in a real-road scenario. We designed a ride experience-based version of the non-verbal Turing test for automated driving. Participants rode in autonomous cars (driven by either human or AI drivers) as a passenger and judged whether the driver was human or AI. The AI driver failed to pass our test because passengers detected the AI driver above chance. In contrast, when the human driver drove the car, the passengers' judgement was around chance. We further investigated how human passengers ascribe humanness in our test. Based on Lewin's field theory, we advanced a computational model combining signal detection theory with pre-trained language models to predict passengers' humanness rating behaviour. We employed affective transition between pre-study baseline emotions and corresponding post-stage emotions as the signal strength of our model. Results showed that the passengers' ascription of humanness would increase with the greater affective transition. Our study suggested an important role of affective transition in passengers' ascription of humanness, which might become a future direction for autonomous driving.
... Packard and Berger (2021) analyze how the concreteness of language shapes customer satisfaction. Rocklage, Rucker, and Nordgren (2021) predict marketplace success based on mass-scale emotionality. Tirunillai and Tellis (2012) relate user-generated content to abnormal returns on stock markets. ...
Consumers seek out online user-generated content to inform their purchase decisions because they perceive content created by other consumers as more believable than marketing communications. This research provides a theory of consumer digital trust in which consumer trust in user-generated content requires a digital environment that minimizes consumer suspicion of misrepresented or missing content. The theory is supported with empirical evidence from a hierarchical meta-analysis of 128 effects from 19 online platforms over 19 years (2004–2022). Account verification features, which alleviate suspicions of misrepresented content creator identities, increase the effect of user-generated content on firm performance, but content-enhancing features, such as photo filters, that can prompt suspicion of misrepresented brand experiences, weaken this link. Content-removal features that can spark speculation of missing information in content creators’ historical content and platform moderation media, which creates questions about missing content in brand conversations, weaken the influence of some user-generated content.
In the last two years, consumers have experienced massive changes in consumption – whether due to shifts in habits; the changing information landscape; challenges to their identity, or new economic experiences of scarcity or abundance. What can we expect from these experiences? How are the world's leading thinkers applying both foundational knowledge and novel insights as we seek to understand consumer psychology in a constantly changing landscape? And how can informed readers both contribute to and evaluate our knowledge? This handbook offers a critical overview of both fundamental topics in consumer psychology and those that are of prominence in the contemporary marketplace, beginning with an examination of individual psychology and broadening to topics related to wider cultural and marketplace systems. The Cambridge Handbook of Consumer Psychology, 2nd edition, will act as a valuable guide for teachers and graduate and undergraduate students in psychology, marketing, management, economics, sociology, and anthropology.
Full-text available
Researchers and practitioners want to create opinions that stick. Yet whereas some opinions stay fixed, others are as fleeting as the time it takes to report them. In seven longitudinal studies with more than 20,000 individuals, we found that attitudes based more on emotion are relatively fixed. Whether participants evaluated brand-new Christmas gifts or one of 40 brands, the more emotional their opinion, the less it changed over time, particularly if it was positive. In a word-of-mouth linguistic analysis of 75,000 real-world online reviews, we found that the more emotional consumers are in their first review, the more that attitude persists when they express it again even years later. Finally, more emotion-evoking persuasive messages create attitudes that decay less over time, further establishing emotion’s causal effect. These effects persist above and beyond other attitude-strength attributes. Interestingly, we also found that lay individuals generally fail to appreciate the relation between emotionality and attitude stability.
Full-text available
Persuasion is a foundational topic within psychology, in which researchers have long investigated effective versus ineffective means to change other people’s minds. Yet little is known about how individuals’ communications are shaped by the intent to persuade others. This research examined the possibility that people possess a learned association between emotion and persuasion that spontaneously shifts their language toward more emotional appeals, even when such appeals may be suboptimal. We used a novel quantitative linguistic approach in conjunction with controlled laboratory experiments and real-world data. This work revealed that the intent to persuade other people spontaneously increases the emotionality of individuals’ appeals via the words they use. Furthermore, in a preregistered experiment, the association between emotion and persuasion appeared sufficiently strong that people persisted in the use of more emotional appeals even when such appeals might backfire. Finally, direct evidence was provided for an association in memory between persuasion and emotionality.
Full-text available
Despite the centrality of both attitude accessibility and attitude basis to the last 30 years of theoretical and empirical work concerning attitudes, little work has systematically investigated their relation. The research that does exist provides conflicting results and is not at all conclusive given the methodology that has been used. The current research uses recent advances in statistical modeling and attitude measurement to provide the most systematic examination of the relation between attitude accessibility and basis to date. Specifically, we use mixed-effects modeling which accounts for variation across individuals and attitude objects in conjunction with the Evaluative Lexicon (EL)—a linguistic approach that allows for the simultaneous measurement of an attitude’s valence, extremity, and emotionality. We demonstrate across four studies, over 10,000 attitudes, and nearly 50 attitude objects that attitudes based on emotion tend to be more accessible in memory, particularly if the attitude is positive.
Full-text available
The rapid expansion of the Internet and the availability of vast repositories of natural text provide researchers with the immense opportunity to study human reactions, opinions, and behavior on a massive scale. To help researchers take advantage of this new frontier, the present work introduces and validates the Evaluative Lexicon 2.0 (EL 2.0)—a quantitative linguistic tool that specializes in the measurement of the emotionality of individuals’ evaluations in text. Specifically, the EL 2.0 utilizes natural language to measure the emotionality, extremity, and valence of evaluative reactions and attitudes. The present article describes how we used a combination of 9 million real-world online reviews and over 1,500 participant judges to construct the EL 2.0 and an additional 5.7 million reviews to validate it. To assess its unique value, the EL 2.0 is compared with two other prominent text analysis tools—LIWC and Warriner et al.’s (Behavior Research Methods, 45, 1191–1207, 2013) wordlist. The EL 2.0 is comparatively distinct in its ability to measure emotionality and explains a significantly greater proportion of the variance in individuals’ evaluations. The EL 2.0 can be used with any data that involve speech or writing and provides researchers with the opportunity to capture evaluative reactions both in the laboratory and “in the wild.” The EL 2.0 wordlist and normative emotionality, extremity, and valence ratings are freely available from
Full-text available
Many situations in our lives require us to make relatively quick decisions as whether to approach or avoid a person or object, buy or pass on a product, or accept or reject an offer. These decisions are particularly difficult when there are both positive and negative aspects to the object. How do people go about navigating this conflict to come to a summary judgment? Using the Evaluative Lexicon (EL), we demonstrate across three studies, 7,700 attitude expressions, and nearly 50 different attitude objects that when positivity and negativity conflict, the valence that is based more on emotion is more likely to dominate. Furthermore, individuals are also more consistent in the expression of their univalent summary judgments when they involve greater emotionality. In sum, valence that is based on emotion tends to dominate when resolving ambivalence and also helps individuals to remain consistent when offering quick judgments.
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. consumer confidence and political opinion, and can also pre- dict future movements in the polls. We find that temporal smoothing is a critically important issue to support a suc- cessful model.
Artificial intelligence is everywhere. But before scientists trust it, they first need to understand how machines learn.
This research documents a substantial disconnect between the objective quality information that online user ratings actually convey and the extent to which consumers trust them as indicators of objective quality. Analyses of a dataset covering 1,272 products across 120 vertically-differentiated product categories reveal that average user ratings (1) lack convergence with Consumer Reports scores, the most commonly used measure of objective quality in the consumer behavior literature, (2) are often based on insufficient sample sizes which limits their informativeness, (3) do not predict resale prices in the used-product marketplace, and (4) are higher for more expensive products and premium brands, controlling for Consumer Reports scores. However, when forming quality inferences and purchase intentions, consumers heavily weight the average rating compared to other cues for quality like price and the number of ratings. They also fail to moderate their reliance on the average user rating as a function of sample size sufficiency. Consumers’ trust in the average user rating as a cue for objective quality appears to be based on an “illusion of validity.”