ArticlePDF Available

Mass-scale emotionality reveals human behaviour and marketplace success

Authors:

Abstract and Figures

Online reviews promise to provide people with immediate access to the wisdom of the crowds. Yet, half of all reviews on Amazon and Yelp provide the most positive rating possible, despite human behaviour being substantially more varied in nature. We term the challenge of discerning success within this sea of positive ratings the ‘positivity problem’. Positivity, however, is only one facet of individuals’ opinions. We propose that one solution to the positivity problem lies with the emotionality of people’s opinions. Using computational linguistics, we predict the box office revenue of nearly 2,400 movies, sales of 1.6 million books, new brand followers across two years of Super Bowl commercials, and real-world reservations at over 1,000 restaurants. Whereas star ratings are an unreliable predictor of success, emotionality from the very same reviews offers a consistent diagnostic signal. More emotional language was associated with more subsequent success. Rocklage et al. find a positivity problem: 80% or more of online ratings are positive and are unreliable predictors of success. As an alternative, mass-scale emotion predicts behaviour towards and success of movies, books, commercials and restaurants.
This content is subject to copyright. Terms and conditions apply.
Articles
https://doi.org/10.1038/s41562-021-01098-5
1College of Management, University of Massachusetts, Boston, MA, USA. 2Kellogg School of Management, Northwestern University, Evanston, IL, USA.
e-mail: matthew.rocklage@umb.edu
People have always looked to and relied on the opinions of
those around them to make decisions1,2. Now, the rise and pro-
liferation of online crowd-sourced platforms, such as Yelp and
Glassdoor, have fundamentally transformed the scope and speed
with which people can harness others’ assessments. Given their
scale, openness and availability, these platforms promise to facilitate
people’s ability to find the best option3,4. Indeed, rather than rely on
trial and error or small, informal networks, people have immedi-
ate access to the experience and wisdom of crowds. In the case of
movies and restaurants, for instance, this aggregated wisdom should
help quickly identify success—those items that have thrived and
become popular. For most platforms, the primary means to identify
successful goods is through an aggregated ‘star rating’—a numeric
rating that measures the extent to which people’s opinions are rela-
tively positive versus negative.
Perhaps surprisingly, a striking limitation of these online rating
systems has emerged: reviews are overwhelmingly positive5. On
Amazon.com, for example, the average star rating is approximately
4.2 out of 5, with well over half of the reviews being 5-star ratings6,7.
Nearly half of all Yelp reviews are 5-star ratings8, and recent research
indicates that nearly 90% of Uber ratings may be 5 stars9. A visual
representation of most online ratings reveals a J-shaped distribution,
with many 4- and 5-star ratings, a few 1-star ratings and few ratings
in between5. The degree of overwhelming positivity suggests that
individuals are often confronted with choosing between numerous
items with similar star ratings, especially given that people will not
even consider options that garner less than a 3-star rating.
A principal problem with this degree of positivity is that the rat-
ings themselves may ultimately be an unreliable indicator of the
success of that item and the human behaviour that underlies this
success (for example, restaurant reservations). Specifically, two
items might receive nearly identical ratings but vary vastly in their
success. Indeed, past research has shown substantial variability
in the link between the positivity of individuals’ ratings and suc-
cess1012. For example, the positivity of online ratings shows little
association with the underlying quality of products and fails to pre-
dict their resale value13. Moreover, an analysis of over 400 movies
revealed that greater positivity in online ratings was associated with
fewer people attending a movie, as evidenced by lower box office
revenue14. This problem has even led companies such as Netflix to
abandon standard rating systems due to their poor performance15.
Put simply, these ratings seem not to hold the wisdom that people
believe they do.
Across disciplines, behavioural scientists are beginning to recog-
nize the problematic nature of these ratings. That is, given this large
degree of positivity, a number of cases exist where items receive a
similarly positive rating. Yet, when it comes to human behaviour,
substantial differences exist—not all 5-star restaurants are equally
popular. The high degree of positivity effectively makes the ratings
ineffective signals for discriminating what are likely to be the best or
most successful options. We label this challenge to discern success
within the mass of positive ratings the ‘positivity problem.
Although quantitative ratings are the most salient and accessible
output of online reviews, most crowd-sourced platforms include a
written portion where people provide qualitative assessments. As
technology has improved, researchers have embraced computa-
tional social science techniques to quantify these qualitative assess-
ments. Perhaps the most common method to analyse text in this
way is via sentiment analysis, which most often quantifies language
in terms of its positivity16. Some words suggest greater favourability
(for example, the word ‘liked’), whereas others suggest greater nega-
tivity (for example, ‘disliked’).
Computational social science has focused primarily on the
positivity (also known as valence) of peoples attitudes17. Relatively
few efforts have sought to quantify aspects of individuals’ attitudes
beyond positivity18. Nevertheless, social psychologists have long
acknowledged that positivity is not always a reliable predictor of
behaviour17,19. To address the limitations of positivity, scholars have
introduced and explored additional facets of an attitude that can
improve its predictive ability17,20. One such facet is the emotionality
of an attitude—the extent to which an attitude is based on individu-
als’ feelings or emotional reactions2124. Positivity and emotionality
are conceptually and empirically distinct. For example, the words
enjoyable’ and ‘impeccable’ imply very similar levels of positivity,
but research indicates that the word ‘enjoyable’ is likely to be indica-
tive of a more emotional attitude than the word ‘impeccable’25.
Mass-scale emotionality reveals human behaviour
and marketplace success
Matthew D. Rocklage 1 ✉ , Derek D. Rucker2 and Loran F. Nordgren2
Online reviews promise to provide people with immediate access to the wisdom of the crowds. Yet, half of all reviews on Amazon
and Yelp provide the most positive rating possible, despite human behaviour being substantially more varied in nature. We term
the challenge of discerning success within this sea of positive ratings the ‘positivity problem’. Positivity, however, is only one
facet of individuals’ opinions. We propose that one solution to the positivity problem lies with the emotionality of people’s opin-
ions. Using computational linguistics, we predict the box office revenue of nearly 2,400 movies, sales of 1.6 million books, new
brand followers across two years of Super Bowl commercials, and real-world reservations at over 1,000 restaurants. Whereas
star ratings are an unreliable predictor of success, emotionality from the very same reviews offers a consistent diagnostic sig-
nal. More emotional language was associated with more subsequent success.
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
Articles NAtUre HUMAN BeHAviOUr
Moreover, the emotionality of individuals’ attitudes can now be cap-
tured via text analysis25,26.
Attitudes based more on emotion tend to be stronger and more
predictive of behaviour. In the political domain, voters’ emotional
reactions to a political candidate—compared with their more cogni-
tive reactions—were better predictors of future voting behaviour27.
Attitudes based more on emotion also tend to come to mind more
quickly28, are more extreme25,26 and are more consistent across con-
texts29 and time30. One reason for this relationship is that emotions
provide individuals themselves with an indication that something
especially impactful has occurred31,32, and they can thereby act as
a particularly clear signal to individuals regarding their own atti-
tude28,33,34. This strong signal, in turn, can lead attitudes to be held
more strongly in memory28, which is an established predictor of the
impact and durability of an attitude17,35.
Outward displays of emotion also signal the importance of one’s
attitude to others. The social–functional approach to emotion
puts forth that a primary function of emotion is to communicate
the strength of one’s attitudes, desires and intentions3638. As social
animals, understanding others’ goals and intentions is vital for suc-
cessful social coordination. Displays of joy and anger, for instance,
provide others with strong signals regarding a person’s state of
mind, goals and priorities. In the context of negotiation, expressions
of happiness signal that one is open to concession, whereas displays
of anger signal that one is unlikely to compromise39,40. These find-
ings indicate that when humans use emotion online, it is probably a
signal that an experience was particularly impactful to them.
Taken together, research suggests that attitudes based on emo-
tion are stronger and more predictive of one’s own behaviour, and
that people use emotion to communicate the impact of their experi-
ence to others. The consequence is that emotionality in text may be
more indicative of the success of a product or service. To illustrate,
consider a restaurant. From an attitudinal perspective, the ability of
a restaurant to elicit a positive, emotional, feelings-based reaction
is likely to lead to a more strongly held attitude in the individual.
This stronger attitude could lead that restaurant to come to mind
more frequently in the future and lead the individual to be more
likely to visit again. From a social–functional perspective, individu-
als’ emotional reactions may also signal to others just how impactful
an experience was and thereby generate more attention for that res-
taurant from others. Thus, for both these reasons, more emotional
language may be able to predict success where star ratings cannot.
In short, we argue that capturing the emotionality expressed in
online reviews may offer one solution to the positivity problem.
More specifically, we hypothesize that the emotionality of people’s
online reviews can predict success and the mass-scale human behav-
iour that underlies this success where aggregated online ratings do
not. In providing evidence of both the positivity problem and the
relationship between emotionality and mass-scale human behav-
iour across multiple domains, we aim to accomplish two objec-
tives. First, we demonstrate the breadth of the positivity problem.
Second, we offer one solution to this problem using a theory-based
approach. In doing so, this work also advances our understanding
of emotionality—a construct considered of great importance across
the social sciences32—by revealing that it has the ability to predict
mass-scale behaviour and marketplace success.
Results
Study 1. In Study 1, we predicted human behaviour and success
in the movie industry in the form of box office revenue earned in
the United States. We obtained all online reviews for all movies
from Metacritic.com from 2005 to 2018—13 years of data—and
used the first 30 user reviews written for each movie to measure
the movie’s star rating (0 to 10 stars) and text emotionality. We
also measured the valence (that is, positivity) of the text to assess
the unique contribution of emotionality. We selected the first 30
reviews for two reasons. First, using the first reviews written for a
movie helped avoid a situation where the success of the movie is
already known by reviewers, which can influence how individu-
als write about the movie41. Second, this approach helped ensure
that reviewers were expressing their own opinions as opposed
to echoing the consensus viewpoint of others. Prior work indi-
cates that early reviews can systematically bias subsequent post-
ing behaviour both in the real world and in well-controlled
laboratory experiments42,43. By using early reviews, we sought to
avoid these influences. Moreover, we used this same number of
reviews consistently in all applicable studies. These results were
also robust when using an alternative number of reviews (that is,
the first 40 reviews) and when using all possible reviews (see the
Supplementary Results for Study 1).
Across all studies, we used the Evaluative Lexicon (www.evalu-
ativelexicon.com) to quantify the average valence and emotional-
ity expressed25,26. Specifically, the Evaluative Lexicon measures the
valence and emotionality implied by the words that individuals
use (for example, ‘amazing’ or ‘enjoyable’). It has been directly vali-
dated as a measure of the valence and emotionality of individuals’
attitudes with both well-controlled laboratory experiments and
real-world naturalistic text25,26. While past work using the Evaluative
Lexicon has focused on the relationship between emotionality and
star ratings25,26, that work did not examine emotionality’s unique
relation with mass-scale behaviour, above and beyond emotional-
ity’s connection with star ratings. As overviewed earlier, while there
is a relationship between emotionality and individuals’ positivity,
these are separable constructs. Unless noted otherwise, all results
across studies utilize multiple regression with standardized coef-
ficients (B), log-transformed dependent variables and two-tailed
significance tests.
As evidence for the large number of positive ratings on this plat-
form, 81% of movies were rated positively (that is, they received an
average star rating above the midpoint of 5 stars). Given that our
aim is to predict success and human behaviour within a sea of posi-
tive reviews, our analyses examined whether emotionality was pre-
dictive of box office revenue for movies that were judged as positive
(those rated above 5 stars on average). There were 2,383 movies.
We first assessed whether the movie’s average star rating was
predictive of its box office revenue. A movie’s star rating was pre-
dictive of a movie making less box office revenue (B = 0.08;
t(2,381) = 3.24; P = 0.001; 95% confidence interval (CI), (0.136,
0.033)). When all movies were included—even those with an ini-
tial negative rating—star ratings were not significantly predictive
of box office revenue (B = 0.004; t(2,931) = 0.15; P = 0.88; 95% CI,
(0.043, 0.050)).
We then added the average emotionality of the reviews’ text
to this same model and the average text valence as a control. Star
ratings continued to be a significant negative predictor of the
movie’s box office revenue (B = 0.13; t(2,379) = 3.86; P < 0.001;
95% CI, (0.193, 0.063); Fig. 1, left panel), and text valence was
in the positive direction but ultimately non-significant (B = 0.06;
t(2,379) = 1.78; P = 0.07; 95% CI, (0.006, 0.124)). Of the great-
est importance, beyond these effects, emotionality was a sig-
nificant positive predictor of future box office revenue (B = 0.08;
t(2,379) = 3.01; P = 0.003; 95% CI, (0.027, 0.130); Fig. 1, right panel).
These results hold when controlling for (1) movie genre, (2) the
year the movie was released, (3) the length of the movie, (4) the
budget of the movie and (5) the arousal implied by the text as mea-
sured by the word list in Warriner et al.18. Regarding the arousal
of the text, although arousal and emotionality are related, arousal
refers to energy level, whereas the emotionality of an attitude is the
extent to which that attitude is based on emotions or feelings25,44.
Emotionality can be high or low in arousal. For example, the adjec-
tives ‘exciting’ and ‘lovable’ imply similar levels of emotionality but
higher or lower levels of arousal, respectively. Research has shown
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
Articles
NAtUre HUMAN BeHAviOUr
that emotionality and arousal are separable in online reviews25.
Emotionality is thus a measure of whether a movie was able to elicit
a feeling or emotional reaction (for example, a movie as ‘inspira-
tional’, ‘enchanting’ or ‘adorable’) rather than how ‘exciting’ that
movie was.
To summarize, whereas the effects of star rating were incon-
sistent across these models, emotionality was a consistent positive
predictor of box office revenue (Supplementary Table 2). Finally,
emotionality was also a significant predictor when not controlling
for any additional variables (B = 0.07; t(2,381) = 2.72; P = 0.007; 95%
CI, (0.020, 0.122); see the Supplementary Results for Study 1 for the
details of all robustness analyses).
Study 2. In Study 2, we generalized these results to a new domain.
Specifically, we predicted the success of all books on Amazon.com
from 1995 to 2015 (20 years of data). We again used the first 30
reviews for each book to index the book’s star rating (1 to 5 stars),
text valence and text emotionality. The results that follow also hold
when using an alternative cut-off (that is, the first 40 reviews) and
when using all possible reviews (see the Supplementary Results for
Study 2). We measured the success of each book on the basis of the
number of verified purchases it accrued over time.
A full 91% of the books received a positive rating by falling above
the midpoint of the star rating scale (3 stars). There were 1.6 million
positively rated books.
The regression results with average star ratings were mixed.
Aggregated ratings were a negative predictor of the number of
book purchases (B = 0.047; t(1,576,840) = 164.60; P < 0.001;
95% CI, (0.047, 0.046)). When books rated as negative were
also included, positive star ratings were significantly predictive of
more purchases (B = 0.015; t(1,727,821) = 57.54; P < 0.001; 95% CI,
(0.015, 0.016)). However, the overall evidence here was mixed, as
star ratings were non-significant or negative predictors in 1/3 of
book genres (Supplementary Table 4).
Analysing positive books, we then predicted the book’s pur-
chases on the basis of that book’s average star rating and emotional-
ity. As in Study 1, we included text valence as a control. The average
star rating was a negative predictor of purchases (B = 0.057;
t(1,576,838) = 189.25; P < 0.001; 95% CI, (0.058, 0.057)), and the
valence of the text was a significant positive predictor (B = 0.024;
t(1,576,838) = 78.28; P < 0.001; 95% CI, (0.024, 0.025)). Beyond
these effects, greater emotionality of the first 30 reviews predicted
greater purchases (B = 0.017; t(1,576,838) = 56.47; P < 0.001; 95%
CI, (0.016, 0.017)). Moreover, greater emotionality was predictive of
more book purchases in 93% of genres.
We also conducted robustness analyses controlling for (1) book
genre, (2) the year the book was released and (3) the arousal implied
by the review text. All primary results replicated (Supplementary
Table 5). Finally, emotionality was also a significant predic-
tor when not controlling for any additional variables (B = 0.016;
t(1,576,840) = 54.87; P < 0.001; 95% CI, (0.015, 0.016); see the
Supplementary Results for Study 2 for the details of all robustness
analyses).
Study 3. Study 3 examined whether the emotionality of real-time
tweets in response to television commercials predicted success and
human behaviour in the form of daily new followers of a brand.
For both the 2016 and 2017 Super Bowls, we obtained all real-time
tweets that occurred on the day of that Super Bowl that referenced
a commercial shown during the Super Bowl. There were 94 com-
mercials across 84 businesses and a total of 187,206 tweets about
these commercials. We then used the Evaluative Lexicon to quantify
the average valence and emotionality expressed towards each com-
mercial across the tweets.
For the ratings of each commercial, we used the results from USA
Today’s Ad Meter survey, which is the most popular set of Super
Bowl ratings45. The Ad Meter survey specified to respondents that
ratings between 1 and 3 indicate a ‘poor’ commercial, between 4 and
7 a ‘good’ commercial, and between 8 and 10 an ‘excellent’ commer-
cial. Though the final number of survey participants is not disclosed
by USA Today, they indicate the panel to be in the thousands46.
We predicted the average number of daily new followers each
company obtained on Facebook in the two weeks after the Super
Bowl. This number of new followers reflects the number of individ-
uals who became interested in learning more about company and its
general offerings and took active steps to interact with that company.
Because each company has only a single Facebook page, we aggre-
gated the Twitter and ratings data at the level of each company by
averaging across that company’s commercials for each Super Bowl
(n = 84). Given that our analysis emphasized the change in new fol-
lowers that a company accrued after the Super Bowl, we controlled
for the average number of daily new followers each company gained
prior to the Super Bowl (see the Supplementary Methods for Study
3 for additional details).
The USA Today scale explicitly specifies ‘good’ commercials as
those above 3 on the scale. Thus, unlike the rating scales in Studies
1 and 2 where we counted a movie or book as positively rated if it
fell above the midpoint of the scale, using the midpoint of the USA
Today scale would not capture all of the positive commercials. We
therefore included commercials that earned a ‘good’ rating or higher
Star rating
log(box office revenue)
0 5 6 7 8 9 10
0
4
6
8
Emotionality
03456
0
4
6
8
Fig. 1 | Predicting movie revenue. Scatter plots, best-fit lines and 95% CIs predicting each movie’s total US box office revenue (US dollars, log transformed)
from Metacritic star ratings (left) and emotionality (right; possible range: 0 to 9). The scatter points are the raw data and thus not adjusted for covariates.
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
Articles NAtUre HUMAN BeHAviOUr
(that is, above 3). In fact, 100% of commercials were rated as ‘good’
or higher across both Super Bowls. Thus, we used all observations.
We again began with a regression model that included each com-
mercial’s average USA Today rating to predict the average daily new
Facebook followers that a company gained in the two weeks after
the Super Bowl. We additionally controlled for the average daily
new Facebook followers (log transformed) that the company gained
prior to the Super Bowl to assess change. The number of followers
that a company accrued before the Super Bowl predicted the fol-
lowers they accrued after the Super Bowl (B = 0.15; t(81) = 14.57;
P < 0.001; 95% CI, (0.131, 0.171)), but the USA Today rating was
not predictive of followers (B = 0.01; t(81) = 1.39; P = 0.17; 95% CI,
(0.006, 0.033)).
We then added the average emotionality of the tweets for each
commercial as our primary predictor and the average valence as
a control. The average USA Today rating (B = 0.02; t(79) = 1.59;
P = 0.12; 95% CI, (0.004, 0.039)) and valence of the tweets were not
predictive of the number of new followers (B = 0.02; t(79) = 1.49;
P = 0.14; 95% CI, (0.039, 0.005)). However, beyond these effects,
the greater the emotionality of the tweets about a commercial, the
more Facebook followers a company accrued over the next two
weeks (B = 0.02; t(79) = 2.38; P = 0.02; 95% CI, (0.004, 0.042)).
Past research has indicated that the relative number of positive
versus negative tweets can be predictive of different outcomes47,48.
We therefore also included this metric as a test of the robustness
of the effects. Conceptually replicating previous research, the
greater the number of positive (minus negative) tweets a commer-
cial received, the more followers the company gained (B = 0.03;
t(78) = 2.62; P = 0.01; 95% CI, (0.007, 0.051)). As before, the USA
Today rating was not predictive (B = 0.01; t(78) = 0.53; P = 0.59; 95%
CI, (0.016, 0.028)), and the average valence of the tweets became
a negative predictor of new followers (B = 0.02; t(78) = 2.08;
P = 0.04; 95% CI, (0.045, 0.001)). Beyond these effects, greater
emotionality once again predicted a greater number of new follow-
ers (B = 0.02; t(78) = 2.66; P = 0.009; 95% CI, (0.007, 0.043)).
In additional robustness analyses, we controlled for (1) the num-
ber of commercials a company showed, (2) the quarter in the game
when the commercial was advertised and (3) the arousal implied
by the tweets. All effects were similar (Supplementary Table 7).
Moreover, the effects were consistent across both Super Bowls.
Emotionality was also a significant predictor when controlling only
for the average daily new Facebook followers each company gained
prior to the Super Bowl (B = 0.02; t(81) = 2.45; P = 0.016; 95% CI,
(0.005, 0.042); see the Supplementary Results for Study 3 for the
details of all robustness analyses).
Study 4. In Study 4, we examined success and human behaviour
in the form of table reservations for restaurants on the basis of the
first 30 Yelp.com reviews for all restaurants that existed in Chicago,
Illinois, as of 2017. We used these reviews to index each restaurant’s
average star rating (1 to 5 stars), text valence and text emotionality.
The results also hold when using an alternative number of reviews
(that is, the first 40 reviews) and when using all possible reviews (see
the Supplementary Results for Study 4). We examined the average
daily table reservations across a two-month period on OpenTable.
com—the most popular online table reservation service in the
United States. Across this two-month period, there were 1.30 mil-
lion table reservations (see the Supplementary Methods for Study 4
for additional details).
On Yelp, restaurants are rated on a 5-point star rating scale. As
evidence for the large number of positive reviews, 92% of restau-
rants received an average star rating that was above the midpoint of
3 stars. We used the restaurants falling above this midpoint. There
were 1,052 restaurants.
Unlike prior studies, the average star rating was predictive of
more table reservations (B = 0.05; t(1,050) = 3.06; P = 0.002; 95% CI,
(0.019, 0.085)). This outcome was the same when including even
negatively rated restaurants (B = 0.08; t(1137) = 4.97; P < 0.001;
95% CI, (0.049, 0.112); see the Supplementary Results for Study 4).
This positive predictive effect of star ratings allows us to examine
whether emotionality continues to be a unique predictor even when
ratings are initially in the positive direction.
We then added the average emotionality of the restaurant’s first
30 reviews as well as the average valence to the model. The aver-
age star rating fell to non-significance (B = 0.03; t(1,048) = 0.97;
P = 0.33; 95% CI, (0.089, 0.030); Fig. 2, left panel), and text valence
was a positive predictor (B = 0.08; t(1,048) = 2.76; P = 0.006; 95% CI,
(0.024, 0.143)). Beyond these effects, restaurants that elicited more
emotion were associated with more table reservations (B = 0.06;
t(1,048) = 3.39; P < 0.001; 95% CI, (0.025, 0.092); Fig. 2, right panel).
We conducted additional analyses to assess the robustness of our
findings. Specifically, we controlled for (1) how well-established the
restaurant is as indexed by the relative number of years the restau-
rant has been open, (2) the neighbourhood where the restaurant is
located, (3) the cuisine of the restaurant (for example, American,
Indian or seafood), (4) the average price of a meal at the restaurant
and (5) the arousal of the text. Again, an individual can use words
that convey an emotional attitude (for example, describing a res-
taurant and its food as ‘enjoyable’, ‘comforting’ or ‘alluring’), inde-
pendent of whether it fosters high or low arousal in that individual.
We found that, across these analyses, emotionality was a significant
Star rating
log(average times booked each day)
0 3 4 5
0
0.5
1.0
1.5
2.0
Emotionality
0 4.0 4.4 4.8 5.2
0
0.5
1.0
1.5
2.0
Fig. 2 | Predicting restaurant table reservations. Scatter plots, best-fit lines and 95% CIs predicting each restaurant’s table reservations from Yelp star
ratings (left) and emotionality (right). The scatter points are the raw data and thus not adjusted for covariates.
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
Articles
NAtUre HUMAN BeHAviOUr
predictor, whereas the star rating was not (Supplementary Table 9).
Finally, emotionality was again a significant predictor when not
controlling for additional variables (B = 0.07; t(1,050) = 4.12;
P < 0.001; 95% CI, (0.036, 0.102); see the Supplementary Results for
Study 4 for the details of all robustness analyses).
Discussion
Across four large-scale studies, we demonstrate that anywhere from
80% to 100% of ratings were positive. The challenge of discerning
success and how people will behave in this sea of positive ratings is
what we term the positivity problem.
Reflecting this problem, the current research indicates that mov-
ies, books, commercials and restaurants that receive similar ratings
often do not have similar levels of success. Throughout our studies,
online ratings tended to provide an unreliable signal of behaviour
towards, and thus success of, a large range of items. As one solu-
tion to this problem, we examined whether emotionality assessed
on a massive scale using computational linguistics provided a more
diagnostic signal. We found that emotionality predicted behaviour
across diverse items and several distinct sources—from Metacritic,
Amazon, Twitter, Yelp, Facebook and OpenTable.
This work has implications for work on online ratings and
discerning the aggregated wisdom from these ratings. In line
with past research, the current work further calls into question
the utility of star ratings for assessing and understanding human
behaviour and ultimately success. Research has indicated that the
predictive ability of star ratings is at best variable1012 and at worst
not at all or even negatively predictive of behaviour and success14.
In the current work, we demonstrate similar outcomes: increas-
ingly positive ratings were commonly non-diagnostic of success.
Moreover, we demonstrate these outcomes across a wide range
of items and online platforms. As we show, one solution to this
problem is for people and organizations to pay greater attention
to the emotionality of individuals’ attitudes. One possibility is
that organizations could consider aggregating reviewers’ language
and providing an ‘emotional star rating’ to provide more mean-
ingful assessments to individuals. Future research could explore
whether star ratings can be fruitfully replaced with other, more
predictive metrics.
The aim of this research is to demonstrate the positivity problem
and the predictive ability of emotionality as one solution. As such,
one limitation to the current work is that we did not identify the
mechanism behind emotionality’s predictive ability. This research
thus provides a springboard for future work where researchers can
delve further into illuminating the paths through which emotional-
ity is able to predict human behaviour. As noted earlier, attitudes
based more on emotion tend to be stronger and more consistent
across contexts and time27,29,30,49. One reason for these outcomes is
that these attitudes tend to be stored more strongly in memory28.
Stronger links in memory predict what individuals think about and
what captures their attention in their environment, thereby pro-
viding a general guide for behaviour17,35,50. Thus, when individuals
consider which restaurant to frequent, website to visit or movie to
see again, attitudes based more on emotion are less likely to have
changed, more likely to come to mind and consequently more likely
to guide behaviour.
Additional work could explore whether attitudes based more on
emotion also affect success by increasing individuals’ propensity
to spread information via word of mouth. This may happen either
spontaneously or when individuals are directly asked for recom-
mendations. In the former case, attitudes based on emotion may
come to mind with relatively little prodding and lead individuals to
spontaneously think of and talk to others about an item. In the latter
case, when asked for a recommendation, individuals may think of
and recommend an emotion-evoking item first, given its stronger
link in memory. In line with this possibility, prior research indicates
that emotion-evoking news articles are generally more likely to be
shared with others51. Future research could explore this potential
implication of attitudes based on emotion.
We show that emotionality offers one means to solve the posi-
tivity problem, but if maximizing predictive accuracy is one’s final
goal, a second limitation of this work is that we did not maximize
predictive ability, and other solutions are possible. For example, one
approach would be to use machine learning to predict success in an
effort to maximize accuracy. However, the present approach bene-
fits from offering a theory-based solution to the positivity problem.
Indeed, machine learning is powerful in its predictive ability but
often does not provide a clear understanding of the underlying con-
structs that help provide this accuracy52. We show that emotionality,
considered of great importance across the behavioural sciences, is
predictive. In doing so, we also provide a conceptual advance to the
study of emotion itself. We show that mass-scale emotion can pre-
dict behaviour and marketplace success.
Whereas most past work on sentiment analysis has focused
on valence, the current work builds on theorizing and empiri-
cal findings in the attitudes and affective science literatures to
put forth emotionality as a unique diagnostic signal. Though the
words ‘enjoyable’ and ‘impeccable’ indicate similar levels of posi-
tivity (valence), they signal higher or lower levels of emotionality,
respectively. Through the current research, it is our hope to urge
researchers to assess factors outside of valence in the endeavour
to understand mass-scale sentiment and to use it to address issues
such as the positivity problem.
Methods
Study 1. We obtained all of the online user reviews for all movies from Metacritic.
com from 2005 to 2018 using Python v.2.7 (ref. 53) in consultation with the site
owners regarding the use of the data. We began with movies released in 2005
because this was the rst year in which there was a meaningful number of user
reviews on the platform.
We used the first 30 reviews for each movie to measure the movies star rating
(0 to 10 stars), text valence and text emotionality. We quantified text valence and
emotionality using the Evaluative Lexicon25. Some movies garnered fewer than 30
reviews, so we used the maximum number of reviews possible for these movies.
As a robustness analysis, we controlled for the number of initial reviews for each
movie, and the results replicate. The results also replicate when focusing only on
those movies that garnered at least 30 reviews.
We measured the success of movies using the box office revenue for each movie
(total United States box office revenue). See the Supplementary Results for Study 1
for more detail.
Study 2. We obtained all book reviews from Amazon.com from its beginning in
1995 until 2015 and used those books that had an identified genre. These reviews
are publicly available for download54,55. We used the first 30 reviews for each book
to measure the book’s star rating (1 to 5 stars), text valence and text emotionality.
We quantified text valence and emotionality using the Evaluative Lexicon.
We measured the success of each book by the number of verified purchases that
book had. See the Supplementary Results for Study 2 for more detail.
Study 3. We obtained all the tweets associated with Super Bowl commercials
from both the 2016 and 2017 Super Bowls using Python v.2.7 and in line with the
terms of use. We used tweets that occurred in real time on the day of each Super
Bowl, that mentioned the name of the company or an affiliated keyword, and that
referenced either the Super Bowl or a commercial. This helped ensure that the
tweets were about the target commercials (see the Supplementary Methods for
Study 3 for additional detail).
Given that Facebook did not provide easy access to long-term historical data
for companies’ Facebook pages, we began to collect the number of followers from
each company’s Facebook page in real time as soon as that company announced
it would be advertising during the Super Bowl. This was done manually and in
line with the terms of use. We used the Facebook page that corresponded to the
most salient brand or company advertised in each commercial. As the Super
Bowl is primarily viewed by those in the United States, we used the Facebook
page specifically affiliated with the United States (for example, mercedesbenzusa)
as opposed to its worldwide Facebook page (for example, mercedesbenz). We
obtained an average of 21.85 days of daily new followers for each company
before the 2016 Super Bowl (s.d. = 7.83) and 16.05 days for the 2017 Super Bowl
(s.d. = 10.73). Capturing these pre–Super Bowl data was imperative to assess the
change in the average number of followers for each company after the Super Bowl.
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
Articles NAtUre HUMAN BeHAviOUr
We then continued to extract the daily number of new followers for each
company for the two weeks after each Super Bowl. This average number of daily
new followers over these two weeks served as the dependent variable. See the
Supplementary Methods and Supplementary Results for Study 3 for more detail.
Study 4. We obtained all reviews on Yelp.com for all restaurants in Chicago,
Illinois, using Python v.2.7 in consultation with the site owners regarding the use of
the data. To do so, we used an existing database of all zip codes in the United States
and used those zip codes in the state of Illinois that directly named Chicago as the
originating city (nzip codes = 91; see the Supplementary Methods for Study 4). The
reviews began in 2004 when Yelp was first founded and continued until September
2017.
To measure the success of and behaviour towards each restaurant, we obtained
the number of daily table reservations made at all Chicago restaurants that used the
table reservation platform from OpenTable.com—the most popular online table
reservation platform in the United States56. We used Python v.2.7 and obtained
the data in line with the terms of use. Over a two-month period (14 July to 27
September 2017), we obtained the average number of daily table reservations
made at each restaurant. There was a total of 1.30 million table reservations
across the Chicago restaurants at this time. See the Supplementary Methods and
Supplementary Results for Study 4 for more detail.
Reporting Summary. Further information on research design is available in the
Nature Research Reporting Summary linked to this article.
Data availability
The data for Study 2 are available from Amazon (https://s3.amazonaws.com/
amazon-reviews-pds/readme.html). The data from Studies 1, 3 and 4 are publicly
hosted on www.metacritic.com (Study 1), www.twitter.com (Study 3), www.
facebook.com (Study 3), www.opentable.com (Study 4) and www.yelp.com (Study
4). For purposes of verification and reproducibility, readers will be provided with
the code and anonymized aggregated data results upon request. Although the
data are publicly available, their use is governed by each site’s terms of use. Those
interested in the original data should contact the site administrators for permission.
Code availability
The code for these analyses is available from the authors upon request.
Received: 14 May 2019; Accepted: 10 March 2021;
Published: xx xx xxxx
References
1. Asch, S. E. Studies of independence and conformity: I. A minority of one
against a unanimous majority. Psychol. Monogr. Gen. Appl. 70, 1–70 (1956).
2. Sherif, M. A study of some social factors in perception. Arch. Psychol.
Columbia Univ. 187, 60 (1935).
3. Simonson, I. & Rosen, E. Absolute Value: What Really Inuences Customers in
the Age of (Nearly) Perfect Information (HarperBusiness, 2014).
4. Smith, A. & Anderson, M. Online Shopping and E-Commerce (Pew Research
Center, 2016); http://assets.pewresearch.org/wp-content/uploads/
sites/14/2016/12/16113209/PI_2016.12.19_Online-Shopping_FINAL.pdf
5. Hu, N., Zhang, J. & Pavlou, P. A. Overcoming the J-shaped distribution of
product reviews. Commun. ACM 52, 144–147 (2009).
6. Woolf, M. Playing with 80 million Amazon product review ratings using
Apache Spark. minimaxir http://minimaxir.com/2017/01/amazon-spark/
(2017).
7. McAuley, J., Pandey, R. & Leskovec, J. Inferring networks of substitutable and
complementary products. in Proc. 21st ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2015).
8. Yelp Factsheet (Yelp, 2017); https://www.yelp.com/factsheet
9. Athey, S., Castillo, J. C. & Knoepe, D. Service quality in the gig economy:
empirical evidence about driving quality at Uber. White Paper. https://doi.
org/10.2139/ssrn.3499781 (2019).
10. Babić Rosario, A., Sotgiu, F., De Valck, K. & Bijmolt, T. H. A. e eect of
electronic word of mouth on sales: a meta-analytic review of platform,
product, and metric factors. J. Mark. Res. 53, 297–318 (2015).
11. Floyd, K., Freling, R., Alhoqail, S., Cho, H. Y. & Freling, T. How
online product reviews aect retail sales: a meta-analysis. J. Retail. 90,
217–232 (2014).
12. You, Y., Vadakkepatt, G. G. & Joshi, A. M. A meta-analysis of electronic
word-of-mouth elasticity. J. Mark. 79, 19–39 (2015).
13. de Langhe, B., Fernbach, P. M. & Lichtenstein, D. R. Navigating by
the stars: investigating the actual and perceived validity of online user ratings.
J. Consum. Res. 42, 817–833 (2016).
14. Holbrook, M. B. & Addis, M. Taste versus the market: an extension of
research on the consumption of popular culture. J. Consum. Res. 34,
415–424 (2007).
15. Fowler, G. A. When 4.3 stars is average: the Internet’s grade-ination
problem; Netix is going with simpler thumbs-up or thumbs-down reviews,
while online star ratings for many products have lost their meaning. Wall
Street Journal https://www.wsj.com/articles/when-4-3-stars-is-average-t
he-internets-grade-ination-problem-1491414200 (5 April, 2017).
16. Pang, B., Lee, L. & Vaithyanathan, S. umbs up? Sentiment classication
using machine learning techniques. in Proc. ACL-02 Conference on Empirical
Methods in Natural Language Processing 10, 79–86 (Association for
Computational Linguistics, 2002).
17. Petty, R. E. & Krosnick, J. A. Attitude Strength: Antecedents and Consequences
(Psychology Press, 1995).
18. Warriner, A. B., Kuperman, V. & Brysbaert, M. Norms of valence, arousal,
and dominance for 13,915 English lemmas. Behav. Res. Methods 45,
1191–1207 (2013).
19. Wicker, A. W. Attitudes versus actions: the relationship of verbal and overt
behavioral responses to attitude objects. J. Soc. Issues 25, 41–78 (1969).
20. Visser, P. S., Bizer, G. Y. & Krosnick, J. A. in Advances in Experimental Social
Psychology Vol. 38 (ed. Zanna, M. P.) 1–61 (Academic Press, 2006).
21. Petty, R. E., Fabrigar, L. R. & Wegener, D. T. in Handbook of Aective Sciences
(eds Davidson, R. J. et al.) 752–772 (Oxford Univ. Press, 2003).
22. Zanna, M. P. & Rempel, J. K. in e Social Psychology of Knowledge (eds
Bar-Tal, D. & Kruglanski, A. W.) 315–334 (Cambridge Univ. Press, 1988).
23. Haddock, G., Zanna, M. P. & Esses, V. M. Assessing the structure of
prejudicial attitudes: the case of attitudes toward homosexuals. J. Pers. Soc.
Psychol. 65, 1105–1118 (1993).
24. Maio, G. R. & Esses, V. M. e need for aect: individual dierences in the
motivation to approach or avoid emotions. J. Pers. 69, 583–614 (2001).
25. Rocklage, M. D., Rucker, D. D. & Nordgren, L. F. e Evaluative Lexicon 2.0:
the measurement of emotionality, extremity, and valence in language. Beha v.
Res. Methods 50, 1327–1344 (2018).
26. Rocklage, M. D. & Fazio, R. H. e evaluative lexicon: adjective use as a
means of assessing and distinguishing attitude valence, extremity, and
emotionality. J. Exp. Soc. Psychol. 56, 214–227 (2015).
27. Lavine, H., omsen, C. J., Zanna, M. P. & Borgida, E. On the primacy of
aect in the determination of attitudes and behavior: the moderating role of
aective-cognitive ambivalence. J. Exp. Soc. Psychol. 34, 398–421 (1998).
28. Rocklage, M. D. & Fazio, R. H. Attitude accessibility as a function of
emotionality. Pers. Soc. Psychol. Bull. 44, 508–520 (2018).
29. Rocklage, M. D. & Fazio, R. H. On the dominance of attitude emotionality.
Pers. Soc. Psychol. Bull. 42, 259–270 (2016).
30. Rocklage, M. D. & Luttrell, A. Attitudes based on feelings: xed or eeting?
Psychol. Sci. https://doi.org/10.1177/0956797620965532 (2021).
31. Tooby, J. & Cosmides, L. e past explains the present. Ethol. Sociobiol. 11,
375–424 (1990).
32. Ekman, P. E. & Davidson, R. J. e Nature of Emotion: Fundamental
Questions (Oxford Univ. Press, 1994).
33. Fazio, R. H. in Attitude Strength: Antecedents and Consequences (eds Petty, R. E.
& Krosnick, J. A.) 247–282 (Lawrence Erlbaum Associates, 1995).
34. Schwarz, N. in Handbook of eories of Social Psychology (eds Van Lange, P.
et al.) 289–308 (Sage, 2012).
35. Fazio, R. H. Attitudes as object–evaluation associations of varying strength.
Soc. Cogn. 25, 603–637 (2007).
36. Frijda, N. H. & Mesquita, B. in Emotion and Culture: Empirical Studies of
Mutual Inuence (eds Kitayama, S. & Markus, H. R.) 51–87 (American
Psychological Association, 1994).
37. Keltner, D. & Haidt, J. Social functions of emotions at four levels of analysis.
Cogn. Emot. 13, 505–521 (1999).
38. Rocklage, M. D., Rucker, D. D. & Nordgren, L. F. Persuasion, emotion, and
language: the intent to persuade transforms language via emotionality.
Psychol. Sci. 29, 749–760 (2018).
39. Van Kleef, G. A., De Dreu, C. K. W. & Manstead, A. S. R. e interpersonal
eects of anger and happiness in negotiations. J. Pers. Soc. Psychol. 86,
57–76 (2004).
40. Andrade, E. B. & Ho, T.-H. Gaming emotions in social interactions.
J. Consum. Res. 36, 539–552 (2009).
41. Lee, Y.-J., Hosanagar, K. & Tan, Y. Do I follow my friends or the
crowd? Information cascades in online movie ratings. Manage. Sci. 61,
2241–2258 (2015).
42. Schlosser, A. E. Posting versus lurking: communicating in a multiple audience
context. J. Consum. Res. 32, 260–265 (2005).
43. Moe, W. W. & Schweidel, D. A. Online product opinions: incidence,
evaluation, and evolution. Mark. Sci. 31, 372–386 (2012).
44. Russell, J. A. & Barrett, L. F. Core aect, prototypical emotional episodes, and
other things called emotion: dissecting the elephant. J. Pers. Soc. Psychol. 76,
805–819 (1999).
45. Ad Meter https://nance.yahoo.com/news/usa-today-commemorate-
30th-ad-150000342.html (2018).
46. Ad Meter 2017 FAQ (Ad Meter, 2017); http://admeter.usatoday.
com/2017/01/17/ad-meter-2017-faq/
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
Articles
NAtUre HUMAN BeHAviOUr
47. Asur, S. & Huberman, B. A. Predicting the future with social media. in Proc.
2010 IEEE/ACM International Conference on Web Intelligence-Intelligent Agent
Technology (WI-IAT) 1, 492–499 (IEEE Computer Society, 2010).
48. O’Connor, B., Balasubramanyan, R., Routledge, B. & Smith, N. From tweets to
polls: linking text sentiment to public opinion time series. in Proc. 4th AAAI
Conference on Weblogs and Social Media 11, 122–129 (AAAI Press, 2010).
49. Pham, M. T., Cohen, J. B., Pracejus, J. W. & Hughes, G. D. Aect monitoring
and the primacy of feelings in judgment. J. Consum. Res. 28, 167–188 (2001).
50. Roskos-Ewoldsen, D. R. & Fazio, R. H. On the orienting value of attitudes:
attitude accessibility as a determinant of an object’s attraction of visual
attention. J. Pers. Soc. Psychol. 63, 198–211 (1992).
51. Berger, J. & Milkman, K. L. What makes online content viral? J. Mark. Res.
49, 192–205 (2012).
52. Castelvecchi, D. Can we open the black box of AI? Nature 538, 20–23 (2016).
53. Python Language Reference, version 2.7. http://www.python.org (Python
Soware Foundation, 2017).
54. Amazon Customer Reviews Dataset (Amazon, 2020); https://s3.amazonaws.
com/amazon-reviews-pds/readme.html
55. Ni, J., Li, J. & McAuley, J. Justifying recommendations using distantly-labeled
reviews and ne-grained aspects. In Proc. 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP) 188–197
(Association for Computational Linguistics, 2019).
56. Filloon, W. In the battle for restaurant reservations, OpenTable is still way
ahead. Eater https://www.eater.com/2018/9/24/17883688/opentable-resy-online-
reservations-app-danny-meyer (2018).
Acknowledgements
We received no specific funding for this work. We thank Internet Video Archive LLC for
their assistance in providing access to the movie data and metadata from Study 1.
Author contributions
M.D.R., D.D.R. and L.F.N. conceptualized the work. M.D.R. obtained and analysed
the data with collaboration from D.D.R. and L.F.N. M.D.R., D.D.R. and L.F.N. wrote
the manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains supplementary material
available at https://doi.org/10.1038/s41562-021-01098-5.
Correspondence and requests for materials should be addressed to M.D.R.
Peer review information Nature Human Behaviour thanks Jonah Berger, Saif
Mohammad and the other, anonymous, reviewer(s) for their contribution to the peer
review of this work.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
© The Author(s), under exclusive licence to Springer Nature Limited 2021
NATURE HUMAN BEHAVIOUR | www.nature.com/nathumbehav
1
nature research | reporting summary April 2020
Corresponding author(s): Matthew D. Rocklage
Last updated by author(s): Feb 17, 2021
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Software and code
Policy information about availability of computer code
Data collection Python 2.7
Data analysis R 3.5.1, SPSS 25
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
The data for Study 2 are available from Amazon (https://s3.amazonaws.com/amazon-reviews-pds/readme.html). The data from Studies 1, 3, and 4 are publicly
hosted on www.metacritic.com (Study 1), www.twitter.com (Study 3), www.facebook.com (Study 3), www.opentable.com (Study 4), and www.yelp.com (Study 4).
For purposes of verification and reproducibility, readers will be provided with the code and anonymized aggregated data results upon request. Although the data
are publicly available, their use is governed by each site’s terms of use. Those interested in the original data should contact the site administrators for permission.
2
nature research | reporting summary April 2020
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Behavioural & social sciences study design
All studies must disclose on these points even when the disclosure is negative.
Study description Quantitative field data.
Research sample Online reviews (Metacritic, Amazon, Yelp) and tweets from Twitter. Each sample is representative of online postings from the
different online platforms. The online reviews are those from among the most popular online review websites for their category and
we obtained all possible tweets on Twitter.
Sampling strategy The sample size represents the available data for each domain.
Data collection Data were collected manually or by an automated script from each corresponding website.
Timing Data were collected from 2016 to 2020.
Data exclusions No data were excluded.
Non-participation No participants dropped out.
Randomization Participants were not allocated into experimental groups.
Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
Materials & experimental systems
n/a Involved in the study
Antibodies
Eukaryotic cell lines
Palaeontology and archaeology
Animals and other organisms
Human research participants
Clinical data
Dual use research of concern
Methods
n/a Involved in the study
ChIP-seq
Flow cytometry
MRI-based neuroimaging
... First, aesthetic scores are highly dependent on the voting procedure (i.e., voting scale, number of stimuli, questions and adjectives in the voting scale). Second, it has been shown that they might provide a variable or even negative impact on the prediction of human behavior and thus on the success of social content [30]. Third, aesthetic scores do not provide any interpretability of why an image is aesthetically pleasing or not. ...
... Third, the comments of our dataset are on average longer and more informative (basing on the score proposed in [8]) than those of the previous datasets. The degree of emotion and valence of critique comments is an excellent indicator of the success of contents on social media [30]. Therefore, together with the dataset, we present a new solution to rank images by exploiting the polarity of criticism as an indicator of aesthetic judgments. ...
... To the best of our knowledge, this is the first attempt to estimate the aesthetic quality of visuals directly from critiques rather than from human ratings. Our proposal (i) allows for weakly-supervised labeling, (ii) naturally connects two related aesthetic tasks, and (iii) provides a strong indicator of human judgment, generally more valuable than simple ratings [30]. ...
Preprint
Full-text available
Computational inference of aesthetics is an ill-defined task due to its subjective nature. Many datasets have been proposed to tackle the problem by providing pairs of images and aesthetic scores based on human ratings. However, humans are better at expressing their opinion, taste, and emotions by means of language rather than summarizing them in a single number. In fact, photo critiques provide much richer information as they reveal how and why users rate the aesthetics of visual stimuli. In this regard, we propose the Reddit Photo Critique Dataset (RPCD), which contains tuples of image and photo critiques. RPCD consists of 74K images and 220K comments and is collected from a Reddit community used by hobbyists and professional photographers to improve their photography skills by leveraging constructive community feedback. The proposed dataset differs from previous aesthetics datasets mainly in three aspects, namely (i) the large scale of the dataset and the extension of the comments criticizing different aspects of the image, (ii) it contains mostly UltraHD images, and (iii) it can easily be extended to new data as it is collected through an automatic pipeline. To the best of our knowledge, in this work, we propose the first attempt to estimate the aesthetic quality of visual stimuli from the critiques. To this end, we exploit the polarity of the sentiment of criticism as an indicator of aesthetic judgment. We demonstrate how sentiment polarity correlates positively with the aesthetic judgment available for two aesthetic assessment benchmarks. Finally, we experiment with several models by using the sentiment scores as a target for ranking images. Dataset and baselines are available (https://github.com/mediatechnologycenter/aestheval).
... Through this model, they demonstrated that emotions mediate the impact of perceived quality on behavioral intention (Ribeiro and Prayag, 2019). Recently, Rocklage et al. (2021) concluded that emotionality in text may be more indicative of the success of a product or service compared to average star rating. ...
... Researchers have found that in contrast to online ratings, potential consumers often refer to reviews from other customers who have already spent money making purchase decisions . Among other things, the emotion in the text may be more indicative of the success of the product or service (Rocklage et al., 2021). Moreover, in emotional psychology, there are different types of emotions, which are discrete (Izard and Carroll, 1977;Plutchik, 2000). ...
Article
Full-text available
Purpose Despite a significant focus on customer evaluation and sentiment analysis, limited attention has been paid to discrete emotional perspective in terms of the emotionality used in text. This paper aims to extend the general-sentiment dictionary in Chinese to a restaurant-domain-specific dictionary, visualize spatiotemporal sentiment trends, identify the main discrete emotions that affect customers’ ratings in a restaurant setting and identify constituents of influential emotions. Design/methodology/approach A total of 683,610 online restaurant reviews downloaded from Dianping.com were analyzed by a sentiment dictionary optimized by the authors; the main emotions (joy, love, trust, anger, sadness and surprise) that affect online ratings were explored by using multiple linear regression methods. After tracking these sentiment review texts, Latent Dirichlet Allocation (LDA) and LDA models with term frequency-inverse document frequency as weights were used to find the factors that constitute influential emotions. Findings The results show that it is viable to optimize or expand sentiment dictionary by word similarity. The findings highlight that love and anger have the highest effect on online ratings. The main factors that constitute consumers’ anger (local characteristics, incorrect food portions and unobtrusive location) and love (comfortable dining atmosphere, obvious local characteristics and complete supporting services) are identified. Different from previous studies, negativity bias is not observed, which poses a question of whether it has to do with Chinese culture. Practical implications These findings can help managers monitor the true quality of restaurant service in an area on time. Based on the results, restaurant operators can better decide which aspects they should pay more attention to; platforms can operate better and can have more manageable webpage settings; and consumers can easily capture the quality of restaurants to make better purchase decisions. Originality/value This study builds upon the existing general sentiment dictionary in Chinese and, to the best of the authors’ knowledge, is the first to provide a restaurant-domain-specific sentiment dictionary and use it for analysis. It also reveals the constituents of two prominent emotions (love and anger) in the case of restaurant reviews.
... Additionally, we should also pay attention to the emotional language in the text comments of the evaluation Frontiers in Physics | www.frontiersin.org February 2022 | Volume 10 | Article 839462 system, which can provide more meaningful information to individuals [46]. ...
Article
Full-text available
Characterizing the reputation of an evaluator is particularly significant for consumers to obtain useful information from online rating systems. Furthermore, overcoming the difficulties of spam attacks on a rating system and determining the reliability and reputation of evaluators are important topics in the research. We have noticed that most existing reputation evaluation methods rely only on using the evaluator’s rating information and abnormal behaviour to establish a reputation system, which disregards the systematic aspects of the rating systems, by including the structure of the evaluator-object bipartite network and nonlinear effects. In this study, we propose an improved reputation evaluation method by combining the structure of the evaluator-object bipartite network with rating information and introducing penalty and reward factors. The proposed method is empirically analyzed on a large-scale artificial data set and two real data sets. The results have shown that this method has better performance than the original correlation-based and IARR2 in the presence of spamming attacks. Our work contributes a new idea to build reputation evaluation models in sparse bipartite rating networks.
... Greater emotionality in a review, whether positive or negative, increases its perceived value and increases sales (Rocklage, Rucker, & Nordgren, 2019;Schindler & Bickart, 2012;Yazdani et al., 2018). ...
Article
Full-text available
Online word‐of‐mouth (WOM) can impact consumers’ product evaluations, purchase intentions, and choices—but when does it do so? How do those receiving WOM know whether to rely on a particular message? This article suggests that the multiple players involved in online WOM (receivers, senders, sellers, platforms, and other consumers) each have their own interests, which are often in conflict. Thus, receivers of WOM are faced with a judgment task in deciding what information to rely on: They must make inferences about the product in question and about the players who provide or present WOM. To do so, they use signals embedded in various components of WOM, such as average star ratings, message content, or sender characteristics. The product and player information provided by these signals shapes the impact of WOM by allowing receivers to make inferences about (a) their likelihood of product satisfaction, and (b) the trustworthiness of WOM players, and therefore the trustworthiness of their content. This article summarizes how each player changes the impact of online WOM, providing a lens for understanding the current literature in online WOM, offering insights for theory in this context, and opening up pathways for future research.
Article
Full-text available
Language is an integral part of marketing. Consumers share word of mouth, salespeople pitch services, and advertisements try to persuade. Further, small differences in wording can have a big impact. But while it is clear that language is both frequent and important, how can we extract insight from this new form of data? This paper provides an introduction to the main approaches to automated textual analysis and how researchers can use them to extract marketing insight. We provide a brief summary of dictionaries, topic modeling, and embeddings, some examples of how each approach can be used, and some advantages and limitations inherent to each method. Further, we outline how these approaches can be used both in empirical analysis of field data as well as experiments. Finally, an appendix provides links to relevant tools and readings to help interested readers learn more. By introducing more researchers to these valuable and accessible tools, we hope to encourage their adoption in a wide variety of areas of research.
Article
Perceived financial constraints are ubiquitous, and prior research suggests that consumers who feel financially constrained are especially likely to engage in compensatory consumption to signal positive attributes or offset the aversiveness associated with their state. However, it is unclear whether spending confers greater happiness when consumers feel financially constrained. Seven high-powered studies (N = 7,228) demonstrate that perceived financial constraints decrease the happiness consumers derive from their purchases. This effect is robust across several purchase types and occurs in part because consumers who perceive greater financial constraints are more likely to consider opportunity costs when evaluating their purchases (studies 2A-2B). Consistent with this mechanism, the effect attenuates when all consumers are prompted to consider opportunity costs (study 3) and when consumers consider planned purchases (study 4). The negative effect of perceived financial constraints on purchase happiness results in an important behavioral outcome: less favorable consumer reviews (studies 5A-5B). The authors conclude by meta-analyzing their file drawer (25,765 participants; 42 studies) to explore how the effect differs across several purchase types and discussing theoretical and practical implications for consumers and marketers.
Article
There are many ways consumers' morality has been shown to impact their marketplace behavior. We present a theoretical framework for how to conceive of and study marketplace morality in an attempt to unify these disparate findings. First, we describe two common conceptualizations of marketplace morality: (a) the attribute‐level approach (where a product attribute fits within a category that is normatively considered moral) and (b) the person‐level approach (where consumers differ in the extent to which they dispositionally value morality). We then introduce a third conceptualization: (c) the attitude‐level approach (where consumers differ in the extent to which they see their relevant attitude as based in their morality). Through this approach, we demonstrate morality's predictive utility for consumers' marketplace behaviors and help explain why other research could have found mixed evidence for its influence. Moreover, we use this approach to illuminate four contexts in which consumers' morality is more likely to influence marketplace attitudes and thereby impact their behavior: when the consumer's attitude is emotional, value‐relevant, identity‐relevant, and/or conceived in a negative valence. We conclude with a discussion of some of the unique challenges to attitude moralization in the marketplace as well as implications for managers promoting morally positioned purchases.
Article
Full-text available
Researchers and practitioners want to create opinions that stick. Yet whereas some opinions stay fixed, others are as fleeting as the time it takes to report them. In seven longitudinal studies with more than 20,000 individuals, we found that attitudes based more on emotion are relatively fixed. Whether participants evaluated brand-new Christmas gifts or one of 40 brands, the more emotional their opinion, the less it changed over time, particularly if it was positive. In a word-of-mouth linguistic analysis of 75,000 real-world online reviews, we found that the more emotional consumers are in their first review, the more that attitude persists when they express it again even years later. Finally, more emotion-evoking persuasive messages create attitudes that decay less over time, further establishing emotion’s causal effect. These effects persist above and beyond other attitude-strength attributes. Interestingly, we also found that lay individuals generally fail to appreciate the relation between emotionality and attitude stability.
Article
Full-text available
Persuasion is a foundational topic within psychology, in which researchers have long investigated effective versus ineffective means to change other people’s minds. Yet little is known about how individuals’ communications are shaped by the intent to persuade others. This research examined the possibility that people possess a learned association between emotion and persuasion that spontaneously shifts their language toward more emotional appeals, even when such appeals may be suboptimal. We used a novel quantitative linguistic approach in conjunction with controlled laboratory experiments and real-world data. This work revealed that the intent to persuade other people spontaneously increases the emotionality of individuals’ appeals via the words they use. Furthermore, in a preregistered experiment, the association between emotion and persuasion appeared sufficiently strong that people persisted in the use of more emotional appeals even when such appeals might backfire. Finally, direct evidence was provided for an association in memory between persuasion and emotionality.
Article
Full-text available
Despite the centrality of both attitude accessibility and attitude basis to the last 30 years of theoretical and empirical work concerning attitudes, little work has systematically investigated their relation. The research that does exist provides conflicting results and is not at all conclusive given the methodology that has been used. The current research uses recent advances in statistical modeling and attitude measurement to provide the most systematic examination of the relation between attitude accessibility and basis to date. Specifically, we use mixed-effects modeling which accounts for variation across individuals and attitude objects in conjunction with the Evaluative Lexicon (EL)—a linguistic approach that allows for the simultaneous measurement of an attitude’s valence, extremity, and emotionality. We demonstrate across four studies, over 10,000 attitudes, and nearly 50 attitude objects that attitudes based on emotion tend to be more accessible in memory, particularly if the attitude is positive.
Article
Full-text available
The rapid expansion of the Internet and the availability of vast repositories of natural text provide researchers with the immense opportunity to study human reactions, opinions, and behavior on a massive scale. To help researchers take advantage of this new frontier, the present work introduces and validates the Evaluative Lexicon 2.0 (EL 2.0)—a quantitative linguistic tool that specializes in the measurement of the emotionality of individuals’ evaluations in text. Specifically, the EL 2.0 utilizes natural language to measure the emotionality, extremity, and valence of evaluative reactions and attitudes. The present article describes how we used a combination of 9 million real-world online reviews and over 1,500 participant judges to construct the EL 2.0 and an additional 5.7 million reviews to validate it. To assess its unique value, the EL 2.0 is compared with two other prominent text analysis tools—LIWC and Warriner et al.’s (Behavior Research Methods, 45, 1191–1207, 2013) wordlist. The EL 2.0 is comparatively distinct in its ability to measure emotionality and explains a significantly greater proportion of the variance in individuals’ evaluations. The EL 2.0 can be used with any data that involve speech or writing and provides researchers with the opportunity to capture evaluative reactions both in the laboratory and “in the wild.” The EL 2.0 wordlist and normative emotionality, extremity, and valence ratings are freely available from www.evaluativelexicon.com.
Article
Full-text available
Many situations in our lives require us to make relatively quick decisions as whether to approach or avoid a person or object, buy or pass on a product, or accept or reject an offer. These decisions are particularly difficult when there are both positive and negative aspects to the object. How do people go about navigating this conflict to come to a summary judgment? Using the Evaluative Lexicon (EL), we demonstrate across three studies, 7,700 attitude expressions, and nearly 50 different attitude objects that when positivity and negativity conflict, the valence that is based more on emotion is more likely to dominate. Furthermore, individuals are also more consistent in the expression of their univalent summary judgments when they involve greater emotionality. In sum, valence that is based on emotion tends to dominate when resolving ambivalence and also helps individuals to remain consistent when offering quick judgments.
Article
Full-text available
The increasing amount of electronic word of mouth (eWOM) has significantly affected the way consumers make purchase decisions. Empirical studies have established an effect of eWOM on sales but disagree on which online platforms, products, and eWOM metrics moderate this effect. The authors conduct a meta-analysis of 1,532 effect sizes across 96 studies covering 40 platforms and 26 product categories. On average, eWOM is positively correlated with sales (.091), but its effectiveness differs across platform, product, and metric factors. For example, the effectiveness of eWOM on social media platforms is stronger when eWOM receivers can assess their own similarity to eWOM senders, whereas these homophily details do not influence the effectiveness of eWOM for e-commerce platforms. In addition, whereas eWOM has a stronger effect on sales for tangible goods new to the market, the product life cycle does not moderate the eWOM effectiveness for services. With respect to the eWOM metrics, eWOM volume has a stronger impact on sales than eWOM valence. In addition, negative eWOM does not always jeopardize sales, but high variability does.
Article
Artificial intelligence is everywhere. But before scientists trust it, they first need to understand how machines learn.
Article
This research documents a substantial disconnect between the objective quality information that online user ratings actually convey and the extent to which consumers trust them as indicators of objective quality. Analyses of a dataset covering 1,272 products across 120 vertically-differentiated product categories reveal that average user ratings (1) lack convergence with Consumer Reports scores, the most commonly used measure of objective quality in the consumer behavior literature, (2) are often based on insufficient sample sizes which limits their informativeness, (3) do not predict resale prices in the used-product marketplace, and (4) are higher for more expensive products and premium brands, controlling for Consumer Reports scores. However, when forming quality inferences and purchase intentions, consumers heavily weight the average rating compared to other cues for quality like price and the number of ratings. They also fail to moderate their reliance on the average user rating as a function of sample size sufficiency. Consumers’ trust in the average user rating as a cue for objective quality appears to be based on an “illusion of validity.”