IDENTIFYING HIGH VALUE CONSUMERS IN A NETWORK:
NETWORK STRUCTURE VERSUS INDIVIDUAL CHARACTERISTICS
Gary J. Russell†
†Sang-Uk Jung is a lecturer of Marketing in the Business School at University of Auckland, New Zealand, Qin
Zhang is an Assistant Professor of Marketing and Gary J. Russell is the Henry B. Tippie Research Professor of
Marketing in the Tippie College of Business at University of Iowa. Corresponding author: Qin Zhang, e-mail:
firstname.lastname@example.org, Ph: 319-335-3125.
IDENTIFYING HIGH VALUE CONSUMERS IN A NETWORK:
NETWORK STRUCTURE VERSUS INDIVIDUAL CHARACTERISTICS
Firms are interested in identifying customers who generate the highest
revenues. Traditionally, customers are regarded as isolated individuals whose buying behavior
depends solely on their own characteristics. In a social network setting, however, customer
interactions can play an important role in purchase behavior. This study proposes a spatial
autoregressive model that explicitly shows how network effects and individual characteristics
interact in generating firm revenue. Using model output, we develop a method of identifying
individuals whose purchase behavior most impacts the total revenues in the network. An
empirical study using a user-level online gaming dataset demonstrates that the proposed model
outperforms benchmark models in predicting revenues. Moreover, the proposed value measure
outperforms a variety of benchmark measures in identifying the most valuable customers.
Keywords: Social Network, Social Influence, Spatial Autoregressive Model, Customer
Relationship Management, Customer Value, Online Games.
Firms have long been interested in identifying and predicting high value customers. In
traditional customer relationship management (CRM), customers are often viewed as isolated
individuals whose purchase behaviors are solely determined by their own characteristics (e.g.,
demographical or/and behavioral characteristics). In a network setting where customers are
connected and communicated with each other, however, purchases of a customer can also be
influenced by other customers in the network.
Clearly, not all customers have the same degree of influence on others in a network. It is
vital for marketing managers to identify key customers who have greater influence on others’
behavior and to understand how behavioral contagion spreads through networks. Because of
spillover effects, focusing marketing efforts on influential customers can increase the magnitude
and speed of diffusion of marketing effects in a way not possible in traditional marketing. This,
in turn, provides opportunities for improvement in the return on marketing investment (ROMI).
Traditionally, customer influence is inferred by the structure of the network (Burt 1987).
This amounts to measuring the influence of a customer based on how well he/she is connected
with others. Recent studies (Trusov, Bodapati and Bucklin 2010) have argued that the network
connections of an individual may not directly translate into his/her impact in influencing the
purchase behavior of others, i.e., the purchases that generate revenues for a firm. Instead, they
propose to measure an individual’s influence based on how an individual’s behavioral outcomes
affect those of others in the network. We agree with this view and consider that it is necessary to
account for both network structure and purchase behavior in measuring the influence of a
customer on others in a network.
In this paper, we further apply this view to customer valuation in a network setting,
where not only the influence of a customer on others’ contribution but also the customer’s own
contribution to a firm should be taken into account. Specifically, we argue that when measuring
customer value in a network it is important to simultaneously account for both network structure
and individual (demographic or/and behavioral) characteristics. Having a measure that account
for both aspects can help firms better understand the value of their customers, enabling them to
compare the value of customers who show different merits, for example, comparing the value of
customers, who are well connected in the network but do not contribute greatly to the firms’
revenues on their own, with that of customers, who are important contributors to firms’ revenues
individual wise but are not well connected to other customers.
For this purpose, we first propose a spatial autoregressive (SAR) model to explicitly
examine how network structure and customers’ characteristics interact in generating revenues for
a firm. The spatial weight matrix of the SAR model, which represents the social network, is
constructed in such a way that potential asymmetric influential relationships between connected
customers are accommodated and real-value parameter estimation is ensured. Using the model
output, we construct a measure of customer valuation that takes into account the effects of both
network effects and individual characteristics. This measure enables us to identify customers
who have greater impact on the total revenues of a network.
In our empirical application, we are able to take advantage of a data on a virtual social
network – an online gaming community. We obtained this unique user-level dataset from a
popular online gaming company in Korea. The dataset contains information about individual
players’ characteristics – demographics and game-playing behaviors in the virtual game
environment, the behavioral outcomes of these players - the revenues they generated for the
company, and information about the social network – communications between players. These
information sources provide great opportunities for us to study how member characteristics and
network linkages interact to generate behavioral outcomes.
We estimate the proposed model using the online gaming data. The estimated social
network effect is in line with what is found in the literature, and other parameter estimates are
also consistent with our expectations. We also estimate two benchmark models - a model that
only accounts for the individual characteristics but ignores the network effects and a model that
only accounts for the network effects but ignores the effects of individual characteristics on
customer revenues. The proposed model outperforms the benchmark models for both calibration
and holdout samples based on standard fit criteria.
Next, we conduct policy simulations to demonstrate how the proposed customer value
measure can be used to help firms identify high value customers in a network. We compare the
proposed measure with a variety of benchmark measures that are commonly used to evaluate
customer value in a network. Our proposed measure not only outperforms all the benchmark
measures but also to a great extent. Consistent with the findings by Trusov, Bodapati and
Bucklin (2010) and Stephen and Toubia (2010), our results support the view that knowledge of
network structure alone is not sufficient in identifying the most valuable customers in a network.
The rest of the paper is organized as follows. We first discuss pertinent previous literature
and position our work relative to this literature. Next, we describe the proposed model and the
proposed customer valuation measure, and apply the theory empirically. We conclude by
discussing the applicability of our proposed measure in more general network settings and
outline opportunities for future research.
There is a growing body of research in the area of social influence and social networks in
the marketing area. While there are many studies in the literature about the diffusion of
innovations (Iyengar, Van den Bulte and Valente 2010; Watts and Dodds 2007; Argo, Dahl and
Morales 2006, 2008; Goldenberg et al. 2009), the effects of word-of-mouth (Trusov, Bucklin and
Pauwels 2009; Godes et al. 2005; Godes and Mayzlin 2004) and joint group decision making
(Hartmann 2010), we focus attention here on modeling interdependent decisions of individuals
and identifying an individual’s impact on revenue generation to firms.
Identifying Individual Influence in a Network
Social network analysis (SNA) has emerged as a key technique to measure an
individual’s influence in a network in the quantitative sociology area. It has been widely used to
analyze social networks in various disciplines such as sociology, economics, physics, computer
science and marketing. These measures are grounded on the basic assumption that people
occupying some important positions in a network tend to have greater access to relevant
resources and tend to have more influence on others (Freeman 1979; Keller and Barry 2003).
Various measures have been suggested such as centrality, structural equivalence and structural
holes, etc. (Burt 1987, 1992). For example, centrality measures such as degree, betweenness and
closeness represent the social power of an individual based on how well they are connected with
others in the network. Thus, the importance of an individual can be inferred from his or her
location in the network (Bavelas 1950; Beauchamp 1965; Freeman 1977; Opsahl, Agneessens
and Skvoretz 2010).
The notion that individuals’ influence on others can be measured using their location in a
network has also been discussed in marketing (e.g., Iacobucci 1990, 1996, 1998; Iacobucci and
Hopkins 1992; Van Den Bulte and Wuyts 2007). Social networking sites allow researchers to
gather relational data among individuals, such as connections of friendship. Because these links
are easily observable by the firm and researchers, it is tempting to apply SNA directly to infer a
person’s importance in the network to the firm. However, measuring customer value in the
revenue creation perspective is more challenging because little is known about how network
connections are translated into revenues to the firm.
The connection between network structure and behavior has been explored in several
studies. Trusov, Bodapati and Bucklin (2010) found that having many links (high degree) does
not make users influential in terms of revenue creation. They illustrate the potential for large
gaps in financial returns to the firm from using model-based estimates of influence versus count
of connections. Also, Stephen and Toubia (2010) found that the sellers who benefit the most
from the network are not necessarily those who are central to the network, but rather those whose
accessibility is most enhanced by the network. Iyengar, Han and Gupta (2009) attempted to
quantify social influence in terms of purchase probability and revenues at the individual level
using actual purchase data in a social networking site. They found significant heterogeneity in
social influence, in that highly connected people tend to be negatively influenced by their friends’
purchases, whereas people with moderate connections are positively influenced by such
purchases. All these studies make the larger point that knowledge of network structure per se
does not provide sufficient information to predict behavioral outcomes.
Modeling Interdependent Behaviors in a Network
Choosing the correct model of interpersonal influence plays a key role in measuring
individual impact in a network. Individual decision-making models can be categorized into two
major types: linear-in-means, and spatial econometrics. The linear-in-means model has often
been used in social econometrics (Manski 1993). Spatial econometrics, originally developed in
the academic geography literature, is widely applied to social network research in sociology
(Leenders 2002). These two types of models adopt the same basic assumptions that an
individual’s preferences or behaviors are a function of others’ preferences or behaviors.
However, there are several key differences.
The linear-in-means model assumes that the outcome of each individual in a group is
linearly dependent upon the average outcomes and characteristics of his or her reference group
(Manski 1993). In this domain, individual-level variables are typically aggregated into group-
level measures (Hartmann et al. 2008), and significant group-level variables are interpreted as
the presence of neighborhood effects. One basic assumption of this model is that the social
influence on different people in the same group is the same. Since the pioneering work by
Datcher (1982), much of the empirical literature on social interactions in econometrics has
involved extending the general form of the linear-in-means model (Solon 1999, Durlauf and
Seshadri 2003, Graham and Hahn 2005). However, these patterns of social interactions are
highly specialized and cannot be generally applied to all social networks.
In contrast, the spatial econometrics approach is more flexible (Le Sage and Pace 2009).
These models focus on the microstructure of interactions among individuals and allow for the
heterogeneity of interactions across pairs of individual actors. Depending on the theory and the
empirical applications, the interdependence in a network can be represented in two different
ways. First, an individual’s behavioral outcome may depend directly upon the outcomes of
others and thus in proportion to their influence. This model, called a spatial autogressive (SAR)
process, is formalized by including a lagged dependent variable as an additional predictor.
Second, interdependence may be modeled through error terms. This may occur when the
observed dependence does not reflect a truly causal effect (such as homophily or unobserved
Research Framework of This Study
In this research, we adopt a spatial autoregressive (SAR) model to investigate interactions
in a social network. The framework has three key advantages. First, it assumes that individuals
who are near each other are more related than individuals who are distant. Second, it allows for
heterogeneous interactions across different pairs of individuals. Third, it implies a causal link
between individual behavior and the behavior of others in the network. Because the model
allows for spillover and magnified effects across individuals (Anselin 1988; Kelejian, Travlas
and Hondronyiannis 2006), it is possible to infer the relative influence of different individuals on
overall network outcomes.
MODELING CUSTOMER VALUE IN A NETWORK
Customer value in a network is defined as the impact of a particular customer’s actions
on the total behavior of a network. We begin by proposing a spatial autoregressive (SAR) model
and discussing how properties of the SAR model specification are useful in the social network
setting. Using this specification, we show how the SAR model structure can be manipulated to
yield an easily-computed measure of customer value.
Spatial Autoregressive (SAR) Model
Drawing upon the spatial statistics literature (LeSage and Pace 2009), we propose to use
a spatial autoregressive (SAR) model to explore how a customer’s own characteristics interact
with the purchases of others in the network in generating revenues for a firm. Let Y denotes a
vector of revenues generated by N customers in a customer network. We can describe the
network revenues using a SAR model as
where X is a matrix that denotes the N customers’ own (k) characteristics (such as
demographic and behavioral characteristics) that affect purchase behavior, is the
parameter vector, and is a vector of errors, assumed to be normally
distributed, i.e., . Network effects are represented by the term. Here, is
a spatial weight matrix that represents the network connections between customers and is a
spatial lag parameter. The parameter measures the degree of overall interdependence of
purchases among customers in the network. We call
the social influence parameter.
The autoregressive term can be understood as a weighted sum of revenues of other
customers in the network. Assuming that (? − ??) matrix is invertible, the model in (1) can be
The first term on the right-hand side of (2), , can be interpreted as the expected
value of revenues, given individual characteristics X and the network structure W. Further,
assuming that |?| < 1, the matrix inversion in (2), can be expanded in an infinite
power series (Debreu and Herstein 1953) as
(? − ??)−1= ? + ?? + ?2?2+ ?3?3+ ⋯. (3)
where ?? is the mth-order neighbor matrix, measuring the extent to which any two individuals
can be reached in m relational jumps. This expression implies that actions of each customer are
magnified across the network due to a spillover pattern dictated by the network structure.
More generally, equation (2) argues that that the impact of each customer on expected
revenues is an interaction between personal characteristics and network structure. Stated
intuitively, the impact of a customer on network revenues depends on both who the customer is
and where he/she is located in the network.
Representing the Network
Constructing the social network is a critical decision in modeling interdependence among
individuals. A social network is typically represented by the continuity/adjacency matrix C, in
which each element indicates the degree of strength in the relationships between the two
individuals represented by the corresponding row and column. While the continuity/adjacency
matrix in spatial econometrics is often constructed using geographical proximity (Manchanda,
Xie and Youn 2008; Nam, Manchanda and Chintagunta 2008; Bell and Song 2007), it can also
be constructed using socio-demographic or geo-demographic similarity (Strang and Tuma 1993;
Robins, Pattison, and Elliott 2001), self-reported relationships (Iyengar, Van den Bulte and
Valente 2010; Nair, Manchanda and Bhatia, 2010) or observed friendships (Iyengar, Han and
Gupta 2009; Trusov, Bodapati and Bucklin 2010; Trusov, Bucklin and Pauwels 2009).
In this research, we use the communication among network members to represent the
network, which takes into account both interactions as well as the connections between members.
We first construct a symmetric contiguity matrix C. A typical element,
degree of strength in the connection between customer i and customer j. We assume that
where, represents the communication between i and j; represents the total communication
of i with all other customers in the network; represents the communication of j with all other
customers in the network. The spatial weight matrix is then generated by row standardizing
the contiguity matrix C (so that all rows in sum up to unity). Following conventional
practice in the spatial statistics literature, all diagonal elements of C (and consequently W) are set
The matrix in our model has two key features. First, W is in the form of a quasi-
symmetric matrix. This ensures that the SAR model is appropriately specified, leading to a real-
value estimate of the spatial lag parameter (Bhatia, Kittaneh and Li 1998). Second, W
incorporates a type of dominance pattern into the spatial weight structure, allowing for
asymmetric influential relationships between members that may exist in a social network. A
typical element in W, Wij, represents the influence of individual j on individual i in the network
and is proportional to , i.e., resulted from the row standardization of the
contiguity matrix C. The term indicates that the influence depends on not only the number of
communications between individual i and j (represented by
ij S ) but also the total number of
communications of individual j (represented by
j B ). In other words, the influence is determined
by how active individual j is in the network, as well as how much interaction exists between
individual i and j. Therefore, individuals who are more active in the network tend to have larger
weights, thus greater influence, than those who are less active.
The SAR model can be regarded as a simultaneous set of regression models (one for each
customer) that are interlocked: the purchases of customer i impact customer j, and vice versa.
Parameters are estimated using maximum likelihood procedures that take into the special
structure of the SAR model (LeSage and Pace 2009).
Measuring Customer Value in a Network
We define customer value in a network as the impact of an individual customer on the
total revenues of the network. The general strategy is to first fit the proposed SAR interaction
model to network data and then to use model output to construct individual measures of
customer. The proposed measure takes into account not only the effects of a customer’s
characteristics on his/her own revenue contribution, but also how those characteristics interact
with the network effect and affect others’ revenue contributions. In other words, the measure
accommodates both the spillover effect - any change of a customer’s characteristics will affect
the revenues of other customers - and the magnified effect - any change of characteristics of
other customers will affect the revenue of the focal customer.
Assume that a firm has N customers and the goal is to choose customers to whom the
intervention, such as target promotions, is implemented. The objective is to maximize the total
revenues of the whole network with the intervention. Based on equation (2), we define the long
run mean revenue of customers as
where, is a vector. are described for equation (1) earlier. This is the
expected revenue of each customer taking into account their own characteristics X and the
spillover effects due to the purchases of other customers.
Suppose now the firm implements an intervention on customers, which yields a direct
increase in spending for each of the customers before they are back into the network. This direct
increase represents customers’ individual responses to the intervention. It is determined by
customers’ characteristics and is a function of X . We denote it as a vector, The ith
element of represents the direct increase in spending by customer i. The updated revenues after
the intervention can be written as
Thus, the distribution of revenues after an intervention can be written as , where
Using this analysis, we define the impact of an intervention on the whole network as the
difference between the total mean revenues across all customers before and after an intervention.
It can be written as
where, is a vector of ones. Thus, the increase in system revenues due to an intervention
depends upon the direct increase in revenues , which is a function of X , and the network
, ,, and
() [() ]'
structure defined by W. By defining as 0 for all customers except customer i, we can use this
expression to measure the value of each customer.
Selecting Customers for Intervention
In any practical application, we need to select a group of customers (not just one) in such
a way that spillover and magnified effects across customers in the network are taken into account.
For this purpose, we assume that the firm plans to select a group of m customers to implement an
intervention and the direct increase in spending yielded from the intervention is , where
the ith element of the vector, , is defined as:
Thus, we can rewrite equation (8) for the total network impact from the selected
where is the sum of all elements across the ith column of
From equation (10), it can be seen that to maximize the total impact of the intervention
using m customers, the firm can first sort all customers in descending order by their respective
values of the product, and then choose the top m customers from this list. Given this logic,
we propose to use the product to measure the value of each customer in a network. We
call this proposed measure of customer value in a network Customer Network Value.
1, if customer is selected for the intervention