ArticlePDF Available

Abstract

Motivated by the growing practice of using social network data in credit scoring, this study analyzes the impact of using network based measures on customer score accuracy and on tie formation among customers. We develop a series of models to compare the accuracy of customer scores obtained with and without network data. We also investigate how the accuracy of social network based scores changes when individuals can strategically modify their social networks to attain higher credit scores. We find that, if individuals are motivated to improve their scores, they may form fewer ties with more similar partners. The impact of such endogenous tie formation on the accuracy of consumer credit scores is ambiguous. Scores can become more accurate as a result of modifications in social networks, but this accuracy improvement may come at the cost of more fragmented social networks. The threat of social exclusion in such endogenously formed networks provides incentives to low type members to exert effort that improves everyone's creditworthiness. We discuss implications for both managers and public policy.
This article was downloaded by: [165.123.225.81] On: 05 December 2019, At: 11:13
Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA
Marketing Science
Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org
Credit Scoring with Social Network Data
Yanhao Wei, Pinar Yildirim, Christophe Van den Bulte, Chrysanthos Dellarocas
To cite this article:
Yanhao Wei, Pinar Yildirim, Christophe Van den Bulte, Chrysanthos Dellarocas (2016) Credit Scoring with Social Network Data.
Marketing Science 35(2):234-258. https://doi.org/10.1287/mksc.2015.0949
Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-
Conditions
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
Copyright © 2016, INFORMS
Please scroll down for article—it is on subsequent pages
With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Vol. 35, No. 2, March–April 2016, pp. 234–258
ISSN 0732-2399 (print) ISSN 1526-548X (online) http://dx.doi.org/10.1287/mksc.2015.0949
© 2016 INFORMS
Credit Scoring with Social Network Data
Yanhao Wei
Department of Economics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, yanhao@sas.upenn.edu
Pinar Yildirim, Christophe Van den Bulte
Marketing Department, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104
{pyild@wharton.upenn.edu,vdbulte@wharton.upenn.edu}
Chrysanthos Dellarocas
Information Systems Department, Questrom School of Business, Boston University, Boston, Massachusetts 02215, dell@bu.edu
M
otivated by the growing practice of using social network data in credit scoring, we analyze the impact
of using network-based measures on customer score accuracy and on tie formation among customers. We
develop a series of models to compare the accuracy of customer scores obtained with and without network data.
We also investigate how the accuracy of social network-based scores changes when consumers can strategically
construct their social networks to attain higher scores. We find that those who are motivated to improve their
scores may form fewer ties and focus more on similar partners. The impact of such endogenous tie formation on
the accuracy of consumer scores is ambiguous. Scores can become more accurate as a result of modifications in
social networks, but this accuracy improvement may come with greater network fragmentation. The threat of
social exclusion in such endogenously formed networks provides incentives to low-type members to exert effort
that improves everyone’s creditworthiness. We discuss implications for managers and public policy.
Keywords: social networks; credit score; customer scoring; social status; social discrimination; endogenous tie
formation
History : Received: July 18, 2014; accepted: June 21, 2015; K. Sudhir served as the senior editor and Yuxin Chen
served as associate editor for this article. Published online in Articles in Advance October 26, 2015.
1. Introduction
When a consumer applies for credit, attempts to refi-
nance a loan or wants to rent a house, potential lenders
often seek information about the applicant’s financial
background in the form of a credit score provided by a
credit bureau or other analysts. A consumer’s score can
influence the lender’s decision to extend credit and the
terms of the credit. In general, consumers with high
scores are more likely to obtain credit, and to obtain
it with better terms, including the annual percentage
rate (APR), the grace period, and other contractual
loan obligations (Rusli 2013). Given that consumers use
credit for a range of undertakings that affect social and
financial mobility, such as purchasing a house, starting
a business or obtaining higher education, credit scores
have a considerable impact on access to opportunities
and hence on social inequality among citizens.
Until recently, assessing consumers’ creditworthiness
relied solely on their financial history. The
financial
credit score popularized by the Fair Isaac Corporation
(FICO), for example, relies on three key data to deter-
mine access to credit: consumers’ debt level, length
of credit history, and regular and on-time payments.
Together, these elements account for about 80% of the
FICO score. In the past few years, however, the credit
scoring industry has witnessed a dramatic change in
data sources (Chui 2013,Jenkins 2014,Lohr 2015). An
increasing number of firms rely on network-based data
to assess consumer creditworthiness. One such com-
pany, Lenddo, reportedly assigns credit scores based
on information in users’ social networking profiles,
such as education and employment history, how many
followers they have, who they are friends with, and
information about those friends (Rusli 2013).
1
Similar
to Lenddo, a growing number of start-ups specialize in
using data from social networks. Such firms claim that
their social network-based credit scoring and financing
practices broaden opportunities for a larger portion of
the population and may benefit low-income consumers
who would otherwise find it hard to obtain credit.
Our study is motivated by the growing use of such
practices and investigates whether a move to network-
based credit scoring affects financing inequality. In
particular, we address the following questions. First,
1
Network data can be collected from a variety of sources. Lenddo,
for instance, obtains applicants’ consent to scan a variety of their
online social accounts (Facebook, Gmail, Twitter, LinkedIn, Yahoo,
Microsoft Live) and sometimes also their phone activity.
234
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 235
from the perspective of lenders, is there an advantage
to using network-based measures rather than measures
based only on an individual’s data? Second, as use
of social network data becomes common practice,
how may consumers’ endogenous network formation
influence the accuracy of credit scores? Third, how
does peer pressure operate in network-based credit
scoring? Finally, and most important for public policy,
how do these scores influence inequality in access to
financing?
1.1. Main Insights
Access to financing is correlated with one’s credit
score. Following Demirgüç-Kunt and Levine (2009),
we assume that credit scores can influence access to
financing at the extensive and intensive margins, i.e.,
by increasing the number of those who are considered
eligible for financing as well as by providing access to
credit at better terms. Although network-based scoring
can affect access to financing at the extensive and
intensive margin, the impact on each might be uneven
for different segments of society.
We first develop a model with continuous risk types
incorporating network-based data (§2). Under the
assumption of homophily, the notion that people are
more likely to form social ties with others who are
similar to them, we show that network data provide
additional information about consumers and reduce
the uncertainty about their creditworthiness. We find
that the accuracy of network-based scores depends
primarily on information from the direct ties, i.e., the
assessed consumers’ ego-network. This implies that
credit-scoring firms can efficiently assess an individual’s
creditworthiness using data from a subset of the overall
network.
In §3, we extend our model to allow consumers
in a network to form ties strategically to improve
their credit scores. We find that they may then choose
not to connect to people with lower scores. This can
result in social fragmentation within a network: Those
with better access to financing opportunities choose to
segregate themselves from those with worse financing
opportunities. As a result, consumers self-select into
highly homogeneous yet smaller subnetworks. The
impact of such social fragmentation on credit scoring
accuracy is ambiguous. On the one hand, scores may
more accurately reflect borrowers’ risk as each agent is
situated in a more homogeneous ego-network. On the
other hand, scores may become less accurate because
smaller ego-networks provide fewer data points and
hence less information on each person. How important
financial scores are relative to social relationships
determines whether strategic tie formation improves or
harms credit score accuracy. When accuracy declines,
network-based scoring could put deserving consumers
with poor financing opportunities in further hardship.
This result supports concerns about social credit scoring
from consumer advocates and regulators such as the
Consumer Financial Protection Bureau (CFPB) and the
Federal Trade Commission (FTC) (Armour 2014).
In §§2and 3, we study environments wherein all
consumers, independent of their type, have similar
needs for financing. We relax this assumption in §4and
introduce a formulation with discrete risk types that
may vary in their needs for financing. When studying
this environment, we pay particular attention to the
strategic formation of social ties. An important result
is the emergence of social exclusion or discrimination
among low-type consumers. They avoid associating
with one another because such associations signal even
more strongly to lending institutions that their type is
low. Such within-group discrimination is different from
between-group discrimination studied commonly in the
literature (e.g., Arrow 1998,Becker 1971,Phelps 1972).
In §5, again within a discrete setting, we allow
consumers to exert effort to improve their true credit-
worthiness or type. When social ties motivate effort,
social credit scoring may benefit those with poor finan-
cial health in two ways, i.e., not only by letting them
benefit from a positive signal from social ties with
others having a stronger financial footing but also by
motivating them to invest more in their own financial
health. We consider environments with explicit discrim-
ination and with homophily. We find that when there
are complementarities between the effort exerted by
individuals, the between-group connections can moti-
vate effort and thus lead to increased social mobility in
both environments. The within-group connections also
improve effort in a discriminatory environment. By con-
trast, when homophily is the only factor determining
tie formation, a high number of low-type friends who
exert low effort will reduce an individual’s desire to
exert effort. In §6, we analyze another way consumers
can exert effort to improve their financial outcomes,
i.e., by actively networking to endogenously alter the
probability of meeting people with high creditworthi-
ness. Our analysis demonstrates that low types exert
effort to meet others more aggressively than high types
only when they are in dire need of improving credit
access. Otherwise, high types exert greater effort.
1.2. Related Literature
Though motivated by and couched in terms of social
credit scoring, the insights we develop go beyond
that realm. Our models involve a relatively abstract
notion of customer attractiveness or “type” that has
two properties: (1) Social relationships are homophilic
with respect to types; and (2) A third party such as a
firm or society at large values higher types more and
bestows some rewards (external to social relationships)
that are monotonically increasing with one’s type. The
notion of homophily in customer value, i.e., the notion
Wei et al.: Credit Scoring with Social Network Data
236 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
that attractive prospects or customers are more likely
to be connected to one another than to the unattractive,
and vice versa, underlies social customer scoring in
predictive analytics (e.g., Benoit and Van den Poel
2012,Goel and Goldstein 2013,Haenlein 2011). It is
also the basis for targeting friends and other network
connections of valuable customers in new product
launch (e.g., Haenlein and Libai 2013,Hill et al. 2006),
in targeted online advertising (Bagherjeiran et al. 2010,
Bakshy et al. 2012,Liu and Tang 2011), and in customer
referral programs (e.g., Kornish and Li 2010,Schmitt
et al. 2011). The basic insights also apply to employment
settings, where firms have long used employee referral
programs to attract better applicants (e.g., Castilla 2005)
and many have started to use social network data to
gain more information about applicants’ character and
work ethic (e.g., Roth et al. 2016).
The model construct that we label “social credit
score” captures a customer’s attractiveness or type
as perceived by a firm based on social network infor-
mation, in which the firm bestows some benefits that
are monotonically increasing with type. Hence, our
insights about social credit scoring can also be inter-
preted as pertaining to consumers’ social status more
broadly, i.e., their “position in a social structure based
on esteem that is bestowed by others” (Hu and Van
den Bulte 2014, p. 510). As such, our analysis involv-
ing endogenous tie formation contributes not only to
research traditions in economics and sociology (e.g.,
Ball et al. 2001,Podolny 2008) but also to the recent
marketing research on how status considerations affect
consumers’ networking behavior (Lu et al. 2013,Toubia
and Stephen 2013), their acceptance of new products
(Iyengar et al. 2015), and their appeal as customers
(Hu and Van den Bulte 2014).
Even when limited to the realm of financial credit
scoring, our analysis relates to several streams of recent
work. First is the large and growing amount of work on
microfinance and, more specifically, how group lending
helps improve access to capital by reducing the negative
consequences of information asymmetries between
creditor and debtor (e.g., Ambrus et al. 2014;Bramoullé
and Kranton 2007a,b;Stiglitz 1990;Townsend 1994).
Our analysis focuses on individual rather than group
loans, and on a priori customer scoring rather than a
posteriori compliance through group monitoring and
social pressure. Hence, our result that social credit
scoring can lead people to form their network ties
differently and to exert more effort in improving their
financial health is different from, yet dovetails with, the
evidence by Feigenberg et al. (2013) that group lending
tends to trigger changes in network structure that in
turn reduce loan defaults. The two different kinds of
“social financing” practices acting at two different stages
of the loan (customer selection and terms definition
versus compliance) can lead to improved outcomes
mediated through endogenous changes in network
structure.
Second, we provide new insights on the risk of
discrimination and exclusion triggered by social financ-
ing (Ambrus et al. 2014,Armour 2014). Our model
allows for the possibility of discrimination against less
creditworthy consumers. There are two ways through
which such discrimination can come about. The first
is that consumers may be subject to discrimination
based on type. In an endogenous network, borrowers
will be more selective in forming relationships, and
may prefer to form relationships with higher-type
consumers to protect their credit score. Formation of
networks to attain a high credit score can be an indirect
way of discrimination because some consumers are
systematically excluded from others’ networks. The
second is that consumers may observe each other’s
effort to improve their score and may discriminate
based on personal effort. Any low-type consumer who
does not exert effort may face disengagement by fellow
low-type contacts who exert effort and who want to
disassociate their own credit score from hers.
Third, our work is relevant to ongoing debates on the
impact of new social technologies on social integration
versus balkanization. Rosenblat and Mobius (2004)
find that a reduction in communication costs decreases
the separation between individuals but increases the
separation between groups. Along similar lines, van
Alstyne and Brynjolfsson (2005) find that the Internet
can lead to segregation among different types of indi-
viduals. In this study, we identify conditions under
which network-based credit scoring (and customer
scoring in general) may foster or harm integration
within versus between groups.
Finally, our work will be of topical interest to the
growing number of scholars seeking to better under-
stand consumers’ financial behaviors, especially the
role of homophily (Galak et al. 2011) and trust signaling
(e.g., Herzenstein et al. 2011,Lin et al. 2013) in gaining
access to credit. It will also be of interest to researchers
focusing on the practices in emerging economies where
consumer finance and access to credit are particularly
important yet the traditional credit scoring apparatus
is found lacking. Creditors in these markets often seek
to enrich scores based on an individual’s history with
additional information (e.g., Guseva and Rona-Tas
2001,Sudhir et al. 2015,Rona-Tas and Guseva 2014).
The rest of the article develops as follows. In §2, we
present a benchmark model with data collection from
networks to assess creditworthiness, and then provide
justification for the emergence of this industry. In §3,
we investigate the possibility of networks forming
endogenously to the social credit scoring practice. We
extend our model to allow consumers to vary in their
financing needs in §4. We consider the possibility
of social mobility through effort in §5. We extend
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 237
the model in several directions in §6and conclude
with implications for public policy and marketing
practice in §7.
2. Model with Exogenous Network
Consider a society with a large population S. Each
person iis denoted by a type x
i
, and x
i
follows N 401 q
1
5
across individuals, with precision q > 0. We assume
that each agent knows her own type and discovers
that of fellow consumers upon meeting them.
The process of forming friendships is specified as
follows. Each pair of consumers meet with a very small
independent probability of  > 0. Between iand jthere
is an independent match value m
ij
2
. A friendship
between iand jcreates utility m
ij
x
i
x
j
for either
individual. So our model features homophily based
on preference rather than opportunity (Zeng and Xie
2008): Individuals enjoy the company of others like
them more than that of others unlike them. Person i
accepts the formation of a friendship tie with j, iff,
they have met and
mij >xixj0(1)
On mutual consent of both parties, a friendship
tie is created. The assumption of a
2
distribution
implies that the probability iand jbecome friends
upon meeting is
Pr4mij >xixj5=e−xixj2/20(2)
Let Gdenote the set of friendships (ties) in society
and n
i
denote the number of friends of i, or, the
degree of iunder G. The expected number of friends
for iis
Ɛ
4n
i
x
i
5=S
pq/4q +15
e
4q/41+q55xi2/2
.
2
To repre-
sent an environment with sufficient uncertainty about
the creditworthiness of consumers, we make three
assumptions: (i) the society is large (
S+
); (ii) the
probability that any pair of individuals meet is very
small (
0
); and (iii) types are diffuse (
q0
). These
three properties characterize a society with sufficient
uncertainty about individuals. They also allow us to
assume that the product term S
pq/4q +15
holds a
constant, which we denote by N.3
Suppose that friendships in the society have been
formed. The lender is interested in updating its infor-
mation about the types of consumers using signals
collected from the network. For any individual i, the
lender may observe a noisy signal yiabout her type
yi=xi+i1(3)
2Ɛ4nixi5=SR+
− e4t xi52/2pq/425eq t2/2dt=Spq/4q+15e4q /41+q55x2
i/2.
3
In a small society where everyone is likely to be friends with others,
or in a society where each type is organized in perfectly homogeneous
and mutually disconnected subgraphs (i.e., components), there is
little to no uncertainty about an individual’s type. This implies that
network-based scores are less useful.
where
i
N 401 c
1
5and is independent across individ-
uals. The firm observes the signals of a finite set of
consumers
y
, which we refer to as the vector of signals
as well. For these consumers, the firm may observe
the presence or absence of a tie. We use g4g
1
1 g
0
5to
denote such information. Specifically, g
1
is the set of
the dyads that the lender knows are friends, and g
0
is the set of the dyads that the lender knows are not
friends. Furthermore, for each person in y, we allow
g
0
to include all of the dyads that involve her and
someone outside y.4
First, we present some properties about the firm’s
posterior on the types of consumers in a network.
Together with the nodes in y, the ties in g
1
define a
subnetwork involving only nodes on which a signal
is observed. In this subnetwork, let d
i
be the degree
of i,
5
and r 4i1 j5 be the length of the shortest path (i.e.,
geodesic distance) between iand j.
Proposition 1.
Let vector xindicate the types of con-
sumers in vector y.
Pr
4xg1 y5is a multivariate normal
density with precision matrix è1
4è15ii =c+di1
4è15ij = −18ijg191
and mean vector
=cèy0(4)
Proposition 1states that the lender’s beliefs about the
types of consumers in the network follow a multivariate
normal distribution the parameters of which depend on
the network structure. So two consumers with identical
individual signals (such as personal financial history)
may obtain different network-based scores because
of social connections. These consumers would obtain
similar financing opportunities if credit scores relied
solely on individual history. In the new regime, despite
identical individual financial histories, it is possible
that they will have unequal access to financing because
of score gains and losses from the social network.
Equation
(4)
shows that the weight that contact j’s
signal receives depends on her location in the network.
Proposition 2states an upper bound on the weight
of connection j’s signal on i’s posterior mean. When
all else is equal, the upper bound on the weight of j
decreases in the distance r4 i1 j5. If iand jare not
connected in the subnetwork, the weight is zero.
4
This type of information arises when the lender observes all of i’s
friends and their signals, which implies that iis not friends with the
rest of the society. Corollary 1demonstrates an example of such a
situation.
5
Note that d
i
, the observed degree of ineed not be the same as her
true degree, n
i
, as here we allow for observing any subnetwork of
friends, dini0
Wei et al.: Credit Scoring with Social Network Data
238 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
Proposition 2.
For all i6= jand r 4i1 j5 < +, the
weight matrix of Proposition 1satisfies
cèij <c
c+di
r4i 1 j5
11
where
maxky8dk9
c+maxky8dk90
To generate further insights about how the weight of
a connection’s signal changes with distance, we follow
with two examples:
Example 1.
For a simple example, consider a star
network g1that is centered at 1.
1
23
4
With c=1, cèequals
004 002 002 002
002 006 001 001
002 001 006 001
002 001 001 006
0
By Proposition 1, this is a “weight” matrix, suggesting
that to calculate the posterior mean of x
1
, for example,
the firm should weigh the signals 4y
1
1 y
2
1 y
3
1 y
4
5by
40041002100210025. Note further that direct neighbors
(friends) for nodes 2, 3, and 4 receive more weight
than indirect neighbors (friends of friends).
Example 2. Consider the following g1.
2
13
4
With c=1, the weight matrix is
0062 0024 0010 0005
0024 0048 0019 0010
0010 0019 0048 0024
0005 0010 0024 0062
0
Note that direct neighbors are weighed more heavily
than indirect neighbors, and that direct neighbors need
not receive equal weight. For instance, the updating
of x
2
weighs the signal from node 1 more heavily than
that from node 3.
The above examples convey the intuition that distant
signals on average receive lower weight in a firm’s
updating of beliefs about a consumer’s type. In Exam-
ples 1and 2, the weight of the signal of an individual
who is two links away is always lower than the weight
of the individual who is only one link away. In the
second example, although individual 2 is at an equal
distance to persons 1 and 3, their signals receive differ-
ent weights: Individual 3’s signal is diluted as she is
linked to individual 4.
Propositions 1and 2together imply that agents
who have lower distances to high-type consumers
can receive a more favorable posterior in credit score
assessment. Conversely, proximity to those with low
signals may hurt an individual’s assessment. Con-
sumers cannot choose their distance as we have not yet
considered active selection of friendship ties to attain
such benefits (see §3). When the weight of a friend j’s
signal (on updating the beliefs about the type of i) is
zero, this implies that either it is unknown whether
there is a friendship between the ego and j, or that
jg
0
and they are not friends. When two people are
not friends, the interpretation is that they have not met
due to the low meeting probability.
In the remainder of the paper, we assume that when
evaluating a particular i, the firm observes the complete
ego-network of i, i.e., all of the ties ij G, and receives
a signal on each of i’s friends. We collect the signals
in the vector y
i
, which we will refer to as the set
of i’s friends. Note that this imposes an additional
assumption on the previous analysis: We now require
that g
1
equals the complete set of i’s direct ties. The
posterior belief of the firm about an individual’s type
can then be stated as a special case of Proposition 1.
Corollary 1.
For the risk assessment of type i,
Pr4xiyi5is normal with precision
i=c+c
c+1ni1(5)
and mean
i=1
icyi+c
c+1X
ijG
yj0
Corollary 1states that when an individual has a
higher number of connections, the posterior about
her type will be more precise. The assessment of an
individual with a higher degree is likely to be closer
to this true type, x
i
.
6
More important, (5) implies that
the precision of a lender’s beliefs is higher than the
precision of the individual signal of i, even with data
only from the direct relationships of i. The corollary
thus states useful information about the efficiency of
risk assessment based on network data. If gathering
data on the whole network is impossible or costly,
efficiency gains can still be attained by using data from
the focal consumer’s immediate neighbors. Remember
from Proposition 2that first degree contacts of ireceive
a greater weight, and that data from longer paths in
the network are expected to receive gradually lower
weights in the beliefs about one’s creditworthiness.
6
Note that
i
=1/E44
i
x
i
5
2
y
i
5, which is the inverse of the condi-
tional mean squared error. Because in
(5)
i
is increasing in n
i
, the
conditional mean squared error is decreasing with ni.
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 239
3. Endogenous Tie Formation
We next study consumers’ incentives to form network
ties to improve their scores. This suggests that the prob-
ability that two agents will become friends depends on
their type, x
i
, and the expected utility from improving
their credit score.
Facing network-based scoring, a consumer has an
incentive not to form ties with low types to achieve a
more favorable score. Such endogenous tie formation
involves a trade-off between utility from friendship
ties with people one likes and utility from a high score.
To formally express this, we assume that the posterior
mean
i
enters the utility additively. The utility of
individual iis
Ui=X
j2 ijG
4mij xixj5+i1(6)
where the first part of the utility, 4m
ij
− x
i
x
j
5,
indicates a social utility taking into consideration
homophily. The second part, 
i
, indicates how much i
enjoys having a high posterior mean. Here, calibrates
the relative importance an individual places on receiv-
ing a high credit score versus the utility from friendship
ties with people she likes. All consumers gain utility
from their posterior credit score at rate .
7
If =0,
the individual cares only about forming friendships
for social utility. If +, then the agent cares little
about social utility but cares greatly about improving
her score.
Parameter can also be interpreted as a measure
of the desire for status. How much people care about
how highly others evaluate them (i.e., generate a
posterior about their type based on characteristics of
their network) captures the importance people place
on their position in a social structure based on esteem
that is bestowed by others, i.e., their status. Let each
consumer iadopt a tie formation rule a priori (i.e.,
before meeting j) which states that she will accept
friendship with j, iff,
(mij > ixixjfor xjxi1
mij > ixixjfor xj< xi0
The parameters
i
and
i
represent the degree to
which iis willing to accept a lower and a higher-
type individual as a friend. These parameters are not
exogenous but will be chosen simultaneously
8
and
7
To allow for the possibility that some agents may have no interest
in improving their scores when they meet others with similar types,
§4presents a discrete formulation of our matching model and we
provide a special case wherein the high types have zero utility from
credit scores.
8
Note that in this model consumers form ties simultaneously.
A model with sequential friendship formation would need to
consider, in addition to tie formation rules, rules about the order
in which consumers form ties, and would need to assume that
individual beliefs about firms’ financial assessment are consistent
with equilibrium outcomes.
optimally by consumers. Although individual iwould
prefer to be friends with others similar to her, which
was expressed in
(1)
, she may have additional utility
from adding high type or removing low type friends
due to the improvement in her credit assessment. This
suggests that consumers will form relationships with
others who have lower types only if the match value m
ij
yields sufficiently high utility.
Comparing
(6)
with
(1)
, a greater (lesser) desire to
link to individuals with higher (lower) types would
indicate that an agent should pick
i
1 and
i1
.
9
Remember that forming a friendship tie requires mutual
consent: For iand jto become friends, ishould want to
connect with jand jshould want to connect with i.
10
Thus
i
becomes irrelevant and
i
becomes the param-
eter that sets the level of mixing with others. In the
rest of the paper we omit any further references to
i
.
Consider the symmetric case where
i
=for all i.
If everyone applies the same rule with common ,
a friendship is established after meeting, iff, m
ij
>
x
i
x
j
. With the common rule in place, the probability
of becoming friends after meeting becomes
Pr4xixj1 5 =exixj2/20
Compared with the tie formation probability in an
exogenous setting (given by Equation
(2)
), consumers
will be more selective in linking to others. Fewer ties
will be formed in the endogenous case.
3.1.
Credit Scoring with Endogenous Tie Formation
In this section we complete the analysis of endogenous
relationship formation using an equilibrium concept.
We use 41 
i
5to denote the common rule with the
possible deviation of i. The expected utility of ibecomes
Ɛ4Uixi1 1 i5=ƐX
j2 ijG
mij xixjxi1 1 i
+Ɛ6i45 xi1 1 i71 (7)
9
The benefits of network-based scoring are measured by the differ-
ence between one’s expected posterior mean and one’s individual
signal. This difference increases in
i
(i.e., the rate at which the
individual rejects ties with low-type friends) and decreases in
i
(i.e.,
the rate at which the individual adds high-type friends). Choosing
i
>1 is worse than
i
=1 because it decreases both the expected
score benefit and the social utility of a tie. Similarly, choosing
i
<1
rather than
i
=1 would decrease the utility from a higher credit
score and the social utility of a well matching tie. Together, these
two arguments imply that: (i) any symmetric equilibrium derived
with restrictions is still an equilibrium even if we allow
i
>1 or
i
<1; and more important, (ii) there is no symmetric equilibrium
where  > 1 or  < 1.
10
If we allowed consumers to form friendships without mutual
consent, then everyone could link to anyone to improve her own
score. The benefits of network-based scoring would be limited since
a connection to a high type would not be informative of one’s type.
Wei et al.: Credit Scoring with Social Network Data
240 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
where
i
45 =
Ɛ
4x
i
y
i
1 5 is the lender’s posterior. Each
person calculates her expected utility from being in
a friendship network before the network is formed,
implying that expected utility will depend on the
friendship rule 41 
i
5adopted. The expectation
Ɛ
4
·
5is
taken before meeting others. We first display a version
of Corollary 1under a symmetric rule. In the following,
when iconforms with the common rule, we omit i
in the expectation conditionals.
Lemma 1.
Under a common relationship formation
rule , the posterior Pr4xiyi1 5 is normal with precision
i45 =c+c
c+ni1(8)
and mean
i45 =1
i45 cyi+c
c+X
j2 ijG
yj0
Compared to Corollary 1, in Lemma 1,
i
and
i
are
scaled by the selection rule . When borrowers are
more selective in forming friendships with lower types
(when is higher), a financial institution will put more
weight on friends’ signals to update beliefs about the
type of an individual (i.e., to calculate the posterior).
In broad terms, this selectivity addresses our second
main research question: When consumers react to an
environment with network-based scoring, will scores be
less or more precise? In other words, will assessments
based on network data yield a better assessment? Our
answer to this question is a qualified yes. We explain
the mechanism through which this improvement can
be achieved via a lemma and a proposition.
Lemma 2.
The expected degree under a symmetric rule
satisfies
Ɛ4ni5 =N
0(9)
A lower rate of mixing between types (a higher )
results in a smaller number of ties per person. Ties
are formed only between those who are highly similar
to each other in type. Such self-selection reduces the
expected number of connections among consumers but
increases the information value of any single link and
the signal it conveys. The net effect on the formation
of ties is not yet clear. We address it next.
Proposition 3shows that, under the limits of S1  1
and q, there is a symmetric equilibrium
where
i
=
, which maximizes (7) for any individual i,
given that =
is the common rule adopted by
everyone else. In other words, there exists a common
tie formation rule from which no individual wants
to deviate, and with which the lender’s posterior is
consistent.
Proposition 3.
For 0<  < N , there exists at least one
symmetric equilibrium, and any symmetric equilibrium
must satisfy
1< <1
N2
0(10)
In words, when networks are created endogenously, con-
sumers are more selective in accepting friendships in equilib-
rium; the upper bound on selectivity is determined by how
much importance consumers put on a high credit score and
the expected degree in society.
Corollary 2. If cN /4N 5, then
Ɛ6i457 > Ɛ6i415=171
where
i
Precision
4x
i
y
i
1 5. On average, the network-
based score becomes more accurate when consumers are
averse to connecting with lower type peers. Otherwise, if
c1, then
Ɛ
6
i
4
5
7 <
Ɛ
6
i
415=17. On average, the
network-based scores are less accurate.
Social credit scoring changes consumer incentives to
form relationships in two directions. Compared to the
exogenous setting (=1), in the endogenous setting
with =
>1, relationships are formed more selec-
tively. This has several consequences. First, relationships
are more strongly homophilous, that is, consumers
form relationships with others who are closer to their
own type. For lenders, this first effect has a positive
impact on network scores: The accuracy of their assess-
ment will improve as a result of obtaining signals from
closer types. Network-based scores will be even more
precise due to data from others who are expected to be
more similar in type.
Second, consumers will reject friendship ties with
others who have lower types. This implies that ego-
networks will shrink (Lemma 2). This second effect has
a negative impact on network scoring accuracy. The
two forces, i.e., homogenization and the shrinking of
ego-networks, work against each other. The net effect
is ambiguous.
Corollary 2identifies a further condition to charac-
terize situations in which the net effect is positive and
network score accuracy improves with endogenous
tie formation. For some sufficiently small , lenders
may benefit from using network-based credit scoring
as it becomes even more precise with self-selection of
consumers to form networks to improve their credit
scores. The improvement in precision is conditional on
consumers placing sufficiently low weight on financial
outcomes relative to the utility derived from social con-
nections. Paradoxically, when consumers care greatly
about their score or status, they may reduce the size
of their social networks so much that network-based
scoring becomes less reliable in equilibrium.
Can societal tissue make network-based scoring more
effective in some societies than others? Corollary 2
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 241
states that the parameter range under which network-
based scores are more precise is larger when the average
number of friends is higher. If everything else remains
the same, the benefits of network-based scoring may
be greater in societies where people maintain a large
number of connections, which are likely to be societies
with collectivist cultures (Hofstede 2001). Interestingly,
several start-ups turning to social scoring have been
growing in countries known to have collectivist cultures
where the density of relationships is generally higher.
Lenddo, for instance, operates in Mexico, Colombia,
and the Philippines, and reports that Mexico is its
fastest growing market.11
3.2. Lending Rates with Endogenous
Network Formation
We now relate our scoring formulation to lending
rates, i.e., access to finance at the intensive margin.
The discussion in this section implies that network-
based scoring affects the rates at which consumers can
borrow, even if they would qualify to receive credit
using the individual score system. For simplicity and
concreteness of discussion, we specify the perceived
probability of repayment of credit by consumer i,P
i
as
Pi=1
1+ei1
which increases from 0 to 1 as the lender’s assessment
of the borrower’s posterior mean,
i
, increases from
− to +. Consider a risk-neutral lender who earns a
rate of r
o
from a non risky investment. Let r
i
be the
lending rate to be charged to consumer iwith type x
i
.
The firm determines the rate by solving
Pi41+ri5+41Pi5·0=1+ro0
This formulation takes into account not only the
expected creditworthiness of a consumer,
i
, but also
the outside options of the lender, r
o
. For r
o
=0, the
borrowing rate for iequals the log odds of default
versus repayment
ri=1Pi
Pi=ei0(11)
As the consumer’s likelihood of a default increases,
she faces a higher borrowing rate. Note that the finan-
cial utility of a consumer given in Equation
(6)
can
be derived by assuming that the lending rate enters
the utility through
log
4r
i
5. If lending rates can be
interpreted in the context of economic opportunities
available to consumers, then a consumer with a better
network score will be likely to receive a loan on bet-
ter terms. This links network-based credit scores to
financing access at the intensive margin.
11
http://techonomy.com/2014/02/lenddos-borrowers-mexico-philip
-pines-get-credit-via-facebook/.
4.
Role of Signals from Social Contacts
In the preceding sections, we developed a model with
continuous types and assumed that every individual
had identical incentives to improve her credit score.
In reality, there may be differences among consumers
about how much utility they can gain from improving
their credit score conditional on their type. In this
section, we introduce a discrete version of the model
to allow for this possibility. The discrete version allows
us to analyze in greater detail how the firm uses
signals of low versus high type friends when assessing
a consumer’s creditworthiness. This enables us to
disentangle and contrast the role of high- and low-
type contact signals in the network.
4.1. Credit Scoring and Tie Formation with
High and Low Types
Consider a society with two types of borrowers: high
types (h) and low types (l) where the prior is uniform,
with
Pr
4x
i
=l5 =
Pr
4x
i
=h5 =
1
2
. Whereas high types
have a low risk of credit default, low types have a
higher risk. With probability , any two consumers
will meet. On meeting, they learn each other’s type
and their match value m
ij
>0, which is i.i.d. across
pairs, with positive distribution density f. For i, the
utility of becoming friends with jis
mij 18xj6=xi91(12)
where the disutility of becoming friends with a dif-
ferent type is normalized to 1. The utility of not
becoming friends is 0. Given the specification, the
probability that two same-type consumers will become
friends conditional on meeting is 1, while the proba-
bility of two different types becoming friends is p
Pr 4mij >1
5 < 1. Hence the network features preference-
based homophily. We retain the assumptions S+
and 0 and set S =Nfor some positive number N.
With the discrete formulation, the expected number of
friends for any type is
1
2
S41+p5: Increasing the degree
of homophily (a lower p) reduces the expected number
of friends.
Network-based Score. We assume that the lender
may observe a signal y
i
which is 1 or 1, indicating a
low or high type. The signal is credible but incorrect
with probability  <
1
2
. This implies, for example, that
if the lender receives a signal from an l-type consumer,
with probability 1 it observes y
i
=1 and with the
remaining probability it observes y
i
=1. Let y
i
be the
collection of signals from iand the friends of i. We
first explore how the firm perceives the probability of
an agent being of h-type conditional on the structure
of her social network.
Wei et al.: Credit Scoring with Social Network Data
242 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
Lemma 3.
In evaluating the type of i, the posterior for
her to be high type is
Pr4xi=hyi5=1+
1yip +415
+415p Li
·+415p
p +415 Hi1
1(13)
where y
i
is the signal observed for agent i,H
i
is the number
of friends with high signal, and L
i
is the number of friends
with low signal.
Lemma 3suggests that low- and high-type signals
observed for a consumer’s social connections affect the
lender’s assessment of that consumer ’s creditworthi-
ness in different directions. Note that 4 +4
1
5p5/
4p +4
1
55 < 1 and 4p +4
1
55/4 +4
1
5p5 > 1.
Thus, high-type signals increase the likelihood that
an agent will be categorized as high type, whereas
low-type signals reduce this likelihood. Figures 1and 2
illustrate how
Pr
4x
i
=hy
i
5changes with
Hi
and
Li
.
The firm would prefer to extend credit to l-types with a
higher number of h-type connections, if everything else
remained the same. This suggests that in a given net-
work where l-types are fairly segregated from h-types
due to homophily, l-types who are bridges between
l-types and h-types may be favored by the lender
(compared to l-types surrounded by the same-types).
Put differently, in-group centrality of l-types will hurt
their financing opportunities whereas between-group
centrality will improve them.
Endogenous Network Formation. Equation
(13)
applies only when tie formation is based only on social
utility and excludes the credit score (=0). We now
consider the case wherein consumer utility includes
credit score. We construct the utility of a borrower
similar to §3.2.P
i
is the firm’s assessment of borrower
i’s probability of repayment, which we may take as the
posterior probability that iis a high type. The lending
rate for borrower iis again given by r
i
=41P
i
5/
Pi
.
Figure 1 Pr4xi=hyi5vs. Hi4 =0041 p =0061Li=101 yi= 15
10 20 30 40 50
Hi
0.4
0.6
0.8
Pr (xi = hyi)
Figure 2 Pr4xi=hyi5vs. Li4 =0041 p =0061Hi=101 yi= 15
10 20 30 40 50
Li
0.1
0.2
0.3
0.4
0.5
0.6
Pr (xi = hyi)
Because the lending rate enters the utility additively
through xilog4ri5, we have
Ui=X
ijG
4mij 18xi6=xj95+xiRi1(14)
where R
i
log
4P
i
/41P
i
55 is the log odds of repay-
ment. A higher R
i
implies a lower risk of extending
credit to an individual. Furthermore, the parameter
xi
calibrates the importance of improving access to financ-
ing. Note that this formulation allows low and high
types to have two different levels of financial need.
When
h
< 
l
, high types’ utility is less dependent on
improving financing compared to the low types. When
h=l
, both types have identical financial needs. This
exposition mirrors our continuous-type model, except
that different types may weigh financial concerns
(represented by Ri) differently when forming ties.
Let consumers choose tie formation rules before the
meeting process. Intuitively, given the network-based
score, consumers will be more selective towards low
types and less selective towards high types. Because of
the simplicity of the discrete-type model, the friend-
ship rules we allow are general and flexible. More
specifically, two high types will continue to form a tie
with probability 1 after they meet. As to the friendship
between low types, a low type iwill set a threshold
i
and accept another low type j, iff
mij i>00
Because friendships are formed based on mutual con-
sent, a friendship between a high and low type can
only be formed when the high type accepts friendship.
A high type iwill accept a low type j, iff
mij i>00
As in the continuous case, social credit scoring makes
consumers wary of forming ties with low types. In the
discrete case low and high types are allowed to differ
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 243
in their need for financing, and low types face discrimi-
nation or social rejection from both low and high types.
This result is interesting since discrimination is often
thought to take place between groups or is believed to
be exercised by one group on another. Interestingly,
within-group discrimination arises endogenously with
the use of the network-based scoring for the low types,
in addition to the more common between-group dis-
crimination. Within-group discrimination may make
the surviving within-group ties more valuable, as we
will see next in Lemma 4.
We define a symmetric profile characterized by two
thresholds, i.e., 41 5, where
i
=for all low type i
and
i
=for all high type i. Let 41 1 
i
5denote a
symmetric profile except for possible deviation of a low
type i. Let
Ɛ
4U
i
l1 1 1 
i
5represent the expected utility
before the meeting process for a low-type
individual i
Ɛ4Uil1 1 1 i5=ƐX
j2 ijG
mij 18xi6=xj9l1 1 1 i
+xi=lƐ6Ri41 5 l 1 1 1 i71
where the lender’s posterior assessment is P
i
41 5 =
Pr
4x
i
=hy
i
1 1 5, consistent with the profile. Similarly,
Ɛ
4U
i
h1 1 1 
i
5is the corresponding expected utility
of a high type. Using this utility formulation, we first
lay out the lender’s prior about consumer types in
Lemma 4.
Lemma 4.
Let 41 5 be the symmetric criterion, p
Pr
4m
ij
> 5 the probability of two l-types forming a tie,
and p
Pr
4m
ij
> 5 be the probability of a tie formation
between hand ltypes. Then the posterior probability of i
being high type is
Pr4xi=hyi1 1 5
=1+
1yip+415p
+415pLi
·p+415p
p+415 Hi
e41/25N 41p51
1(15)
where H
i
is the number of friends with high signal, and L
i
is the number of friends with low signal.
Lemma 4presents a slightly different result com-
pared with Lemma 3in decomposing the contribu-
tions of high and low signals. When consumers form
ties endogenously, the probability of a favorable risk
assessment, P
i
41 5 (or the corresponding R
i
41 5), is
increasing in the number of high signals (i.e., H
i
) for
any level of p
. By contrast, R
i
41 5 increases in the
number of friends with low signals (i.e., L
i
) only if p
is sufficiently small,
12
and decreases in L
i
otherwise. In
12 Precisely, when p< p+4/415541p5.
other words, when l-types are very selective in forming
ties among themselves (p
low), then in-group ties help
to achieve a more favorable assessment from the firm,
as low types have fewer ties than high types and a
large friendship circle becomes a conspicuous signal,
suggesting that one is more likely to be a h-type. That
is the reason low-type signals can increase the high
type perception, Pi41 5. Yet when low types are less
selective towards other own types, the negative signal
begins to dominate the positive impact from size of
social circle and Pidecreases in Li.
We now turn to the impact of how selective low types
are in forming ties among themselves, characterized
by selection rule .R
i
41 5 is not always decreasing
in L
i
. In particular, we can define a value
¯
45 such
that the expected effect of an additional low type friend
on R
i
41 5 is positive, iff  >
¯
45. Formally,
¯
can be
defined by
p+415p¯
+415p1p¯
+415p
p+415
=10
It can be easily shown that 0 <
¯
45 < . We show,
in detail, how a consumer’s odds of a favorable risk
assessment vary with respect to the selectivity of l-types
in Lemma 5.
Lemma 5.
The expected log odds for a low type under a
common tie formation criterion 41 5,
Ɛ
6R
i
41 5 l1  1 7,
is strictly quasi-concave in and achieves its maximum at
¯
45. Furthermore, 0<
¯
45 < , i.e., the selectivity among
low types which results in the most favorable risk assessment
for a low type, is lower than the selectivity of high types
towards the low types.
Figure 3plots a numerical example for the expected
log odds of repayment as a function of . Note that
very high or very low levels of within-group selectivity
Figure 3 Expected Log Odds of Repayment vs. Selectivity
0
–2.8
–2.6
–2.4
–2.2
–2.0
E(Ri(,)l,,)
–1.8
–1.6
–1.4
1234567
()
Note. =002, fâ 43125,=5.
Wei et al.: Credit Scoring with Social Network Data
244 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
result in lower expected odds, whereas medium levels
of selectivity among low types yield the most favorable
risk assessment for them. The inverse U-curve rela-
tionship stems from two competing forces that shape
low-type borrowers’ chances of receiving a loan. As
the level of selectivity begins to increase from zero, the
expected assessment initially improves. Consumers
benefit from disassociating themselves from l-types,
thus improving the appearance of being an h-type.
As selectivity increases further, however, a second
and competing effect starts to dominate: Consumers’
ego-networks begin to shrink extensively. Recall that
the size of a borrower’s network becomes a conspicu-
ous signal of her type when consumers can form ties
endogenously. Extreme selectivity leads to a smaller
number of ties and so reveals the true low type of a
borrower, thus reducing her chances of a favorable
credit assessment.
Lemma 6.
The expected log odds for a low type is strictly
decreasing in for  < . Higher levels of selectivity of high
types towards low types reduce the chances of a favorable
assessment for low types.
The lemma states that, unlike the within-group exclu-
sion that helps low types to some degree, between-type
exclusion strictly reduces their chances of improving
their financial outcomes. As high types exclude lower
types from their networks, the latter’s chances of a
favorable assessment from the firm decreases, resulting
in further hardship for this segment.
We will look for a symmetric equilibrium where
no consumers have ex-ante incentive to deviate, and
the company’s posterior is consistent with their equi-
librium behaviors. More precisely, 4
1 
5is a sym-
metric equilibrium if, for all i,
Ɛ
4U
i
l1 
1 
1 
i
5(or
Ɛ
4U
i
h1 
1 
1 
i
5, depending on i’s type) is maximized
by
i
=
(or
i
=
). While ensuring that there will
be no unilateral deviation, a Nash equilibrium in social
networks does not necessarily rule out mutual improve-
ment in the utility of consumers. For example, a very
high acceptance criteria such as
=  can always
be part of an equilibrium because if no l-type accepts
another l-type, an l-type would have no incentive for
unilaterally deviating from this threshold. We remove
“unintuitive” equilibria similar to the one described
from consideration. Formally, we will not consider
tie formation criteria 4
1 
5an equilibrium if there
is another profile 4
∗∗
1 
5such that (i) low types are
better off, and (ii) given that high types choose
and
every other low type chooses
∗∗
, a low type is willing
to set her criterion
∗∗
as well. Similarly, we do not
consider 4
1 
5an equilibrium if there is a profile
41 ∗∗5with unintuitive properties alike.
Note that from Lemma 5, for any equilibrium,
< 
should hold. In words, when both low and high types
need financing, regardless of how dire the needs of
the low types are (i.e., independent of the value of
l
),
low types will face within and between group exclusion.
More important, since high types are more successful
in tie formation, they can afford to be selective in
forming friendships. The low types, by contrast, cannot
be picky choosers: If they set the friendship threshold
too high, they find themselves on the downhill side of
the expected log odds curve (Figure 3). They would
achieve a higher score and higher social utility by
being less selective. As a result, the within-group
discrimination against low types is always lower than
the between-group discrimination against them. This
result is formally stated in Proposition 4.
Proposition 4.
Suppose
h
1 
l
>0. In any symmetric
equilibrium 4
1 
5, we have 0< 
<
¯
4
5and
>1,
i.e., when both types gain utility from improving their credit
scores, the within-group discrimination among low types is
always lower than the between-group discrimination against
them.
In summary, two forces influence the network-based
score in equilibrium to be more or less diagnostic for
detecting a low type. Compared with the scenario
before people react, higher exclusion among low types
make social network-based scoring less powerful, by
Lemma 5. Similarly, higher levels of exclusion on low
types by high types increase the accuracy of the scores
by Lemma 6.
4.2. Special Case: Lower Financing Needs for
High Types
Until now, we have focused on an environment where
the high types need financing. In reality, it is often the
case that the need for financing (i.e., obtaining credit
or a loan) is markedly more severe for low types. To
address this possibility, we provide the outcomes from
the special case when
h
=0. Note that, by continuity,
this implies that similar results would hold if
h
is
a very small positive number. Note also that when
h
=0, is no longer material, and high types form a
tie with low types only when mij >1 (i.e., =1).
Proposition 5.
When
h
=0, there exists a unique
equilibrium among low types such that 0< 
<
¯
415. When
only low types gain utility from improving their credit
scores, there exists within-group discrimination among low
types in equilibrium. This discrimination level is lower than
the preference of high types to avoid forming relationships
with low types due to mere homophily.
Proposition 5suggests that when high types put no
or very little weight on access to financing, they may
reject many social ties with low types due to homophily.
In addition, due to financial concerns, l-type consumers
are systematically excluded even from the networks
of others similar to them. Put differently, existing
financial inequality breeds within-group discrimination
and social isolation among those of lower type and
greater need.
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 245
4.3. Explicit Discrimination Against Low Types
We have shown how strategic discrimination against
low types may emerge endogenously even in the
presence of nonstrategic homophily among low types.
To extend the discussion on discrimination, we analyze
an environment with exogenous discrimination against
l-types. To formally express such discrimination, we
construct the utility for iof becoming friends with jin
a manner similar to but different from the specification
in Equation (12)
mij 18xj=l90
Keeping the discrete matching formulation with this
slight modification, the probability that two h-type
individuals will become friends conditional on meeting
is 1 and the probability that any other type of pairs
will become friends is p
1
Pr
4m
ij
>15 < 1. Social utility
is penalized whenever one becomes friends with an
individual who is an l-type.
Parallel to Lemma 3, the following lemma gives the
posterior before consumers strategically form their
social ties to obtain better network-based scores. Note
that mathematically the following lemma is a special
case of Lemma 4where p=p=p1.
Lemma 7.
Let p
1
Pr
4m
ij
>15be the probability of
formation of a tie with at least one low type. Then,
Pr4xi=hyi5=1+
1yip1
+415p1Li
·p1
p1+415Hi
e41/25N 41p151
1(16)
where H
i
is the number of friends with a high signal, and L
i
is the number of friends with a low signal.
The lemma says that having a friend with a low
signal actually improves one’s score. When explicit dis-
crimination is present, the expected number of friends
varies for each type: For a high type, the expected
degree is
1
2
S41+p5, whereas for a low type it is Sp.
Similar to the endogenous rise of discrimination, a
larger social network is a conspicuous signal. A con-
sumer with a larger network emits a stronger signal
that she is a high type. Because in expectation low
types have a smaller social circle, any tie becomes a
signal of being high type.
What happens when both exogenous discrimination
and endogenous tie formation are at work? Lemma 7
implies that consumers will be less selective towards
low types in an attempt to obtain better scores. Similar
to the thresholds we defined for the homophily case,
we let low types choose a criterion
i
1 towards their
same type, and let high types choose
i
1 towards
low types. High types continue to form ties with
probability 1 upon meeting. A tie between two different
types forms only when the high type accepts the low
type. It is not difficult to see that Lemmas 5and 6can
be stated here without change. Furthermore, a result
can be derived that corresponds to Proposition 4.
Proposition 6.
When low types are exogenously dis-
criminated against and l1 h>0, in a symmetric equilib-
rium, ¯
45 < <1and <1.
5. Effort to Become a High Type
Our results thus far have relied on the assumption
that consumers are endowed with types that cannot be
changed. In other words, we assumed that there is no
social mobility. Although some type indicators (e.g.,
family, race, birthplace, country of origin) cannot be
altered, other potential indicators, such as occupation or
financial discipline, can be improved if low types exert
effort (e.g., by investing in education). In this section,
we extend our discussion to allow for this possibility.
An array of factors may force l-type consumers to exert
effort, but we will focus on factors endogenous to tie
formation such as the reduction of borrowing costs
and the threat of social exclusion.
We model the mechanism in the following fashion.
Consider a friends network Gamong land htype
consumers. Let G
l
denote the subnetwork among the
low types. Furthermore, let H
i
denote the number of
h-type contacts of a low-type i, which are collectively
represented with the vector Hfor all of the low types.
Similarly, let L
i
denote the number of l-type contacts of
a low-type i. Each low-type consumer may exert effort
e
i
0 such that, with probability e
i
, she will become a
high type. Note that the effort therefore projects types
of possible future contacts. We assume that given the
network and parameters of our model, e
i
1 for all
low-type i. High-type consumers exert zero effort and
remain high types.
The utility that a low-type individual iderives from
exerting effort eiis composed of two parts
Ui4e1 G5 =X
j2 ijG
8mij 18xj=l941ej59 +ui1(17)
where
ui=aeibei
2Hi+X
j2 ijG1 xj=l
ejei0(18)
The term in curly brackets in Equation (17) captures
consumer i’s expected social utility under the assump-
tion of explicit discrimination (§4.3) and exertion of
own and friends’ effort. Given the effort of a friend e
j
,
there is 1 e
j
probability that jwill remain a low type,
in which case i’s utility from forming ties with jwill
be discounted by a unit normalized to 1.
The term u
i
expresses the nonsocial benefits and costs
of exerting effort. First, term ae
i
captures the expected
intrinsic benefits of becoming a high type. Second, the
Wei et al.: Credit Scoring with Social Network Data
246 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
cost of effort is captured with the marginal cost be
i
/2
that is increasing in effort. Third, under social network-
based scoring, a (potential) high-type friend jhas a
positive effect on i’s credit score and thus reduces i’s
financing burden. We formally express this network
effect by allowing the marginal cost of effort for ito
decrease in the number of high-type friends she has and
in the efforts of her low-type friends to become high
types, at rate  > 0. Alternatively, b4H
i
+
PijG1xj=l
e
j
5e
i
can be thought of as an interaction term, representing
how the return to one’s own effort (e
i
) is expected to
be amplified by the number of friends one expects will
be considered high type. Some investors, for instance,
may prefer friends who are also invited to participate
in exclusive investment opportunities (Bursztyn et al.
2014). In a very different setting, one is likely to gain
admission to an exclusive bar or dance club if oneself
and the rest of one’s party are attractively dressed.
It is important to make two notes here. First, the
derivation of the functional form of u
i
is a reduced-
form approach to motivate complementarity between
one’s effort and the effort of her friends. It is possi-
ble to derive this form of complementarity based on
the results provided in the earlier sections. (In the
online appendix (available as supplemental material at
http://dx.doi.org/10.1287/mksc.2015.0949), we offer a
more detailed description of how Equation (18) can be
derived through this route.) As demonstrated in §4,
under network-based credit scoring with non-zero
financing needs for both types, low types will face
within-group and between-group discrimination. Under
such pressure, l-type consumers would exert effort
to increase their social and credit scoring utility from
friendships. The benefits of exerting effort depend on
the expected number of low- and high-type friends.
Second, it is possible to consider alternate specifica-
tions of social utility. For instance, we could also inves-
tigate an environment with pure homophily instead of
discrimination, in which case Equation (17) would be
replaced with
Ui4e1G5 =X
j2 ijGmij ei18xj=l941ej5
41ei518xj=h9 +18xj=l9ej+ui0(19)
In an environment with homophily, consumer iwill
become a high type with probability e
i
, in which case
there will be a disutility for a tie with consumer j
who, after exerting effort e
j
, remains a low type (which
happens with probability 1 e
j
). With probability 1e
i
,
consumer iwill remain a low type, in which case
she will face a disutility from ties with high types
(including low types who become high types after
exerting effort ej).
Next, given the utility form in (18), we will first
derive the optimal effort level in a given network.
5.1. Effort in an Exogenous Network
We are interested in the Nash equilibrium under which
consumers simultaneously choose their efforts when
the network is exogenously given. Proposition 7sum-
marizes the optimal level of effort for a consumer
conditional on her social network, following Ballester
et al. (2006).
Proposition 7.
Let A
l
be sociomatrix (i.e., the adjacency
matrix) of Gl.
(i)
Under a discriminating social utility, if the largest-
magnitude eigenvalue of A
l
is smaller than
1
, then the
equilibrium effort is
e=4IAl514ab1+H5
=4I+Al+2A2
l+···54ab1+H50 (20)
(ii)
Under a homophilic social utility, if the largest-
magnitude eigenvalue of A
l
is smaller than 2b
1
+
1
,
the equilibrium effort is
e=6I42b1+5Al7164a +HL5b1+H7
=6I+42b1+5Al+42b1+52A2
l+···7
·64a +HL5b1+H70 (21)
Proposition 7states that the effort exerted by con-
sumers to improve their score relies on several factors.
A discriminatory environment and an environment
with homophily differ in the role of the low types in
inducing effort. In both environments, a consumer with
a higher number of high-type friends is likely to exert
more effort, as her overall cost of borrowing is lower.
In an environment with discrimination, if two l-type
consumers are connected to the same number of h-type
friends, the one with a higher number of l-type friends
is incentivized to exert more effort. This is perhaps sur-
prising, as sufficiently high within-group connectivity
can be a stronger motivator of effort. By contrast, in
homophily, increasing proportions of low-type friends
can reduce effort due to enhanced social utility when a
consumer with low-type friends remains low type with
low effort.
The expression for the equilibrium level of effort
given in Equations 4205and 4215is a form of Bonacich
centrality. The effort exerted by an agent to improve her
credit score is proportional to her Bonacich centrality
measure, which is the “summed connections to others,
weighted by their centralities” (Bonacich
1987, p. 1172
).
With a discriminating social utility, a consumer who is
at the center of a social network is likely to be exposed
to higher positive network effects, and therefore may
exert greater effort. As a result, consumers who are
more central in the network are more prone to social
mobility when there are complementarities. In an
environment with pure homophily, there will be two
conflicting forces determining centrality and social
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 247
mobility relationships. First, being central in a network
of high types and low types who exert effort can
increase a consumer’s chances of social mobility. Second,
if a low-type consumer is central among other low
types who exert little effort, she will reduce her effort
to “fit” and be similar to her network to enhance
her social utility. Therefore, in tie formation based on
homophily, it is possible for central low types to exert
low effort leading to permanent low class membership
and financial hardship.
5.2. Effort with Endogenous Network Formation
Among Low Types Under Discriminating Utility
As we have specified in
(17)
(19)
, the friendship utility
of a friend of idepends on the effort that iwill exert.
Hence the effort of iplays an important role in her
friends’ network formation. Moreover, in the last sec-
tion, we saw that i’s effort depends on her position
in the network. This mutual dependence between the
network position and effort suggests the possibility of
multiple stable situations. With discriminating social
utility, for example, in one society, people may exert
low effort, and as a result, may become sparsely con-
nected. This in turn gives little incentive for them to
exert effort. Conversely, in another society, people may
exert high effort and thus may become more densely
connected, reinforcing their high-effort behavior.
To further explore how effort mitigates the likelihood
of exclusion, we consider a two-stage game under
the discrimination environment. In the first stage,
consumers choose friends, and friendships are formed
bilaterally. In the second stage, consumers exert effort.
Let e
4G5 be the Nash effort for a given network G,
which is characterized in Proposition 7. The first-stage
reduced form utility for idepends on Gonly
Ui4e4G51 G50
We look for pairwise-stable networks Gunder U.Gis
pairwise stable if (i) for any ij G, we have both
U
i
4G5 > U
i
4G ij5 and U
j
4G5 > U
j
4G ij5; (ii) for any
ij yG,U
i
4G5 U
i
4G+ij5 or U
j
4G5 U
j
4G+ij5. Example 3
provides an application of different stability outcomes
in equilibrium.
Example 3.
Consider a society with four low-type
consumers and explicit discrimination, and assume
that a=1, b=5, and m
ij
=
1
2
for all i1 j . Let b =
1
5
. It
can be easily verified that the empty network and the
complete network are pairwise stable. For the empty
network, each person exerts effort
1
5
and obtains utility
of
1
10
. For the complete network, each person exerts
effort 1
2and has utility 5
8.
The example demonstrates that the empty network
is pairwise stable because everyone exerts very low
effort. A single link between a pair will not generate a
sufficiently large change. The disutility of friendship
with a low type (which is normalized to 1) prevents
any pair from becoming friends. Moreover, a complete
network is pairwise stable because everyone exerts
reasonable effort. The effort reduces the disutility of
friendship between low types; the friendship utility
between any pair is exactly zero. Breaking any one
link increases the costs of effort for the pair; thus, they
will decrease their efforts. This leads to higher costs
for their friends, and eventually everyone’s effort will
decrease. As a result, everyone receives less utility
from the friendship and effort.
Overall, the example suggests that the network struc-
ture in different societies may facilitate social pressure
to exert effort at different rates. In particular, in soci-
eties where network structure is sparse, social pressure
is expected to be less effective and social mobility
may remain limited. By contrast, in denser societies,
social pressure can be more effective, motivating higher
levels of social mobility. The difference suggests that
network-based scoring practices are expected to reach
different levels of success in different societies, and
that the performance is conditional on the network
structure of society.
6. Extensions
6.1. Uncertainty About Friends’ Types
In our main model, the underlying assumption was
that upon meeting, consumers learn about each others’
types with certainty. In reality, types may be observed
with some noise. Consider the case wherein consumers
meet others but observe their types imperfectly. Let
consumer iobserve a signal of x
j
upon meeting with j,
which is correct with probability 1 with 0 <  <
1
2
.
This implies that the added utility from homophily
relies on how the uncertainty about the other’s type
is resolved: Expected social utility is m
ij
if the
signal is the same as one’s own type, and m
ij
1+
otherwise. Respectively, probabilities p
P4m
ij
> 5 and
p
1
P4m
ij
>15 define how likely two consumers
are to become friends upon meeting.
Compared with the benchmark model, the added
uncertainty implies that ties will be less informative
for the firm to predict a consumer’s type. To see this,
first note that under this formulation, the probability
that two consumers of the same type will form a tie
upon meeting is
qs4152p+4141525p11(22)
and that two consumers of opposite types will form a
tie is
qd2p+4125p10(23)
Using these probabilities, we can formulate how the
firm will assess a borrower’s type to be high as given
in Lemma 8.13
13 The derivation of Lemma 8follows the derivation of Lemma 4.
Wei et al.: Credit Scoring with Social Network Data
248 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
Lemma 8.
When consumers learn about each others’
types with uncertainty,
Pr4xi=hyi5=1+
1yiqd+415qs
qs+415qdLi
·qs+415qd
qd+415qsHi1
1
where H
i
is the number of friends with high signal, and L
i
is the number of friends with low signal.
We are interested in how the presence of noise in
detecting each other’s true types in social relationships
may influence the firm’s ability to rely on social credit
scores. We compare Lemma 8with Lemma 3. Since
p
1
< q
d
< q
s
<1, 1 < 4q
d
+415q
s
5/4q
s
+415q
d
5 <
4p
1
+4155/4 +415p
1
5and 4 +415p
1
5/4p
1
+
4155 < 4q
s
+415q
d
5/4q
d
+415q
s
5 < 1. In words,
signals from contacts carry less weight in forming
beliefs about a consumer’s type when types cannot be
perfectly observed in friendship.
There are two observations related to this finding.
First, the level of information sharing between con-
sumers can change the appropriateness of a social
network for credit scoring. For example, if an online
network allows consumers to frequently communicate
and exchange in-depth information, this may positively
influence the efficiency of credit assessment by reducing
the uncertainty about friends’ types. Second, the ability
of peers to observe each other’s types may correlate
with the characteristics of the network, including tie
strength. For example, parameter could reflect the
strength of ties correlating with the ability to convey
complex or subtle information (Van den Bulte and
Wuyts 2007, pp. 71–72) and hence with one’s ability to
observe a friend’s type. Next, in §6.2, we discuss this
in detail.
6.2. Friendship Formation and Strength of Ties
In §6.1, we maintained the assumption that all relation-
ships carry equal information and pointed out that the
informativeness of a link may relate to tie strength.
We will adjust the earlier model slightly to extend the
earlier discussion.
Specifically, we assume that consumers can form
weak and strong ties, and that they learn about others’
type with certainty only if they have strong ties with
them. After meeting, a match value m
ij
>0 and the
tie type are randomly determined. If the tie is strong,
consumers obtain the utility m
ij
1
8xi6=xj9
by forming a
friendship. If the tie is weak, types remain unknown,
and the social utility of forming a tie is m
ij
. Param-
eter captures the disutility from forming a weak tie.
Because weak ties do not carry information about
the type or the type difference between the ego and the
friend, a firm cannot use them to update its posterior
belief about a consumer’s type. Only the strong ties
will reveal information about a contact’s type and
become eligible for the firm to use to determine the
social score.
The general implication is straightforward. Because
strong ties are more homophilous than weak ties and
since they provide a greater ability to learn about one’s
contacts, the accuracy of social scoring increases with
the relative prevalence of strong versus weak ties.
6.3. Effort to Enhance Probability of Meeting
High Types
In §5, the model was built such that the low-type
consumers exerted effort to climb social ladders by
improving their type. Under some circumstances, con-
sumers cannot change their type but can exert effort to
increase the probability of meeting high types. Net-
working is an example of such directed effort. In this
section we explore this possibility, which also allows
us to endogenize the probability of meeting between
two consumers.
We use the settings of the discrete-type model in §4
and allow individual ito choose an effort level e
i
.
Conditional on the effort exerted, the individual is likely
to meet another person randomly with probability
4M/S5e
i
, where Mis a constant that calibrates the
chance of meeting another person proportional to the
effort exerted in a society of size S. A meeting between i
and jhappens when either of the two “runs into”
the other. Suppose a common effort eis exerted by
everyone but i. Then the expected number of meetings
for ibecomes
S11ei
M
S1eM
S=ei+eeieM
SM0
When S→ +, the expected number of meetings
increases to 4ei+e5M.
First consider the scenario of exogenous tie formation
where an individual’s utility depends only on the social
utility from friendships. Recall that, upon meeting
with j 1 i always forms a tie if jis of the same type, and
forms a tie, iff, m
ij
>1 if jis of the different type. Let f
be the density of the matching value distribution. The
expected social utility for i, given that a symmetric
effort eis used except for possible deviation of ito e
i
,
can be derived from
ƐX
j2 ijG
mij 18xi6=xj9xi1e 1 ei
=4ei+e5 M
2Z
0
tf 4t5 dt +Z
1
4t 15f 4t 5 dt0
Let ådenote the term in the last parentheses. Let
1
2
e
2
i
be the cost of effort. The equilibrium effort is then
given by
e=
20(24)
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 249
Under this common effort level, the firm’s posterior on
type is again given by (13) in Lemma 3.
Next, we set this equilibrium effort level as the
baseline, and compare it to that when social relation-
ships affect financial benefits. Because the credit score
introduces asymmetric desirability of low-type and
high-type friends, the effort levels exerted by low
types and high types will, in general, be different. In
principle, the firm’s posterior needs to incorporate the
difference in efforts. Here for simplicity, we focus on
how effort level will differentiate between types but
omit how it would affect a firm’s posterior assessment.
Formally, we let the credit score enter utility addi-
tively through
xi
R
i
with P
i
given simply by (13).
We characterize a symmetric equilibrium, by which we
mean the effort pair 4e
l
1 e
h
5where every low type
chooses e
l
and every high type chooses e
h
such that no
consumer has an incentive to deviate.
The following proposition summarizes how the
consumer motivation to meet others changes compared
with the effort they would exert simply to maximize
their utility from friendships.
Proposition 8.
(i)
For
l
=
h
>0,e
h
> e
> e
l
. When both types have
identical needs for financing, high types exert more effort
than low types.
(ii)
For
l
sufficiently larger than
h
,e
l
> e
h
e
. When
low types have higher needs for financing, they exert more
effort than high types.
The proposition suggests that when land htype
consumers have identical needs for financing, high
types exert higher levels of effort to increase their
probability of meeting others compared to low types
and compared with the effort exerted when consumers
only want to maximize social utility. This is because
high types have a higher marginal return on effort
than the low types (i.e., are more likely to form new
ties as a result of effort). As a result, independent of
their financial needs, high types always exert more
effort than they would when they earn utility from
improving their access to credit in addition to the
gains in social utility. Low types, by contrast, have
lower returns, but when high types make an effort
to meet others, they also benefit from it. With some
probability, a meeting will take place between a low
and a high type and a friendship will be formed if m
ij
is sufficiently high.
If, on the other hand, the low types’ utility from
improving their credit scores is very high (
l
very high),
this pattern result could reverse. Low types would
feel an immense pressure to increase the probability of
becoming friends with high types, resulting in a higher
level of effort exerted by low types compared to that
of high types.
Finally, an environment where the low types exert
sufficiently high levels of effort could help to create a
bridge between the two types, possibly reducing the
social separation. Therefore, low types who have more
to gain from improving their financing (
l
> 
h
) could
exert sufficiently great networking effort to connect the
two types.
7. Conclusion
7.1. Main Insights
Increasing access to financing is important in many
countries where institutions and contract enforcement
are weak (e.g., Feigenberg et al. 2013,Rona-Tas and
Guseva 2014). In low-income countries, in particular,
part of the credit access problem stems from the fact
that reliable data on financial history do not exist, are
limited, costly to collect or hard to verify. In these
countries, lenders tend to be very conservative in
accepting borrowers’ credit applications. This, of course,
makes it even harder for those who are in financial
hardship to obtain credit and generate a financial track
record. Group lending has proven to be a popular way
to address this problem. An alternative and possible
complement is to use additional available data to assess
applicants’ creditworthiness. Using social data is one
such option.
Motivated by the importance of consumer access
to credit and by the increasing use of network-based
credit scoring, we analyzed the potential implications
of such practices for consumers. Our study shows
that there are benefits to collecting information from a
consumer’s network rather than only individualized
data. Simply put, when consumers have an above
average chance of interacting with others of similar
creditworthiness, then network ties provide additional
reliable signals about their true creditworthiness. Hence,
social scoring can reduce lenders’ misgivings about
engaging applicants with limited personal financial
history, which include many who are economically
disadvantaged and underbanked.
As these new scoring methods gain popularity, con-
sumers may adapt their personal networks, which in
turn may affect the usefulness of these scores. If one’s
network can influence one’s financing chances, some
consumers, particularly those in more dire need of
improving their credit score, may form social ties more
selectively. If all consumers behave in this manner and
forming social ties requires mutual agreement, the end
result of such behavior will be social fragmentation into
subnetworks where consumers only connect to others
who are very similar to them. Though we expect that
such fragmentation and balkanization will be deemed
socially undesirable by many, its implications for net-
work scoring accuracy is not straightforward. Although
there will be fewer ties conveying information about
Wei et al.: Credit Scoring with Social Network Data
250 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
one’s contacts useful in updating lenders’ prior beliefs,
each of the ties will be more informative. We find, how-
ever, that there are situations in which social scoring is
beneficial even when consumers adjust their networks.
Specifically, these are: (i) consumers place sufficiently
low importance on the posterior mean of the firm’s
beliefs about their type (low ), (ii) high precision on
individual credit scores (high c), and (iii) relatively
dense network (high N).
To focus on the role of connections to consumers with
different levels of financial strength in the emergence
of balkanized societal structures, we introduce discrete
types and discrete type matching. Not surprisingly,
connections to those with high-type signals have an
overall positive impact. More interesting is that the
impact of connections to low-type signal consumers
can be positive or negative, depending on the tie
formation rules used in society. As demonstrated in
Figure 3, consumers with poor financial health would
prefer others like them not to be too selective but also
not to be too liberal in their willingness to associate
with people with poor financial health. Intuitively, as
the selectivity of same-type consumers increases, the
impact of negative signals received from some of the
low-type friends weakens. As the selectivity increases
even further, low types’ social circles will shrink such
that it will be harder for them to emit a high-type
signal, since size of social circles is a conspicuous signal
of type. As a result, disadvantaged consumers would
prefer some intermediate level of ostracism and social
isolation.
In our extensions, we discuss two scenarios that
may reduce the reliability of social scores. First, if
consumers cannot observe their social contacts’ types
perfectly upon meeting, the added noise will imply
that homophily will play a lesser role in the formation
of social networks. As a result, firms’ ability to detect
a borrower’s type by looking at her friends will be
limited. Similarly, if the network consists mainly of
weak rather than strong ties, this will also reduce social
scores’ diagnosticity, since strength correlates with
how well consumers know each other. In both of these
scenarios, contacts’ signals carry lower value to the
firm in assessing the risk of a borrower.
We also consider the possibility of exerting effort
in two different ways. First, we move away from the
static type model and allow consumers to improve
their type. We find that when there is discrimination
against low types, low- and high-type contacts play a
role in motivating effort, but high types, in general,
have a stronger effect. In an environment with only
homophily, these results hold as well, unless a consumer
is highly embedded in a network with many low-type
friends who exert low effort. Such consumers are not
motivated to exert effort towards improving themselves
and are more likely to remain a low type. Second,
when types are sticky and cannot be altered, we allow
consumers to exert effort to improve their chances of
meeting other people. This second model shows that
consumers’ networking effort will depend on their
need for financing. When high and low types have
comparable needs for financing, high types have higher
returns on their effort of creating new ties and thus
exert more effort to meet others. Because the types are
revealed only after meeting, low types’ likelihood of
meeting a higher type increases when high types exert
effort, too. Therefore they choose to free ride on others’
efforts. This outcome reverses when low types are in
dire need of financing, and they become the primary
driver of meetings in society.
One possible outcome of social scoring, which is not
addressed in this research, is that consumers strategi-
cally manipulate the perception of their type by trading
friendships for financial access. In particular, realizing
their higher financial status, high types may want
to offer their friendships in exchange for monetary
rewards. To model an environment wherein friendships
are traded, we may need to consider several additional
layers of complexity. First, rationally, traded friendships
would need to be formed such that the credit scoring
firm should be unable to distinguish a fake relationship
from a true friendship. Otherwise, low types would
have no incentive to pay for a high type’s friendship.
Second, high types must be financially motivated and
the benefit from forming a friendship with a low type
must exceed the losses from less favorable risk assess-
ment. Third, trading friendships must be rare enough
that a credit-scoring firm still benefits and desires to use
data from the social networks. Altogether, modeling
an environment of this sort would require a fairly
complicated model, which goes beyond the purposes
of the current study. Despite the complication, our
expectation for the findings would be fairly simple: In
line with the extensions discussed in §§6.1 and 6.2, if
social ties have lower informative value and homophily
is diluted, social credit scores will be less diagnostic in
detecting one’s true creditworthiness.
7.2. Implications for Public Policy
The link between credit scores and income is hard
to ignore.
14
It is reported that most U.S. consumers
with an income under $60K have a poor credit score.
15
Moreover, a significant portion of the individualized
credit score calculation relies on a consumer’s existing
debt level. Those with higher amounts of debt, all else
14
This is so even though FICO and other leading institutions state
that income is not a part of one’s individual credit score, as it is a
self-reported item of assessment.
15
http://www.creditsesame.com/about/press/consumers-who-earn
-60000-or-less-have-dangerously-high-credit-usage-levels-according-to
-credit-sesame/.
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 251
equal, are expected to have lower credit scores. With
network-based assessment, it is possible for immigrants,
underbanked consumers, recent college graduates, and
others who do not have a credit history but who
are creditworthy to signal this to lenders with higher
accuracy. The benefits introduced through network-
based systems may help overcome a portion of the
financing problems, particularly if networks are created
based on attributes correlated to financial health.
However, our analysis also raises an important con-
cern about discrimination against already financially
disadvantaged and underbanked groups. For instance,
the Equal Credit Opportunity Act (ECOA) prohibits
lenders from discrimination based on sex, race, color,
religion, national origin or age. To the extent that some
of these characteristics correlate with creditworthiness
and that homophily along those dimensions correlate
with homophily along levels of creditworthiness, a
side-effect of social credit scoring could be discrimina-
tion in access to credit along characteristics prohibited
by the ECOA (National Consumer Law Center 2014,
pp.
27–29
). Aside from strict legality, there is also a con-
cern that social scoring opens an additional back door
to discrimination along dimensions that many may find
objectionable (Dixon and Gelman 2014,
Pasquale 2015
).
Matters are even more complex as our results also
show that social scoring may lead consumers with
low creditworthiness to prefer being discriminated
against (in tie formation at least) to some moderate
extent. Thus moderate levels of discrimination and
social ostracism by fellow consumers may actually
help rather than harm disadvantaged consumers. Also,
one hitherto ignored societal benefit of social scoring is
that it can motivate rather than demotivate financially
disadvantaged citizens to exert greater effort to improve
their creditworthiness. The financial discrimination and
social exclusion implications of social credit scoring,
and how they balance against its benefits, warrant
attention from policy makers and researchers.
Finally, our findings here are of interest to policy
makers keen on understanding the mutual interaction
between social status and network structure. As noted
at the outset, our mathematical analysis of credit scoring
applies broadly to social status. Some people command
less respect than others. Differences in status are rarely
based solely on differences in true but hard-to-observe
ability or character. People often use the company that
others keep as a signal when assessing the respect they
deserve. Our analysis of the benefits and challenges of
social credit scoring, including improved diagnosticity
paired with the risk of unwitting discrimination and
the seeming paradox of optimal ostracism, extends to
situations wherein citizens, employees or customers
are valued and accorded status based on the company
they keep.
7.3. Implications for Management
To managers in the financial industry, our analysis
suggests that lenders can expect to reduce their risk in
the short run by incorporating network-based measures.
This dovetails with new governmental policies on risk.
For example, as part of the regulations by the Basel
Committee on Banking and Supervision, European
banks have been encouraged to reduce the level of risk
they undertake (Sousa et al. 2013). Regulations in the
banking industry encourage U.S. financial institutions
to better manage risk as well. These regulations have
come at a time when big data analytics are enabling
financial institutions to access larger and richer data sets.
Indeed, it has been reported that social media and
social network data are being used not only by start-
ups but also by established and more institutionalized
credit scoring firms such as Experian (Armour 2014).
The trend toward using social data may prove to be
useful in the post-crisis environment.
Our study also offers some insight to managers
outside the financial industry who use social scoring
for targeting customers when launching new products,
targeting ads or designing referral programs. (i) The
effectiveness of social scoring need not decrease when
customers purposely adapt their networks to improve
their score and their access to the benefits it entails.
(ii) Marketers do not need information on the complete
network. Data on the focal consumer’s immediate
contacts already provide an improvement in scoring
accuracy. (iii) Social scoring is likely to be most diagnos-
tic in societies and communities (online or not) where
consumers maintain many strong, rather than weak,
ties. (iv) Smart marketers will go beyond generic ties
and seek to leverage specific ties that correlate highly
with the traits they seek in their target customers.
A car manufacturer such as Audi, for instance, will
benefit from focusing on Twitter connections pertaining
to cars (personal communication). (v) The benefits
of social scoring to the marketer are greater when
the benefits of having a high score matter little to
customers or at least has little impact on those with
whom they chose to form ties. More generally, the
benefits of social scoring are greater when they involve
networks of ties that not only exhibit great homophily
but also are built and maintained for intrinsic rather
than extrinsic reasons. Examples of the former used in
social scoring include telephone call data and kinship
data (Benoit and Van den Poel 2012,Hill et al. 2006).
Examples of the latter are many ties in general-purpose
online social networking platforms, where linking is
very easy and often occurs between casual contacts.
(vi) Customers with a high number of connections
(degree centrality) in an undirected network such as
Facebook or LinkedIn are not necessarily the most
attractive. This is not only because centrality in such
networks cannot distinguish between opinion seekers
Wei et al.: Credit Scoring with Social Network Data
252 Marketing Science 35(2), pp. 234–258, © 2016 INFORMS
and opinion leaders (in-degree versus out-degree cen-
trality) but also because (as our analysis shows) the
most active networkers may be high-type or low-type
customers, depending on whether low types value
the benefits of a high consumer score more than high
types. (vii) Marketers should be concerned that social
customer scoring may create the impression of unfair
discrimination. This is not only a legal and an ethi-
cal issue but also a commercial one. For instance, in
January 2015, users of WeChat, the Chinese chat app,
protested against discrimination after they were not
targeted to see an ad for BMW, the luxury car maker.
Some believed that the targeting algorithm involved
social scoring based on those to whom the potential
targets were connected (Clover 2015). Because social
scoring uses inputs beyond one’s traits and history,
marketers must balance improved diagnostics against
actual and perceived fairness.
The insights in this paper also provide some guid-
ance on data collection and system design. Several
firms already create credit scores using social network
data. One important question they face is whether the
number of friends is a useful signal for measuring
creditworthiness. Our study demonstrates that even
when it is not a signal directly linked to one’s type, the
practice of network scoring would endogenously make
the number of friends a useful signal. Thus social credit
scoring may shape credit assessment in its own image,
i.e., help to construct the reality it is meant to describe,
just as modern option theory did for valuing financial
derivatives (MacKenzie 2006,MacKenzie and Millo
2003). This makes the implications of our analysis all
the more important.
Supplemental Material
Supplemental material to this paper is available at http://dx
.doi.org/10.1287/mksc.2015.0949.
Acknowledgments
The authors thank the senior editor, associate editor, and
two anonymous referees for their comments and sugges-
tions. The authors also benefited from comments by Lisa
George, Yogesh Joshi, Upender Subramanian, Yi Zhu, and
participants at the 2014 Columbia-NYU-Wharton-Yale Four
School Marketing Conference, the 2014 Marketing Science
Conference, the 12th ZEW Conference on Information and
Communication Technologies at the University of Mannheim,
the 2015 INSEAD Marketing Camp, and the 2015 CRES
Strategy Conference at Washington University in St. Louis.
The authors gratefully acknowledge financial support from
the Rodney L. White Center for Financial Research and
the Social Impact Initiative of the Wharton School of the
University of Pennsylvania.
Appendix. Proofs
Proof of Proposition 1.Because once conditional on the
types x, the signals yare independent of the network, we
have Pr4yx5=Pr4yg1 x5. Using Bayes’ rule we have
Pr4xg1 y5Pr4x5Pr4 g1 yx5
=Pr4x5Pr4yx5Pr4g x50
Thus
Pr4xg1y5
Y
iy
eqxi2/2×Y
iy
ec4yixi52/2×Y
ijg1
e4xixj52/2
×Y
ijg02 i1j y
61e4xixj52/27×Y
ijg02 jyy1Ɛ4nixi5
S0(25)
In the expression above, 41
Ɛ
4n
i
x
i
5/S5 is the probability
that iis not friends with jfor some iwhose type is x
i
and
some jwhose type is unknown. Fix some iyand consider
the term
Qijg02 jyy
41
Ɛ
4n
i
x
i
5/S5. If 8ij g
02
jyy9is not
empty, then by our assumption on the information structure,
it multiplies across everyone in the rest of the society. So its
value under the limits of S,, and qis
lim
S→1Ɛ4nixi5N1Ɛ4nixi5
SS−y=eN1
which is not a function of xthus does not contribute to the
conditional density. Note that the rest of the terms in the
right-hand side of (25) multiply across finite items. It is easy
to see that as 0 and q0,
Pr4x g1 y 5 Y
iy
ec4yixi52/2×Y
ijg1
e4xixj52/20(26)
This implies that
Pr
4xg1 y5is a multivariate normal density
N 4Ì1è5. To find the parameters Ìand è, all we need to do
is match the coefficients. The coefficients of x
2
i
1 x
i
x
j
, and x
i
in the quadratic form
1
2
4xÌ5
0
è
1
4xÌ5are
1
2
4è
1
5
ii
,
4è
1
5
ij
and 4è
1
5
i1
1
+4è
1
5
i2
2
+
···
. The corresponding
coefficients in the right-hand side of (26) are
1
2
4c +d
i
5,
1
8ijg19
and cy
i
. Matching them gives us the results in the
Proposition.
Proof of Corollary 1.This is a special case of Proposi-
tion 1, where iis fixed and y=8j ij G9,g
1
=8ij ij G9 and
g0=8ij ij yG1 j 6=i9.
Proof of Proposition 2.Let Dbe the diagonal matrix
where D
ii
=c+d
i
, and B=D
1
Awhere Ais the adjacency
matrix of g1. We express the precision matrix by
è=4IB51D10
Let B
0
denote the matrix Bwhen c=0. Since B
0
is a stochastic
matrix (i.e., each row summing up to 1), its largest-magnitude
eigenvalue is 1. When c > 0, Bis non-negative and it is easy
to see that
B< B00
By the Perron-Frobenius Theorem, we know that the largest-
magnitude eigenvalue of Bis smaller than that of B
0
, which
is . Given that  < 1, we write
è=4I+B+B2+···5D10
Wei et al.: Credit Scoring with Social Network Data
Marketing Science 35(2), pp. 234–258, © 2016 INFORMS 253
Because for any k1, B
k
is non-negative and B
k
< 
k
,
we have
4Bk5ij < k0
Now consider a node jwhose distance from iin the subnet-
work defined by g
1
is r 4i1 j5 1. Because Ais the adjacency
matrix of g
1
, and there is no path between iand jwhose
length is less than r 4i1 j5, we know 4B
k
5
ij
=0 for all k < r4 i1 j5.
Hence an upper bound of 4I+B+B2+···5ij is
+
X
k=r4i1 j 5
k=r4i1 j 5 /4150
Proof of Lemma 1.Derivation of the lemma follows
similarly to the proof of Corollary 1.
Proof of Lemma 2.Under a symmetric rule ,i, and j
become friends, iff, they have met and m
ij
> 4x
i
x
j
5
2
. Thus
Ɛ4nixi1 5 =SZ+
−
e4t xi52/2rq
2eqt2/2dt
=Srq
q+e44q5/4+q 55xi2/20
Recall that S
pq/4q +15
=N. Taking q0 gives the
result.
Proof of Proposition 3.For notational simplicity, the
expectation sign
Ɛ
4·5throughout this proof refers to the
conditional expectation
Ɛ
4·x
i
1 1 
i
5, which is computed
conditional on the type x
i
and a symmetric rule except for
possible deviation of ito
i
. Similarly, the notation
Pr
4
·
5also
refers to the probability with the same conditionals.
First we calculate the expected social utility,
ƐPijG
4m
ij
x
j
x
i
5, which we denote more compactly as
Ɛ
u
i
. For any j
we have
Pr4xj1ij G5 =Pr4xj5Pr4ij Gxj5
=rq
2eqx2