Online Demand Under Limited Consumer Search.
ABSTRACT Using aggregate product search data from Amazon.com, we jointly estimate consumer information search and online demand for durable goods. To estimate demand and search primitives, we introduce an optimal sequential search process into a model of choice and treat the observed marketlevel product search data as aggregations of individuallevel optimal search sequences. The model builds on the dynamic programming framework by Weitzman (1979) and combines it with a choice model. At the individual level, the model has several attractive properties including closedform expressions for the probability distribution of alternative search sets and breaking the curse of dimensionality. Using numerical experiments, we verify the model's ability to identify consumer tastes and search cost from product search data. Empirically, the model is applied to the camcorder online market and is used to answer manufacturer questions about market structure and competition, and to address policy maker issues about the effect of recommendation tools on consumer surplus outcomes. We find that consumer search for camcorders is typically limited to about 10 choice options, and that this affects the estimates of own and crosselasticities. We also find that the vast majority of the households benefit from the Amazon.com's product recommendations via lower search costs.

Conference Paper: Search Less, Find More? Examining Limited Consumer Search with Social Media and Product Search Engines
Thirty Third International Conference on Information Systems (ICIS 2012); 01/2012  SourceAvailable from: Bart J. Bronnenberg[Show abstract] [Hide abstract]
ABSTRACT: We develop a probitbased choice model under optimal sequential search and apply the model to study aggregate demand of consumer durable goods. In our joint model of search and choice, we fully characterize optimal sequential search and derive a semiclosed form expression for the probability of choice that obeys the full set of restrictions imposed by optimal sequential search. Our joint model leads to a partial simulationbased estimation that avoids demanding highdimensional, simulated integrations in evaluating choice probabilities and that is particularly attractive when the consumer search set is large. We demonstrate the applicability of the proposed model using aggregate search and choice data from the camcorder product category at Amazon.com. We show that the joint use of search and choice data provides better predictions than using search data alone and leads to more realistic estimates of consumer substitution patterns.07/2014; 
Article: Data Selection and Procurement
[Show abstract] [Hide abstract]
ABSTRACT: In this note I overview the data selection and procurement process in the context of structural models. Data selection for structural models presents unique challenges because data and structure often substitute and because it is imperative to consider what information identifies causal effects of interest. I further discuss three types of field data on which to build empirical models: (i) data that are proprietary to firms, (ii) data that can come from the public domain, or (iii) data that can be purchased from private research firms, and I discuss the benefits and limits of each. I then detail a process for obtaining proprietary data and the potential pitfalls inherent in the process.Marketing Science 01/2011; 30(6):965976. · 2.36 Impact Factor
Page 1
Electronic copy available at: http://ssrn.com/abstract=1340267
ONLINE DEMAND UNDER LIMITED CONSUMER SEARCH∗
Jun B. Kim†
Paulo Albuquerque‡
Bart J. Bronnenberg§
October 19, 2009
Abstract
Using aggregate product search data from Amazon.com, we jointly estimate consumer information
search and online demand for consumer durable goods. To estimate the demand and search primitives,
we introduce an optimal sequential search process into a model of choice and treat the observed market
level product search data as aggregations of individuallevel optimal search sequences. The model builds
on the dynamic programming framework by Weitzman (1979) and combines it with a choice model. It
can accommodate highly complex demand patterns at the market level. At the individual level, the
model has a number of attractive properties in estimation, including closedform expressions for the
probability distribution of alternative sets of searched goods and breaking the curse of dimensionality.
Using numerical experiments, we verify the model's ability to identify the heterogeneous consumer tastes
and search costs from product search data. Empirically, the model is applied to the online market for
camcorders and is used to answer manufacturer questions about market structure and competition,
and to address policy maker issues about the e?ect of selectively lowered search costs on consumer
surplus outcomes. We ?nd that consumer search for camcorders at Amazon.com is typically limited
to little over 10 choice options, and that this a?ects the estimates of own and cross elasticities. In a
policy simulation, we also ?nd that the vast majority of the households bene?t from the Amazon.com's
product recommendations via lower search costs.
Keywords: costbene?t analysis, optimal sequential search, demand for durable goods, information
economics, consideration sets
∗The authors are grateful to seminar participants at the 2009 Marketing Dynamics Conference in Waikato (New Zealand),
Dartmouth College, Erasmus University Rotterdam, Georgia Tech, GSB Stanford University, GSB University of Chicago, Hong
Kong University of Science and Technology, National University of Singapore, Santa Clara University, Tilburg University, and
the University of Texas, Dallas. We also thank participants at the 2009 QME conference participants and Chad Syverson in
particular for comments and feedback.
†Jun B. Kim is Assistant Professor of Management at Georgia Institute of Technology.
‡Paulo Albuquerque is Assistant Professor of Marketing at the Simon Graduate School of Business, University of Rochester
§Bart J. Bronnenberg is Professor of Marketing, and CentER Research Fellow, Tilburg University. Bronnenberg grateful
acknowledges EU funding from the Marie Curie Program (IRG 230962)
1
Page 2
Electronic copy available at: http://ssrn.com/abstract=1340267
1Introduction
Online demand for consumer durables and search goods is large and rapidly growing. Comscore (2007)
estimates that nontravel U.S. online consumer spending in 2006 reached $102.1 billion. Jupiter Media
Metrix (2004) estimated U.S. online consumer spending of $65 billion in 2004, with $20.2 billion on durable
consumer search goods, and another $8.3 billion on information goods. The Comscore report shows that the
fastest growing ecommerce categories include durable consumer goods such as video consoles, consumer
electronics, furniture, appliances, and equipment, as well as information goods such as books and magazines,
music, and software. These categories saw annual growth rates for 2007 of 25% to 50% range. PC World
reported in 2007 that the ?appeal of online shopping? is growing. Between August 2006 and the same
month a year later, 14 percent of the $159 billion that U.S. shoppers spent on consumer electronics was
spent online, up from 5 percent a year earlier, according to the Consumer Electronics Association.
In this paper, we seek to understand online demand by studying the product information acquisition
for durable search goods and/or information goods at Amazon.com using aggregated histories of search
behavior, which provides us with a unique opportunity to directly observe categorylevel consumer browsing
behaviors. Our premise is that we can learn about the preferences of consumers by studying their ?shopping
behaviors.? That is, because the examination and inspection of goods or services come at the cost of
the consumer's time and e?ort, search outcomes become informative about what the consumer wants.
Observing the particular region of the attribute space in which a consumer invests time browsing products
may teach us something about her preferences. Speci?cally, our proposal is to treat browsing behavior
as the outcome of an optimal sequential search process across choice options for which the consumer has
di?erent expectations and uncertainties. In addition, these choice options need not all be equally accessible
and may be o?ered to consumers at di?erent search costs (for instance through the use of seller sponsored
recommendation engines). Recognizing that the three demand primitives  expectations, uncertainties, and
search cost  can be changed by interested parties, e.g., manufacturers or policy makers, the substantive
goal of this paper is to analyze the impact of limited search on choice decisions of the consumers and on the
competitive market structure. Methodologically, we introduce an optimal sequential search process into a
model of choice and identify the demand parameters of interest from the search data.
Table 1 presents an example of viewing data for camcorders obtained from Amazon.com. The table lists
products that were viewed by consumers conditional on viewing a particular (or a focal) product, the Sony
DCRDVD108. In addition, the order in which these products are listed is determined by an Amazon.com
algorithm that uses the frequency of samesession viewing of the focal product and the other products.1
1This data generating mechanism is explained separately in the data section.
2
Page 3
Electronic copy available at: http://ssrn.com/abstract=1340267
The table does not list all existing 300+ camcorder options and re?ects the fact that some options are
seldom or never viewed together with the focal product by any consumer in the same online session. The
data in Table 1 exist for each of the camcorder options as the focal product. Because the products in the
view list are rank ordered, we refer to these data as the viewrank data. The paper shows that, across
viewed products, the viewrank data are informative about substitution, and that from viewed to non
viewed products, the data imply either low or lack of substitution. For durable goods, where meaningful
observations of consumer switching are usually very limited, the premise of this paper is that the viewrank
data are in the spirit of revealed measures of substitution.
A related premise is that the viewrank data can be used to estimate the demand system. This would
be of interest to practitioners and policy makers because the Amazon.com viewrank data are publicly
available, and contain crossproduct information that is not present in reports of sales volume or market
shares.
The general approach in the paper is to model the viewrank data as the aggregation across consumers
of individuallevel optimal search sequences, in which each consumer tries to maximize her expected util
ity, taking into account the search costs of the alternatives that she inspects. At the individual level,
our approach yields a probabilistic model of optimal search set formation, is not subject to the curse of
dimensionality, and is purposely suited to be estimable using the viewrank data. Using data experiments,
we ?nd that the model is successful at identifying the parameters of a choicebased demand system with
random e?ects. In addition, the model correctly identi?es search cost and search set size.
From an application of our model to the Amazon.com camcorder category, we ?nd the following results.
The median (average) search set contains 11 (14) products, with about 40% of consumers searching less
than ?ve products out of a total of over 90 products. We ?nd that the cost of search is signi?cant and is
subject to consumer heterogeneity. The search cost is lowered for products which appear more frequently
at Amazon.com, measured by the total number of references to the product. We also ?nd that online
competition between many products is e?ectively 0, because many products are not jointly searched by
consumers. In fact, when looking at the estimated frequency of coviewership of two products, we ?nd that
the large majority of all possible product pairs, about 70%, is viewed by less than 5% of the population. This
implies severe limits on substitution, which in turn causes many crossprice elasticities to be numerically
zero. Finally, our results show that almost everyone bene?ts from the product references that selectively
lower search costs at Amazon.com
The remaining of the paper is organized as follows. The next section reviews the background literature.
Section 3 outlines the model. Section 4 presents the data and discusses the Amazon.com's data generation
process. Section 5 explains model operationalization and estimation, and discusses empirical identi?cation.
3
Page 4
Viewrank
1
2
3
4
5
6
7
8
9
10
11
12
13
...
37
38
39
40
41
42
43
Brand
SONY
Media formatOptical Zoom
25
32
20
10
40
40
25
30
20
25
10
10
10
...
10
35
35
32
25
32
32
···
···
···
···
···
···
···
···
···
···
···
···
···
···
Price
$443.32
$248.11
$539.00
$665.20
$509.84
$299.99
$363.88
$347.55
$257.43
$345.99
$552.42
$378.45
$790.22
...
$752.75
$354.78
$376.57
$289.39
$554.14
$488.88
$361.81
DVD
PANASONIC MINIDV
DVD SONY
SONY
SONY
SONY
SONY
HD
HD
MINIDV
MINIDV
DVD
MINIDV
DVD
MINIDV
DVD
DVD
PANASONIC
SONY
CANON
SONY
HITACHI
SONY
... ...
SONY
CANON
CANON
DVD
DVD
DVD
···
···
···
···
···
···
···
PANASONICMINIDV
SONY
JVC
HD
MINIDV
DVDPANASONIC
Table 1: Product alternatives searched at Amazon.com, in May 2007, given search of a Sony Camcorder
with DVD media format, 40 × optical zoom, 2.5Inch swivel screen, etc., selling at $328
Section 6 presents evidence from numerical experiments to show that the model is identi?ed. Section 7
presents empirical results, model robustness checks, and model validations. Section 8 contains two policy
experiments and describes managerial implications. Section 9 concludes.
2Background
Marketing scholars and economists have long recognized that consumers do not in general search or consider
the universal choice set, due to reasons such as nonzero search cost, product proliferation, and preference
dispersion (e.g., Hauser and Wernerfelt 1989; Howard and Sheth 1969; Nelson 1970; Stigler 1961). The
recent popularity of the choice based demand system has brought renewed attention to the issue of modeling
choice sets and the concerns exist that not taking into account the limited nature of choice sets leads to
biased estimates of demand (Bruno and Vilcassim 2008; Chiang, Chib, and Narasimhan 1999; Goeree
2008). Papers in this tradition specify a probability of a product being known (Goeree 2008) or accessible
(Bruno and Vilcassim 2008) that is not the outcome of an optimal search process but simply constitutes
a consumer response to ?rms' actions. In this paper, we advocate that such responses can be measured
in the context of how they a?ect the consumer's search strategies, and that if one has access to outcomes
of search behavior, as we do here, those become informative of important demand primitives when viewed
4
Page 5
through the lens of optimal information search.
Understanding consumer information search has been an important topic in both marketing and eco
nomics and hence research on consumer information acquisition abounds. Starting with Stigler (1961),
early research on consumer information acquisition focused on consumers searching for pricequotes in
homogeneous goods markets at some e?ort. Extending the scope of consumer search to issues of market
outcomes, several authors theorized that limited consumer information search may have a signi?cant im
pact on market structure (Diamond 1971; Nelson 1974; Anderson and Renault 1999). In this paper, we
model consumer search behavior not only to evaluate market structure issues, but also to evaluate the
impact of changing search costs by ?rms on consumer surplus.
We model the consumer's willingness to search for choice options by assuming that the consumer is
motivated to search only if she bene?ts from doing so. There is already a tradition in the consideration set
literature to represent consideration sets as the outcome of nonsequential search (Roberts and Lattin 1991;
Mehta, Rajiv and Srinivasan 2003). This tradition rests on the ?xed sample strategy proposed in Stigler
(1961) as an optimal search policy for a consumer in a commodities goods market under price uncertainty.
In contrast, McCall (1965) and Nelson (1970) argue that a sequential search strategy is optimal in terms of
total cost2and since we additionally believe that online search is more correctly captured as a sequential
process, we will model online search for information in this study as a sequential process and use the
theory of optimal sequential search. Seminal contributions to sequential search theory have been made
by Weitzman (1979), in the case of single agent problems and by Reinganum (1982, 1983) in the case of
multiple agent problems. We implement the optimal search strategies of these papers into a singleagent
random utility choice model.
In contrast to a large volume of theoretical work, there has been relatively limited empirical research on
consumer information search using secondary data. Two recent exceptions are papers on empirical search
for commodities (Hong and Shum 2006) and for di?erentiated products (Hortaçsu and Syverson 2004).
In the former, the authors devise a model that translates the price dispersion into heterogeneous search
cost across population. In the latter, the authors develop a model to translate the utility distribution into
heterogenous search cost. In our case, like Hortaçsu and Syverson (2004), we model search for di?erentiated
products, but unlike them, we have collected direct measures of search outcomes, allowing us to estimate
a more general demand model. For instance, in contrast to the homogeneous demand model in Hortaçsu
and Syverson (2004), we believe that information about which products tend to be viewed together allows
us to estimate heterogeneous consumer preferences in a di?erentiated product category.3
2Actually, blocksampled search strategies have been argued to be even better (see e.g., Morgan and Manning 1985).
However, in online search such strategies can not be executed and therefore they are not considered here.
3For a comprehensive review of several empirical applications, see MoragaGonzáles (2006).
5
Page 6
With our choice model that includes optimal sequential search, we seek to explore the in?uence of
retailer product recommendations, a mechanism to selectively lower search costs, on consumer search
behavior and its impact on market structure. Given the popularity and ubiquity of recommendations at
many online stores, it is of practical and academic interest to investigate how recommendations a?ect
the consumer information and product search decisions. In behavioral work, Huang and Chen (2006)
report that the recommendations of other consumers in?uence the choices of subjects more e?ectively than
recommendations from an expert. Senecal and Nantel (2004) also show that retailer recommendations will
signi?cantly a?ect demand.
3A demand model with costly sequential product search
3.1Utility
Our modeling assumptions at the individual level are as follows. Consumer i has a utility for product
j = 1,...,J that is equal to
uij= Vij+ eij
(1)
with
Vij
=
Xjbi
bi
∼
N(b,B)
eij
∼
N(0,σ2
ij),
where Xj is a row vector of product characteristics and biis a vector that represents individualspeci?c
sensitivities to product characteristics. We assume the matrix B is diagonal. The outside good is the
(J + 1)stalternative, and the consumer is aware of the option not to buy. This option does not require a
search and is available at no cost.
The utility function contains an expectation of Vij and an unknown component of utility, eij. Our
interpretation is that this decomposition partitions what the consumer knows and does not know into Vij
and eij, and the consumer's goal of search is to resolve eij4. The most relevant attributes, whose values
are de?ned by Vij, are accessible from general category information displays without retrieving the product
detail web page,5thus facilitating the existence of an expectation Vij prior to search. Before accessing a
4Our interpretation is consistent with Nelson (1970) who de?nes consumer search as an information problem to fully
evaluate the utility of each option.
5In the digital camcorder category page at Amazon.com, consumers have access to important product characteristics in
6
Page 7
product page, knowledgeable consumers may have lower variance eij's and less knowledgeable consumers
may have highervariance eij's. When consumers request the product detail web page, they see more details
about the product, which resolves eij.
Resolving eij upon search comes at some cost. We introduce product and individual speci?c search
cost, cij, which we interpret mainly as time spent on discovering and evaluating the product.6We model
search cost as a log normally distributed random e?ect
cij∝ exp(Ljγi),
(2)
with
γi∼ N(γ,Γ),
where the matrix Γ is diagonal.The lognormal speci?cation ensures that the sign of cij is positive,
consistent with theory. The cost attributes Ljdescribe, for instance, the accessibility of product j and are
assumed to be known by the consumers. For instance, it may contain the appearance frequency of product
j at the store or the number of times it is recommended.
The consumer's search and choice process are the outcome of her desire to maximize expected utility
minus total search cost. This involves contrasting the marginal bene?t and marginal cost of search. The
objective of the analyst is to estimate b, B, γ, and Γ from data.7
3.2A model of sequential search
In sequential search, a consumer decides to stop or continue search each time after having searched a
product. The theory of optimal sequential search states that consumers only continue search if the marginal
bene?ts of doing so outweigh the marginal costs.
Utility uij of consumer i for product j is Vij+ eij. De?ne u∗
iat any stage of the search process as
the highest utility among the searched products thus far. The consumer's expected marginal bene?t from
search of product j is
Bij(u∗
i) =
?∞
u∗
i
(uij− u∗
i)f (uij)duij,
(3)
where f(·) is the probability density distribution of uij. The marginal bene?t is the expectation of the
camcorder such as brand, price, media format, zoom, pixel number, and the dimension for all products in this category.
6Search cost is di?erent between consumer packaged goods and consumer durables. For packaged goods in which experience
is more easily obtained, mental maintenance and processing cost constitute the majority of search cost (Lattin and Roberts
1991). For onetime purchases such as consumer durable goods, it is more likely that search costs are determined by the
time spent on searching for more information and the need for evaluation. Therefore in the context of digital camcorders, we
interpret search cost as the opportunity cost of time invested in identifying and evaluating another candidate product.
7In the empirical analysis, we will assume that σ2
ij= 1, but in the modeling section we wish to keep the level of product
uncertainty general.
7
Page 8
utility for j given that it is higher than u∗
i, multiplied by the probability that uijexceeds u∗
i.8Note that the
bene?t of search only depends on the arrangement of utility above u∗
i. The left tail of the utility distribution
below u∗
ican be arbitrarily rearranged without a?ecting search or choice.
The goal of the consumer is, given the current best option, to maximize expected utility minus incurred
search cost over a set of options that, at the individual level, are characterized by product speci?c mean
utilities, Vij, product speci?c search costs, cij, and product speci?c uncertainties, captured by σ2
ij. This
implies that the consumer continues search if there exists at least one j such that
cij< Bij(u∗
i),
(4)
i.e., if the expected marginal bene?t of searching is larger than the marginal cost, cij.
The optimal sequential search strategy can be formalized as follows. First, partition the set of options
into Si∪¯Si, with Si containing all searched options and¯Si containing all nonsearched options. All
decision relevant information about Siis contained in u∗
i= maxj∈Si
?uij,ei(J+1)
?, provided we assign 0
?. De?ne the value function
to the deterministic component of utility for the outside good.
At any point in the search process, the state of the system is given by?u∗
i,¯Si
W?u∗
going forward. This value function must satisfy the following Bellman equation (Weitzman 1979)
i,¯Si
?as the expected (discounted) value of following an optimal search policy, from the current state
W(u∗
i,¯Si) = max(u∗
i,max
j∈¯Si(−cij+ βi· [F(u∗
i) · W(u∗
i,¯Si− {j})
??
?
?
uij≤u∗
i
+
?∞
?
u∗
i
W(uij,¯Si− {j})f(uij)duij
?? ?
i, or the
i,¯Si−
uij>u∗
i
]))
(5)
This equation says that from state?u∗
consumer can search any j ∈¯Si. In the latter case, the consumer gets an expectation F(u∗
{j}) +?∞
in a single session are conducted in a short time span, we set the discount rate βito 1.
i,¯Si
?, the consumer can terminate search and collect u∗
i) · W(u∗
u∗
iW?uij,¯Si− {j}?f (uij)duij, which she seeks to maximize across j. Because all online searches
Now we discuss some important modeling assumptions in our proposed search model. First, our model
is a full information model in which the consumers are assumed to have full knowledge about the products
8This can be seen by writing equation 3 alternatively as
Bij(u∗
i) = (1 − Fj(u∗
i)) ×
?inf
u∗
i
(uij− u∗
i)
f (uij)
?1 − Fj
?u∗
i
??duij,
which is the multiplication of the chance that the utility draw is larger than u∗
from the distribution of uijabove u∗
i
iand the expected value of a truncated draw
8
Page 9
and their attribute values. This allows the consumers to form Vij for all products prior to search and
to use them in computing the reservation utilities during the sequential search process. Later in the
empirical section, we conduct a series of robustness tests in which we relax the aforementioned assumption.
In these robustness tests, we assume that consumers have partial knowledge about the products and
investigate whether this partial knowledge assumption meaningfully a?ects our model estimates. Second,
we assume that E(eij·ekl) = 0 and thus that the correlation of the unobserved portion of the utility across
individuals and products is zero. The corresponding consumer behavior underlying this assumption is that
the consumers have wellde?ned preferences prior to search and do not learn about the products during the
search process9. Third, we do not include any context or reference e?ect in the proposed model of optimal
consumer search. Although we acknowledge that a more comprehensive model should include such e?ects,
we use the costbene?t framework as the ?rstorder approximation of the optimal consumer search. Lastly,
we do not model Amazon.com's (potentially) strategic behavior in setting prices and product information
(eij).10
3.3The optimal strategy
The solution to the above dynamic program is to continue searching until a utility u∗
iis discovered that
is larger than some limit, which in turn depends on how much option value is still left in the unsearched
set. This limit depends on a quantity that is called a ?reservation utility?. To de?ne this concept, each
consumer i has a reservation utility zijfor each product j that  if she had already found a product with
that utility  leaves her indi?erent between searching and not searching j. In other words, the reservation
utility zijobeys the following equation (see also equation 4, above):
cij= Bij(zij) =
?∞
zij
(uij− zij)f (uij)duij.
(6)
Thus, the reservation utilities solve zij= B−1
and a separate appendix provides the details of computation of zijincluding its existence and uniqueness.
ij(cij). The estimation section establishes that Bijis monotonic
The optimal search strategy (see, e.g., Weitzman 1979) that solves the consumer's maximization problem
of equation 5 has three components; a selection rule, which determines the ordering of the search sequence,
a stopping rule, which determines the length of the search sequence, and a choice rule.
9We discuss this topic in detail in the next section.
10Amazon.com does not di?erentiate prices or product information across consumers. We allow for ?exible product ?xed
e?ects in the model, thereby reducing the potential for endogeneity biases stemming from potentially strategic prices or supply
of product information. We acknowledge that the issue is important and warrants future study. The issue of how much and
which product information Amazon.com should supply in order to maximize pro?ts is interesting but is outside the scope of
this paper.
9
Page 10
1. Selection rule: Compute all reservation utilities zijand sort them in descending order. If a product is
to be searched, it should be the product with the highest reservation utility zijamong the products
not yet searched.
2. Stopping rule: Stop searching when the highest utility obtained so far, u∗
i, is greater than maxj∈¯Si(zij)
among the unsearched items.
3. Choice rule: Once search stops, collect u∗
iby choosing the maximum utility alternative in Si.
We note that this search and choice process can accommodate that some consumers do not search at all.
Indeed, consumers for whom maxj(zij) < 0 for all j will not ?nd it worth their time to search brands. They
will choose the outside good.11The same process can also accommodate that some consumers just browse
but do not buy. For such consumers, maxj∈Si(zij) > 0, but maxj∈Si(uij) < 0. These two statements are
not in con?ict, as will be seen below. These consumers will also choose the outside good.
We assume that the optimal selection and stopping rules above are derived assuming information
obtained by searching one product does not a?ect the knowledge of other products. That is, we do not
assume consumer learning during the search process. From the modeling perspective, we assume that eij
are independent given Vijduring the search process. The current consumer behavior literature advocates
that this is a reasonable assumption for consumers engaging in search processes (Moorthy, Ratchford, and
Talukdar 1997).
Two important points need to be made. First, given a choice set, the choice model above is not a probit
model. For instance, given the stopping rule above, search beyond item k is continued only if the utility
draw for eik is low enough. This implies that conditional on observing a speci?c choice set, the eij are
not distributed normal with mean 0 and variance σ2
ij. Therefore, given search, choice probabilities do not
follow a standard probit.
Second, Chiang, Chib and Narasimhan (1999) mention that identi?cation of choice sets (or in this case:
search sets) is subject to the curse of dimensionality. Indeed, in a nonsequential search process, with J
possible alternatives, there exist 2J− 1 possible search sets. This large number of permutations would
render the computation of the search frequency of any given product impossible with universal choice set
sizes of J = 300+ at Amazon.com. However, an important computational windfall of the sequential search
process is that it is not subject to the curse of dimensionality. Given the selection rule above, there are only
J possible optimal choice sets at the individual level. Given a set of individual level parameters, there will
be an ordering of the choice alternatives along their reservation utilities zij, and the consumer optimally
11Note that because we estimate our model with search data, we assume that all consumers search at least one product.
However, the model actually accommodates nonsearch behavior.
10
Page 11
00.51
c
1.52
−1
0
1
2
3
z
012
σ2
34
0
1
2
3
4
5
z
Figure 1: The relation between search cost, c, product uncertainty, σ2, and search attractiveness, z.
samples these choice alternatives in descending order. Thus, if the zijcan be computed, the contents of a
search set of size m is known. In sum, whereas, across consumers, the model allows for the existence of any
of the 2Jpossible search sets as a consequence of consumer heterogeneity, at the individual level only J of
these sets can be an outcome of the optimal sequential search process that belongs to a particular vector
of individual parameters.
Before completing the model, we investigate some properties of the search sequence by means of an
example.
3.4Some characteristics of the search sequence
In Figure 1, we plot the relation between c, σ2, and z, under the assumption of normality of eij. The
left hand panel varies search cost from 0 to 2, and measures the change in the reservation utility zij. For
reference, in this example we choose Vij= 1 and σ2
ij= 1. The zij, the reservation utility, or more intuitively
the relative attractiveness of searching j, is decreasing in its search cost. As search cost increases, zijgoes
to Vij− cij. This implies that as search costs increase relative to product uncertainty, the attractiveness
of search tends to go to the expected utility net of search cost. On the other hand, if search costs are low
relative to product uncertainty, or product uncertainty is high relative to search cost, zijgoes to in?nity.
Indeed, if it is free to search, the option value (upside) of searching any product that has utility support
on R+is in?nite.
The relation between σ2and z in the right hand side panel shows the option value of uncertain prospects.
For reference, in this graph Vij= 1 and cij= 0.1. As outlined above, in sequential search, the search value
of a product is determined by its upside. That is, anything lower than the current maximum u∗
iis irrelevant.
11
Page 12
Per consequence, the reservation utility zijis increasing in product variance. As a natural consequence, if
novice consumers are characterized by having high σ2
ijrelative to Vij, they will tend to have higher zijand
thus search more than consumers who have more experience. For completeness, we note that zijincreases
linearly in Vij.
3.5 Inclusion probabilities and set occurrence
Our data are a function of the frequency with which products are being viewed or searched, and therefore
we seek to derive the probability πijthat a given product j is included in the optimal search set of consumer
i. Consider that we know zijand Vijfor each individual and product. With some abuse of notation, denote
the rank of zijby r, with r(1) returning the index j of the highest ranked zijand r(J) returning the index j
with the lowest ranked zijfor individual i. From these de?nitions, πi,r(1)is the inclusion probability of the
product with the highest ranked zijfor consumer i, and πi,r(j)is the inclusion probability of the product
with the jthhighest ranked zij.
The contents of set Sikis fully determined by the selection rule (ranking on z) and the stopping rule
(the size k). The probability πi,r(j)that product r(j) is in the set, is equal to the probability that the ?rst
j −1 draws of utilities all fell short of zi,r(j)(which is less than zi,r(j−1)by the selection rule above). Thus,
the inclusion probability of product r(j) is
πi,r(j)
=Pr
?
maxj−1
k=1
?Vi,r(k)+ ei,r(k)
?< zi,r(j)
, j > 1
?
?
(7)
=
j−1
?
k=1
F?zi,r(j)− Vi,r(k)
?
(8)
with πi,r(1)= 112and F(·) is the cumulative probability distribution of eij, which in our case is the normal
distribution with mean 0 and variance σ2
ij.
There are three useful properties of these probabilities of inclusion.
1. First, it is trivial to show that πi,r(j)> πi,r(j+1), or the inclusion probability of the (j + 1)thproduct
is always less than the inclusion probability of the jthproduct.
2. Second, given the sequential nature of search and the selection rules of the optimal strategy, the
probability that r(j) and r(j + k) occur together in a set is equal to the probability that r(j + k) is
in the set.
πi,{r(j) and r(j+k)}= πi,r(j+k)= min?πi,r(j),πi,r(j+k)
?,
(9)
12Because our data are predicated on the occurrence of search, consumers search at least one product.
12
Page 13
where the last step is from the ?rst property. In the estimation section, we will use the last formulation
of this property, when we need to determine the probability that two product j and k are jointly in
the set.
3. Third, given the sequential nature of choice and the independence of the eij, the probability that
the set Sikoccurs can be computed as follows. First, recall that Sikis the optimal set of size k for
individual i. The probability that Sikoccurs is equal to the probability that search continues beyond
r(k − 1) minus the probability of continuing search beyond r(k). This is equal to the chance that
r(k) is in the choice set minus the chance that r(k + 1) is in the choice set of consumer i. Thus
Pr?Si,r(k)
?= πi,r(k)− πi,r(k+1).
(10)
This concludes the statement of the individuallevel model. The aggregation to the level at which Ama
zon.com reports its data is explained in the estimation section. For completeness, it is also explained that
an alternative to the approach in this subsection is to use draws of the eij and compute realizations of
the process. This would lead to less computation at the individual level, but at the aggregate level we
would have to use a frequency estimator for market level behavior, whereas using the model above, we can
integrate over a probability model with far greater precision.
4Data
4.1 The viewrank data
We have collected, on a regular basis, the viewrank data for all camcorder products from May 2006 until
October 2007. To ensure that the analysis is based on a su?ciently large sample of viewing behaviors, and
because we do not have information about the temporal window used by Amazon.com in computing the
viewrank data, we use the products that appear throughout the data collection period and aggregated
their rank orders in the viewrank data to the monthly level.13We use data from the month of May 2007.
At ?rst, we extracted top 200 camcorders from the Amazon.com website, based on sales rank. We
13We use average sales price in our analysis. In the data, we observe that the positions of products in the viewrank lists
?uctuate over time. This calls for an averaging mechanism for the di?erent positions of a product in the viewlists over time.
For this averaging procedure, we use the percentile ranking similar to Bajari, Fox and Ryan (2007). In the percentile ranking,
the product with the highest rank among J products is coded as J, not 1. Then we normalize the rank of product j at time
t as,
ˆ rjt=
maxk{rkt}
Once we compute ˆ rjt, the percentile ranking of the product j at t, we compute the average ranking of product j as the mean
of the daily percentile ranking as ˆ rj=
T
rjt
(11)
1
?
tˆ rjt.
13
Page 14
Attributes
Brand
Media Formats
Price
Form
HighDe?nition
Pixel
Zoom
Ranges
Sony (31), Panasonic (19), Canon (15), JVC (15), other (11)
MiniDV (33), DVD (30), FM(9), HD (19)
$ 530 (mean), $ 263 (std. dev.)
Compact (8), Conventional (83)
Yes (14), No (77)
1.67M (mean), 1.45M (std. dev.)
19.8 (mean), 10.9 (std. dev.)
Table 2: Description of the choice options in the empirical data (with frequency of occurrence in parenthesis)
removed the smaller players such as Aiptek, Samsonic and DXG who cater to the lowestprice tier only
with di?erent types of camcorders and which have very low salesrank. We also removed from the analysis
those camcorders on which we had no observations of media format14, and all camcorders of professional
grade. After applying these data ?lters, we are left with 91 choice alternatives. The summary statistics of
the products are shown in Table 2.
All 91 products have their own viewrank lists, i.e., all of the products have a list from which we
observe which other products are closely related, in the order of decreasing relationship. On average, a
given product appears 24 out of 90 times on other product's view list with a standard deviation of 18. The
minimum number of appearances is 0 while the maximum is 83.
Table 3 gives the results of a descriptive regression of the number of appearances on the viewlists. Note
that Sony, Panasonic, and Canon appear most frequently in the viewrank lists. Further, high de?nition
and pixel size improve the number of appearances, while higher price reduces it. We conclude that the
number of appearances on the viewrank data depends on demand drivers such as product attributes and
prices.
We point out the rich information embedded in the Amazon viewrank data. For every focal product k,
Amazon.com provides a list of top N most related products among the remaining J − 1 products.15Also,
product k may appear on the viewrank lists of other J −1 products. Therefore, the data reveal a complex
pattern of relations between a given product k and the other J − 1 products.
Lastly, we discuss the type of consumers who we believe are represented in the product search data.
Moe (2003) classi?es online store browsing behavior of consumers into four di?erent categories  directed
buying, search and deliberation, hedonic browsing, and knowledge building. She also classi?es the contents
of ecommerce web pages into three di?erent categories: product, category, and information pages. She
reports that the consumers in directed buying mode will frequently visit the product page while the
consumers in the mode of search and deliberation focus on both product and category pages. Hedonic
14The media formats of the products in the data include ?ash memory(FM) and hard drive(HD).
15During the data collection period, Amazon.com listed up to 45 products that are related to the focal product.
14
Page 15
Variable
Intercept
Sony
Panasonic
Canon
JVC
Samsung
MiniDV
DVD
FM
Compact
High De?nition
Zoom
Screen Size
Pixel
Price
Appearance frequency
R2
β
std.err.
22.71
15.16
14.56
14.09
14.51
12.79
4.21
4.34
9.00
8.88
5.38
0.20
6.28
1.64
9.28
0.23
10.64
51.80
48.61
44.61
30.42
36.09
13.42
22.58
18.69
1.80
16.10
0.17
1.28
6.49
36.40
1.65
0.60
Table 3: Descriptive regression of the frequency of product appearance against product characteristics
browsers focus on category pages while consumers in knowledge building will focus on information pages.
Montgomery et. al (2004) also identify that the focus of the consumers in the buying mode is product detail
pages. Amazon.com's product search data are based on the number of consumers who requested product
detail pages from the Amazon.com server. Therefore, consistent with previous research, we conjecture that
Amazon.com's product search data predominantly re?ect the behaviors of consumers in either buying or
search phase with a vested interest in the product category.
4.2Other measures of search at Amazon.com
We now discuss other data that are available at Amazon.com. At each product detail page, Amazon.com
lists up to four top products purchased by past consumers who searched the product in the current detail
page. These product references serve as shortcuts to other closely relevant products, thereby reducing
consumers' search costs for potentially attractive products. We use the number or frequency of appearances
of each product aggregated across other product pages as an explanatory variable that a?ects search cost.
For instance, we hypothesize that a product that appears frequently at the store level will have a smaller
search cost compared to products that do not. Hereafter, Lj denotes the frequency of appearances of
product j over all product pages.16
16It is important to ensure that there is no multicollinearity between Lj and other product characteristics we use in our
empirical analysis. In order to verify this, we regressed Ljon the product attributes such as brands, media formats, prices and
etc. We ?nd that the model ?t is relatively low (R2= 0.12) and none of the coe?cients are signi?cant, thereby supporting a
lack or low level of multicollinearity between Ljand other product characteristics.
15
View other sources
Hide other sources
 Available from Paulo Albuquerque · Jun 5, 2014
 Available from SSRN