# Online Demand Under Limited Consumer Search.

**ABSTRACT** Using aggregate product search data from Amazon.com, we jointly estimate consumer information search and online demand for durable goods. To estimate demand and search primitives, we introduce an optimal sequential search process into a model of choice and treat the observed market-level product search data as aggregations of individual-level optimal search sequences. The model builds on the dynamic programming framework by Weitzman (1979) and combines it with a choice model. At the individual level, the model has several attractive properties including closed-form expressions for the probability distribution of alternative search sets and breaking the curse of dimensionality. Using numerical experiments, we verify the model's ability to identify consumer tastes and search cost from product search data. Empirically, the model is applied to the camcorder online market and is used to answer manufacturer questions about market structure and competition, and to address policy maker issues about the effect of recommendation tools on consumer surplus outcomes. We find that consumer search for camcorders is typically limited to about 10 choice options, and that this affects the estimates of own and cross-elasticities. We also find that the vast majority of the households benefit from the Amazon.com's product recommendations via lower search costs.

**0**Bookmarks

**·**

**80**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**We explore how internet browsing behavior varies between mobile phones and personal computers. Smaller screen sizes on mobile phones increase the cost to the user of browsing for information. In addition, a wider range of offline locations for mobile internet usage suggests that local activities are particularly important. Using data on user behavior at a (Twitter-like) microblogging service, we exploit exogenous variation in the ranking mechanism of posts to identify the ranking effects. We show (1) Ranking effects are higher on mobile phones suggesting higher cognitive load: Links that appear at the top of the screen are especially likely to be clicked on mobile phones and (2) The benefit of browsing for geographically close matches is higher on mobile phones: Stores located in close proximity to a user’s home are much more likely to be clicked on mobile phones. Thus, the mobile internet is somewhat less “internet-like”: search costs are higher and distance matters more. We speculate on how these changes may affect the future direction of internet commerce.06/2012; - SourceAvailable from: Panos Ipeirotis
##### Conference Paper: Search Less, Find More? Examining Limited Consumer Search with Social Media and Product Search Engines

Thirty Third International Conference on Information Systems (ICIS 2012); 01/2012 - SourceAvailable from: Stephan Seiler[Show abstract] [Hide abstract]

**ABSTRACT:**Prices for grocery items differ across stores and time because of promotion periods. Consumers therefore have an incentive to search for the lowest prices. However, when a product is purchased infrequently, the effort to check the price every shopping trip might outweigh the benefit of spending less. I propose a structural model for storable goods that takes into account inventory holdings and search. The model is estimated using data on laundry detergent purchases. I find search costs play a large role in explaining purchase behavior, with consumers unaware of the price of detergent on 70 % of their shopping trips. Therefore, from the retailer’s point of view raising awareness of a promotion through advertising and displays is important. I also find a promotion for a particular product increases the consumer’s incentive to search. This change in incentives leads to an increase in category traffic, which from the store manager’s perspective is a desirable side effect of the promotion.Quantitative Marketing and Economics 11/2012; · 1.50 Impact Factor

Page 1

Electronic copy available at: http://ssrn.com/abstract=1340267

ONLINE DEMAND UNDER LIMITED CONSUMER SEARCH∗

Jun B. Kim†

Paulo Albuquerque‡

Bart J. Bronnenberg§

October 19, 2009

Abstract

Using aggregate product search data from Amazon.com, we jointly estimate consumer information

search and online demand for consumer durable goods. To estimate the demand and search primitives,

we introduce an optimal sequential search process into a model of choice and treat the observed market-

level product search data as aggregations of individual-level optimal search sequences. The model builds

on the dynamic programming framework by Weitzman (1979) and combines it with a choice model. It

can accommodate highly complex demand patterns at the market level. At the individual level, the

model has a number of attractive properties in estimation, including closed-form expressions for the

probability distribution of alternative sets of searched goods and breaking the curse of dimensionality.

Using numerical experiments, we verify the model's ability to identify the heterogeneous consumer tastes

and search costs from product search data. Empirically, the model is applied to the online market for

camcorders and is used to answer manufacturer questions about market structure and competition,

and to address policy maker issues about the e?ect of selectively lowered search costs on consumer

surplus outcomes. We ?nd that consumer search for camcorders at Amazon.com is typically limited

to little over 10 choice options, and that this a?ects the estimates of own and cross elasticities. In a

policy simulation, we also ?nd that the vast majority of the households bene?t from the Amazon.com's

product recommendations via lower search costs.

Keywords: cost-bene?t analysis, optimal sequential search, demand for durable goods, information

economics, consideration sets

∗The authors are grateful to seminar participants at the 2009 Marketing Dynamics Conference in Waikato (New Zealand),

Dartmouth College, Erasmus University Rotterdam, Georgia Tech, GSB Stanford University, GSB University of Chicago, Hong

Kong University of Science and Technology, National University of Singapore, Santa Clara University, Tilburg University, and

the University of Texas, Dallas. We also thank participants at the 2009 QME conference participants and Chad Syverson in

particular for comments and feedback.

†Jun B. Kim is Assistant Professor of Management at Georgia Institute of Technology.

‡Paulo Albuquerque is Assistant Professor of Marketing at the Simon Graduate School of Business, University of Rochester

§Bart J. Bronnenberg is Professor of Marketing, and CentER Research Fellow, Tilburg University. Bronnenberg grateful

acknowledges EU funding from the Marie Curie Program (IRG 230962)

1

Page 2

Electronic copy available at: http://ssrn.com/abstract=1340267

1Introduction

Online demand for consumer durables and search goods is large and rapidly growing. Comscore (2007)

estimates that non-travel U.S. online consumer spending in 2006 reached $102.1 billion. Jupiter Media

Metrix (2004) estimated U.S. online consumer spending of $65 billion in 2004, with $20.2 billion on durable

consumer search goods, and another $8.3 billion on information goods. The Comscore report shows that the

fastest growing e-commerce categories include durable consumer goods such as video consoles, consumer

electronics, furniture, appliances, and equipment, as well as information goods such as books and magazines,

music, and software. These categories saw annual growth rates for 2007 of 25% to 50% range. PC World

reported in 2007 that the ?appeal of online shopping? is growing. Between August 2006 and the same

month a year later, 14 percent of the $159 billion that U.S. shoppers spent on consumer electronics was

spent online, up from 5 percent a year earlier, according to the Consumer Electronics Association.

In this paper, we seek to understand online demand by studying the product information acquisition

for durable search goods and/or information goods at Amazon.com using aggregated histories of search

behavior, which provides us with a unique opportunity to directly observe category-level consumer browsing

behaviors. Our premise is that we can learn about the preferences of consumers by studying their ?shopping

behaviors.? That is, because the examination and inspection of goods or services come at the cost of

the consumer's time and e?ort, search outcomes become informative about what the consumer wants.

Observing the particular region of the attribute space in which a consumer invests time browsing products

may teach us something about her preferences. Speci?cally, our proposal is to treat browsing behavior

as the outcome of an optimal sequential search process across choice options for which the consumer has

di?erent expectations and uncertainties. In addition, these choice options need not all be equally accessible

and may be o?ered to consumers at di?erent search costs (for instance through the use of seller sponsored

recommendation engines). Recognizing that the three demand primitives - expectations, uncertainties, and

search cost - can be changed by interested parties, e.g., manufacturers or policy makers, the substantive

goal of this paper is to analyze the impact of limited search on choice decisions of the consumers and on the

competitive market structure. Methodologically, we introduce an optimal sequential search process into a

model of choice and identify the demand parameters of interest from the search data.

Table 1 presents an example of viewing data for camcorders obtained from Amazon.com. The table lists

products that were viewed by consumers conditional on viewing a particular (or a focal) product, the Sony

DCR-DVD108. In addition, the order in which these products are listed is determined by an Amazon.com

algorithm that uses the frequency of same-session viewing of the focal product and the other products.1

1This data generating mechanism is explained separately in the data section.

2

Page 3

Electronic copy available at: http://ssrn.com/abstract=1340267

The table does not list all existing 300+ camcorder options and re?ects the fact that some options are

seldom or never viewed together with the focal product by any consumer in the same online session. The

data in Table 1 exist for each of the camcorder options as the focal product. Because the products in the

view list are rank ordered, we refer to these data as the view-rank data. The paper shows that, across

viewed products, the view-rank data are informative about substitution, and that from viewed to non-

viewed products, the data imply either low or lack of substitution. For durable goods, where meaningful

observations of consumer switching are usually very limited, the premise of this paper is that the view-rank

data are in the spirit of revealed measures of substitution.

A related premise is that the view-rank data can be used to estimate the demand system. This would

be of interest to practitioners and policy makers because the Amazon.com view-rank data are publicly

available, and contain cross-product information that is not present in reports of sales volume or market

shares.

The general approach in the paper is to model the view-rank data as the aggregation across consumers

of individual-level optimal search sequences, in which each consumer tries to maximize her expected util-

ity, taking into account the search costs of the alternatives that she inspects. At the individual level,

our approach yields a probabilistic model of optimal search set formation, is not subject to the curse of

dimensionality, and is purposely suited to be estimable using the view-rank data. Using data experiments,

we ?nd that the model is successful at identifying the parameters of a choice-based demand system with

random e?ects. In addition, the model correctly identi?es search cost and search set size.

From an application of our model to the Amazon.com camcorder category, we ?nd the following results.

The median (average) search set contains 11 (14) products, with about 40% of consumers searching less

than ?ve products out of a total of over 90 products. We ?nd that the cost of search is signi?cant and is

subject to consumer heterogeneity. The search cost is lowered for products which appear more frequently

at Amazon.com, measured by the total number of references to the product. We also ?nd that online

competition between many products is e?ectively 0, because many products are not jointly searched by

consumers. In fact, when looking at the estimated frequency of co-viewership of two products, we ?nd that

the large majority of all possible product pairs, about 70%, is viewed by less than 5% of the population. This

implies severe limits on substitution, which in turn causes many cross-price elasticities to be numerically

zero. Finally, our results show that almost everyone bene?ts from the product references that selectively

lower search costs at Amazon.com

The remaining of the paper is organized as follows. The next section reviews the background literature.

Section 3 outlines the model. Section 4 presents the data and discusses the Amazon.com's data generation

process. Section 5 explains model operationalization and estimation, and discusses empirical identi?cation.

3

Page 4

View-rank

1

2

3

4

5

6

7

8

9

10

11

12

13

...

37

38

39

40

41

42

43

Brand

SONY

Media formatOptical Zoom

25

32

20

10

40

40

25

30

20

25

10

10

10

...

10

35

35

32

25

32

32

···

···

···

···

···

···

···

···

···

···

···

···

···

···

Price

$443.32

$248.11

$539.00

$665.20

$509.84

$299.99

$363.88

$347.55

$257.43

$345.99

$552.42

$378.45

$790.22

...

$752.75

$354.78

$376.57

$289.39

$554.14

$488.88

$361.81

DVD

PANASONIC MINIDV

DVD SONY

SONY

SONY

SONY

SONY

HD

HD

MINIDV

MINIDV

DVD

MINIDV

DVD

MINIDV

DVD

DVD

PANASONIC

SONY

CANON

SONY

HITACHI

SONY

... ...

SONY

CANON

CANON

DVD

DVD

DVD

···

···

···

···

···

···

···

PANASONICMINIDV

SONY

JVC

HD

MINIDV

DVDPANASONIC

Table 1: Product alternatives searched at Amazon.com, in May 2007, given search of a Sony Camcorder

with DVD media format, 40 × optical zoom, 2.5-Inch swivel screen, etc., selling at $328

Section 6 presents evidence from numerical experiments to show that the model is identi?ed. Section 7

presents empirical results, model robustness checks, and model validations. Section 8 contains two policy

experiments and describes managerial implications. Section 9 concludes.

2Background

Marketing scholars and economists have long recognized that consumers do not in general search or consider

the universal choice set, due to reasons such as non-zero search cost, product proliferation, and preference

dispersion (e.g., Hauser and Wernerfelt 1989; Howard and Sheth 1969; Nelson 1970; Stigler 1961). The

recent popularity of the choice based demand system has brought renewed attention to the issue of modeling

choice sets and the concerns exist that not taking into account the limited nature of choice sets leads to

biased estimates of demand (Bruno and Vilcassim 2008; Chiang, Chib, and Narasimhan 1999; Goeree

2008). Papers in this tradition specify a probability of a product being known (Goeree 2008) or accessible

(Bruno and Vilcassim 2008) that is not the outcome of an optimal search process but simply constitutes

a consumer response to ?rms' actions. In this paper, we advocate that such responses can be measured

in the context of how they a?ect the consumer's search strategies, and that if one has access to outcomes

of search behavior, as we do here, those become informative of important demand primitives when viewed

4

Page 5

through the lens of optimal information search.

Understanding consumer information search has been an important topic in both marketing and eco-

nomics and hence research on consumer information acquisition abounds. Starting with Stigler (1961),

early research on consumer information acquisition focused on consumers searching for price-quotes in

homogeneous goods markets at some e?ort. Extending the scope of consumer search to issues of market

outcomes, several authors theorized that limited consumer information search may have a signi?cant im-

pact on market structure (Diamond 1971; Nelson 1974; Anderson and Renault 1999). In this paper, we

model consumer search behavior not only to evaluate market structure issues, but also to evaluate the

impact of changing search costs by ?rms on consumer surplus.

We model the consumer's willingness to search for choice options by assuming that the consumer is

motivated to search only if she bene?ts from doing so. There is already a tradition in the consideration set

literature to represent consideration sets as the outcome of non-sequential search (Roberts and Lattin 1991;

Mehta, Rajiv and Srinivasan 2003). This tradition rests on the ?xed sample strategy proposed in Stigler

(1961) as an optimal search policy for a consumer in a commodities goods market under price uncertainty.

In contrast, McCall (1965) and Nelson (1970) argue that a sequential search strategy is optimal in terms of

total cost2and since we additionally believe that online search is more correctly captured as a sequential

process, we will model online search for information in this study as a sequential process and use the

theory of optimal sequential search. Seminal contributions to sequential search theory have been made

by Weitzman (1979), in the case of single agent problems and by Reinganum (1982, 1983) in the case of

multiple agent problems. We implement the optimal search strategies of these papers into a single-agent

random utility choice model.

In contrast to a large volume of theoretical work, there has been relatively limited empirical research on

consumer information search using secondary data. Two recent exceptions are papers on empirical search

for commodities (Hong and Shum 2006) and for di?erentiated products (Hortaçsu and Syverson 2004).

In the former, the authors devise a model that translates the price dispersion into heterogeneous search

cost across population. In the latter, the authors develop a model to translate the utility distribution into

heterogenous search cost. In our case, like Hortaçsu and Syverson (2004), we model search for di?erentiated

products, but unlike them, we have collected direct measures of search outcomes, allowing us to estimate

a more general demand model. For instance, in contrast to the homogeneous demand model in Hortaçsu

and Syverson (2004), we believe that information about which products tend to be viewed together allows

us to estimate heterogeneous consumer preferences in a di?erentiated product category.3

2Actually, block-sampled search strategies have been argued to be even better (see e.g., Morgan and Manning 1985).

However, in online search such strategies can not be executed and therefore they are not considered here.

3For a comprehensive review of several empirical applications, see Moraga-Gonzáles (2006).

5

Page 6

With our choice model that includes optimal sequential search, we seek to explore the in?uence of

retailer product recommendations, a mechanism to selectively lower search costs, on consumer search

behavior and its impact on market structure. Given the popularity and ubiquity of recommendations at

many online stores, it is of practical and academic interest to investigate how recommendations a?ect

the consumer information and product search decisions. In behavioral work, Huang and Chen (2006)

report that the recommendations of other consumers in?uence the choices of subjects more e?ectively than

recommendations from an expert. Senecal and Nantel (2004) also show that retailer recommendations will

signi?cantly a?ect demand.

3A demand model with costly sequential product search

3.1Utility

Our modeling assumptions at the individual level are as follows. Consumer i has a utility for product

j = 1,...,J that is equal to

uij= Vij+ eij

(1)

with

Vij

=

Xjbi

bi

∼

N(b,B)

eij

∼

N(0,σ2

ij),

where Xj is a row vector of product characteristics and biis a vector that represents individual-speci?c

sensitivities to product characteristics. We assume the matrix B is diagonal. The outside good is the

(J + 1)stalternative, and the consumer is aware of the option not to buy. This option does not require a

search and is available at no cost.

The utility function contains an expectation of Vij and an unknown component of utility, eij. Our

interpretation is that this decomposition partitions what the consumer knows and does not know into Vij

and eij, and the consumer's goal of search is to resolve eij4. The most relevant attributes, whose values

are de?ned by Vij, are accessible from general category information displays without retrieving the product

detail web page,5thus facilitating the existence of an expectation Vij prior to search. Before accessing a

4Our interpretation is consistent with Nelson (1970) who de?nes consumer search as an information problem to fully

evaluate the utility of each option.

5In the digital camcorder category page at Amazon.com, consumers have access to important product characteristics in

6

Page 7

product page, knowledgeable consumers may have lower variance eij's and less knowledgeable consumers

may have higher-variance eij's. When consumers request the product detail web page, they see more details

about the product, which resolves eij.

Resolving eij upon search comes at some cost. We introduce product and individual speci?c search

cost, cij, which we interpret mainly as time spent on discovering and evaluating the product.6We model

search cost as a log normally distributed random e?ect

cij∝ exp(Ljγi),

(2)

with

γi∼ N(γ,Γ),

where the matrix Γ is diagonal.The lognormal speci?cation ensures that the sign of cij is positive,

consistent with theory. The cost attributes Ljdescribe, for instance, the accessibility of product j and are

assumed to be known by the consumers. For instance, it may contain the appearance frequency of product

j at the store or the number of times it is recommended.

The consumer's search and choice process are the outcome of her desire to maximize expected utility

minus total search cost. This involves contrasting the marginal bene?t and marginal cost of search. The

objective of the analyst is to estimate b, B, γ, and Γ from data.7

3.2A model of sequential search

In sequential search, a consumer decides to stop or continue search each time after having searched a

product. The theory of optimal sequential search states that consumers only continue search if the marginal

bene?ts of doing so outweigh the marginal costs.

Utility uij of consumer i for product j is Vij+ eij. De?ne u∗

iat any stage of the search process as

the highest utility among the searched products thus far. The consumer's expected marginal bene?t from

search of product j is

Bij(u∗

i) =

?∞

u∗

i

(uij− u∗

i)f (uij)duij,

(3)

where f(·) is the probability density distribution of uij. The marginal bene?t is the expectation of the

camcorder such as brand, price, media format, zoom, pixel number, and the dimension for all products in this category.

6Search cost is di?erent between consumer packaged goods and consumer durables. For packaged goods in which experience

is more easily obtained, mental maintenance and processing cost constitute the majority of search cost (Lattin and Roberts

1991). For one-time purchases such as consumer durable goods, it is more likely that search costs are determined by the

time spent on searching for more information and the need for evaluation. Therefore in the context of digital camcorders, we

interpret search cost as the opportunity cost of time invested in identifying and evaluating another candidate product.

7In the empirical analysis, we will assume that σ2

ij= 1, but in the modeling section we wish to keep the level of product

uncertainty general.

7

Page 8

utility for j given that it is higher than u∗

i, multiplied by the probability that uijexceeds u∗

i.8Note that the

bene?t of search only depends on the arrangement of utility above u∗

i. The left tail of the utility distribution

below u∗

ican be arbitrarily rearranged without a?ecting search or choice.

The goal of the consumer is, given the current best option, to maximize expected utility minus incurred

search cost over a set of options that, at the individual level, are characterized by product speci?c mean

utilities, Vij, product speci?c search costs, cij, and product speci?c uncertainties, captured by σ2

ij. This

implies that the consumer continues search if there exists at least one j such that

cij< Bij(u∗

i),

(4)

i.e., if the expected marginal bene?t of searching is larger than the marginal cost, cij.

The optimal sequential search strategy can be formalized as follows. First, partition the set of options

into Si∪¯Si, with Si containing all searched options and¯Si containing all non-searched options. All

decision relevant information about Siis contained in u∗

i= maxj∈Si

?uij,ei(J+1)

?, provided we assign 0

?. De?ne the value function

to the deterministic component of utility for the outside good.

At any point in the search process, the state of the system is given by?u∗

i,¯Si

W?u∗

going forward. This value function must satisfy the following Bellman equation (Weitzman 1979)

i,¯Si

?as the expected (discounted) value of following an optimal search policy, from the current state

W(u∗

i,¯Si) = max(u∗

i,max

j∈¯Si(−cij+ βi· [F(u∗

i) · W(u∗

i,¯Si− {j})

??

?

?

uij≤u∗

i

+

?∞

?

u∗

i

W(uij,¯Si− {j})f(uij)duij

?? ?

i, or the

i,¯Si−

uij>u∗

i

]))

(5)

This equation says that from state?u∗

consumer can search any j ∈¯Si. In the latter case, the consumer gets an expectation F(u∗

{j}) +?∞

in a single session are conducted in a short time span, we set the discount rate βito 1.

i,¯Si

?, the consumer can terminate search and collect u∗

i) · W(u∗

u∗

iW?uij,¯Si− {j}?f (uij)duij, which she seeks to maximize across j. Because all online searches

Now we discuss some important modeling assumptions in our proposed search model. First, our model

is a full information model in which the consumers are assumed to have full knowledge about the products

8This can be seen by writing equation 3 alternatively as

Bij(u∗

i) = (1 − Fj(u∗

i)) ×

?inf

u∗

i

(uij− u∗

i)

f (uij)

?1 − Fj

?u∗

i

??duij,

which is the multiplication of the chance that the utility draw is larger than u∗

from the distribution of uijabove u∗

i

iand the expected value of a truncated draw

8

Page 9

and their attribute values. This allows the consumers to form Vij for all products prior to search and

to use them in computing the reservation utilities during the sequential search process. Later in the

empirical section, we conduct a series of robustness tests in which we relax the aforementioned assumption.

In these robustness tests, we assume that consumers have partial knowledge about the products and

investigate whether this partial knowledge assumption meaningfully a?ects our model estimates. Second,

we assume that E(eij·ekl) = 0 and thus that the correlation of the unobserved portion of the utility across

individuals and products is zero. The corresponding consumer behavior underlying this assumption is that

the consumers have well-de?ned preferences prior to search and do not learn about the products during the

search process9. Third, we do not include any context or reference e?ect in the proposed model of optimal

consumer search. Although we acknowledge that a more comprehensive model should include such e?ects,

we use the cost-bene?t framework as the ?rst-order approximation of the optimal consumer search. Lastly,

we do not model Amazon.com's (potentially) strategic behavior in setting prices and product information

(eij).10

3.3The optimal strategy

The solution to the above dynamic program is to continue searching until a utility u∗

iis discovered that

is larger than some limit, which in turn depends on how much option value is still left in the unsearched

set. This limit depends on a quantity that is called a ?reservation utility?. To de?ne this concept, each

consumer i has a reservation utility zijfor each product j that - if she had already found a product with

that utility - leaves her indi?erent between searching and not searching j. In other words, the reservation

utility zijobeys the following equation (see also equation 4, above):

cij= Bij(zij) =

?∞

zij

(uij− zij)f (uij)duij.

(6)

Thus, the reservation utilities solve zij= B−1

and a separate appendix provides the details of computation of zijincluding its existence and uniqueness.

ij(cij). The estimation section establishes that Bijis monotonic

The optimal search strategy (see, e.g., Weitzman 1979) that solves the consumer's maximization problem

of equation 5 has three components; a selection rule, which determines the ordering of the search sequence,

a stopping rule, which determines the length of the search sequence, and a choice rule.

9We discuss this topic in detail in the next section.

10Amazon.com does not di?erentiate prices or product information across consumers. We allow for ?exible product ?xed

e?ects in the model, thereby reducing the potential for endogeneity biases stemming from potentially strategic prices or supply

of product information. We acknowledge that the issue is important and warrants future study. The issue of how much and

which product information Amazon.com should supply in order to maximize pro?ts is interesting but is outside the scope of

this paper.

9

Page 10

1. Selection rule: Compute all reservation utilities zijand sort them in descending order. If a product is

to be searched, it should be the product with the highest reservation utility zijamong the products

not yet searched.

2. Stopping rule: Stop searching when the highest utility obtained so far, u∗

i, is greater than maxj∈¯Si(zij)

among the unsearched items.

3. Choice rule: Once search stops, collect u∗

iby choosing the maximum utility alternative in Si.

We note that this search and choice process can accommodate that some consumers do not search at all.

Indeed, consumers for whom maxj(zij) < 0 for all j will not ?nd it worth their time to search brands. They

will choose the outside good.11The same process can also accommodate that some consumers just browse

but do not buy. For such consumers, maxj∈Si(zij) > 0, but maxj∈Si(uij) < 0. These two statements are

not in con?ict, as will be seen below. These consumers will also choose the outside good.

We assume that the optimal selection and stopping rules above are derived assuming information

obtained by searching one product does not a?ect the knowledge of other products. That is, we do not

assume consumer learning during the search process. From the modeling perspective, we assume that eij

are independent given Vijduring the search process. The current consumer behavior literature advocates

that this is a reasonable assumption for consumers engaging in search processes (Moorthy, Ratchford, and

Talukdar 1997).

Two important points need to be made. First, given a choice set, the choice model above is not a probit

model. For instance, given the stopping rule above, search beyond item k is continued only if the utility

draw for eik is low enough. This implies that conditional on observing a speci?c choice set, the eij are

not distributed normal with mean 0 and variance σ2

ij. Therefore, given search, choice probabilities do not

follow a standard probit.

Second, Chiang, Chib and Narasimhan (1999) mention that identi?cation of choice sets (or in this case:

search sets) is subject to the curse of dimensionality. Indeed, in a non-sequential search process, with J

possible alternatives, there exist 2J− 1 possible search sets. This large number of permutations would

render the computation of the search frequency of any given product impossible with universal choice set

sizes of J = 300+ at Amazon.com. However, an important computational windfall of the sequential search

process is that it is not subject to the curse of dimensionality. Given the selection rule above, there are only

J possible optimal choice sets at the individual level. Given a set of individual level parameters, there will

be an ordering of the choice alternatives along their reservation utilities zij, and the consumer optimally

11Note that because we estimate our model with search data, we assume that all consumers search at least one product.

However, the model actually accommodates non-search behavior.

10

Page 11

00.51

c

1.52

−1

0

1

2

3

z

012

σ2

34

0

1

2

3

4

5

z

Figure 1: The relation between search cost, c, product uncertainty, σ2, and search attractiveness, z.

samples these choice alternatives in descending order. Thus, if the zijcan be computed, the contents of a

search set of size m is known. In sum, whereas, across consumers, the model allows for the existence of any

of the 2Jpossible search sets as a consequence of consumer heterogeneity, at the individual level only J of

these sets can be an outcome of the optimal sequential search process that belongs to a particular vector

of individual parameters.

Before completing the model, we investigate some properties of the search sequence by means of an

example.

3.4Some characteristics of the search sequence

In Figure 1, we plot the relation between c, σ2, and z, under the assumption of normality of eij. The

left hand panel varies search cost from 0 to 2, and measures the change in the reservation utility zij. For

reference, in this example we choose Vij= 1 and σ2

ij= 1. The zij, the reservation utility, or more intuitively

the relative attractiveness of searching j, is decreasing in its search cost. As search cost increases, zijgoes

to Vij− cij. This implies that as search costs increase relative to product uncertainty, the attractiveness

of search tends to go to the expected utility net of search cost. On the other hand, if search costs are low

relative to product uncertainty, or product uncertainty is high relative to search cost, zijgoes to in?nity.

Indeed, if it is free to search, the option value (upside) of searching any product that has utility support

on R+is in?nite.

The relation between σ2and z in the right hand side panel shows the option value of uncertain prospects.

For reference, in this graph Vij= 1 and cij= 0.1. As outlined above, in sequential search, the search value

of a product is determined by its upside. That is, anything lower than the current maximum u∗

iis irrelevant.

11

Page 12

Per consequence, the reservation utility zijis increasing in product variance. As a natural consequence, if

novice consumers are characterized by having high σ2

ijrelative to Vij, they will tend to have higher zijand

thus search more than consumers who have more experience. For completeness, we note that zijincreases

linearly in Vij.

3.5 Inclusion probabilities and set occurrence

Our data are a function of the frequency with which products are being viewed or searched, and therefore

we seek to derive the probability πijthat a given product j is included in the optimal search set of consumer

i. Consider that we know zijand Vijfor each individual and product. With some abuse of notation, denote

the rank of zijby r, with r(1) returning the index j of the highest ranked zijand r(J) returning the index j

with the lowest ranked zijfor individual i. From these de?nitions, πi,r(1)is the inclusion probability of the

product with the highest ranked zijfor consumer i, and πi,r(j)is the inclusion probability of the product

with the jthhighest ranked zij.

The contents of set Sikis fully determined by the selection rule (ranking on z) and the stopping rule

(the size k). The probability πi,r(j)that product r(j) is in the set, is equal to the probability that the ?rst

j −1 draws of utilities all fell short of zi,r(j)(which is less than zi,r(j−1)by the selection rule above). Thus,

the inclusion probability of product r(j) is

πi,r(j)

=Pr

?

maxj−1

k=1

?Vi,r(k)+ ei,r(k)

?< zi,r(j)

, j > 1

?

?

(7)

=

j−1

?

k=1

F?zi,r(j)− Vi,r(k)

?

(8)

with πi,r(1)= 112and F(·) is the cumulative probability distribution of eij, which in our case is the normal

distribution with mean 0 and variance σ2

ij.

There are three useful properties of these probabilities of inclusion.

1. First, it is trivial to show that πi,r(j)> πi,r(j+1), or the inclusion probability of the (j + 1)thproduct

is always less than the inclusion probability of the jthproduct.

2. Second, given the sequential nature of search and the selection rules of the optimal strategy, the

probability that r(j) and r(j + k) occur together in a set is equal to the probability that r(j + k) is

in the set.

πi,{r(j) and r(j+k)}= πi,r(j+k)= min?πi,r(j),πi,r(j+k)

?,

(9)

12Because our data are predicated on the occurrence of search, consumers search at least one product.

12

Page 13

where the last step is from the ?rst property. In the estimation section, we will use the last formulation

of this property, when we need to determine the probability that two product j and k are jointly in

the set.

3. Third, given the sequential nature of choice and the independence of the eij, the probability that

the set Sikoccurs can be computed as follows. First, recall that Sikis the optimal set of size k for

individual i. The probability that Sikoccurs is equal to the probability that search continues beyond

r(k − 1) minus the probability of continuing search beyond r(k). This is equal to the chance that

r(k) is in the choice set minus the chance that r(k + 1) is in the choice set of consumer i. Thus

Pr?Si,r(k)

?= πi,r(k)− πi,r(k+1).

(10)

This concludes the statement of the individual-level model. The aggregation to the level at which Ama-

zon.com reports its data is explained in the estimation section. For completeness, it is also explained that

an alternative to the approach in this subsection is to use draws of the eij and compute realizations of

the process. This would lead to less computation at the individual level, but at the aggregate level we

would have to use a frequency estimator for market level behavior, whereas using the model above, we can

integrate over a probability model with far greater precision.

4Data

4.1 The view-rank data

We have collected, on a regular basis, the view-rank data for all camcorder products from May 2006 until

October 2007. To ensure that the analysis is based on a su?ciently large sample of viewing behaviors, and

because we do not have information about the temporal window used by Amazon.com in computing the

view-rank data, we use the products that appear throughout the data collection period and aggregated

their rank orders in the view-rank data to the monthly level.13We use data from the month of May 2007.

At ?rst, we extracted top 200 camcorders from the Amazon.com website, based on sales rank. We

13We use average sales price in our analysis. In the data, we observe that the positions of products in the view-rank lists

?uctuate over time. This calls for an averaging mechanism for the di?erent positions of a product in the view-lists over time.

For this averaging procedure, we use the percentile ranking similar to Bajari, Fox and Ryan (2007). In the percentile ranking,

the product with the highest rank among J products is coded as J, not 1. Then we normalize the rank of product j at time

t as,

ˆ rjt=

maxk{rkt}

Once we compute ˆ rjt, the percentile ranking of the product j at t, we compute the average ranking of product j as the mean

of the daily percentile ranking as ˆ rj=

T

rjt

(11)

1

?

tˆ rjt.

13

Page 14

Attributes

Brand

Media Formats

Price

Form

High-De?nition

Pixel

Zoom

Ranges

Sony (31), Panasonic (19), Canon (15), JVC (15), other (11)

MiniDV (33), DVD (30), FM(9), HD (19)

$ 530 (mean), $ 263 (std. dev.)

Compact (8), Conventional (83)

Yes (14), No (77)

1.67M (mean), 1.45M (std. dev.)

19.8 (mean), 10.9 (std. dev.)

Table 2: Description of the choice options in the empirical data (with frequency of occurrence in parenthesis)

removed the smaller players such as Aiptek, Samsonic and DXG who cater to the lowest-price tier only

with di?erent types of camcorders and which have very low sales-rank. We also removed from the analysis

those camcorders on which we had no observations of media format14, and all camcorders of professional

grade. After applying these data ?lters, we are left with 91 choice alternatives. The summary statistics of

the products are shown in Table 2.

All 91 products have their own view-rank lists, i.e., all of the products have a list from which we

observe which other products are closely related, in the order of decreasing relationship. On average, a

given product appears 24 out of 90 times on other product's view list with a standard deviation of 18. The

minimum number of appearances is 0 while the maximum is 83.

Table 3 gives the results of a descriptive regression of the number of appearances on the view-lists. Note

that Sony, Panasonic, and Canon appear most frequently in the view-rank lists. Further, high de?nition

and pixel size improve the number of appearances, while higher price reduces it. We conclude that the

number of appearances on the view-rank data depends on demand drivers such as product attributes and

prices.

We point out the rich information embedded in the Amazon view-rank data. For every focal product k,

Amazon.com provides a list of top N most related products among the remaining J − 1 products.15Also,

product k may appear on the view-rank lists of other J −1 products. Therefore, the data reveal a complex

pattern of relations between a given product k and the other J − 1 products.

Lastly, we discuss the type of consumers who we believe are represented in the product search data.

Moe (2003) classi?es online store browsing behavior of consumers into four di?erent categories - directed

buying, search and deliberation, hedonic browsing, and knowledge building. She also classi?es the contents

of e-commerce web pages into three di?erent categories: product, category, and information pages. She

reports that the consumers in directed buying mode will frequently visit the product page while the

consumers in the mode of search and deliberation focus on both product and category pages. Hedonic

14The media formats of the products in the data include ?ash memory(FM) and hard drive(HD).

15During the data collection period, Amazon.com listed up to 45 products that are related to the focal product.

14

Page 15

Variable

Intercept

Sony

Panasonic

Canon

JVC

Samsung

MiniDV

DVD

FM

Compact

High De?nition

Zoom

Screen Size

Pixel

Price

Appearance frequency

R2

β

std.err.

22.71

15.16

14.56

14.09

14.51

12.79

4.21

4.34

9.00

8.88

5.38

0.20

6.28

1.64

9.28

0.23

-10.64

51.80

48.61

44.61

30.42

36.09

-13.42

-22.58

-18.69

1.80

16.10

0.17

1.28

6.49

-36.40

1.65

0.60

Table 3: Descriptive regression of the frequency of product appearance against product characteristics

browsers focus on category pages while consumers in knowledge building will focus on information pages.

Montgomery et. al (2004) also identify that the focus of the consumers in the buying mode is product detail

pages. Amazon.com's product search data are based on the number of consumers who requested product

detail pages from the Amazon.com server. Therefore, consistent with previous research, we conjecture that

Amazon.com's product search data predominantly re?ect the behaviors of consumers in either buying or

search phase with a vested interest in the product category.

4.2Other measures of search at Amazon.com

We now discuss other data that are available at Amazon.com. At each product detail page, Amazon.com

lists up to four top products purchased by past consumers who searched the product in the current detail

page. These product references serve as shortcuts to other closely relevant products, thereby reducing

consumers' search costs for potentially attractive products. We use the number or frequency of appearances

of each product aggregated across other product pages as an explanatory variable that a?ects search cost.

For instance, we hypothesize that a product that appears frequently at the store level will have a smaller

search cost compared to products that do not. Hereafter, Lj denotes the frequency of appearances of

product j over all product pages.16

16It is important to ensure that there is no multi-collinearity between Lj and other product characteristics we use in our

empirical analysis. In order to verify this, we regressed Ljon the product attributes such as brands, media formats, prices and

etc. We ?nd that the model ?t is relatively low (R2= 0.12) and none of the coe?cients are signi?cant, thereby supporting a

lack or low level of multi-collinearity between Ljand other product characteristics.

15

#### View other sources

#### Hide other sources

- Available from Paulo Albuquerque · Jun 5, 2014
- Available from SSRN