Content uploaded by Richard Chen

Author content

All content in this area was uploaded by Richard Chen on Oct 24, 2017

Content may be subject to copyright.

arXiv:1709.00197v1 [stat.AP] 1 Sep 2017

Incentivized Advertising: Treatment Eﬀect and Adverse Selection

Khai X. Chiong

Assistant Professor of Marketing

Naveen Jindal School of Management

University of Texas at Dallas

khai.chiong@utdallas.edu

Richard Y. Chen

Research Scientist

Y Combinator Research, San Francisco

richard.chen@ycr.org

Sha Yang

Professor of Marketing

Marshall School of Business

University of Southern California

shayang@marshall.usc.edu

1

Incentivized Advertising: Treatment Eﬀect and Adverse Selection

Abstract

Incentivized advertising is a new ad format that is gaining popularity in digital mobile

advertising. In incentivized advertising, the publisher rewards users for watching an

ad. An endemic issue here is adverse selection, where reward-seeking users select into

incentivized ad placements to obtain rewards. Adverse selection reduces the publisher’s

ad proﬁt as well as poses a diﬃculty to causal inference of the eﬀectiveness of incentivized

advertising. To this end, we develop a treatment eﬀect model that allows and controls for

unobserved adverse selection, and estimate the model using data from a mobile gaming

app that oﬀers both incentivized and non-incentivized ads. We ﬁnd that rewarding users

to watch an ad has an overall positive eﬀect on the ad conversion rate. A user is 27%

more likely to convert when being rewarded to watch an ad. However there is a negative

oﬀsetting eﬀect that reduces the eﬀectiveness of incentivized ads. Some users are averse

to delayed rewards, they prefer to collect their rewards immediately after watching the

incentivized ads, instead of pursuing the content of the ads further. For the subset of

users who are averse to delayed rewards, the treatment eﬀect is only 13%, while it can be

as high as 47% for other users.

Keywords: online advertising, mobile, causal inference, bayesian estimation, endoge-

nous selection

2

1. Introduction

Mobile advertising, including video ads and banner ads in mobile devices, is a dominant

segment of digital advertising. In the U.S., businesses spending on mobile advertising

accounts for more than 50% of the total spending on digital advertising.1The growth

of mobile advertising is fueled by the widespread usage of mobile applications or apps

(Ghose and Han [2014]) – it is now commonplace to advertise on mobile apps.

Mobile advertising is also a fast evolving industry, where advertisers and publishers

continuously innovate on ad formats, improve data tracking capabilities (Goldfarb and

Tucker [2011a,b]) and optimize ad placements. In recent years, mobile publishers have

widely adopted a new format of ad placement, called incentivized advertising. In an

incentivized ad placement, publishers reward users for watching an ad. More generally,

incentivized ad takes the form of rewarding a user for completing an action related to the

ad.2Incentivized advertising is also commonly known as reward advertising.

Incentivized ads ﬁrst appeared among mobile gaming apps. Examples include ad place-

ments where publishers reward users with in-game virtual items, additional game levels

and lives, for viewing an ad, typically in a full-screen video format. One of the reasons for

using incentivized advertising is to reduce annoyance towards ads, which is of particular

concern in mobile advertising. Mobile devices have smaller screen sizes compared to per-

sonal computers, and as such it is more diﬃcult to eﬀectively advertise in mobile devices.

For instance, conventional banner ads are very intrusive in mobile devices. Moreover,

1According to the 2017 Internet Advertising Revenue Report from PricewaterhouseCoopers, spending

on mobile ads is $36.6 billions in 2016, while total spending on digital advertising is $72.5 billions.

2WSJ (Jan 5, 2016), More Marketers Oﬀer Incentives for Watching Ads

3

mobile apps especially mobile gaming apps, rely on a continuous user’s experience, so

that interstitial full-screen ads do not tend to work well.3

Incentivized advertising allows the app developer to incorporate advertising into the

game-play, for instance, by oﬀering to revitalize an injured game character if the user

watches an ad. Therefore incentivized ads allow for a more seamless transition between

gameplay and ads, which improves the playability of the game and reduces the annoyance

due to interruptions. Moreover, rewarding users to watch an ad could aﬀect the mood

of the users, and contribute to an overall positive perception towards the ads. For these

reasons, incentivized advertising has become a popular format of advertising within mobile

gaming apps. Various industry white papers have reported that incentivized advertising

is well-received by users.4It has even expanded beyond mobile gaming publishers.5

Despite the increasing adoption of incentivized advertising, little is known about how

incentivized advertising aﬀects users’ behavior (on the other hand, we have known quite a

bit about the eﬀects of other important formats of online advertising, see Bart, Stephen,

and Sarvary [2014], Bruce, Murthi, and Rao [2017], Manchanda, Dub´e, Goh, and Chinta-

gunta [2006]). To this end, we aim to study the causal eﬀect of incentivized advertising by

developing a treatment eﬀect model with unobserved selection. Our goal is to understand

and quantify the eﬀect of incentivized advertising on user’s conversion rate as compared

with non-incentivized advertising. That is, how much ad conversion rate changes as a

3This is related to the topic of ‘viewability’ in advertising. c.f. The Economist (March 26, 2016).

Invisible ads, phantom readers.

4eMarketer (July 1, 2014): Want App Users to Interact with Your Ads? Reward Them

5For example, the mobile music streaming app Spotify incentivizes users to watch a video ad with

30 minutes of ad-free music; the video streaming website Hulu incentivizes users to watch a longer video

ad with an ad-free episode; the mobile operator Sprint rewards certain users with reduced phone bill for

watching ads.

4

result of oﬀering rewards to users for watching ads. From a managerial perspective, this

model allows us to ask whether a publisher can obtain higher ad revenue using incentivized

or non-incentivized ad placements.

We estimate this treatment eﬀect model using a large impressions-level dataset from a

publisher who uses both incentivized and non-incentivized ad placements. This publisher

is a mobile gaming app, and incentivized ads take the form of rewarding a user with

additional game levels if the user watches a full-screen video ad trailer about another app.

The publisher uses CPI (cost-per-install)6pricing for all its ads, so that the publisher is

only paid whenever an ad leads to a conversion event, deﬁned as the user installing the

advertised app.

The main feature of our treatment eﬀect model is that we allow and control for unob-

served adverse selection. When the publisher rewards users for watching an ad, it causes

an adverse selection eﬀect where users who are reward-seeking self-select into incentivized

ad placements to obtain rewards. In the presence of adverse selection, a user is not ran-

domly assigned to either incentivized or non-incentivized ads, therefore it is important to

control for adverse selection in order to properly assess the causal eﬀect of incentivized

advertising. If reward-seeking attitude is an observable characteristics, controlling for

adverse selection is done using propensity score methods (Section 5).

When there is unobserved adverse selection, we develop and estimate a model where

users can endogenously select into watching incentivized ads, and where watching in-

centivized ads then translates into users’ outcomes. This model has two outcomes, an

6In other forms of online advertising such as sponsored search advertising, it is more common for the

publisher to be paid per clicks, see Ghose and Yang [2009], Hu, Shin, and Tang [2015], Rutz and Bucklin

[2011], Yao and Mela [2011], Zhu and Wilbur [2011]).

5

intermediate outcome where the user can express an intention to install the advertised

app, and a ﬁnal outcome where the user decides to install the app. In our dataset, we

observe both the intermediate and the ﬁnal outcomes of the users. In the intermediate

stage, the user chooses whether or not to click at the end of the ad, which redirects the

user to the App Store. In the ﬁnal stage, the user chooses whether to install the app

that was advertised. Identiﬁcation of the model requires a variable that enters into the

selection equation but not the outcome equations, while estimation is implemented using

Bayesian MCMC.

Our main result shows that rewarding users to watch an ad has a negative eﬀect on

the intermediate outcome (where the user clicks on the ads to proceed to the App Store).

Our explanation is that some users are averse to delayed rewards, and therefore prefer

to collect their rewards immediately after watching incentivized ads. As such, rewards

have the negative eﬀect of reducing the user’s intention to take any action that delays

the rewards. The user prefers to collect the rewards immediately instead of going to the

App store and installing a new app. We also ﬁnd that users exhibit varying degrees of

aversion to delayed rewards.

On the ﬂip side, we ﬁnd that rewarding users to watch an ad has a positive eﬀect on

install (the ﬁnal outcome) conditional on clicking the ad (the intermediate outcome). This

result is in line with common ﬁndings that giving out rewards induces positive eﬀects on

products adoption and purchases. In our context, when the publisher gives rewards to

its users, they induce the users to perceive the publisher’s content more favorably. As

such, an ad that is published in an incentivized ad placement is then perceived more

favorably by the users, and elicited a more positive response. This particular ﬁnding

6

has some basis in the consumer’s behavior literature, where researchers have found that

consumers’ aﬀective feelings of favorability toward the ad itself, is an important predictor

of advertising eﬀectiveness and response (Calder and Sternthal [1980], MacKenzie and

Lutz [1989], MacKenzie et al. [1986], Mitchell and Olson [1981]; Shimp 1981). Their

ﬁndings resonate with our explanation that rewarding users to watch an ad causes users

to feel less ad annoyance, and consequently increases the ad conversion rate.

The overall causal eﬀect of incentivized advertising depends on the interplay between the

negative eﬀect on clicking and the positive eﬀect on installing conditional on clicking. For

our particular publisher, we ﬁnd that incentivized advertising has an overall positive eﬀect

on the ad conversion rate. A user is 27% more likely to install when served incentivized

advertising compared to non-incentivized advertising. In terms of ad revenue, this eﬀect

is equivalent to a CPM (revenue per thousand of impressions) of $0.413. To give a sense of

the industry (mobile ad networks) benchmarks, the average CPMs for the US and China

are reported to be $7.00 and $2.70.7

Our result highlights the beneﬁts of targeting the placement of incentivized ads accord-

ing to demographics. Rewards have a negative eﬀect on Clicks for the users who are averse

to delayed rewards, and therefore the overall treatment eﬀect on Install is heterogenous

according to user’s characteristics. We ﬁnd that incentivized advertising is least eﬀective

when the device language is set to Russian (eﬀect size of 13%), and most eﬀective for

Chinese languages (eﬀect size of 47%). Given the potential cost of giving rewards, the

publisher should not use incentivized ads when the eﬀect size is close to zero.

7See http://ecpm.adtapsy.com/

7

The rest of the paper is organized as follows. Section 2 describes the data and relevant

industry background. Section 3 develops and estimate the model. Section 4 develops an

alternative estimator using the propensity scores. Section 5 concludes. The appendix

contains all ﬁgures and tables.

2. Data and industry background

The dataset comes from a mobile gaming app. The genre of the app is classiﬁed as

“Action” in the Android App Store (it is not available in iOS or other operating systems).

The app relies on publishing ads to monetize its user base. It uses both incentivized and

non-incentivized ad placements.

In the context of this publisher, we deﬁne an incentivized ad to be a video ad that

rewards users after the ad has been played. While a non-incentivized ad is a video ad

that does not reward users after the ad has been played. Every ad is either incentivized

or non-incentivized. The rewards are tied to the game itself (in-app rewards). Typically,

the rewards unlock additional levels in the game for the users.8

The content of the ad consists of a short video trailer showing another mobile app.

These ads are users-targeted – they show mobile apps that users are likely to download

and install. The targeting and serving of these ads are operated by a platform. The

platform shares a pre-speciﬁed amount of percentage revenue with the publisher.

Users are not allowed to skip the ads. At the end of the ad, the user can exit the ad

by either clicking the ‘x’ button, or the user can click on ‘Install’ button. When the user

8Another kind of incentivized advertising provides rewards for users to install apps, but Apple has

blocked applications with such ad formats since 2011. c.f. TechCrunch (April 2011) Apple Clamps Down

On Incentivized App Downloads

8

clicks on the ‘Install’ button, the user will be directed to the App Store where she can

download the advertised app.

We deﬁne Intermediate to be a binary variable indicating whether the user has expressed

an intention to install by clicking on the ‘install’ button at the end of the ad – whereby

the user would have a chance to review more information about the advertised app in the

App Store. Intermediate is an intermediate outcome. The ﬁnal outcome is Install, which

is a binary variable indicating whether the user has downloaded the advertised app.

This particular platform operates on a cost-per-install (CPI) model, where an advertiser

only pays the publisher in the event that the user installs the advertiser’s app. CPI

advertising is growing rapidly. Spending on CPI campaigns increased by 80% from 2014

to 2015 and accounted for 10.3% of of mobile advertising spend in 2015.9

2.1. Adverse selection

Adverse selection is an issue endemic to incentivized advertising. Adverse selection here

means the following: users deliberately seek out incentivized ad placements, in order to

obtain rewards. For instance, users who know where and when in the game to ﬁnd ad

placements that are incentivized could then seek them out. These reward-seeking users

have low intention to install new apps. Incentivized advertising becomes ineﬀective when

adverse selection is severe – users only watch ads to collect rewards and are not converted

to install. It remains an open question whether incentivized advertising is eﬀective and

should be widely adopted by publishers.

9eMarketer (December, 2015). Mobile Advertising and Marketing Trends Roundup

9

On the other hand, adverse selection also poses challenges to data analysis and causal

inference. Whenever an ad is served, it appears as an observation in our dataset. There-

fore in the presence of adverse selection, our sample of incentivized ads is self-selected

and consists disproportionately of reward-seeking users. Since a user is not randomly as-

signed to either incentivized or non-incentivized ads, estimating the eﬀect of incentivized

advertising would be biased. If reward-seeking attitude is an observable characteristics,

correcting for selection can be done using propensity score methods. This is accomplished

in Section 5. More generally, we develop and estimate a model which allows and controls

for unobserved adverse selection in Section 3.

2.2. Data and variable description

The dataset contains 365,847 observations generated from the publisher. The timeframe

spans from May 1, 2016 to May 31, 2016. Each observation consists of an ad serving

instance. An ad serving instance is also commonly called an impression.

Whenever an ad is served, it is recorded as a unit of observation in the database. Note

that after the ad has been served, a user can choose not to watch or pay attention to the

ad. The user can take some actions such as clicking or installing after the ad has been

served, which we observed (outcome variables). We also observe some characteristics of

the users (control variables).

Each row of the dataset corresponds to an impression, hence we say that we have

impression-level dataset. Now a single user may be served multiple ads by the publisher.

Although we have 365,847 impressions, there are 143,280 unique users. The median user

10

generated only 1 impression, while the average user generated 2.55 impressions (standard

deviation of 3.26).

We now describe the treatment and the outcome variables. We also provide some

summary statistics of these variables. Each variable is subscripted by i, which we refer to

as impression i.

(1) Incentivized,di: a binary (zero or one) variable, where di= 1 indicates that the

user is in the treatment group during impression i. The user has been served an

incentivized ad. If di= 0, then the user is in the control group and has been served

a non-incentivized ad. The mean of diis 0.6898, i.e. 68.98% of all observations

correspond to incentivized ads.

(2) Intermediate,yτ

i: a binary outcome variable indicating whether the user during

impression ihas expressed intention to install by clicking on the ‘Install’ button

at the end of the ad. This intention is credible in the sense that the user would

then be redirected to the relevant page in the Android App Store for downloading

of the advertised app. The mean of yτ

iis 0.1344, that is, there are 49,179 clicks on

‘install’.

(3) Install,yi: a binary outcome variable indicating whether the user during impres-

sion ihas downloaded the advertiser’s app to her mobile device from the Android

App Store. The mean of yiis 0.0029, that is, there are 1,067 installs in total.

In addition to the treatment and outcome variables above, we now describe the control

or covariate variables. These variables are the observable characteristics of the users.

11

(1) Language: the language used in the user’s mobile device. The top 5 languages by

number of observations are: (1) Spanish (ES), 35.86%; (2) English (EN), 25.82%;

(3) Portuguese (PT), 11.03%; (4) Russian (RU) 6.93%; (5) Chinese (ZH), 6.81%.

(2) Country: the country of the user based on device and time-zone setting. The top

5 countries by number of observations are: (1) India, 12.13%; (2) Mexico, 11.75;

(3) Brazil, 10.36%; (4) China, 7.40%; (5) Indonesia, 5.52%.

(3) Region: it is useful to group countries into geographical regions that are similar

to each other. We classify countries into statistical subregions as deﬁned by the

United Nations. The top 10 subregions by number of observations are: (1) South

America, 28.84%; (2) Central America, 15.33%; (3) Southern Asia, 14.68%; (4)

South-Eastern Asia, 14.19%; (5) Eastern Asia, 7.94%; (6) Eastern Europe, 5.21%;

(7) Western Asia (Middle East), 3.70%; (8) Northern America, 2.35%; (9) Central

Asia, 1.97%; (10) Southern Europe, 1.88%. However some of these regions are

highly correlated with languages. As such, we will not construct indicator variables

for Eastern Asia (correlation of 0.91 with ZH), Central America (correlation of 0.55

with ES), and Central Asia (correlation of 0.50 with RU).

(4) WiFi: whether the device is connected via WiFi or mobile data when the ad

request is sent to the intermediary. The average value of WiFi is 0.7804, that is,

78.04% of the users were on WiFi.

(5) Device Brand: the manufacturer of the user’s mobile device. Since this particular

app operates on an Android platform, one of the most prominent brand, Apple,

is not included here. The top 5 device brands by number of observations are: (1)

12

Samsung, 40.89%; (2) Motorola, 7.11%; (3) Huawei, 5.77%; (4) LG, 4.76%; (5)

Lenovo, 4.39%.

(6) Device Volume: a numeric value from [0,1] that describes the level of device volume

when the device sends the intermediary with an ad request. The mean of Device

Volume is 0.55, with a standard deviation of 0.30.

(7) Screen Resolution: the number of pixels (per million) of the user’s mobile device.

It is computed by multiplying the number of pixels per horizontal line by the

number of pixels per vertical line. A higher screen resolution means better visual

quality. The mean is 0.857, while the standard deviation is 0.645.

(8) Android Version: an integer-valued variable from 1 to 8 indicating the version

number of the Android mobile operating system. A higher number corresponds to

a newer and more recent Android operating system. At the time of this dataset,

the most recent Android version is Android 6.0 (code name: Marshmallow). The

mean is 4.45 and the standard deviation is 0.61.

The characteristics of a user can change over time, for instance, a user could have diﬀerent

device volume settings at diﬀerent time periods. Causal inference does not follow simply

from comparing the outcome of a user for when she was served incentivized versus non-

incentivized ads.

3. Treatment effect model with unobserved selection

How does rewarding users for watching an ad aﬀect the subsequent action (Install)

taken by the user? When users are randomly assigned incentivized (treatment) or non-

incentivized ads (control), then the causal eﬀect of incentivized ads can be determined

13

by comparing the outcome of the treatment versus the control group. Here, we do not

have the luxury of random assignment, and we must then control for the selection of

reward-seeking users into the treatment group (i.e. adverse selection).

When adverse selection is solely attributed to the observable characteristics of the

users, estimators based on propensity scores can be used to obtain the treatment eﬀect

of incentivized advertising. This is done in Section 5. Here, we undertake a more general

treatment eﬀect model that allows for unobserved selection. As a motivation, suppose that

there is an unobserved variable vithat measures the degree of rewards-seeking behavior

of user i. Users who are more reward-seeking are more likely be self-selected into the

treatment group due to the rewards from incentivized ads. This is modeled as Equation

1below, where di= 1[x1iγ+vi+ǫ1i≥0]. Here, x1iis a vector of observed characteristics

of the user i, and γis a vector of unknown parameters.

The probability that the user ithen expresses the intention to install is yτ

i= 1[ui+ǫ2i≥

0]. Now, uiis the utility that a user ienjoys from installing a new app. ǫ2iis the

unobserved taste of the users. If ǫ2iand viare correlated, then the assumption underlying

the standard propensity score method (Section 5) is violated.10 In particular, it is likely

that viis negatively correlated with ǫ2i. That is, a more reward-seeking user is less likely

to click on ‘install’, because the reward-seeking user would rather collect the rewards

immediately instead of clicking on ‘install’ and going to the App store. We will take

unobserved adverse selection as meaning that there is a negative correlation between ǫ1i

and ǫ2i.

10Users’ outcome is no longer independent of their treatment assignment conditional on observables.

Here, a user who has higher unobserved viis more likely to be selected into di= 1, and subsequently

aﬀects the outcome yc

i.

14

Conditional on clicking on ‘install’, the user’s probability of installing the app is given

by 1[ui+ǫ3i≥0], where ǫ3iis the unobserved tastes that aﬀect users at the App Store

(when users could see more information about the app). As before, uiis the utility that

the user enjoys from installing a new app.

3.1. Unobserved selection

Based on our preceding discussion, we can estimate a model incorporating unobserved

adverse selection. The model is an endogenous treatment eﬀect model with two layers of

outcomes: the intermediate outcome and the ﬁnal Install outcome. The model consists of

three interdependent non-linear equations, as given below. Note that we have absorbed

vi(the user’s reward-seeking attitude) into e1i.

di= 1[u1i+ǫ1i≥0](1)

yτ

i= 1[α1di+u2i+ǫ2i≥0](2)

yi

(yτ

i= 1) = 1[α2di+u3i+ǫ3i≥0](3)

yi

(yτ

i= 0) = 0

Equation 1is the selection equation, it determines when a user is selected into the

incentivized ads treatment. Equations 2and 3are the outcome equations. Equation 2

determines when a user would express the intention to install (by clicking on “install”).

Equation 3determines when a user would install the advertised app after clicking on

“install”. Equation 3can be written more compactly as yi

yτ

i=yτ

i·1[α2di+u3i+ǫ3i≥0].

15

α1and α2measure the eﬀect of incentivized advertising on the pair of outcomes inten-

tion and install. (ǫ1i, ǫ2i, ǫ3i) are idiosyncratic preferences unobserved to us, but observed

by the users. Crucially, we allow these errors to be correlated with each other. If they

are uncorrelated, there is no unobserved selection eﬀect and we can use propensity score

methods. It is not feasible to use a two-stage plug-in procedure where we ﬁrst estimate the

selection equation then plug-in the estimates for di. These equations must be estimated

jointly. The joint distribution of (ǫ1i, ǫ2i, ǫ3i) will be speciﬁed in the next section.

Now we parameterize the utilities as follows: (i) u1i=x1iγ, (ii) u2i=x2iβ, and (iii)

u3i=w1·(x2iβ) + w2. Now x1iand x2iare vectors of covariates that are subsets of xi.

The utility from installing a new app is u2i=x2iβ. This utility enters into the equations

for both Intermediate and Install. We allow this utility to be scaled and translated by w1

and w2when it enters into the equation for Install. The parameter w1allows the user to

express curiosity or motives for information acquisition. For example, when 0 < w1<1,

then the user’s utility for the app is magniﬁed during the Intermediate stage, and the user

is more likely to click on the ad to ﬁnd out more about the app in the App Store. At

the Install stage, this ampliﬁcation disappears, and the likelihood of installing the app

would just depend on the actual utility for the app plus some noise that represents new

information from the App Store.

This formulation of utilities is not crucial to the model. We parameterize the utilities

in this manner in order to reduce the number of parameters to be estimated. Even with

this structure, we have a high-dimensional set of parameters to be estimated. Almost all

our covariates are indicator or categorical variables: whether a user is located at a certain

16

region, whether a user speaks a certain language, etc. For this reason, the formulation

u3i=w1·(x2iβ) + w2is helpful in reducing the number of parameters.

The pair of Equations 1and 2represents a standard approach for handling treatment

endogeneity in binary outcome models (see, e.g., Smith and Blundell [1986], Rivers and

Vuong [1988], or Wooldridge [2002] (Section 15.7). The outcome variable is modeled

as Equation 2, but it contains an endogenous treatment variable di, which we model

as Equation 1. This endogeneity arises because of the correlation between (ǫ1i, ǫ2i, ǫ3i).

Our framework here diﬀers from the standard approach in that we have an additional

outcome variable (Equation 3) that also depends on the endogenous treatment variable.

In a well-known study, Evans and Schwab [1995] estimates the pair of Equations 1and 2

as a bivariate probit model.

3.2. Identiﬁcation

In the frequentist setting, identiﬁcation and estimation of the model relies on the presence

of an exclusion restriction – an instrumental variable that enters into the selection equa-

tion, but does not enter into the outcome equations (see Wooldridge [2002] and Evans

and Schwab [1995]). Now among the variables that are available to us in Section 2.2, it

is not clear a priori whether we have an exogenous instrumental variable. Therefore we

follow the plausibly exogenous approach of Conley, Hansen, and Rossi [2012], where we

place a near-zero prior on a plausibly exogenous variable. We then estimate the model

using Bayesian MCMC.

Speciﬁcally, we choose the variable Device Volume as a plausible instrumental variable.

Let the coeﬃcient on Device Volume in Equation 2be denoted by γ, our prior for γis

17

γ|α1∼ N (0, δ2α2

1). When δ= 0, Device Volume is a fully valid exclusion restriction in the

frequentist sense. We set δ= 0.25, which allows Device Volume to have a small eﬀect in

the outcome equation, in particular, the eﬀect of Device Volume is proportionally smaller

than the treatment eﬀect α1. The idea is that Device Volume enters into the selection

equation, but only has a relatively small eﬀect on the user’s eventual outcomes.

This is reasonable: the user’s device volume is recorded at the moment of ad servings. If

the user’s volume setting is high, she will be less incline to seek out and watch incentivized

ads, hence Device Volume aﬀects selection (negatively). Now after the selection stage, the

user is free to adjust her volume setting during the ad. Because users adjust their volumes

during the ads, the pre-adjusted volume settings should not aﬀect users’ outcomes. While

the volume settings prevailed during the ads could aﬀect users’ outcomes, this volume

setting is diﬀerent from the recorded volume settings, which should not aﬀect users’

outcomes.

3.3. Scalable Estimation

A desideratum for our estimation procedure is that it must be scalable, in the sense that it

must be suitable for impressions-level data. For some popular publisher, impressions-level

data means billions of observations in a single day.11 Estimation entails calculating the

likelihood for each impression and summing them up. Moreover, calculating the likelihood

for each impression involves modeling the dependence between the unobservables in the

selection and the outcome equations (due to adverse selection). We ﬁnd that modeling

the dependence between (ǫ1i, ǫ2i, ǫ3i) as a multivariate Gaussian is too slow in this setting,

11http://www.businessinsider.com/the-size-of-fbx-facebooks-ad-exchange-2012-11

18

even though we only have over 350,000 impressions. The reason is: we need to compute

the CDF of a trivariate Gaussian as many times as there are impressions. Computing

each CDF of a trivariate Gaussian involves multi-dimensional integrations, which required

either Monte Carlo integration or numerical quadrature.12

With this in mind, we now specify the distributions of (ǫ1i, ǫ2i, ǫ3i) that lead to a

tractable likelihood. The marginal distributions of ǫ1i,ǫ2iand ǫ3iare assumed to have

the standard logistic distributions. That is, ǫ1i∼Logistic(0,1), and the CDF of ǫ1iis

Pr(ǫ1i≤x) = 1

1+e−x. Similarly, the marginal distributions of ǫ2iand ǫ3iare both assumed

to have the standard logistic distributions. Denote F1(e1), F2(e2), F3(e3) as the marginal

CDFs of ǫ1i,ǫ2iand ǫ3irespectively.

To model the dependence between (ǫ1i, ǫ2i, ǫ3i), the joint CDF of (ǫ1i, ǫ2i, ǫ3i) is formu-

lated as C(F1(e1), F2(e2), F3(e3)). This is without loss of generality – any joint CDF of

(ǫ1i, ǫ2i, ǫ3i) can be written this way (Skylar’s Theorem). The function Cis known as a

Copula. Conversely, when Csatisﬁes some properties, then C(F1(e1), F2(e2), F3(e3)) is a

valid joint CDF. The idea is to choose a copula that is more tractable than the multivari-

ate Gaussian. Copulas are used extensively in ﬁnance to model the dependence among

random variables, and recently, copulas have appeared in various marketing journals,

see Danaher and Smith [2011a,b], George and Jensen [2011], Kumar, Zhang, and Luo

[2014]. These papers also contain formal introductions of copulas and their applicability

in marketing.

12For instance in MATLAB and R, the algorithm to calculate the CDF of a trivariate Gaussian employs

numerical quadrature techniques developed by Drezner and Wesolowsky (1989), and Genz (2004). For

higher dimensions, quasi-Monte Carlo integration algorithm is used.

19

We model the joint CDF of (ǫ1i, ǫ2i, ǫ3i) as Pr(ǫ1i≤e1, ǫ2i≤e2, ǫ3i≤e3) = F1(e1)−θ+

F2(e2)−θ+F3(e3)−θ−2+−1/θ. The notation [x]+means max{x, 0}, i.e. [x]+cannot be

negative. F1,F2, and F3are marginal CDFs of ǫ1i,ǫ2iand ǫ3irespectively. The parameter

θ∈[−1,∞)\ {0}controls the dependence among the variables. This copula is known as

the Clayton copula, where C(x, y, z;θ) = ([x−θ+y−θ+z−θ]+)−1/θ . There is a one-to-one

relationship between the parameter θand Kendall rank correlation coeﬃcient τbetween

the variables, given by τ=θ

θ+2 . Therefore, when θis negative, ǫ1iand ǫ2iare negatively

correlated in the sense of having a negative rank correlation coeﬃcient, which is indica-

tive of unobserved adverse selection. When τis estimated to be close to zero, (ǫ1i, ǫ2i)

are uncorrelated, and there is no unobserved adverse selection (we can then use standard

propensity score methods). Another commonly used copula is the Gumbel copula, which

is a multivariate extension of the familiar Gumbel distribution. We do not use the Gumbel

copula because it restricts τto be positive.

Having formulated the joint distributions of (ǫ1i, ǫ2i, ǫ3i), we can then derive the likeli-

hood for each impression iaccording to Equations 1to 3. The log-likelihood of observing

the data (di, yτ

i, yi,xi)n

i=1 given Θ, the set of parameters to be estimated, is denoted as

L((di, yτ

i, yi,xi)n

i=1|Θ). There are 52 parameters to be estimated, and we will describe

them in the next section. Due to the choice of our joint distribution, this log-likelihood

function can be derived in closed-form. This log-likelihood function can be computed

very quickly even when there is a large number of impressions because it does not involve

numerical integration.

More importantly, the gradient of the log-likelihood function with respect to the pa-

rameters can also be computed with ease. Being able to easily compute the gradient of

20

the target distribution allows us to employ more eﬃcient Markov Chain Monte Carlo al-

gorithms such as Hamiltonian Markov Chain or Metropolis-adjusted Langevin algorithm

(MALA) (Roberts and Tweedie [1996]). These MCMC methods are more suitable here

compared to the plain random walk metropolis since we have a moderately large number

of parameters. Our MCMC method will be based on MALA. Informally, MALA con-

structs a random walk that drifts in the direction of the gradient, and hence the gradient

enables the random walk to move more eﬃciently towards regions of high-probability. It

also has a Metropolis-Hastings accept/reject mechanism that improves the mixing and

convergence properties of this random walk.

For the priors, we impose uninformative priors on all the parameters, except for the

parameters corresponding to the instrument variable (Device Volume), and the scale pa-

rameter w1. The uninformative prior for a parameter is given by the Gaussian distribution

with a mean of zero and a standard deviation of 100. The scale parameter w1has a prior

of N(0.5,0.25). In order to restrict the copula dependence parameter θto be within

[−1,∞), we apply the transformation θ=f(˜

θ) = (˜

θ+ 1)2−1, and subsequently impose

an uninformative prior of N(0,100) on ˜

θ.

We ran the MALA Markov Chain 5,000 iterations. Despite such a small number of

iterations, convergence occurred quickly, which is not surprising since we have employed

a gradient-based MCMC algorithm. Speciﬁcally, using the diagnostic of Heidelberger and

Welch individually on all parameters, we reject the null hypothesis of non-stationarity for

all parameters when the ﬁrst-half of the chain is discarded as burn-in samples. We report

the posterior means and standard deviations after discarding the burn-in samples. This

is done in the next section.

21

4. Parameter estimates and results

In total, there are 52 parameters to be estimated. We allow the treatment eﬀect for

Intermediate to vary over the main language groups, so that Equation (2) now becomes

yτ

i= 1[(α1zi)di+xiβ+ǫ2i≥0], where α1zi=a0+a1×ENi+a2×ESi+a3×P Ti+a4×

RUi+a5×ZHi. The indicator variables ENi,E Si,P Ti,RUiand Z Hiindicate whether

the language setting of impression iis English, Spanish, Portuguese, Russian, or Chinese.

These are the ﬁve major language groups covering over 86% of all impressions. We do

not estimate for heterogeneous treatment eﬀects in the Install stage because the number

of impressions where both selection and install occurred is much smaller compared to the

number of impressions where both selection and clicks occurred.

To summarize, there are 21 parameters to be estimated in the selection equation di=

1[xiγ+ǫ1i≥0]. We list these parameters and show their estimates in Table 1. There are

26 parameters to be estimated in the Intermediate outcome equation yτ

i= 1[(α1zi)di+

xiβ+ǫ2i≥0]. We describe these parameters and show their estimates in Table 2. There

are 4 parameters to be estimated in the Install outcome equation yi

yτ

i=yτ

i·1[α2di+

w1xiβ+w2+ǫ3i≥0]. We list these parameters in Table 3. Finally, we also need to

estimate the parameter θwhich controls the degree of dependence among the unobserved

error terms.

In Section 5.3, we use the standard propensity score method to show that qualitatively

similar results are obtained. While our model here controls for unobserved selection, the

standard propensity score methods control only for observed selections.

22

4.1. Estimates of the selection equation

Let us elaborate on Table 1, which reports the posterior means and standard errors of the

parameters in the selection equation, di= 1[xiγ+ǫ1i≥0].

First, we see that θ, the dependence parameter of the copula is −0.353. This translates

to a Kendall rank correlation coeﬃcient τbetween ǫi1and ǫi2of τ=θ

θ+2 =−0.214. This is

an evidence for unobserved adverse selection. There is an unobserved user’s characteristic

(degree of reward-seeking) that increases the likelihood of selection into treatment, and

at the same time, decreases the likelihood of clicking on ‘install’.

Looking at the other coeﬃcients in Table 1, we ﬁnd that they support an adverse

selection narrative. For instance, the coeﬃcient on WiFi is positive – a user with WiFi

internet connection is more likely to seek out the incentivized ad treatment. Users are

less likely to seek out incentivized ad placements when connected to cellular networks,

which are slower and costly.

The coeﬃcient on Device Volume is negative. A user whose device’s volume is higher

is less likely to seek out incentivized ad treatment. An explanation is that a user would

experience more annoyance and discomfort from watching an ad when the volume is

higher, and hence, she is more reluctant to seek out incentivized ads.

The coeﬃcient on Screen Resolution is positive. A user who has a better visual ex-

perience is less averse to watching ads, and hence is more likely to seek out incentivized

ad treatment. The coeﬃcient on Android Version is also positive, suggesting that a user

with a more recent Android operating system is more likely to seek out incentivized ad

treatment.

23

Overall, the result from Table 1shows evidence of adverse selection – users deliberately

seek out incentivized ads to obtain rewards.

4.2. Estimates of the intermediate outcome equation

Now we examine the estimates for the Intermediate outcome equation, yτ

i= 1[(α1zi)di+

xiβ+ǫ2i≥0]. Table 2reports the posterior means and standard deviations of the

coeﬃcients.

We ﬁnd that the treatment eﬀects vary according to the languages that were chosen

by the users. The baseline treatment eﬀect α1is signiﬁcantly negative. Moreover for the

users who have chosen English, Spanish and Russian, the treatment eﬀects are signiﬁcantly

negative and larger in magnitudes than the baseline. While the users who have chosen

Portuguese and Chinese, the treatment eﬀects are signiﬁcantly positive.

The negative treatment eﬀect is surprising, as it implies that incentivized ad decreases

the probability of clicks compared to non-incentivized ads. That is, for a subset of users

being exposed to incentivized ads, they are less likely to go beyond this intermediate step

of clicking on the ads, compared to their counterparts in the control group (exposed to

non-incentivized ads).

Our explanation is that rewards have negative distortionary eﬀects in the intermediate

stage because users prefer not to delay their rewards by clicking on ‘install’. These users

are averse to delayed rewards. They would rather collect their rewards immediately rather

than going to the App Store even though they are suﬃciently interested in the advertised

app. In the absence of rewards (setting di= 0), these users would not be distracted away

24

by the rewards, and would actually be more likely to click on the ads and go to the App

Store.

For the users whose device languages are Portuguese and Chinese, the treatment eﬀect

on the intermediate outcome is positive. The fact that rewards have a positive eﬀect is

somewhat less surprising. We will postpone the explanation to the next section when we

discuss the ﬁnal outcome equation.

4.3. Estimates of the ﬁnal outcome equation

We see in Table 3that α2, the treatment eﬀect on Install (conditional on having clicked) is

positive. Previously, we also see that during the intermediate stage, the treatment eﬀect

on clicks is positive for some users. Therefore, the overall treatment eﬀect for these users

are unambiguously positive.

We now oﬀer an explanation for the positive treatment eﬀects of incentivized ads on

Intermediate and Install. Research in the consumer’s behavior literature (Calder and

Sternthal [1980], MacKenzie and Lutz [1989], MacKenzie et al. [1986], Mitchell and Olson

[1981] ; Shimp 1981) suggests that a person’s aﬀective state (moods and feelings) when he

or she watches the ad is an important predictor of advertising eﬀectiveness and purchase

intention. The reward, which is given by the publisher, causes the user to perceive the

publisher’s content more favorably, including the ads that are published therein. Therefore

ad conversion is higher when users are being rewarded for watching the ads.

Note that the reward is unrelated to the advertiser’s content or product, therefore we

can rule out the complementarity between rewards and the advertiser’s product. When

25

there is a complementarity, a user could is more interested in the advertiser’s app when

she is also being rewarded.

Now for those users who experienced a negative treatment eﬀect during the Intermediate

stage, the overall treatment eﬀect is ambiguous. We will quantify the overall treatment

eﬀect in the next section. Our estimation suggests that a user can both experience a

negative treatment eﬀect during the Intermediate stage, but a positive treatment eﬀect

during the ﬁnal Install stage. This is not contradictory to our explanation. If the user

were to reach the ﬁnal stage, the aversion to delayed rewards would diminish since there

is now a shorter time between Install and the collection of rewards.

4.4. Counterfactuals

In the previous section, we have seen that the overall treatment eﬀects are ambiguously

signed for some users. Here, we would like to quantify the overall treatment eﬀects. First,

we calculate the overall Average Treatment Eﬀect (ATE) on Install implied by the model.

The ATE is calculated as follows: for each impression i, we compute the probability

that the user would click on ‘install’ and eventually install, if the user were to be in the

treatment group, then minus the probability that the user would click on ‘install’ and

eventually install, if the user were to be in the control group. More precisely, we have:

AT E =1

n

n

X

i=1 ˆ

Pr(α1zi)di+xiβ+ǫ2i≥0, α2di+w1xiβ+w2+ǫ3i≥0

−ˆ

Prxiβ+ǫ2i≥0, w1xiβ+w2+ǫ3i≥0

(4)

26

The ATE measures how much the overall unconditional Install rate would change as a

result of comparing two counterfactual scenarios for every impression: (1) when the user’s

impression is served an incentivized ad, and (2) when the user’s impression is served a non-

incentivized ad. These changes in the Install rate are then averaged over all impressions

to obtain the ATE.

Using the formula in Equation 4, the Average Treatment Eﬀect implied by the model is

0.000795. This is a large magnitude given that the baseline install is 0.00292 (1,067 installs

out of 365,847 ad serving). The ATE of 0.000795 represents an increase of 27%. Therefore,

a user is 27% more likely to install when served incentivized advertising compared to non-

incentivized advertising. Since the publisher is paid per-install, this represents a large

increase in ad revenue for the publisher (as well as the platform who shares revenue with

the publisher). We have proposed an explanation for why rewards have a positive eﬀect

on user’s behavior. There is a well-known link between a person’s aﬀective state (moods

and feelings) during ad exposure, and the subsequent purchase intention. Therefore being

rewarded for watching an ad causes the user to feel less annoyed at advertising, which

increases ad eﬀectiveness and conversion rate.

How does this ATE translate to ad revenue? We can provide a back-of-the-envelope

calculation. The average price per-install commanded by this publisher is $0.52. Hence

this ATE translates to 0.000795 ×$0.52 = $0.0004134, or $0.413 per thousands of im-

pressions. Ad revenues are frequently measured in terms of CPM (revenue per thousands

of impressions). To give a sense of the industry (mobile ad networks) benchmarks, the

average CPMs for the US and China are reported to be $7.00 and $2.70.13

13http://ecpm.adtapsy.com/

27

While incentivized advertising has an overall positive eﬀect, we saw previously that

there is a negative countervailing eﬀect. This negative countervailing eﬀect enters in the

intermediate stage. Rewards have negative distortionary eﬀects in the intermediate stage

when users prefer to collect their rewards immediately after watching the ads, instead

of clicking on ‘install’ and going to the App Store. Moreover, this negative eﬀect varies

widely among users. Therefore we expect the eﬀect of incentivized ads to be less for those

users who are adverse to delayed rewards. To quantify this, we compute the treatment

eﬀects averaged locally according to users’ languages. When we calculate the (Local)

Average Treatment Eﬀects by languages of the users, we see that rewarding users to

watch ads has the largest eﬀect on users whose device language is Chinese. The treatment

eﬀects averaged over English, Spanish, Portuguese, Russian, Chinese users are respectively

0.000752, 0.000667, 0.000608, 0.000391, 0.00138. In terms of dollar amounts and CPM,

the magnitudes of these treatment eﬀects are $0.391, $0.347, $0.316, $0.203, and $0.718,

respectively.

Another useful counterfactual from the perspective of the publisher is the Average

Treatment Eﬀect on the Treated. Suppose we had switched all incentivized ads to non-

incentivized ads, what is the eﬀect? This is more relevant to the publisher because

it represents a counterfactual that the publisher can directly implement. The average

treatment eﬀect on the treated is computed by averaging Equation 4over isuch that

di= 1, which amounts to 0.000724, or an equivalent CPM of $0.376. Moreover since

there are 252,379 treated observations, this implies that the publisher would lose 183

installs.

28

We can also quantify the revenue impact of adverse selection. In the following coun-

terfactual, we remove unobserved adverse selection, that is, we suppose that selection is

independent of outcomes.14 Whether or not an impression is served an incentivized ad

is independent of the actions that would be taken during the Intermediate and Install

stages. This rules out reward-seeking users who self-select into watching incentivized ads

but otherwise they are not interested in the ad itself. The revenue impact of unobserved

adverse selection is calculated using Equation 5below, which amounts to 0.000552, or

$0.287 CPM. Therefore, adverse selection negatively impacts publisher’s ad revenue.

1

252,379 X

i:di=1 ˆ

Pr[di= 1] ·ˆ

Pr[yτ

i= 1, yi= 1] −ˆ

Pr[di= 1, yτ

i= 1, yi= 1]

(5)

5. Estimating treatment effects using propensity scores

In this section, we estimate the treatment eﬀect of incentivized advertising using propen-

sity scores. We want to compare our previous results to other model-free approaches.

Propensity score method can control for selection bias to the extent that selection is

based on observables. Therefore it is not valid in the presence unobserved selection,

which we have analyzed previously.

5.1. Estimation procedure

Identical to the previous data environment, we observe (di, yτ

i, yi,xi) for the sample of

impressions i= 1,...,n, where xiis a vector of user’s covariates during impression i.

14We implicitly conditioned on observed covariates. Note that this is precisely the assumption that

underlies standard propensity score methods.

29

Our estimation procedure consists of two steps. In the ﬁrst step, we estimate the

propensity scores: ˆpi= Pr(di= 1|xi), which is the probability that a user is served an

incentivized ad during impression i. We estimate the propensity scores using a Probit

regression of dion the user’s covariates xi. Note that ximust only contain pre-treatment

covariates. Pre-treatment covariates are the user’s characteristics that could aﬀect the

user’s selection into treatment.

In the second step, we construct ˆpi, which are the ﬁtted values of the Probit regression

from the ﬁrst-step. Then, we run the regression of yion 1, di, ˆpi,di(ˆpi−µp) for i= 1,...,n,

where µpis the average value of ˆpiacross i= 1,...,n. This is the control function

approach explained in Proposition 18.5 of Wooldridge [2002]. Under some assumptions,

the ATE on Intermediate can be recovered as the coeﬃcient on the regressor diwhen

regressing yτ

ion 1, di, ˆpi,di(ˆpi−µp) for i= 1,...,n, while the ATE on Install can be

obtained as the coeﬃcient on the regressor diwhen regressing yion 1, di, ˆpi,di(ˆpi−µp)

for i= 1,...,n.

In addition, we can include higher order polynomial terms of the propensity scores in

order to better control for selection bias (making sure to de-mean the propensity score

term before constructing its interaction with di). Therefore we also regress yion 1, di, ˆp2

i,

ˆp3

i,di(ˆpi−µp), for i= 1,...,n.

The assumptions needed are explained in Proposition 18.5 of Wooldridge [2002]. We

will brieﬂy discuss the main assumption, which is the assumption of “ignorability of

treatment” (Rosenbaum and Rubin [1983]). This assumption is also known as selection

on observables. Given observed covariates x:diand (y0i, y1i) are independent conditional

on xi. This assumption implies that E[y0i|xi, di] = E[y0i|xi] and E[y1i|xi, di] = E[y1i|xi].

30

There are other methods for estimating the ATE, relying on diﬀerent assumptions.

We ﬁnd that these other methods deliver similar results. For instance, the ATE can be

estimated as an Inverse Probability Weighted Estimator using the propensity scores. That

is, AT E =1

nPn

i=1

yi(di−ˆp(xi))

ˆp(xi)(1−ˆp(xi)) (see Proposition 18.3 of Wooldridge [2002]). One method

to compute the ATE that does not rely on the propensity scores is 1

nPn

iˆr(xi), where

r(x) = Pr[yi= 1|x, di= 1] −Pr[yi= 1|x, di= 0].

5.2. First-stage adverse selection estimation

In the ﬁrst stage, we estimate the propensity scores via a Probit regression. Speciﬁcally,

the dependent variable is the binary treatment variable Incentivized, or di. The covariates

are Android Version,Wiﬁ,Screen Resolution,Device Volume. We also control for the

following ﬁxed eﬀects: Countries,Languages and Device Brands.

The result is given in Table 4. We ﬁnd that the result is qualitatively similar to the

result obtained from estimating the selection equation (see Section 4.1).

5.3. Second-stage treatment eﬀect estimation

Using the ﬁrst-stage propensity scores, we now estimate the average treatment eﬀects

(ATE). We show the result in Tables 5and 6. Again, the results obtained here are

qualitatively similar to the model-based results.

The ATE on Intermediate is signiﬁcantly negative, while the ATE on Install is signif-

icantly positive. From Column 2 (Intermediate) of Table 5, the ATE on Intermediate is

−0.0635. This means that rewarding users to watch an ad reduces the probability that

a user clicks on install by −0.0635 on average. The baseline Intermediate is 0.1344, i.e.

31

49,179 clicks out of 365,847. An ATE of this magnitude represents almost 50% decrease

in the probability that a user would click on install.

Now the ATE for Install is statistically signiﬁcant at 0.00795 (Column 2 of Table 6).

This is a large magnitude because the baseline Install is 0.0217 (i.e. 1,067 installs out of

49,179 clicks). Therefore an ATE of this magnitude represents 36.6% increase in Install.

In another words, if users are rewarded for watching the ads, they are 36.6% more likely

to install the advertised app at the App store.

Compounding the eﬀect of Intermediate, the overall eﬀect on Install is positive and

signiﬁcant. From Column 4 of Table 6, the overall ATE obtained using the propensity

score method here is 0.00187, while the ATE obtained using the model that controls for

unobserved selection is 0.000795. Hence, the propensity score method biases the ATE

upwards.

5.4. Naive treatment eﬀects

In the Appendix (Table 7), we show results without controlling for any selection bias.

We use probit regressions to show how incentivized advertising is related to (i) the user’s

probability of clicking ‘install’, and (ii) the user’s probability of installing. We control for

all the user’s characteristics mentioned in the preceding section. However these regressions

are not valid if there is a selection bias. We will not interpret these coeﬃcients further.

6. Appendix

6.1. Tables and Figures

32

Table 1. Parameters appearing in the selection equation, di= 1[xiγ+

ǫ1i≥0]. The variables that correspond to these parameters are detailed in

Section 2.2.

Parameter (Description) Estimates

θ(Dependence parameter of the copula) -0.353 (0.00323)

Device Volume -0.0879 (0.00536)

WiFi 0.352 (0.00712)

Android Version 0.133 (0.00106)

Screen Resolution -0.0172 (0.00174)

Huawei Dummy 0.0837 (0.0134)

Lenovo Dummy -0.0792 (0.00346)

LG Dummy 0.157 (0.00415)

Motorola Dummy 0.17 (0.00186)

Samsung Dummy 0.0141 (0.000779)

EN (English Language Dummy) -0.183 (0.00179)

ES (Spanish Language Dummy) 0.253 (0.00617)

PT (Portuguese Language Dummy) 0.317 (0.00976)

RU (Russian Language Dummy) 0.114 (0.0019)

ZH (Chinese Language Dummy) -0.573 (0.0192)

33

North America Dummy 0.0571 (0.00441)

South America Dummy 0.18 (0.00206)

South-East Asia Dummy 0.00197 (0.00162)

South Asia Dummy -0.276 (0.00528)

Middle East Dummy -0.214 (0.0141)

Southern and Eastern Europe Dummy 0.117 (0.00133)

Constant 0.0151 (0.000812)

Table 2. Parameters appearing in the Intermediate outcome equation,

yτ

i= 1[(α1zi)di+xiβ+ǫ2i≥0]

Parameter (Description) Estimates (Standard Error)

α1(Treatment eﬀect baseline) -0.0124 (0.000371)

α1×EN (Interaction of treatment eﬀect and EN ) -0.0122 (0.00107)

α1×ES (Interaction of treatment eﬀect and ES) -0.0734 (0.00161)

α1×P T (Interaction of treatment eﬀect and P T ) 0.0553 (0.00568)

α1×RU (Interaction of treatment eﬀect and RU) -0.0616 (0.00293)

α1×ZH (Interaction of treatment eﬀect and ZH ) 0.111 (0.0066)

34

Device Volume -0.0132 (0.0223)

WiFi -0.181 (0.00526)

Android Version -0.327 (0.000725)

Screen Resolution -0.0314 (0.000939)

Huawei Dummy -0.0413 (0.00175)

Lenovo Dummy -0.0505 (0.00447)

LG Dummy -0.0787 (0.0023)

Motorola Dummy -0.0388 (0.00102)

Samsung Dummy -0.026 (0.00141)

EN (English Language Dummy) -0.0245 (0.00243)

ES (Spanish Language Dummy) -0.145 (0.00583)

PT (Portuguese Language Dummy) 0.0349 (0.00109)

RU (Russian Language Dummy) -0.0917 (0.00214)

ZH (Chinese Language Dummy) 0.00464 (0.00311)

North America Dummy -0.138 (0.00566)

South America Dummy -0.0577 (0.00194)

South-East Asia Dummy -0.115 (0.00379)

South Asia Dummy 0.0727 (0.0043)

35

Middle East Dummy 0.0922 (0.00567)

Southern and Eastern Europe Dummy -0.119 (0.00637)

Constant -0.0325 (0.000378)

Table 3. Parameters appearing in the Install outcome equation, yi

yτ

i=

yτ

i·1[α2di+w1xiβ+w2+ǫ3i≥0]

Parameter (Description) Estimates

α2(Install treatment eﬀect) 0.141 (0.0074)

w1(Scale parameter) 0.00732 (0.000534)

w2(Constant) -0.199 (0.0015)

36

(1)

Incentivized

Android Version 0.117∗∗∗

(0.00482)

Device Volume -0.217∗∗∗

(0.00810)

Screen Resolution 0.0117∗∗∗

(millions of pixels) (0.00432)

WiFi 0.556∗∗∗

(0.00594)

Constant -1.148∗∗∗

(0.0628)

N358,127

Countries controlled: Yes (178 indicator variables)

Languages controlled: Yes (48 indicator variables)

Device brands controlled: Yes (10 indicator variables)

Standard errors in parentheses. ∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

Table 4. First-stage probit estimation of propensity scores.

37

(1) (2)

Intermediate Intermediate

Incentivized -0.0611∗∗∗ -0.0635∗∗∗

(0.00146) (0.00146)

ˆ

p(x) -0.213∗∗∗ -0.452∗∗∗

(0.00540) (0.0685)

Incentivized ×˙

ˆ

p(x) 0.150∗∗∗ 0.0978∗∗∗

(0.00680) (0.00754)

ˆ

p(x)20.194

(0.126)

ˆ

p(x)30.0319

(0.0717)

Constant 0.317∗∗∗ 0.372∗∗∗

(0.00338) (0.0113)

N358128 358128

Standard errors in parentheses

∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

Table 5. Regressions with propensity score to control for selection bias.

The coeﬃcient on Incentivized shows the average treatment eﬀect of incen-

tivized advertising on the Intermediate outcome.

38

(1) (2) (3) (4)

Install Install Install Install

Incentivized 0.00724∗∗∗ 0.00795∗∗∗ 0.00800∗∗∗ 0.00187∗∗∗

(0.00111) (0.00112) (0.00113) (0.000196)

ˆ

p(x) 0.000597 0.140∗∗∗ 0.136∗∗∗ 0.0527∗∗∗

(0.00230) (0.0302) (0.0238) (0.00796)

Incentivized ×˙

ˆ

p(x) -0.00811∗-0.00242 -0.00480 -0.00369∗∗∗

(0.00488) (0.00539) (0.00497) (0.00121)

ˆ

p(x)2-0.253∗∗∗ -0.295∗∗∗ -0.101∗∗∗

(0.0662) (0.0586) (0.0160)

ˆ

p(x)30.136∗∗∗ 0.183∗∗∗ 0.0548∗∗∗

(0.0427) (0.0412) (0.00972)

Constant 0.00802∗∗∗ -0.0132∗∗∗ -0.00786∗∗∗ -0.00477∗∗∗

(0.00128) (0.00378) (0.00248) (0.00112)

N48390 48390 48266 358128

Standard errors in parentheses

∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

Table 6. Regressions with propensity score to control for selection bias.

In Columns (1) to (3), we conditioned on Intermediate = 1. In Column (3),

the ﬁrst-stage propensity scores are computed using only the subset of data

such that Intermediate = 1.

39

(1) (2) (3) (4) (5)

Intermediate Intermediate Install Install Install

Incentivized -0.408∗∗∗ -0.307∗∗∗ 0.0984∗∗∗ 0.227∗∗∗ 0.271∗∗∗

(0.00541) (0.00625) (0.0228) (0.0270) (0.0396)

Device Volume 0.100∗∗∗ 0.0433 -0.0482

(0.00926) (0.0371) (0.0569)

Android Version 0.0850∗∗∗ 0.0358∗-0.0177

(0.00543) (0.0216) (0.0331)

Screen Resolution -0.0411∗∗∗ -0.0224 0.0343

(millions of pixels) (0.00496) (0.0200) (0.0293)

WiFi -0.106∗∗∗ -0.136∗∗∗ 0.0872∗∗

(0.00696) (0.0259) (0.0427)

N365847 358087 365847 340662 45724

Marginal Eﬀects -0.0867∗∗∗ -0.0636∗∗∗ 0.00088∗∗∗ 0.00201∗∗∗ 0.00859∗∗∗

(0.0011) (0.0013) (0.00020) (0.00024) (0.00129)

Countries controlled: No Yes No Yes Yes

Languages controlled: No Yes No Yes Yes

Device brands controlled: No Yes No Yes Yes

Standard errors in parentheses

∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

Table 7. Probit regressions without controlling for selection bias. In the last

column, we condition on Intermediate = 1.

40

References

Yakov Bart, Andrew T Stephen, and Miklos Sarvary. Which products are best suited

to mobile advertising? a ﬁeld study of mobile display advertising eﬀects on consumer

attitudes and intentions. Journal of Marketing Research, 51(3):270–285, 2014.

Norris I Bruce, BPS Murthi, and Ram C Rao. A dynamic model for digital advertising:

The eﬀects of creative format, message content, and targeting on engagement. Journal

of Marketing Research, 54(2):202–218, April 2017.

Bobby J Calder and Brian Sternthal. Television commercial wearout: An information

processing view. Journal of Marketing Research, pages 173–186, 1980.

Timothy G Conley, Christian B Hansen, and Peter E Rossi. Plausibly exogenous. Review

of Economics and Statistics, 94(1):260–272, 2012.

Peter J Danaher and Michael S Smith. Modeling multivariate distributions using copulas:

applications in marketing. Marketing Science, 30(1):4–21, 2011a.

Peter J Danaher and Michael S Smith. Rejoinderestimation issues for copulas applied to

marketing data. Marketing Science, 30(1):25–28, 2011b.

William N Evans and Robert M Schwab. Finishing high school and starting college:

Do catholic schools make a diﬀerence? The Quarterly Journal of Economics, 110(4):

941–974, 1995.

Edward I George and Shane T Jensen. Commentarya latent variable perspective of copula

modeling. Marketing Science, 30(1):22–24, 2011.

41

Anindya Ghose and Sang Pil Han. Estimating demand for mobile applications in the new

economy. Management Science, 60(6):1470–1488, 2014.

Anindya Ghose and Sha Yang. An empirical analysis of search engine advertising: Spon-

sored search in electronic markets. Management Science, 55(10):1605–1622, 2009.

Avi Goldfarb and Catherine Tucker. Online display advertising: Targeting and obtrusive-

ness. Marketing Science, 30(3):389–404, 2011a.

Avi Goldfarb and Catherine E Tucker. Privacy regulation and online advertising. Man-

agement science, 57(1):57–71, 2011b.

Heikki Haario, Eero Saksman, and Johanna Tamminen. An adaptive metropolis algo-

rithm. Bernoulli, pages 223–242, 2001.

Yu Hu, Jiwoong Shin, and Zhulei Tang. Incentive problems in performance-based online

advertising pricing: cost per click vs. cost per action. Management Science, 62(7):

2022–2038, 2015.

V Kumar, Xi Alan Zhang, and Anita Luo. Modeling customer opt-in and opt-out in a

permission-based marketing context. American Marketing Association, 2014.

Scott B MacKenzie and Richard J Lutz. An empirical examination of the structural

antecedents of attitude toward the ad in an advertising pretesting context. The Journal

of Marketing, pages 48–65, 1989.

Scott B MacKenzie, Richard J Lutz, and George E Belch. The role of attitude toward the

ad as a mediator of advertising eﬀectiveness: A test of competing explanations. Journal

of marketing research, pages 130–143, 1986.

Puneet Manchanda, Jean-Pierre Dub´e, Khim Yong Goh, and Pradeep K Chintagunta. The

eﬀect of banner advertising on internet purchasing. Journal of Marketing Research, 43

42

(1):98–108, 2006.

Andrew A. Mitchell and Jerry C. Olson. Are product attribute beliefs the only mediator

of advertising eﬀects on brand attitude? Journal of Marketing Research, 18(3):318–332,

1981. ISSN 00222437. URL http://www.jstor.org/stable/3150973.

Douglas Rivers and Quang H Vuong. Limited information estimators and exogeneity tests

for simultaneous probit models. Journal of econometrics, 39(3):347–366, 1988.

Gareth O Roberts and Jeﬀrey S Rosenthal. Examples of adaptive MCMC. Journal of

Computational and Graphical Statistics, 18(2):349–367, 2009.

Gareth O Roberts and Richard L Tweedie. Exponential convergence of langevin distri-

butions and their discrete approximations. Bernoulli, pages 341–363, 1996.

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in

observational studies for causal eﬀects. Biometrika, 70(1):41–55, 1983.

Oliver J Rutz and Randolph E Bucklin. From generic to branded: A model of spillover

in paid search advertising. Journal of Marketing Research, 48(1):87–102, 2011.

Richard J Smith and Richard W Blundell. An exogeneity test for a simultaneous equa-

tion tobit model with an application to labor supply. Econometrica: Journal of the

Econometric Society, pages 679–685, 1986.

Jeﬀrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press,

2002.

Song Yao and Carl F Mela. A dynamic model of sponsored search advertising. Marketing

Science, 30(3):447–468, 2011.

Yi Zhu and Kenneth C Wilbur. Hybrid advertising auctions. Marketing Science, 30(2):

249–273, 2011.

43