Content uploaded by Richard Chen
Author content
All content in this area was uploaded by Richard Chen on Oct 24, 2017
Content may be subject to copyright.
arXiv:1709.00197v1 [stat.AP] 1 Sep 2017
Incentivized Advertising: Treatment Effect and Adverse Selection
Khai X. Chiong
Assistant Professor of Marketing
Naveen Jindal School of Management
University of Texas at Dallas
khai.chiong@utdallas.edu
Richard Y. Chen
Research Scientist
Y Combinator Research, San Francisco
richard.chen@ycr.org
Sha Yang
Professor of Marketing
Marshall School of Business
University of Southern California
shayang@marshall.usc.edu
1
Incentivized Advertising: Treatment Effect and Adverse Selection
Abstract
Incentivized advertising is a new ad format that is gaining popularity in digital mobile
advertising. In incentivized advertising, the publisher rewards users for watching an
ad. An endemic issue here is adverse selection, where reward-seeking users select into
incentivized ad placements to obtain rewards. Adverse selection reduces the publisher’s
ad profit as well as poses a difficulty to causal inference of the effectiveness of incentivized
advertising. To this end, we develop a treatment effect model that allows and controls for
unobserved adverse selection, and estimate the model using data from a mobile gaming
app that offers both incentivized and non-incentivized ads. We find that rewarding users
to watch an ad has an overall positive effect on the ad conversion rate. A user is 27%
more likely to convert when being rewarded to watch an ad. However there is a negative
offsetting effect that reduces the effectiveness of incentivized ads. Some users are averse
to delayed rewards, they prefer to collect their rewards immediately after watching the
incentivized ads, instead of pursuing the content of the ads further. For the subset of
users who are averse to delayed rewards, the treatment effect is only 13%, while it can be
as high as 47% for other users.
Keywords: online advertising, mobile, causal inference, bayesian estimation, endoge-
nous selection
2
1. Introduction
Mobile advertising, including video ads and banner ads in mobile devices, is a dominant
segment of digital advertising. In the U.S., businesses spending on mobile advertising
accounts for more than 50% of the total spending on digital advertising.1The growth
of mobile advertising is fueled by the widespread usage of mobile applications or apps
(Ghose and Han [2014]) – it is now commonplace to advertise on mobile apps.
Mobile advertising is also a fast evolving industry, where advertisers and publishers
continuously innovate on ad formats, improve data tracking capabilities (Goldfarb and
Tucker [2011a,b]) and optimize ad placements. In recent years, mobile publishers have
widely adopted a new format of ad placement, called incentivized advertising. In an
incentivized ad placement, publishers reward users for watching an ad. More generally,
incentivized ad takes the form of rewarding a user for completing an action related to the
ad.2Incentivized advertising is also commonly known as reward advertising.
Incentivized ads first appeared among mobile gaming apps. Examples include ad place-
ments where publishers reward users with in-game virtual items, additional game levels
and lives, for viewing an ad, typically in a full-screen video format. One of the reasons for
using incentivized advertising is to reduce annoyance towards ads, which is of particular
concern in mobile advertising. Mobile devices have smaller screen sizes compared to per-
sonal computers, and as such it is more difficult to effectively advertise in mobile devices.
For instance, conventional banner ads are very intrusive in mobile devices. Moreover,
1According to the 2017 Internet Advertising Revenue Report from PricewaterhouseCoopers, spending
on mobile ads is $36.6 billions in 2016, while total spending on digital advertising is $72.5 billions.
2WSJ (Jan 5, 2016), More Marketers Offer Incentives for Watching Ads
3
mobile apps especially mobile gaming apps, rely on a continuous user’s experience, so
that interstitial full-screen ads do not tend to work well.3
Incentivized advertising allows the app developer to incorporate advertising into the
game-play, for instance, by offering to revitalize an injured game character if the user
watches an ad. Therefore incentivized ads allow for a more seamless transition between
gameplay and ads, which improves the playability of the game and reduces the annoyance
due to interruptions. Moreover, rewarding users to watch an ad could affect the mood
of the users, and contribute to an overall positive perception towards the ads. For these
reasons, incentivized advertising has become a popular format of advertising within mobile
gaming apps. Various industry white papers have reported that incentivized advertising
is well-received by users.4It has even expanded beyond mobile gaming publishers.5
Despite the increasing adoption of incentivized advertising, little is known about how
incentivized advertising affects users’ behavior (on the other hand, we have known quite a
bit about the effects of other important formats of online advertising, see Bart, Stephen,
and Sarvary [2014], Bruce, Murthi, and Rao [2017], Manchanda, Dub´e, Goh, and Chinta-
gunta [2006]). To this end, we aim to study the causal effect of incentivized advertising by
developing a treatment effect model with unobserved selection. Our goal is to understand
and quantify the effect of incentivized advertising on user’s conversion rate as compared
with non-incentivized advertising. That is, how much ad conversion rate changes as a
3This is related to the topic of ‘viewability’ in advertising. c.f. The Economist (March 26, 2016).
Invisible ads, phantom readers.
4eMarketer (July 1, 2014): Want App Users to Interact with Your Ads? Reward Them
5For example, the mobile music streaming app Spotify incentivizes users to watch a video ad with
30 minutes of ad-free music; the video streaming website Hulu incentivizes users to watch a longer video
ad with an ad-free episode; the mobile operator Sprint rewards certain users with reduced phone bill for
watching ads.
4
result of offering rewards to users for watching ads. From a managerial perspective, this
model allows us to ask whether a publisher can obtain higher ad revenue using incentivized
or non-incentivized ad placements.
We estimate this treatment effect model using a large impressions-level dataset from a
publisher who uses both incentivized and non-incentivized ad placements. This publisher
is a mobile gaming app, and incentivized ads take the form of rewarding a user with
additional game levels if the user watches a full-screen video ad trailer about another app.
The publisher uses CPI (cost-per-install)6pricing for all its ads, so that the publisher is
only paid whenever an ad leads to a conversion event, defined as the user installing the
advertised app.
The main feature of our treatment effect model is that we allow and control for unob-
served adverse selection. When the publisher rewards users for watching an ad, it causes
an adverse selection effect where users who are reward-seeking self-select into incentivized
ad placements to obtain rewards. In the presence of adverse selection, a user is not ran-
domly assigned to either incentivized or non-incentivized ads, therefore it is important to
control for adverse selection in order to properly assess the causal effect of incentivized
advertising. If reward-seeking attitude is an observable characteristics, controlling for
adverse selection is done using propensity score methods (Section 5).
When there is unobserved adverse selection, we develop and estimate a model where
users can endogenously select into watching incentivized ads, and where watching in-
centivized ads then translates into users’ outcomes. This model has two outcomes, an
6In other forms of online advertising such as sponsored search advertising, it is more common for the
publisher to be paid per clicks, see Ghose and Yang [2009], Hu, Shin, and Tang [2015], Rutz and Bucklin
[2011], Yao and Mela [2011], Zhu and Wilbur [2011]).
5
intermediate outcome where the user can express an intention to install the advertised
app, and a final outcome where the user decides to install the app. In our dataset, we
observe both the intermediate and the final outcomes of the users. In the intermediate
stage, the user chooses whether or not to click at the end of the ad, which redirects the
user to the App Store. In the final stage, the user chooses whether to install the app
that was advertised. Identification of the model requires a variable that enters into the
selection equation but not the outcome equations, while estimation is implemented using
Bayesian MCMC.
Our main result shows that rewarding users to watch an ad has a negative effect on
the intermediate outcome (where the user clicks on the ads to proceed to the App Store).
Our explanation is that some users are averse to delayed rewards, and therefore prefer
to collect their rewards immediately after watching incentivized ads. As such, rewards
have the negative effect of reducing the user’s intention to take any action that delays
the rewards. The user prefers to collect the rewards immediately instead of going to the
App store and installing a new app. We also find that users exhibit varying degrees of
aversion to delayed rewards.
On the flip side, we find that rewarding users to watch an ad has a positive effect on
install (the final outcome) conditional on clicking the ad (the intermediate outcome). This
result is in line with common findings that giving out rewards induces positive effects on
products adoption and purchases. In our context, when the publisher gives rewards to
its users, they induce the users to perceive the publisher’s content more favorably. As
such, an ad that is published in an incentivized ad placement is then perceived more
favorably by the users, and elicited a more positive response. This particular finding
6
has some basis in the consumer’s behavior literature, where researchers have found that
consumers’ affective feelings of favorability toward the ad itself, is an important predictor
of advertising effectiveness and response (Calder and Sternthal [1980], MacKenzie and
Lutz [1989], MacKenzie et al. [1986], Mitchell and Olson [1981]; Shimp 1981). Their
findings resonate with our explanation that rewarding users to watch an ad causes users
to feel less ad annoyance, and consequently increases the ad conversion rate.
The overall causal effect of incentivized advertising depends on the interplay between the
negative effect on clicking and the positive effect on installing conditional on clicking. For
our particular publisher, we find that incentivized advertising has an overall positive effect
on the ad conversion rate. A user is 27% more likely to install when served incentivized
advertising compared to non-incentivized advertising. In terms of ad revenue, this effect
is equivalent to a CPM (revenue per thousand of impressions) of $0.413. To give a sense of
the industry (mobile ad networks) benchmarks, the average CPMs for the US and China
are reported to be $7.00 and $2.70.7
Our result highlights the benefits of targeting the placement of incentivized ads accord-
ing to demographics. Rewards have a negative effect on Clicks for the users who are averse
to delayed rewards, and therefore the overall treatment effect on Install is heterogenous
according to user’s characteristics. We find that incentivized advertising is least effective
when the device language is set to Russian (effect size of 13%), and most effective for
Chinese languages (effect size of 47%). Given the potential cost of giving rewards, the
publisher should not use incentivized ads when the effect size is close to zero.
7See http://ecpm.adtapsy.com/
7
The rest of the paper is organized as follows. Section 2 describes the data and relevant
industry background. Section 3 develops and estimate the model. Section 4 develops an
alternative estimator using the propensity scores. Section 5 concludes. The appendix
contains all figures and tables.
2. Data and industry background
The dataset comes from a mobile gaming app. The genre of the app is classified as
“Action” in the Android App Store (it is not available in iOS or other operating systems).
The app relies on publishing ads to monetize its user base. It uses both incentivized and
non-incentivized ad placements.
In the context of this publisher, we define an incentivized ad to be a video ad that
rewards users after the ad has been played. While a non-incentivized ad is a video ad
that does not reward users after the ad has been played. Every ad is either incentivized
or non-incentivized. The rewards are tied to the game itself (in-app rewards). Typically,
the rewards unlock additional levels in the game for the users.8
The content of the ad consists of a short video trailer showing another mobile app.
These ads are users-targeted – they show mobile apps that users are likely to download
and install. The targeting and serving of these ads are operated by a platform. The
platform shares a pre-specified amount of percentage revenue with the publisher.
Users are not allowed to skip the ads. At the end of the ad, the user can exit the ad
by either clicking the ‘x’ button, or the user can click on ‘Install’ button. When the user
8Another kind of incentivized advertising provides rewards for users to install apps, but Apple has
blocked applications with such ad formats since 2011. c.f. TechCrunch (April 2011) Apple Clamps Down
On Incentivized App Downloads
8
clicks on the ‘Install’ button, the user will be directed to the App Store where she can
download the advertised app.
We define Intermediate to be a binary variable indicating whether the user has expressed
an intention to install by clicking on the ‘install’ button at the end of the ad – whereby
the user would have a chance to review more information about the advertised app in the
App Store. Intermediate is an intermediate outcome. The final outcome is Install, which
is a binary variable indicating whether the user has downloaded the advertised app.
This particular platform operates on a cost-per-install (CPI) model, where an advertiser
only pays the publisher in the event that the user installs the advertiser’s app. CPI
advertising is growing rapidly. Spending on CPI campaigns increased by 80% from 2014
to 2015 and accounted for 10.3% of of mobile advertising spend in 2015.9
2.1. Adverse selection
Adverse selection is an issue endemic to incentivized advertising. Adverse selection here
means the following: users deliberately seek out incentivized ad placements, in order to
obtain rewards. For instance, users who know where and when in the game to find ad
placements that are incentivized could then seek them out. These reward-seeking users
have low intention to install new apps. Incentivized advertising becomes ineffective when
adverse selection is severe – users only watch ads to collect rewards and are not converted
to install. It remains an open question whether incentivized advertising is effective and
should be widely adopted by publishers.
9eMarketer (December, 2015). Mobile Advertising and Marketing Trends Roundup
9
On the other hand, adverse selection also poses challenges to data analysis and causal
inference. Whenever an ad is served, it appears as an observation in our dataset. There-
fore in the presence of adverse selection, our sample of incentivized ads is self-selected
and consists disproportionately of reward-seeking users. Since a user is not randomly as-
signed to either incentivized or non-incentivized ads, estimating the effect of incentivized
advertising would be biased. If reward-seeking attitude is an observable characteristics,
correcting for selection can be done using propensity score methods. This is accomplished
in Section 5. More generally, we develop and estimate a model which allows and controls
for unobserved adverse selection in Section 3.
2.2. Data and variable description
The dataset contains 365,847 observations generated from the publisher. The timeframe
spans from May 1, 2016 to May 31, 2016. Each observation consists of an ad serving
instance. An ad serving instance is also commonly called an impression.
Whenever an ad is served, it is recorded as a unit of observation in the database. Note
that after the ad has been served, a user can choose not to watch or pay attention to the
ad. The user can take some actions such as clicking or installing after the ad has been
served, which we observed (outcome variables). We also observe some characteristics of
the users (control variables).
Each row of the dataset corresponds to an impression, hence we say that we have
impression-level dataset. Now a single user may be served multiple ads by the publisher.
Although we have 365,847 impressions, there are 143,280 unique users. The median user
10
generated only 1 impression, while the average user generated 2.55 impressions (standard
deviation of 3.26).
We now describe the treatment and the outcome variables. We also provide some
summary statistics of these variables. Each variable is subscripted by i, which we refer to
as impression i.
(1) Incentivized,di: a binary (zero or one) variable, where di= 1 indicates that the
user is in the treatment group during impression i. The user has been served an
incentivized ad. If di= 0, then the user is in the control group and has been served
a non-incentivized ad. The mean of diis 0.6898, i.e. 68.98% of all observations
correspond to incentivized ads.
(2) Intermediate,yτ
i: a binary outcome variable indicating whether the user during
impression ihas expressed intention to install by clicking on the ‘Install’ button
at the end of the ad. This intention is credible in the sense that the user would
then be redirected to the relevant page in the Android App Store for downloading
of the advertised app. The mean of yτ
iis 0.1344, that is, there are 49,179 clicks on
‘install’.
(3) Install,yi: a binary outcome variable indicating whether the user during impres-
sion ihas downloaded the advertiser’s app to her mobile device from the Android
App Store. The mean of yiis 0.0029, that is, there are 1,067 installs in total.
In addition to the treatment and outcome variables above, we now describe the control
or covariate variables. These variables are the observable characteristics of the users.
11
(1) Language: the language used in the user’s mobile device. The top 5 languages by
number of observations are: (1) Spanish (ES), 35.86%; (2) English (EN), 25.82%;
(3) Portuguese (PT), 11.03%; (4) Russian (RU) 6.93%; (5) Chinese (ZH), 6.81%.
(2) Country: the country of the user based on device and time-zone setting. The top
5 countries by number of observations are: (1) India, 12.13%; (2) Mexico, 11.75;
(3) Brazil, 10.36%; (4) China, 7.40%; (5) Indonesia, 5.52%.
(3) Region: it is useful to group countries into geographical regions that are similar
to each other. We classify countries into statistical subregions as defined by the
United Nations. The top 10 subregions by number of observations are: (1) South
America, 28.84%; (2) Central America, 15.33%; (3) Southern Asia, 14.68%; (4)
South-Eastern Asia, 14.19%; (5) Eastern Asia, 7.94%; (6) Eastern Europe, 5.21%;
(7) Western Asia (Middle East), 3.70%; (8) Northern America, 2.35%; (9) Central
Asia, 1.97%; (10) Southern Europe, 1.88%. However some of these regions are
highly correlated with languages. As such, we will not construct indicator variables
for Eastern Asia (correlation of 0.91 with ZH), Central America (correlation of 0.55
with ES), and Central Asia (correlation of 0.50 with RU).
(4) WiFi: whether the device is connected via WiFi or mobile data when the ad
request is sent to the intermediary. The average value of WiFi is 0.7804, that is,
78.04% of the users were on WiFi.
(5) Device Brand: the manufacturer of the user’s mobile device. Since this particular
app operates on an Android platform, one of the most prominent brand, Apple,
is not included here. The top 5 device brands by number of observations are: (1)
12
Samsung, 40.89%; (2) Motorola, 7.11%; (3) Huawei, 5.77%; (4) LG, 4.76%; (5)
Lenovo, 4.39%.
(6) Device Volume: a numeric value from [0,1] that describes the level of device volume
when the device sends the intermediary with an ad request. The mean of Device
Volume is 0.55, with a standard deviation of 0.30.
(7) Screen Resolution: the number of pixels (per million) of the user’s mobile device.
It is computed by multiplying the number of pixels per horizontal line by the
number of pixels per vertical line. A higher screen resolution means better visual
quality. The mean is 0.857, while the standard deviation is 0.645.
(8) Android Version: an integer-valued variable from 1 to 8 indicating the version
number of the Android mobile operating system. A higher number corresponds to
a newer and more recent Android operating system. At the time of this dataset,
the most recent Android version is Android 6.0 (code name: Marshmallow). The
mean is 4.45 and the standard deviation is 0.61.
The characteristics of a user can change over time, for instance, a user could have different
device volume settings at different time periods. Causal inference does not follow simply
from comparing the outcome of a user for when she was served incentivized versus non-
incentivized ads.
3. Treatment effect model with unobserved selection
How does rewarding users for watching an ad affect the subsequent action (Install)
taken by the user? When users are randomly assigned incentivized (treatment) or non-
incentivized ads (control), then the causal effect of incentivized ads can be determined
13
by comparing the outcome of the treatment versus the control group. Here, we do not
have the luxury of random assignment, and we must then control for the selection of
reward-seeking users into the treatment group (i.e. adverse selection).
When adverse selection is solely attributed to the observable characteristics of the
users, estimators based on propensity scores can be used to obtain the treatment effect
of incentivized advertising. This is done in Section 5. Here, we undertake a more general
treatment effect model that allows for unobserved selection. As a motivation, suppose that
there is an unobserved variable vithat measures the degree of rewards-seeking behavior
of user i. Users who are more reward-seeking are more likely be self-selected into the
treatment group due to the rewards from incentivized ads. This is modeled as Equation
1below, where di= 1[x1iγ+vi+ǫ1i≥0]. Here, x1iis a vector of observed characteristics
of the user i, and γis a vector of unknown parameters.
The probability that the user ithen expresses the intention to install is yτ
i= 1[ui+ǫ2i≥
0]. Now, uiis the utility that a user ienjoys from installing a new app. ǫ2iis the
unobserved taste of the users. If ǫ2iand viare correlated, then the assumption underlying
the standard propensity score method (Section 5) is violated.10 In particular, it is likely
that viis negatively correlated with ǫ2i. That is, a more reward-seeking user is less likely
to click on ‘install’, because the reward-seeking user would rather collect the rewards
immediately instead of clicking on ‘install’ and going to the App store. We will take
unobserved adverse selection as meaning that there is a negative correlation between ǫ1i
and ǫ2i.
10Users’ outcome is no longer independent of their treatment assignment conditional on observables.
Here, a user who has higher unobserved viis more likely to be selected into di= 1, and subsequently
affects the outcome yc
i.
14
Conditional on clicking on ‘install’, the user’s probability of installing the app is given
by 1[ui+ǫ3i≥0], where ǫ3iis the unobserved tastes that affect users at the App Store
(when users could see more information about the app). As before, uiis the utility that
the user enjoys from installing a new app.
3.1. Unobserved selection
Based on our preceding discussion, we can estimate a model incorporating unobserved
adverse selection. The model is an endogenous treatment effect model with two layers of
outcomes: the intermediate outcome and the final Install outcome. The model consists of
three interdependent non-linear equations, as given below. Note that we have absorbed
vi(the user’s reward-seeking attitude) into e1i.
di= 1[u1i+ǫ1i≥0](1)
yτ
i= 1[α1di+u2i+ǫ2i≥0](2)
yi
(yτ
i= 1) = 1[α2di+u3i+ǫ3i≥0](3)
yi
(yτ
i= 0) = 0
Equation 1is the selection equation, it determines when a user is selected into the
incentivized ads treatment. Equations 2and 3are the outcome equations. Equation 2
determines when a user would express the intention to install (by clicking on “install”).
Equation 3determines when a user would install the advertised app after clicking on
“install”. Equation 3can be written more compactly as yi
yτ
i=yτ
i·1[α2di+u3i+ǫ3i≥0].
15
α1and α2measure the effect of incentivized advertising on the pair of outcomes inten-
tion and install. (ǫ1i, ǫ2i, ǫ3i) are idiosyncratic preferences unobserved to us, but observed
by the users. Crucially, we allow these errors to be correlated with each other. If they
are uncorrelated, there is no unobserved selection effect and we can use propensity score
methods. It is not feasible to use a two-stage plug-in procedure where we first estimate the
selection equation then plug-in the estimates for di. These equations must be estimated
jointly. The joint distribution of (ǫ1i, ǫ2i, ǫ3i) will be specified in the next section.
Now we parameterize the utilities as follows: (i) u1i=x1iγ, (ii) u2i=x2iβ, and (iii)
u3i=w1·(x2iβ) + w2. Now x1iand x2iare vectors of covariates that are subsets of xi.
The utility from installing a new app is u2i=x2iβ. This utility enters into the equations
for both Intermediate and Install. We allow this utility to be scaled and translated by w1
and w2when it enters into the equation for Install. The parameter w1allows the user to
express curiosity or motives for information acquisition. For example, when 0 < w1<1,
then the user’s utility for the app is magnified during the Intermediate stage, and the user
is more likely to click on the ad to find out more about the app in the App Store. At
the Install stage, this amplification disappears, and the likelihood of installing the app
would just depend on the actual utility for the app plus some noise that represents new
information from the App Store.
This formulation of utilities is not crucial to the model. We parameterize the utilities
in this manner in order to reduce the number of parameters to be estimated. Even with
this structure, we have a high-dimensional set of parameters to be estimated. Almost all
our covariates are indicator or categorical variables: whether a user is located at a certain
16
region, whether a user speaks a certain language, etc. For this reason, the formulation
u3i=w1·(x2iβ) + w2is helpful in reducing the number of parameters.
The pair of Equations 1and 2represents a standard approach for handling treatment
endogeneity in binary outcome models (see, e.g., Smith and Blundell [1986], Rivers and
Vuong [1988], or Wooldridge [2002] (Section 15.7). The outcome variable is modeled
as Equation 2, but it contains an endogenous treatment variable di, which we model
as Equation 1. This endogeneity arises because of the correlation between (ǫ1i, ǫ2i, ǫ3i).
Our framework here differs from the standard approach in that we have an additional
outcome variable (Equation 3) that also depends on the endogenous treatment variable.
In a well-known study, Evans and Schwab [1995] estimates the pair of Equations 1and 2
as a bivariate probit model.
3.2. Identification
In the frequentist setting, identification and estimation of the model relies on the presence
of an exclusion restriction – an instrumental variable that enters into the selection equa-
tion, but does not enter into the outcome equations (see Wooldridge [2002] and Evans
and Schwab [1995]). Now among the variables that are available to us in Section 2.2, it
is not clear a priori whether we have an exogenous instrumental variable. Therefore we
follow the plausibly exogenous approach of Conley, Hansen, and Rossi [2012], where we
place a near-zero prior on a plausibly exogenous variable. We then estimate the model
using Bayesian MCMC.
Specifically, we choose the variable Device Volume as a plausible instrumental variable.
Let the coefficient on Device Volume in Equation 2be denoted by γ, our prior for γis
17
γ|α1∼ N (0, δ2α2
1). When δ= 0, Device Volume is a fully valid exclusion restriction in the
frequentist sense. We set δ= 0.25, which allows Device Volume to have a small effect in
the outcome equation, in particular, the effect of Device Volume is proportionally smaller
than the treatment effect α1. The idea is that Device Volume enters into the selection
equation, but only has a relatively small effect on the user’s eventual outcomes.
This is reasonable: the user’s device volume is recorded at the moment of ad servings. If
the user’s volume setting is high, she will be less incline to seek out and watch incentivized
ads, hence Device Volume affects selection (negatively). Now after the selection stage, the
user is free to adjust her volume setting during the ad. Because users adjust their volumes
during the ads, the pre-adjusted volume settings should not affect users’ outcomes. While
the volume settings prevailed during the ads could affect users’ outcomes, this volume
setting is different from the recorded volume settings, which should not affect users’
outcomes.
3.3. Scalable Estimation
A desideratum for our estimation procedure is that it must be scalable, in the sense that it
must be suitable for impressions-level data. For some popular publisher, impressions-level
data means billions of observations in a single day.11 Estimation entails calculating the
likelihood for each impression and summing them up. Moreover, calculating the likelihood
for each impression involves modeling the dependence between the unobservables in the
selection and the outcome equations (due to adverse selection). We find that modeling
the dependence between (ǫ1i, ǫ2i, ǫ3i) as a multivariate Gaussian is too slow in this setting,
11http://www.businessinsider.com/the-size-of-fbx-facebooks-ad-exchange-2012-11
18
even though we only have over 350,000 impressions. The reason is: we need to compute
the CDF of a trivariate Gaussian as many times as there are impressions. Computing
each CDF of a trivariate Gaussian involves multi-dimensional integrations, which required
either Monte Carlo integration or numerical quadrature.12
With this in mind, we now specify the distributions of (ǫ1i, ǫ2i, ǫ3i) that lead to a
tractable likelihood. The marginal distributions of ǫ1i,ǫ2iand ǫ3iare assumed to have
the standard logistic distributions. That is, ǫ1i∼Logistic(0,1), and the CDF of ǫ1iis
Pr(ǫ1i≤x) = 1
1+e−x. Similarly, the marginal distributions of ǫ2iand ǫ3iare both assumed
to have the standard logistic distributions. Denote F1(e1), F2(e2), F3(e3) as the marginal
CDFs of ǫ1i,ǫ2iand ǫ3irespectively.
To model the dependence between (ǫ1i, ǫ2i, ǫ3i), the joint CDF of (ǫ1i, ǫ2i, ǫ3i) is formu-
lated as C(F1(e1), F2(e2), F3(e3)). This is without loss of generality – any joint CDF of
(ǫ1i, ǫ2i, ǫ3i) can be written this way (Skylar’s Theorem). The function Cis known as a
Copula. Conversely, when Csatisfies some properties, then C(F1(e1), F2(e2), F3(e3)) is a
valid joint CDF. The idea is to choose a copula that is more tractable than the multivari-
ate Gaussian. Copulas are used extensively in finance to model the dependence among
random variables, and recently, copulas have appeared in various marketing journals,
see Danaher and Smith [2011a,b], George and Jensen [2011], Kumar, Zhang, and Luo
[2014]. These papers also contain formal introductions of copulas and their applicability
in marketing.
12For instance in MATLAB and R, the algorithm to calculate the CDF of a trivariate Gaussian employs
numerical quadrature techniques developed by Drezner and Wesolowsky (1989), and Genz (2004). For
higher dimensions, quasi-Monte Carlo integration algorithm is used.
19
We model the joint CDF of (ǫ1i, ǫ2i, ǫ3i) as Pr(ǫ1i≤e1, ǫ2i≤e2, ǫ3i≤e3) = F1(e1)−θ+
F2(e2)−θ+F3(e3)−θ−2+−1/θ. The notation [x]+means max{x, 0}, i.e. [x]+cannot be
negative. F1,F2, and F3are marginal CDFs of ǫ1i,ǫ2iand ǫ3irespectively. The parameter
θ∈[−1,∞)\ {0}controls the dependence among the variables. This copula is known as
the Clayton copula, where C(x, y, z;θ) = ([x−θ+y−θ+z−θ]+)−1/θ . There is a one-to-one
relationship between the parameter θand Kendall rank correlation coefficient τbetween
the variables, given by τ=θ
θ+2 . Therefore, when θis negative, ǫ1iand ǫ2iare negatively
correlated in the sense of having a negative rank correlation coefficient, which is indica-
tive of unobserved adverse selection. When τis estimated to be close to zero, (ǫ1i, ǫ2i)
are uncorrelated, and there is no unobserved adverse selection (we can then use standard
propensity score methods). Another commonly used copula is the Gumbel copula, which
is a multivariate extension of the familiar Gumbel distribution. We do not use the Gumbel
copula because it restricts τto be positive.
Having formulated the joint distributions of (ǫ1i, ǫ2i, ǫ3i), we can then derive the likeli-
hood for each impression iaccording to Equations 1to 3. The log-likelihood of observing
the data (di, yτ
i, yi,xi)n
i=1 given Θ, the set of parameters to be estimated, is denoted as
L((di, yτ
i, yi,xi)n
i=1|Θ). There are 52 parameters to be estimated, and we will describe
them in the next section. Due to the choice of our joint distribution, this log-likelihood
function can be derived in closed-form. This log-likelihood function can be computed
very quickly even when there is a large number of impressions because it does not involve
numerical integration.
More importantly, the gradient of the log-likelihood function with respect to the pa-
rameters can also be computed with ease. Being able to easily compute the gradient of
20
the target distribution allows us to employ more efficient Markov Chain Monte Carlo al-
gorithms such as Hamiltonian Markov Chain or Metropolis-adjusted Langevin algorithm
(MALA) (Roberts and Tweedie [1996]). These MCMC methods are more suitable here
compared to the plain random walk metropolis since we have a moderately large number
of parameters. Our MCMC method will be based on MALA. Informally, MALA con-
structs a random walk that drifts in the direction of the gradient, and hence the gradient
enables the random walk to move more efficiently towards regions of high-probability. It
also has a Metropolis-Hastings accept/reject mechanism that improves the mixing and
convergence properties of this random walk.
For the priors, we impose uninformative priors on all the parameters, except for the
parameters corresponding to the instrument variable (Device Volume), and the scale pa-
rameter w1. The uninformative prior for a parameter is given by the Gaussian distribution
with a mean of zero and a standard deviation of 100. The scale parameter w1has a prior
of N(0.5,0.25). In order to restrict the copula dependence parameter θto be within
[−1,∞), we apply the transformation θ=f(˜
θ) = (˜
θ+ 1)2−1, and subsequently impose
an uninformative prior of N(0,100) on ˜
θ.
We ran the MALA Markov Chain 5,000 iterations. Despite such a small number of
iterations, convergence occurred quickly, which is not surprising since we have employed
a gradient-based MCMC algorithm. Specifically, using the diagnostic of Heidelberger and
Welch individually on all parameters, we reject the null hypothesis of non-stationarity for
all parameters when the first-half of the chain is discarded as burn-in samples. We report
the posterior means and standard deviations after discarding the burn-in samples. This
is done in the next section.
21
4. Parameter estimates and results
In total, there are 52 parameters to be estimated. We allow the treatment effect for
Intermediate to vary over the main language groups, so that Equation (2) now becomes
yτ
i= 1[(α1zi)di+xiβ+ǫ2i≥0], where α1zi=a0+a1×ENi+a2×ESi+a3×P Ti+a4×
RUi+a5×ZHi. The indicator variables ENi,E Si,P Ti,RUiand Z Hiindicate whether
the language setting of impression iis English, Spanish, Portuguese, Russian, or Chinese.
These are the five major language groups covering over 86% of all impressions. We do
not estimate for heterogeneous treatment effects in the Install stage because the number
of impressions where both selection and install occurred is much smaller compared to the
number of impressions where both selection and clicks occurred.
To summarize, there are 21 parameters to be estimated in the selection equation di=
1[xiγ+ǫ1i≥0]. We list these parameters and show their estimates in Table 1. There are
26 parameters to be estimated in the Intermediate outcome equation yτ
i= 1[(α1zi)di+
xiβ+ǫ2i≥0]. We describe these parameters and show their estimates in Table 2. There
are 4 parameters to be estimated in the Install outcome equation yi
yτ
i=yτ
i·1[α2di+
w1xiβ+w2+ǫ3i≥0]. We list these parameters in Table 3. Finally, we also need to
estimate the parameter θwhich controls the degree of dependence among the unobserved
error terms.
In Section 5.3, we use the standard propensity score method to show that qualitatively
similar results are obtained. While our model here controls for unobserved selection, the
standard propensity score methods control only for observed selections.
22
4.1. Estimates of the selection equation
Let us elaborate on Table 1, which reports the posterior means and standard errors of the
parameters in the selection equation, di= 1[xiγ+ǫ1i≥0].
First, we see that θ, the dependence parameter of the copula is −0.353. This translates
to a Kendall rank correlation coefficient τbetween ǫi1and ǫi2of τ=θ
θ+2 =−0.214. This is
an evidence for unobserved adverse selection. There is an unobserved user’s characteristic
(degree of reward-seeking) that increases the likelihood of selection into treatment, and
at the same time, decreases the likelihood of clicking on ‘install’.
Looking at the other coefficients in Table 1, we find that they support an adverse
selection narrative. For instance, the coefficient on WiFi is positive – a user with WiFi
internet connection is more likely to seek out the incentivized ad treatment. Users are
less likely to seek out incentivized ad placements when connected to cellular networks,
which are slower and costly.
The coefficient on Device Volume is negative. A user whose device’s volume is higher
is less likely to seek out incentivized ad treatment. An explanation is that a user would
experience more annoyance and discomfort from watching an ad when the volume is
higher, and hence, she is more reluctant to seek out incentivized ads.
The coefficient on Screen Resolution is positive. A user who has a better visual ex-
perience is less averse to watching ads, and hence is more likely to seek out incentivized
ad treatment. The coefficient on Android Version is also positive, suggesting that a user
with a more recent Android operating system is more likely to seek out incentivized ad
treatment.
23
Overall, the result from Table 1shows evidence of adverse selection – users deliberately
seek out incentivized ads to obtain rewards.
4.2. Estimates of the intermediate outcome equation
Now we examine the estimates for the Intermediate outcome equation, yτ
i= 1[(α1zi)di+
xiβ+ǫ2i≥0]. Table 2reports the posterior means and standard deviations of the
coefficients.
We find that the treatment effects vary according to the languages that were chosen
by the users. The baseline treatment effect α1is significantly negative. Moreover for the
users who have chosen English, Spanish and Russian, the treatment effects are significantly
negative and larger in magnitudes than the baseline. While the users who have chosen
Portuguese and Chinese, the treatment effects are significantly positive.
The negative treatment effect is surprising, as it implies that incentivized ad decreases
the probability of clicks compared to non-incentivized ads. That is, for a subset of users
being exposed to incentivized ads, they are less likely to go beyond this intermediate step
of clicking on the ads, compared to their counterparts in the control group (exposed to
non-incentivized ads).
Our explanation is that rewards have negative distortionary effects in the intermediate
stage because users prefer not to delay their rewards by clicking on ‘install’. These users
are averse to delayed rewards. They would rather collect their rewards immediately rather
than going to the App Store even though they are sufficiently interested in the advertised
app. In the absence of rewards (setting di= 0), these users would not be distracted away
24
by the rewards, and would actually be more likely to click on the ads and go to the App
Store.
For the users whose device languages are Portuguese and Chinese, the treatment effect
on the intermediate outcome is positive. The fact that rewards have a positive effect is
somewhat less surprising. We will postpone the explanation to the next section when we
discuss the final outcome equation.
4.3. Estimates of the final outcome equation
We see in Table 3that α2, the treatment effect on Install (conditional on having clicked) is
positive. Previously, we also see that during the intermediate stage, the treatment effect
on clicks is positive for some users. Therefore, the overall treatment effect for these users
are unambiguously positive.
We now offer an explanation for the positive treatment effects of incentivized ads on
Intermediate and Install. Research in the consumer’s behavior literature (Calder and
Sternthal [1980], MacKenzie and Lutz [1989], MacKenzie et al. [1986], Mitchell and Olson
[1981] ; Shimp 1981) suggests that a person’s affective state (moods and feelings) when he
or she watches the ad is an important predictor of advertising effectiveness and purchase
intention. The reward, which is given by the publisher, causes the user to perceive the
publisher’s content more favorably, including the ads that are published therein. Therefore
ad conversion is higher when users are being rewarded for watching the ads.
Note that the reward is unrelated to the advertiser’s content or product, therefore we
can rule out the complementarity between rewards and the advertiser’s product. When
25
there is a complementarity, a user could is more interested in the advertiser’s app when
she is also being rewarded.
Now for those users who experienced a negative treatment effect during the Intermediate
stage, the overall treatment effect is ambiguous. We will quantify the overall treatment
effect in the next section. Our estimation suggests that a user can both experience a
negative treatment effect during the Intermediate stage, but a positive treatment effect
during the final Install stage. This is not contradictory to our explanation. If the user
were to reach the final stage, the aversion to delayed rewards would diminish since there
is now a shorter time between Install and the collection of rewards.
4.4. Counterfactuals
In the previous section, we have seen that the overall treatment effects are ambiguously
signed for some users. Here, we would like to quantify the overall treatment effects. First,
we calculate the overall Average Treatment Effect (ATE) on Install implied by the model.
The ATE is calculated as follows: for each impression i, we compute the probability
that the user would click on ‘install’ and eventually install, if the user were to be in the
treatment group, then minus the probability that the user would click on ‘install’ and
eventually install, if the user were to be in the control group. More precisely, we have:
AT E =1
n
n
X
i=1 ˆ
Pr(α1zi)di+xiβ+ǫ2i≥0, α2di+w1xiβ+w2+ǫ3i≥0
−ˆ
Prxiβ+ǫ2i≥0, w1xiβ+w2+ǫ3i≥0
(4)
26
The ATE measures how much the overall unconditional Install rate would change as a
result of comparing two counterfactual scenarios for every impression: (1) when the user’s
impression is served an incentivized ad, and (2) when the user’s impression is served a non-
incentivized ad. These changes in the Install rate are then averaged over all impressions
to obtain the ATE.
Using the formula in Equation 4, the Average Treatment Effect implied by the model is
0.000795. This is a large magnitude given that the baseline install is 0.00292 (1,067 installs
out of 365,847 ad serving). The ATE of 0.000795 represents an increase of 27%. Therefore,
a user is 27% more likely to install when served incentivized advertising compared to non-
incentivized advertising. Since the publisher is paid per-install, this represents a large
increase in ad revenue for the publisher (as well as the platform who shares revenue with
the publisher). We have proposed an explanation for why rewards have a positive effect
on user’s behavior. There is a well-known link between a person’s affective state (moods
and feelings) during ad exposure, and the subsequent purchase intention. Therefore being
rewarded for watching an ad causes the user to feel less annoyed at advertising, which
increases ad effectiveness and conversion rate.
How does this ATE translate to ad revenue? We can provide a back-of-the-envelope
calculation. The average price per-install commanded by this publisher is $0.52. Hence
this ATE translates to 0.000795 ×$0.52 = $0.0004134, or $0.413 per thousands of im-
pressions. Ad revenues are frequently measured in terms of CPM (revenue per thousands
of impressions). To give a sense of the industry (mobile ad networks) benchmarks, the
average CPMs for the US and China are reported to be $7.00 and $2.70.13
13http://ecpm.adtapsy.com/
27
While incentivized advertising has an overall positive effect, we saw previously that
there is a negative countervailing effect. This negative countervailing effect enters in the
intermediate stage. Rewards have negative distortionary effects in the intermediate stage
when users prefer to collect their rewards immediately after watching the ads, instead
of clicking on ‘install’ and going to the App Store. Moreover, this negative effect varies
widely among users. Therefore we expect the effect of incentivized ads to be less for those
users who are adverse to delayed rewards. To quantify this, we compute the treatment
effects averaged locally according to users’ languages. When we calculate the (Local)
Average Treatment Effects by languages of the users, we see that rewarding users to
watch ads has the largest effect on users whose device language is Chinese. The treatment
effects averaged over English, Spanish, Portuguese, Russian, Chinese users are respectively
0.000752, 0.000667, 0.000608, 0.000391, 0.00138. In terms of dollar amounts and CPM,
the magnitudes of these treatment effects are $0.391, $0.347, $0.316, $0.203, and $0.718,
respectively.
Another useful counterfactual from the perspective of the publisher is the Average
Treatment Effect on the Treated. Suppose we had switched all incentivized ads to non-
incentivized ads, what is the effect? This is more relevant to the publisher because
it represents a counterfactual that the publisher can directly implement. The average
treatment effect on the treated is computed by averaging Equation 4over isuch that
di= 1, which amounts to 0.000724, or an equivalent CPM of $0.376. Moreover since
there are 252,379 treated observations, this implies that the publisher would lose 183
installs.
28
We can also quantify the revenue impact of adverse selection. In the following coun-
terfactual, we remove unobserved adverse selection, that is, we suppose that selection is
independent of outcomes.14 Whether or not an impression is served an incentivized ad
is independent of the actions that would be taken during the Intermediate and Install
stages. This rules out reward-seeking users who self-select into watching incentivized ads
but otherwise they are not interested in the ad itself. The revenue impact of unobserved
adverse selection is calculated using Equation 5below, which amounts to 0.000552, or
$0.287 CPM. Therefore, adverse selection negatively impacts publisher’s ad revenue.
1
252,379 X
i:di=1 ˆ
Pr[di= 1] ·ˆ
Pr[yτ
i= 1, yi= 1] −ˆ
Pr[di= 1, yτ
i= 1, yi= 1]
(5)
5. Estimating treatment effects using propensity scores
In this section, we estimate the treatment effect of incentivized advertising using propen-
sity scores. We want to compare our previous results to other model-free approaches.
Propensity score method can control for selection bias to the extent that selection is
based on observables. Therefore it is not valid in the presence unobserved selection,
which we have analyzed previously.
5.1. Estimation procedure
Identical to the previous data environment, we observe (di, yτ
i, yi,xi) for the sample of
impressions i= 1,...,n, where xiis a vector of user’s covariates during impression i.
14We implicitly conditioned on observed covariates. Note that this is precisely the assumption that
underlies standard propensity score methods.
29
Our estimation procedure consists of two steps. In the first step, we estimate the
propensity scores: ˆpi= Pr(di= 1|xi), which is the probability that a user is served an
incentivized ad during impression i. We estimate the propensity scores using a Probit
regression of dion the user’s covariates xi. Note that ximust only contain pre-treatment
covariates. Pre-treatment covariates are the user’s characteristics that could affect the
user’s selection into treatment.
In the second step, we construct ˆpi, which are the fitted values of the Probit regression
from the first-step. Then, we run the regression of yion 1, di, ˆpi,di(ˆpi−µp) for i= 1,...,n,
where µpis the average value of ˆpiacross i= 1,...,n. This is the control function
approach explained in Proposition 18.5 of Wooldridge [2002]. Under some assumptions,
the ATE on Intermediate can be recovered as the coefficient on the regressor diwhen
regressing yτ
ion 1, di, ˆpi,di(ˆpi−µp) for i= 1,...,n, while the ATE on Install can be
obtained as the coefficient on the regressor diwhen regressing yion 1, di, ˆpi,di(ˆpi−µp)
for i= 1,...,n.
In addition, we can include higher order polynomial terms of the propensity scores in
order to better control for selection bias (making sure to de-mean the propensity score
term before constructing its interaction with di). Therefore we also regress yion 1, di, ˆp2
i,
ˆp3
i,di(ˆpi−µp), for i= 1,...,n.
The assumptions needed are explained in Proposition 18.5 of Wooldridge [2002]. We
will briefly discuss the main assumption, which is the assumption of “ignorability of
treatment” (Rosenbaum and Rubin [1983]). This assumption is also known as selection
on observables. Given observed covariates x:diand (y0i, y1i) are independent conditional
on xi. This assumption implies that E[y0i|xi, di] = E[y0i|xi] and E[y1i|xi, di] = E[y1i|xi].
30
There are other methods for estimating the ATE, relying on different assumptions.
We find that these other methods deliver similar results. For instance, the ATE can be
estimated as an Inverse Probability Weighted Estimator using the propensity scores. That
is, AT E =1
nPn
i=1
yi(di−ˆp(xi))
ˆp(xi)(1−ˆp(xi)) (see Proposition 18.3 of Wooldridge [2002]). One method
to compute the ATE that does not rely on the propensity scores is 1
nPn
iˆr(xi), where
r(x) = Pr[yi= 1|x, di= 1] −Pr[yi= 1|x, di= 0].
5.2. First-stage adverse selection estimation
In the first stage, we estimate the propensity scores via a Probit regression. Specifically,
the dependent variable is the binary treatment variable Incentivized, or di. The covariates
are Android Version,Wifi,Screen Resolution,Device Volume. We also control for the
following fixed effects: Countries,Languages and Device Brands.
The result is given in Table 4. We find that the result is qualitatively similar to the
result obtained from estimating the selection equation (see Section 4.1).
5.3. Second-stage treatment effect estimation
Using the first-stage propensity scores, we now estimate the average treatment effects
(ATE). We show the result in Tables 5and 6. Again, the results obtained here are
qualitatively similar to the model-based results.
The ATE on Intermediate is significantly negative, while the ATE on Install is signif-
icantly positive. From Column 2 (Intermediate) of Table 5, the ATE on Intermediate is
−0.0635. This means that rewarding users to watch an ad reduces the probability that
a user clicks on install by −0.0635 on average. The baseline Intermediate is 0.1344, i.e.
31
49,179 clicks out of 365,847. An ATE of this magnitude represents almost 50% decrease
in the probability that a user would click on install.
Now the ATE for Install is statistically significant at 0.00795 (Column 2 of Table 6).
This is a large magnitude because the baseline Install is 0.0217 (i.e. 1,067 installs out of
49,179 clicks). Therefore an ATE of this magnitude represents 36.6% increase in Install.
In another words, if users are rewarded for watching the ads, they are 36.6% more likely
to install the advertised app at the App store.
Compounding the effect of Intermediate, the overall effect on Install is positive and
significant. From Column 4 of Table 6, the overall ATE obtained using the propensity
score method here is 0.00187, while the ATE obtained using the model that controls for
unobserved selection is 0.000795. Hence, the propensity score method biases the ATE
upwards.
5.4. Naive treatment effects
In the Appendix (Table 7), we show results without controlling for any selection bias.
We use probit regressions to show how incentivized advertising is related to (i) the user’s
probability of clicking ‘install’, and (ii) the user’s probability of installing. We control for
all the user’s characteristics mentioned in the preceding section. However these regressions
are not valid if there is a selection bias. We will not interpret these coefficients further.
6. Appendix
6.1. Tables and Figures
32
Table 1. Parameters appearing in the selection equation, di= 1[xiγ+
ǫ1i≥0]. The variables that correspond to these parameters are detailed in
Section 2.2.
Parameter (Description) Estimates
θ(Dependence parameter of the copula) -0.353 (0.00323)
Device Volume -0.0879 (0.00536)
WiFi 0.352 (0.00712)
Android Version 0.133 (0.00106)
Screen Resolution -0.0172 (0.00174)
Huawei Dummy 0.0837 (0.0134)
Lenovo Dummy -0.0792 (0.00346)
LG Dummy 0.157 (0.00415)
Motorola Dummy 0.17 (0.00186)
Samsung Dummy 0.0141 (0.000779)
EN (English Language Dummy) -0.183 (0.00179)
ES (Spanish Language Dummy) 0.253 (0.00617)
PT (Portuguese Language Dummy) 0.317 (0.00976)
RU (Russian Language Dummy) 0.114 (0.0019)
ZH (Chinese Language Dummy) -0.573 (0.0192)
33
North America Dummy 0.0571 (0.00441)
South America Dummy 0.18 (0.00206)
South-East Asia Dummy 0.00197 (0.00162)
South Asia Dummy -0.276 (0.00528)
Middle East Dummy -0.214 (0.0141)
Southern and Eastern Europe Dummy 0.117 (0.00133)
Constant 0.0151 (0.000812)
Table 2. Parameters appearing in the Intermediate outcome equation,
yτ
i= 1[(α1zi)di+xiβ+ǫ2i≥0]
Parameter (Description) Estimates (Standard Error)
α1(Treatment effect baseline) -0.0124 (0.000371)
α1×EN (Interaction of treatment effect and EN ) -0.0122 (0.00107)
α1×ES (Interaction of treatment effect and ES) -0.0734 (0.00161)
α1×P T (Interaction of treatment effect and P T ) 0.0553 (0.00568)
α1×RU (Interaction of treatment effect and RU) -0.0616 (0.00293)
α1×ZH (Interaction of treatment effect and ZH ) 0.111 (0.0066)
34
Device Volume -0.0132 (0.0223)
WiFi -0.181 (0.00526)
Android Version -0.327 (0.000725)
Screen Resolution -0.0314 (0.000939)
Huawei Dummy -0.0413 (0.00175)
Lenovo Dummy -0.0505 (0.00447)
LG Dummy -0.0787 (0.0023)
Motorola Dummy -0.0388 (0.00102)
Samsung Dummy -0.026 (0.00141)
EN (English Language Dummy) -0.0245 (0.00243)
ES (Spanish Language Dummy) -0.145 (0.00583)
PT (Portuguese Language Dummy) 0.0349 (0.00109)
RU (Russian Language Dummy) -0.0917 (0.00214)
ZH (Chinese Language Dummy) 0.00464 (0.00311)
North America Dummy -0.138 (0.00566)
South America Dummy -0.0577 (0.00194)
South-East Asia Dummy -0.115 (0.00379)
South Asia Dummy 0.0727 (0.0043)
35
Middle East Dummy 0.0922 (0.00567)
Southern and Eastern Europe Dummy -0.119 (0.00637)
Constant -0.0325 (0.000378)
Table 3. Parameters appearing in the Install outcome equation, yi
yτ
i=
yτ
i·1[α2di+w1xiβ+w2+ǫ3i≥0]
Parameter (Description) Estimates
α2(Install treatment effect) 0.141 (0.0074)
w1(Scale parameter) 0.00732 (0.000534)
w2(Constant) -0.199 (0.0015)
36
(1)
Incentivized
Android Version 0.117∗∗∗
(0.00482)
Device Volume -0.217∗∗∗
(0.00810)
Screen Resolution 0.0117∗∗∗
(millions of pixels) (0.00432)
WiFi 0.556∗∗∗
(0.00594)
Constant -1.148∗∗∗
(0.0628)
N358,127
Countries controlled: Yes (178 indicator variables)
Languages controlled: Yes (48 indicator variables)
Device brands controlled: Yes (10 indicator variables)
Standard errors in parentheses. ∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Table 4. First-stage probit estimation of propensity scores.
37
(1) (2)
Intermediate Intermediate
Incentivized -0.0611∗∗∗ -0.0635∗∗∗
(0.00146) (0.00146)
ˆ
p(x) -0.213∗∗∗ -0.452∗∗∗
(0.00540) (0.0685)
Incentivized ×˙
ˆ
p(x) 0.150∗∗∗ 0.0978∗∗∗
(0.00680) (0.00754)
ˆ
p(x)20.194
(0.126)
ˆ
p(x)30.0319
(0.0717)
Constant 0.317∗∗∗ 0.372∗∗∗
(0.00338) (0.0113)
N358128 358128
Standard errors in parentheses
∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Table 5. Regressions with propensity score to control for selection bias.
The coefficient on Incentivized shows the average treatment effect of incen-
tivized advertising on the Intermediate outcome.
38
(1) (2) (3) (4)
Install Install Install Install
Incentivized 0.00724∗∗∗ 0.00795∗∗∗ 0.00800∗∗∗ 0.00187∗∗∗
(0.00111) (0.00112) (0.00113) (0.000196)
ˆ
p(x) 0.000597 0.140∗∗∗ 0.136∗∗∗ 0.0527∗∗∗
(0.00230) (0.0302) (0.0238) (0.00796)
Incentivized ×˙
ˆ
p(x) -0.00811∗-0.00242 -0.00480 -0.00369∗∗∗
(0.00488) (0.00539) (0.00497) (0.00121)
ˆ
p(x)2-0.253∗∗∗ -0.295∗∗∗ -0.101∗∗∗
(0.0662) (0.0586) (0.0160)
ˆ
p(x)30.136∗∗∗ 0.183∗∗∗ 0.0548∗∗∗
(0.0427) (0.0412) (0.00972)
Constant 0.00802∗∗∗ -0.0132∗∗∗ -0.00786∗∗∗ -0.00477∗∗∗
(0.00128) (0.00378) (0.00248) (0.00112)
N48390 48390 48266 358128
Standard errors in parentheses
∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Table 6. Regressions with propensity score to control for selection bias.
In Columns (1) to (3), we conditioned on Intermediate = 1. In Column (3),
the first-stage propensity scores are computed using only the subset of data
such that Intermediate = 1.
39
(1) (2) (3) (4) (5)
Intermediate Intermediate Install Install Install
Incentivized -0.408∗∗∗ -0.307∗∗∗ 0.0984∗∗∗ 0.227∗∗∗ 0.271∗∗∗
(0.00541) (0.00625) (0.0228) (0.0270) (0.0396)
Device Volume 0.100∗∗∗ 0.0433 -0.0482
(0.00926) (0.0371) (0.0569)
Android Version 0.0850∗∗∗ 0.0358∗-0.0177
(0.00543) (0.0216) (0.0331)
Screen Resolution -0.0411∗∗∗ -0.0224 0.0343
(millions of pixels) (0.00496) (0.0200) (0.0293)
WiFi -0.106∗∗∗ -0.136∗∗∗ 0.0872∗∗
(0.00696) (0.0259) (0.0427)
N365847 358087 365847 340662 45724
Marginal Effects -0.0867∗∗∗ -0.0636∗∗∗ 0.00088∗∗∗ 0.00201∗∗∗ 0.00859∗∗∗
(0.0011) (0.0013) (0.00020) (0.00024) (0.00129)
Countries controlled: No Yes No Yes Yes
Languages controlled: No Yes No Yes Yes
Device brands controlled: No Yes No Yes Yes
Standard errors in parentheses
∗p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Table 7. Probit regressions without controlling for selection bias. In the last
column, we condition on Intermediate = 1.
40
References
Yakov Bart, Andrew T Stephen, and Miklos Sarvary. Which products are best suited
to mobile advertising? a field study of mobile display advertising effects on consumer
attitudes and intentions. Journal of Marketing Research, 51(3):270–285, 2014.
Norris I Bruce, BPS Murthi, and Ram C Rao. A dynamic model for digital advertising:
The effects of creative format, message content, and targeting on engagement. Journal
of Marketing Research, 54(2):202–218, April 2017.
Bobby J Calder and Brian Sternthal. Television commercial wearout: An information
processing view. Journal of Marketing Research, pages 173–186, 1980.
Timothy G Conley, Christian B Hansen, and Peter E Rossi. Plausibly exogenous. Review
of Economics and Statistics, 94(1):260–272, 2012.
Peter J Danaher and Michael S Smith. Modeling multivariate distributions using copulas:
applications in marketing. Marketing Science, 30(1):4–21, 2011a.
Peter J Danaher and Michael S Smith. Rejoinderestimation issues for copulas applied to
marketing data. Marketing Science, 30(1):25–28, 2011b.
William N Evans and Robert M Schwab. Finishing high school and starting college:
Do catholic schools make a difference? The Quarterly Journal of Economics, 110(4):
941–974, 1995.
Edward I George and Shane T Jensen. Commentarya latent variable perspective of copula
modeling. Marketing Science, 30(1):22–24, 2011.
41
Anindya Ghose and Sang Pil Han. Estimating demand for mobile applications in the new
economy. Management Science, 60(6):1470–1488, 2014.
Anindya Ghose and Sha Yang. An empirical analysis of search engine advertising: Spon-
sored search in electronic markets. Management Science, 55(10):1605–1622, 2009.
Avi Goldfarb and Catherine Tucker. Online display advertising: Targeting and obtrusive-
ness. Marketing Science, 30(3):389–404, 2011a.
Avi Goldfarb and Catherine E Tucker. Privacy regulation and online advertising. Man-
agement science, 57(1):57–71, 2011b.
Heikki Haario, Eero Saksman, and Johanna Tamminen. An adaptive metropolis algo-
rithm. Bernoulli, pages 223–242, 2001.
Yu Hu, Jiwoong Shin, and Zhulei Tang. Incentive problems in performance-based online
advertising pricing: cost per click vs. cost per action. Management Science, 62(7):
2022–2038, 2015.
V Kumar, Xi Alan Zhang, and Anita Luo. Modeling customer opt-in and opt-out in a
permission-based marketing context. American Marketing Association, 2014.
Scott B MacKenzie and Richard J Lutz. An empirical examination of the structural
antecedents of attitude toward the ad in an advertising pretesting context. The Journal
of Marketing, pages 48–65, 1989.
Scott B MacKenzie, Richard J Lutz, and George E Belch. The role of attitude toward the
ad as a mediator of advertising effectiveness: A test of competing explanations. Journal
of marketing research, pages 130–143, 1986.
Puneet Manchanda, Jean-Pierre Dub´e, Khim Yong Goh, and Pradeep K Chintagunta. The
effect of banner advertising on internet purchasing. Journal of Marketing Research, 43
42
(1):98–108, 2006.
Andrew A. Mitchell and Jerry C. Olson. Are product attribute beliefs the only mediator
of advertising effects on brand attitude? Journal of Marketing Research, 18(3):318–332,
1981. ISSN 00222437. URL http://www.jstor.org/stable/3150973.
Douglas Rivers and Quang H Vuong. Limited information estimators and exogeneity tests
for simultaneous probit models. Journal of econometrics, 39(3):347–366, 1988.
Gareth O Roberts and Jeffrey S Rosenthal. Examples of adaptive MCMC. Journal of
Computational and Graphical Statistics, 18(2):349–367, 2009.
Gareth O Roberts and Richard L Tweedie. Exponential convergence of langevin distri-
butions and their discrete approximations. Bernoulli, pages 341–363, 1996.
Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in
observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
Oliver J Rutz and Randolph E Bucklin. From generic to branded: A model of spillover
in paid search advertising. Journal of Marketing Research, 48(1):87–102, 2011.
Richard J Smith and Richard W Blundell. An exogeneity test for a simultaneous equa-
tion tobit model with an application to labor supply. Econometrica: Journal of the
Econometric Society, pages 679–685, 1986.
Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press,
2002.
Song Yao and Carl F Mela. A dynamic model of sponsored search advertising. Marketing
Science, 30(3):447–468, 2011.
Yi Zhu and Kenneth C Wilbur. Hybrid advertising auctions. Marketing Science, 30(2):
249–273, 2011.
43