ArticlePDF Available

An Effective Budget Management Framework for Real-Time Bidding in Online Advertising

Authors:
Article

An Effective Budget Management Framework for Real-Time Bidding in Online Advertising

Abstract and Figures

Real-time bidding (RTB) has achieved great success and significantly improved the efficien-cy and transparency of online advertising. It allows advertisers to purchase ad impressions via auctions. Advertisers who adopt RTB always seek an optimal strategy of budget spending to reach as a wider range of audiences with a more sustainable impact as possible. Traditional bidding strategies, such as fixed bid and performance-based bid, easily lead to being either too aggressive (budgets wiped out too fast) or con-servative (budget surplus exists at the end with low clicks), due to lack of optimal budget management. In this paper, we study the optimization of budget efficiency under the smooth delivery constraint for display advertising. We model the problem as a multi-constrained budget allocation optimization and use a heuris-tic algorithm to solve an approximate optimal budget allocation. The key to the solution is to determine the bidding function for each time slot. Here, we propose a piecewise bidding strategy to filter out the low-quality impressions, where each time slot has its own predicted click-through rate (pCTR) threshold -- only when the pCTR of an impression is not lower than the threshold, will the campaign participate in the bid-ding. However, determining the pCTR threshold is challenging due to market uncertainty, we tackle this problem by modeling the distributions of pCTRs and market prices in each time slot. On this basis, we de-rive an optimal bidding function for each time slot to make the bid price more adaptable to its available budget. We conduct our experiments on a public real-world dataset and the results show that our proposed method performs the best in terms of various standard metrics (e.g., the number of clicks, cost-per-click) under a given budget comparing to the state-of-the-art baselines.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME XX, 2020 1
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2020.Doi Number
An Effective Budget Management Framework
for Real-time Bidding in Online Advertising
Mengjuan Liu1, Wei Yue1, Lizhou Qiu1, and Jiaxing Li1
1Department of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054 China
Corresponding author: Mengjuan Liu (e-mail: mjliu@ uestc.edu.cn).
This work was supported in part by the National Natural Science Foundation of China under Grant 61202445 and Grant 61472064.
ABSTRACT Real-time bidding (RTB) has achieved great success and significantly improved the efficien-
cy and transparency of online advertising. It allows advertisers to purchase ad impressions via auctions.
Advertisers who adopt RTB always seek an optimal strategy of budget spending to reach as a wider range
of audiences with a more sustainable impact as possible. Traditional bidding strategies, such as fixed bid
and performance-based bid, easily lead to being either too aggressive (budgets wiped out too fast) or con-
servative (budget surplus exists at the end with low clicks), due to lack of optimal budget management. In
this paper, we study the optimization of budget efficiency under the smooth delivery constraint for display
advertising. We model the problem as a multi-constrained budget allocation optimization and use a heuris-
tic algorithm to solve an approximate optimal budget allocation. The key to the solution is to determine the
bidding function for each time slot. Here, we propose a piecewise bidding strategy to filter out the low-
quality impressions, where each time slot has its own predicted click-through rate (pCTR) threshold -- only
when the pCTR of an impression is not lower than the threshold, will the campaign participate in the bid-
ding. However, determining the pCTR threshold is challenging due to market uncertainty, we tackle this
problem by modeling the distributions of pCTRs and market prices in each time slot. On this basis, we de-
rive an optimal bidding function for each time slot to make the bid price more adaptable to its available
budget. We conduct our experiments on a public real-world dataset and the results show that our proposed
method performs the best in terms of various standard metrics (e.g., the number of clicks, cost-per-click)
under a given budget comparing to the state-of-the-art baselines.
INDEX TERMS Real-Time Bidding, Demand-Side Platform, Budget Management, Bid Optimization
I. INTRODUCTION
In recent years, online advertising has developed into a
multi-billion dollar industry [1]. As one of the most excit-
ing advances in online advertising, real-time bidding (RTB)
has received increasing attention, since it improves the effi-
ciency and transparency of the display advertising ecosys-
tem [2]. In the traditional ad delivery way, the advertisers
directly buy guaranteed ad display opportunities from pub-
lishers (such as websites or mobile apps) through private
contracts, where publishers (sellers) usually have partial
information about the market demand of their ad impres-
sions from historical transactions and advertisers (buyers)
do not know the buying price of each other. As a result, the
traditional way has some faultiness in terms of market effi-
ciency and transparency. Different from the direct buys,
RTB enables publishers to sell the individual ad impression
via hosting a real-time auction, which is generally regarded
as a fair and transparent way for advertisers and publishers
to agree with a price quickly, whilst achieving the best pos-
sible sales outcome. Also, RTB facilitates advertisers to
evaluate each ad impression and bid for it, thus more effi-
cient budget utilization can be obtained by spending on the
impressions with more positive user responses.
The typical business process of ad delivery in RTB [3] is
illustrated in Fig. 1. When a user loads to browse a page of
a publisher, the script for the ad slot embedded on the page
will initiate a bid request for the impression to the supply-
side platform (SSP). After receiving the request, SSP initi-
ates an ad bid request to the ad exchange (ADX) with the
user’s cookie and the context information. Then, ADX pub-
lishes the bid request to the connected demand-side plat-
forms (DSPs). Each DSP obtains user information from the
data management platform (DMP) and starts an auction
among ad campaigns that match the impression according
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 2
to their targeting rules. The winner within each DSP enters
the second-round auction in the ADX. The final winner will
be selected at ADX according to the generalized second
pricing (GSP) mechanism [4], that is, the actual cost of this
impression is the second-highest bidding price that ADX
received, also called as the market price. The winning no-
tice is then sent to the advertiser and the winner’s ad will be
fed back to the SSP, and displayed to the user on the page
of the publisher. In practice, the entire process needs to be
completed within 100 milliseconds [14]. Later, the DSP that
the winner belongs to will track the user’s behavior (e.g.,
click, conversion), and optimize the bidding strategy based
on the user response. For each day, such a request-bidding-
feedback loop occurs billions of times for an ordinary RTB
platform [25].
SSP RTB
ADX
DMP
USER
1. ad impression
9. show ad
10. feedback tracking
2. ad request 3. bid request
4. user information
5. bid response
7. win notification
8. ad
6. auction
DSP
auction
ad campaign
ad campaign
advertiser
FIGURE 1. The general process of an ad delivery in RTB.
In RTB, DSP plays a critical role as the agent for adver-
tisers and helps them manage the budget of every ad cam-
paign. It also assists advertisers in making intelligent bid-
ding decisions for every ad impression under specific cam-
paign targets. Ideally, by optimizing the bidding strategy,
DSP can support each advertiser to achieve the optimal
utilization of budget, such as maximizing ad clicks or mini-
mizing cost-per-click (CPC) given the fixed budget [5, 6].
Unfortunately, due to the highly dynamic and unpredictable
nature of the RTB market, ad impressions and market com-
petitions in different ad delivery periods may differ greatly.
Therefore, the optimal bidding strategy learned on the his-
torical data does not guarantee the optimal use of the total
budget in the new delivery period, mainly because of the
common issue existing in most bidding strategies, ignoring
the budget management or smoothness of spending. That is,
most of the bidding strategies may be either too aggressive,
which results in that advertisers’ budgets are wiped out too
fast, and consequently missing the opportunities of captur-
ing high-quality impressions in subsequent time slots, or
too conservative, which ends up with the fact that budget is
not fully used and remains after the entire ad delivery peri-
od, and potentially losing a number of impressions that may
lead to clicks. Therefore, advertisers in RTB prefer to spend
their budgets smoothly in the entire ad delivery period to
reach the optimal utilization of the budget [7].
To achieve such a goal, several budget allocation algo-
rithms are proposed. In general, they divide an ad delivery
period (typically, a day) into a sequence of time slots and
allocate some amount of budget for each time slot. During
the ad delivery period, the spending in each time slot cannot
exceed its allocated budget. However, most budget alloca-
tion algorithms are not optimal because they do not consid-
er the quality of ad impressions (usually measured by the
predicted click-through rates) in each time slot, as well as
ignoring the variance of market prices in different time slots
[8]. Typically, both time-based and traffic-based allocations
have such shortcomings. In recent studies, the reinforce-
ment learning (RL) framework is used to learn the optimal
bidding strategy with a limited budget [9]. Different with
the bidding strategies based on static optimization [5, 10],
RL-based schemes decide the bid price for an impression,
depending on not only the impressions value to the cam-
paign but also the impacts of this bid on the future profits,
so that the budget can be dynamically allocated across all
the available impressions in the whole delivery period. Un-
fortunately, the performance of the RL-based bidding strat-
egy is still unsatisfactory, although it overcomes the dynam-
ics of the RTB environment in a sense. Therefore, in this
paper we still derive the optimal bidding functions based on
static optimization and control the spending rate through
the allocated budget for each time slot.
On the other hand, we observe that most of the impres-
sions with low pCTRs also have very low market prices by
analyzing the iPinYou dataset, a public one used by many
RTB papers [5, 9]. The situation leads to much money be-
ing wasted on those impressions with low pCTRs even if
the advertisers bid with low prices. To avoid this situation,
we set a pCTR threshold for each time slot to filter out
those invalid impressions and give up bidding for them.
However, determining the pCTR threshold for each time
slot is challenging due to market uncertainty.
In this paper, we propose an effective budget manage-
ment framework, for DSP to optimize the bidding strategy
while satisfying the smooth delivery constraint for each
campaign. Our framework consists of two interdependent
components, i.e., a budget allocation algorithm and a bid-
ding strategy. The budget allocation algorithm specifies the
allocated budget for each time slot and dynamically adjusts
them throughout the entire ad delivery period. The bidding
strategy is a piecewise function associated with a specific
time slot, for helping the ad campaign in determining the
bid price on each ad impression. To our knowledge, this is
the first study that improves budget utilization by combin-
ing the bidding strategy into the budget allocation algorithm.
The contributions of this work can be summarized as:
We model the budget use optimization with the smooth
delivery constraint as a multi-constraints optimization prob-
lem and propose a heuristic algorithm to seek the approxi-
mate optimal budget allocation for each time slot.
To avoid wasting the budget on the impressions with
low pCTRs, a piecewise bidding strategy is proposed. That
is, each time slot has its own pCTR threshold -- only when
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 3
the pCTR of an impression is not lower than the threshold,
will the campaign participate in the bidding. Specifically,
we use a simple method to derive the pCTR threshold for
each time slot, by modeling two distributions of number
over pCTR and market price over pCTR for each time slot.
Following the idea of piecewise bidding, we derive an
optimal bidding function for each time slot mathematically,
to help the ad campaign win the high-quality impressions
with reasonable prices. It depends on two factors: the pCTR
of an impression and the probability function of winning a
given impression with a bid price; meanwhile, a spending
rate related factor is introduced to adjust the bid price to
make it more adaptable for the available budget.
The rest of this paper is organized as follows. The related
work is introduced in Section II. In Section III we describe
the budget allocation optimization problem with multi-
constraints. Section IV details the proposed budget man-
agement framework. The experimental results are presented
and analyzed in Section V. Finally, we conclude our work
and discuss the future work in Section VI.
II. RELATED WORK
Most of the existing bidding strategies independently de-
cide the bid price of each impression based on its value to
the campaign (usually, the value is measured by the pCTR
or the predicted conversion rate) and do not take into ac-
count the constraint of delivering the ads smoothly, e.g.
linear bidding [11] and non-linear bidding [5], where the
bidding decision is generally formulated as a static optimi-
zation problem and the optimal bidding function is derived
based on the historical data. Therefore, this kind of bidding
strategies easily leads to spending out the budget prema-
turely or having a big budget surplus in a new ad delivery
period because the RTB market is constantly dynamic. To
overcome this problem, some budget allocation algorithms
are proposed to work together with the existing bidding
strategy. A simple, yet widely used, budget allocation
scheme that meets the smooth delivery constraint is uni-
form allocation, where the budget is uniformly split across
a delivery period. However, this time-based scheme cannot
effectively differentiate the impacts of time on ad traffic
and impression quality, and might force to waste money on
low-quality impressions.
Therefore, several traffic-based budget allocation algo-
rithms are accordingly proposed. For example, in [8], each
ad delivery period (usually a day) is divided into a sequence
of time slot schedules, and the budget allocated for each
time slot is proportional to the predicted number of impres-
sions in that time slot. In [7], Lee et al. propose a more ef-
fective allocation strategy, where the budget of each time
slot is allocated based on the ad traffic and can be adjusted
dynamically according to the actual cost. Besides, they de-
fine a term, pacing rate, representing the portion of impres-
sions that the campaign would like to bid on in a time slot,
thus limiting the number of bids. The experimental evalua-
tion shows that their algorithm provides consistent im-
provements in terms of CPC without under-/over- pacing.
In [12], the author proves mathematically that the budget
allocation proportional to the ad traffic is better than the
uniform pacing, by assuming that the impressions have the
same quality. However, these traffic-based budget alloca-
tion algorithms may not work well since they only consider
the number of impressions in each time slot, ignoring the
quality of impressions as well as the market competition.
Different from the above schemes, Xu et al. propose a
smart pacing control based on pCTR [13], which first di-
vides the impressions into several groups according to their
pCTRs, and then sets different pacing rates for different
groups. Typically, the impressions within the same group
have the same pacing rate, and the group with a higher
pCTR has a higher pacing rate. The benefit of this scheme
is that it takes into account the quality of an impression
when deciding whether to participate in bidding for it. But
it does not support budget management, so the budget effi-
ciency has not been guaranteed. Recently, the authors in [14]
analyze that there are three main challenges when the ad-
vertisers optimizing their bidding strategies in RTB, namely
(i) estimating pCTR of the ad impression, (ii) forecasting
the market value of the given ad impression, and (iii) decid-
ing the optimal bid for the given auction based on the first
two. Furthermore, they point out the three challenges are
strongly correlated and dealing with any individual problem
independently may not be globally optimal. So they pro-
pose a comprehensive bid framework, which consists of
three optimizers dealing with each challenge above, and as
a whole, jointly optimizes these three parts. Unfortunately,
the joint optimization scheme still regards the bidding deci-
sion as a static optimization problem, without supporting
budget management.
As cutting-edge technology, reinforcement learning (RL)
is introduced into the bidding decision in RTB to overcome
the dynamic changes of the auction environment. Typically,
Cai et al. [9] propose a model-based RL framework (called
RLB) to sequentially generate the optimal bid price for each
impression, where the bidding problem is taken as a mar-
kov decision process (MDP) and a dynamic programming
algorithm is used to resolve the MDP. Specifically, at each
time step (triggered by a bid request arriving), the bidding
agent first observes a state (about the current RTB envi-
ronment) and selects a bid action from the action space un-
der the state, where the action is the bid price for the auc-
tioned impression. However, such a model-based RL meth-
od requires the explicit state transition probability matrix,
which is difficult to represent due to the huge computation-
al cost in the real-world. As an improvement, the authors in
[15, 16] seek to solve the MDP by using a model-free RL
algorithm, where the budget-constrained bidding problem is
formulated as a bidding parameter control problem based
on the linear bidding function and a value-based model-free
RL algorithm is leveraged to solve it. Unfortunately, both
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 4
model-based and model-free bidding strategies, the perfor-
mance is unsatisfactory. As a comparison, we evaluate RLB
in our experiments and the results illustrate that its bidding
performance is worse than that of budget allocation algo-
rithms. Additionally, the value-based model-free RL algo-
rithms, such as Deep Q Network (DQN) [16], may have
convergence issues in practice and all have shown unable to
converge to any policy for both simple MPD and simple
function approximator [17].
Our budget management framework draws ideas from [7,
8], but with several improvements. Firstly, we model the
budget use optimization with the smooth delivery constraint
as a multi-constraint budget allocation optimization prob-
lem on time-slot level and then propose a heuristic algo-
rithm to find the approximate optimal budget allocation for
each time slot. Secondly, we propose a piecewise bidding
function with the pCTR threshold, which can not only se-
lect out high-quality impressions but also combine the allo-
cated budget with bidding decisions. Lastly, we derive an
optimal bidding function for each time slot to help the ad
campaign win high-quality impressions with reasonable
prices.
III. PROBLEM FORMULATION
The goal of this paper is to maximize the total revenue for
each ad campaign while satisfying the smooth delivery con-
straint under a given budget. Therefore, we model the opti-
mal goal as a budget allocation optimization problem with
multiple constraints, as shown in (1). Here, the total reve-
nue is defined as the total number of clicks that each cam-
paign has achieved, and
B
represents the total budget of an
ad campaign for a delivery period. Specifically, an ad de-
livery period (typically, a day) is divided into T time slots,
and
 
* * *
12
, , , T
b b b
is the optimal budget allocation for T
time slots, which means the ad campaign can get the most
clicks under this allocation. Here,
()clicks t
and
()cost t
are
the number of clicks and the actual cost in time slot t;
*
b
is
the average value of the optimal budget allocation and
represents the standard deviation of the optimal budget al-
location, which should be less than the threshold of smooth-
ing constraint
.
(1)
* * * * 2
1 1 1
1
s.t. ( ) , , ( ) , ( )
T T T
t t t
t t t
cost t b b B cost t B b b
T
 
 
 
Furthermore, based on the statistical modeling, we de-
note the number of clicks for each time slot as (2), where
is the pCTR of an impression to a specific ad campaign,
and
( , )ft
is the number of impressions in time slot t that
are believed to have pCTR of
, and
( ( , ), )win bid t t
is
the winning rate for the impression with bid price
( , )bid t
.
1
0
( ) ( , ) ( ( , ), )clicks t f t win bid t t d
 
 
(2)
Therefore, (1) can be rewritten as (3):
* * *
12
1
0
{ , , , } 1
maxmize ( , ) ( ( , ), )
T
T
b b b tf t win bid t t d
 
 
(3)
* * * * 2
1 1 1
1
s.t. ( ) , , ( ) , ( )
T T T
t t t
t t t
cost t b b B cost t B b b
T
 
 
 
Finally, we propose a heuristic algorithm to solve (3) and
get an approximate optimal budget allocation for each time
slot. Before that, three functions should be determined for
each time slot, i.e.
( , )ft
,
( ( , ), )win bid t t
, and
( , )bid t
.
In Section IV, we will detail how to solve the optimal
budget allocation through two key components in our
budget management framework.
IV. BUDGET MANAGEMENT FRAMEWORK
In this section, we first discuss the process of budget alloca-
tion and adjustment, and then explain how the bidding
strategy works under the allocated budget and the pCTR
threshold. Next, we describe the bidding strategy, the budg-
et allocation algorithm, and the CTR estimation in detail.
Initiate the budget for
each time slot
Update the budget
offline
Before a new time slot starts
Determine the
pCTR threshold
online
Bid Request
Is budget spent out?
Estimate CTR Stop bidding
pCTR >= pCTR threshold
Bidding
Bid price
Yes
No
No
Yes
FIGURE 2. Budget Allocation. FIGURE 3. Bidding Strategy.
Our budget management framework consists of two in-
terdependent components: a piecewise bidding strategy and
a budget allocation algorithm. Fig. 2 shows the process of
budget allocation and dynamical adjustment. We first di-
vide an ad delivery period (typically, a day) into a sequence
of time slots (usually a span of 2 hours for each time slot),
and then allocate the budget for each time slot by using the
heuristic algorithm to solve (3). During the delivery period,
the budget allocation algorithm can dynamically adjust the
budgets for the subsequent time slots based on the actual
cost of the finished time slots. In each time slot, we use a
piecewise bidding strategy to determine the bid price for an
impression. Distinguished with others, in our bidding strat-
egy, each time slot has its own predicted click-through rate
(pCTR) threshold; only when the pCTR of an impression is
not lower than the threshold, will the campaign participate
in the bidding. When an impression arrives at a DSP, the
DSP calculates the bid prices for all eligible campaigns.
Fig. 3 shows the process of calculating the bid price. First,
the bidding strategy determines whether the ad campaign
has an available budget in the current time slot. If the budg-
et is enough, it continues to estimate the pCTR of the im-
pression and compare the pCTR with the pCTR threshold.
The bid price is calculated only if the pCTR is not lower
than the threshold. At last, the DSP chooses the campaign
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 5
with the highest bid price to participate in the auction and
return the highest bid price to the ad exchange (ADX). The
ADX determines the winner and deducts the cost from the
winning campaign according to the GSP (generalized sec-
ond-price) mechanism.
A.
PIECEWISE BIDDING STRATEGY
To solve (3), it is necessary to determine three functions for
each time slot, i.e. the pCTR distribution function, the win-
ning function, and the bidding function. In our budget man-
agement framework, these functions are formulated by the
bidding strategy component.
2.44141E-4 0.00195 0.01563 0.125 1
0
1000
2000
3000
4000
Time slot=8
Time slot=12
The number of impressions
The predicted CTR
0.0005
FIGURE 4. The number of impressions over the pCTR (advertiser ID =
1458, 2013/06/11).
Firstly, we use historical data to fit the pCTR distribution
function. To do so, we construct the empirical scatter charts
of pCTR distribution
( , )ft
for each time slot based on
the data in the training set, and all charts exhibit a similar
pattern to the F-distribution (only two example time slots
are plotted in Fig. 4 for the purpose of visualization). How-
ever, it is difficult to fit the F-distribution owing to its com-
plexity. We find that only very few points are located on
the left of the peak, so we ignore these points and fit
( , )ft
only based on the points on the right of the peak
(including the peak), which follows the power-law distribu-
tion. Therefore, we define
( , )ft
as (4) and the parameters
(
c
and
) can be learned by fitting the historical data.
 
,f t c


(4)
Secondly, we estimate the winning function for each time
slot by fitting the historical data. Fig. 5 shows the relation-
ships between bid prices and winning rates in different time
slots, where the differences of winning rates reflect the
changes of market competitions over time. For example,
when the bid price is 100 (10-3 Chinese FEN), the winning
rates are 80.72%, 82.30%, 85.11%, and 88.67% respective-
ly in time slot 9, 8, 1, and 12, which indicate that market
competition in time slot 9 is fiercer than that in time slot 12.
Simplified, we define the winning function as (5), related to
the bid price and time slot t. Here
1
k
and
2
k
for each time
slot can be learned from the historical data.
12
( , )
( ( , ), ) ( , )
bid t
win bid t t k bid t k

(5)
050 100 150 200 250 300
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
The win rate
The bid price
Time slot=8
Time slot=12
Time slot=9
Time slot=1
FIGURE 5. The winning rate over the bid price (advertiser ID = 1458,
2013/06/11).
Finally, we formulate the bidding function for each time
slot. By analyzing the historical logs, we find that most of
the impressions with low pCTRs have also low market
prices. That is, the advertiser has bought many low-quality
impressions even bidding at very low prices. In order to
avoid wasting the budget on these low-quality impressions,
a piecewise bidding strategy is designed as (6), where each
time slot has its own pCTR threshold
_ ( )pctr thresh t
, only
when the pCTR of an impression is not lower than the
threshold, will the ad campaign participate in the auction.
However, there are two challenges: how to determine the
pCTR threshold for each time slot and what price should
offer for each high-quality impression. In the next two sub-
sections, we will discuss them in detail.
( , ), _ ( )
( , ) 0, _ ( )
bid t if pctr thresh t
bid t if pctr thresh t

(6)
1) PREDICTED CTR THRESHOLD
In our budget management framework, the pCTR threshold
of each time slot is crucial since it combines the piecewise
bid strategy with the budget allocation algorithm. Potentially,
the budget spending rate in a time slot is controlled by its
pCTR threshold. In theory, the pCTR threshold relies on not
only the allocated budget but also the pCTR distribution of
impressions and their market prices, formulated as (7).
11
*( ) ( , ) ( , ) ( , ) ( , )
t
b req t p t g t d f t g t d

   
 

(7)
Here,
( , )ft
denotes the number of impressions in time slot
t
that are believed to have pCTR of
, which can be repre-
sented by the number of impressions
()req t
and the proba-
bility density function of pCTR
( , )pt
;
( , )gt
represents
the market price of the impression with the pCTR of
. The
meaning of (7) is that the sum of the market prices of impres-
sions whose pCTRs are no less than the threshold
equals
to
*
t
b
in time slot
t
, assuming all impressions selected by the
threshold should be won.
To calculate the pCTR threshold, we need to formulate
two functions
( , )ft
and
( , )gt
. Since
( , )ft
has been
defined by (4), we only need to formulate
( , )gt
. Firstly,
we analyze the relationship between the market price and
the pCTR of an impression, shown as Fig. 6, where the
range of pCTRs [0, 1] is equally divided into 10000 inter-
vals and each point is the average market price of the im-
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 6
pressions whose pCTRs belong to an interval. The scatter
plots for two example time slots show no significant corre-
lation between the market price of an impression and the
pCTR. This is because the market price of an impression is
mostly determined by the pCTRs to all ad campaigns (bid-
ders) in GSP auction, rather than the pCTR to one specific
campaign. Thus it is difficult to accurately fit
( , )gt
based
on the historical data. But, we also observe the fact that the
average market price of impressions with high pCTRs is
mostly higher than that of impressions with low pCTRs in
Fig. 7, where the impressions in two example time slots are
arranged in ascending order of pCTR and each point repre-
sents the average market price of every 5000 adjacent im-
pressions. Furthermore, we divide the sorted impressions in
time slot 8 into two groups and find that the average market
price of the first group with low pCTRs is 64.47, signifi-
cantly lower than that of the second one with high pCTRs,
i.e. 77.13. Thus, in this paper, we simply use the average
market price of high-quality impressions instead of
( , )gt
,
as shown in (8), where
()mprice i
is the market price of
impression
i
and
I
represents a set of impressions with
pCTRs no less than
(hyperparameter),
N
is the number
of impressions in
I
.
1()
iI
mprice mprice i
N
(8)
Then (7) can be rewritten as (9):
1
*
t
b c mprice d

  
(9)
Solving (9) gives the pCTR threshold of time slot
t
:
*
11
_ ( ) 1 t
pctr thresh t b
c mprice
 
(10)
2) OPTIMAL BIDDING FUNCTION
A bidding function refers to the logic of deciding a bid price
for a given impression. In most of the bidding strategies, each
campaign learns an optimal bidding function and bids for all
available impressions in the whole delivery period based on
this bidding function. Obviously, it is not the best way for
ignoring the impacts of market competitions in different time
slots on the winning rates. Therefore, in this paper, we derive
the optimal bidding function for each time slot to make the
bid prices more suitable for the dynamic RTB market on
time-slot level.
The goal of our optimal bidding function is to maximize
the number of the clicks in a time slot, as defined in (11),
where
is the pCTR of an impression. Here, the lower
bound of the integral range is
_ ( )pctr thresh t
, because that
in the piecewise bidding strategy each campaign only bids
for the impressions whose pCTRs are not below the thresh-
old. Furthermore, we take (4) and (5) into (11) and find that
( , )bid t
is the unique variable of the objective function.
Thus, (11) is a functional extremum problem [17]. We de-
fine the Lagrange function of (11) in (12), where
is the
Lagrangian multiplier.
9.76563E-4 0.00391 0.01563 0.0625 0.25 1
0
50
100
150
200
250
300
Time slot=8
Time slot=12
The average market price
The predicted CTR
FIGURE 6. The average of the market prices over the pCTR (advertiser
ID = 1458, 2013/06/11).
0 1 2 3 4 5 6 7 8 9 10 11 12
40
50
60
70
80
90
The average market price
Segment number (sorted by pCTR)
Time slot=8
Time slot=12
FIGURE 7. The average of the market prices over the segment number
(each segment contains 5000 impressions where all impressions are
sorted in ascending order according to pCTR, advertiser ID = 1458,
2013/06/11).
1
_ ( )
maxmize ( , ) ( ( , ), )
pctr thresh t f t win bid t t d
 

(11)
1*
_ ( )
. . ( , ) ( ( , ), ) ( , ) t
pctr thresh t
s t f t win bid t t bid t d b
 
 
( , ( , ), ( , )) ( , ) ( ( , ))
( , ) ( ( , ), ) ( , )
L bid t bid t f t win bid t
f t win bid t t bid t
 
 
 
 
(12)
Here, the Euler-Lagrange equation is given as follows.
'
( ) 0
L d L
bid d bid



We take (12) into Euler-Lagrange equation and get the
following derivations:
'0
L
bid
,
'
( ) 0
dL
d bid
,
since
( , ( , ), ( , ))L bid t bid t
 
does not contain
( , )bid t
.
Therefore, we can get (13).
( , ( , ), ( , )) 0
( , )
( ( , ), )
( , ) ( , )
( , ) ( ( , ), )
( ( , ), )
( , ) ( , ) ( , )
( ( , ), )
( , )
( (
L L bid t bid t
bid bid t
win bid t t
ft bid t
f t win bid t t
win bid t t
f t bid t bid t
win bid t t
bid t
win bid
 

 
 



 
 
 

 ( ( , ))
, ), ) ( , ) ( , )
win bid t
t t bid t bid t
 
 
(13)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 7
Taking a derivative with respect to
( , )bid t
gives:
22
12
( ( , ))
( , ) ( ( , ) )
k
win bid t
bid t k bid t k

 
(14)
Taking (5) and (14) back into (13) gives:
2
2 1 2 2
2
11
()
( , ) ()
k k k k
bid t kk

 

(15)
Furthermore, considering the dynamic nature of the mar-
ket, we also introduce a variable factor that is related to the
spending rate of the current time slot. Therefore, (15) is
reformulated as follows:
( , ) ( ( )/ ( )) ( , )bid i t sp t sp i bid t

(16)
where
()sp t
is the ratio of the available budget of time slot t
to the duration of a time slot, representing the ideal spend-
ing rate of time slot
t
;
()sp i
is the ratio of the actual cost to
the elapsed time of the current time slot, representing the
instantaneous spending rate when the impression arrives.
Thus, our bid price can be adjusted dynamically according
to the real market competition to better adapt to the availa-
ble budget in the current time slot. In Section V, our exper-
imental results show that our bidding function can perform
the best under the given budget.
B.
BUDGET ALLOCATION ALGORITHM
The original idea of budget allocation is to take the daily
budget as input and calculate the scheduled spending for
each ad campaign. Based on the scheduled spending, each
campaign can achieve the goal of smooth ad delivery
throughout the lifetime. We use a heuristic algorithm to
solve the multi-constrained optimal budget allocation prob-
lem shown in (3). Based on historical data, we randomly set
up 5,000 budget allocation schemes that satisfying the
smooth constraint and choose the budget allocation scheme
with the largest number of clicks in the training set as the
optimal allocation,
* * * *
12
{ , , }
T
B b b b
.
In practice, for an ad campaign, there is always a differ-
ence between the actual cost and its allocated budget of a
time slot; therefore, we take a dynamic sequential approach
to adjust the budget for the next time slot based on the actu-
al spending of the finished time slots, similarly to [7, 13].
The budget for the next time slot can be updated by a sim-
ple function defined in (17), where the updated budget of
time slot
t
is
t
b
and
()spend i
represents the actual cost of
the campaign in time slot
i
.
1
**
1
( ( ( ))) / T
t
t t i
iit
b b B spend i b

(17)
C.
CTR ESTIMATION
As mentioned above, a good budget management frame-
work relies heavily on the performance of the CTR Estima-
tor. There are many studies on the CTR estimation; howev-
er, designing a good CTR estimator remains a challenge,
because the click event is extremely sparse in the real da-
taset [18]. In this paper, the focus is on the budget man-
agement, not the CTR estimation. Therefore, we empirical-
ly compare the performance of the five representative mod-
els and choose the best one as our CTR estimator. Among
them, three models are based on the shallow structure, i.e.
logistic regression (LR) [19], factorization machine (FM)
[20], and field-aware factorization machine (FFM) [21],
and two models are based on the deep neural network, i.e.
factorization machine supported neural network (FNN) [22]
and Product-based Neural Network (PNN) [23].
V. EXPERIMENTAL RESULTS AND DISCUSSIONS
In this section, we conduct experiments on a publicly avail-
able real-world dataset iPinYou
1
and evaluate the perfor-
mance of the proposed budget management framework by
comparing with several state-of-the-art baselines. Before
presenting the detailed experimental results, we first briefly
describe the dataset.
A.
IPINYOU DATASET
Our dataset is published by iPinYou, one of the leading
DSP companies in the online advertising industry. It in-
cludes logs of impressions, bids, clicks, and final conver-
sions. For each impression, the bid logs contain the infor-
mation of the user. So we can train a better CTR estimator
based on such rich data. In addition, the impression and
click logs provide the information of bid price, paying
(market) price, and user feedbacks (e.g. clicks), and wheth-
er the advertiser won the auction. More details of the da-
taset can be found in [24]. In our experiments, we use the
data of 7 days from 2013/06/06 to 2013/06/12. The aggre-
gated data for the first 6 days is employed as the training set,
and the data on the last day is used to construct the testing
set. Note that in our experiments we use the winning im-
pressions as the received impressions, ignoring the losing
impressions, because the losing impressions have no market
prices and user feedbacks. Therefore, the number of im-
pressions in our experiments is far less than the actual
number in the iPinYou dataset. Besides, since the budget
management of each campaign is performed independently
in our framework, we present the statistical and experi-
mental results of one advertising campaign (advertiser ID =
1458) due to space limitations. We have consistent and sim-
ilar results for other campaigns.
The descriptive statistics of this campaign are shown in
Table I. In the iPinYou dataset, the DSP bids for every im-
pression with a fixed bid price of 300, and all numbers re-
lated to price use the currency of RMB, and the unit is 10-3
Chinese FEN. The click-through rates in all 7 days are very
low, ranging from 0.0642% to 0.0903%, which will be like-
ly to significantly reduce the accuracy of the CTR estima-
tors. Furthermore, we observe that there is no significant
change in the number of win bids (which are taken as the
received impressions in our experiments) per day, which
means we can estimate the key functions (such as the pCTR
distribution function, the winning function and the bidding
1
iPinYou Dataset website: http://data.computational-advertising.org/
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 8
TABLE I
THE STATISTICS OF THE CAMPAIGN (ADVERTISER ID = 1458)
Date
2013/6/6
2013/6/7
2013/6/8
2013/6/9
2013/6/10
2013/6/11
2013/6/12
Cost
30,096,630
30,228,554
30,615,541
30,548,604
30,303,929
30,309,883
30,297,100
Reqs
3,250,536
3,733,421
1,390,738
1,851,641
1,551,919
1,485,384
1,437,857
Win bids
448,164
478,109
413,804
423726
434,240
437,520
447,493
Clicks
328
307
347
351
370
395
356
Win rate
13.7874%
12.8062%
29.7543%
22.8838%
27.9808%
29.4550%
31.1222%
CTR
0.0732%
0.0642%
0.0839%
0.0828%
0.0852%
0.0903%
0.0796%
CPM
67.1554
63.2252
73.9856
72.0952
69.7861
69.2766
67.7041
CPC
91758
98464
88229
87033
81903
76734
85104
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0
7000
14000
21000
28000
35000
42000
49000
56000
63000
70000
06/11(Win bids)
06/12(Win bids)
06/11(Costs)
06/12(Costs)
Time slots
Win bids
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Cost
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0
10
20
30
40
50
60 06/11(Clicks)
06/12(Clicks)
06/11(CTR)
06/12(CTR)
Time slots
Clicks
0.0004
0.0006
0.0008
0.0010
0.0012
0.0014
0.0016
0.0018
0.0020
0.0022
0.0024
CTR
0 1 2 3 4 5 6 7 8 9 10 11 12 13
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
6/11(CPC)
6/12(CPC)
6/11(CPM)
6/12(CPM)
Time slots
CPC
62
64
66
68
70
72
CPM
(a) (b) (c)
FIGURE 8. The distributions of key metrics (advertiser ID=1458, 2013/06/11 and 2013/06/12).
function) and metrics (such as the pCTR threshold) of each
time slot in a new delivery period based on the historical
data in the training set. Unfortunately, it is difficult to ana-
lyze the accurate evolving patterns in these statistics since
the dataset only has records of 7 days. Actually, some key
metrics such as clicks and CPC are quite different every
day. Therefore, when seeking the optimal budget allocation
through a heuristic algorithm, we fit the key functions and
metrics of each time slot every day in the training set based
on the data that really happened in that time slot. However,
for the new ad delivery period (e.g. the 7th day in the test-
ing set), the RTB market is unknown, in this paper we
simply learn the key functions and metrics of each time slot
on the 7th day just based on the data on the 6th day (the day
before the testing day). We do this on the assumption that
the closer the time, the more similar the market environ-
ment. In addition, we split a day into 12 time slots and allo-
cate a budget for each time slot to satisfy the smooth deliv-
ery constraint.
Fig. 8 shows the real statistics of each time slot on the
6th and 7th days. We can observe that the statistics distribu-
tions of each time slot on the two days are quite different,
except the metrics of win bids and cost. This is because that
the RTB market (including impressions, market competi-
tions and user feedbacks) is a constantly dynamic environ-
ment even for the same ad campaign. Specifically, Fig.8 (a)
exhibits the distributions of winning impressions and costs
on the two days, and the statistical results show that the
winning impressions and costs in the first 6 time slots are
significantly lower than those in the last 6 time slots. Espe-
cially from 2:00 a.m. to 8:00 a.m., the winning impressions
and costs are very small. Fig.8 (b) focuses on the statistics
of clicks and CTRs and both of them exhibit different dis-
tributions. For example, in time slot 10, the number of
clicks on the 6th day is higher than that on the 7th day;
while in time slot 8, the reverse is true. Also, the CPC and
the CPM of each time slot are displayed in Fig.8 (c), where
the CPMs (the average market prices) on the 6th day are
much higher than those on the 7th day in time slot 1, 2, 3, 6,
7, 8, 9 and 10, implying that in these time slots the market
competitions on the 6th day are more intense than those on
the 7th day. Therefore, the method that uses the historical
data on the 6th day to fit the key functions and metrics on
the 7th day has some defects, which can deteriorate the per-
formance of our budget management framework. However,
it is still regarded as a feasible method due to the dynamic
and unpredictable nature of the RTB market. And for com-
parison, we also evaluate the performance of the proposed
framework under the budget of 5,000,000 by learning the
key functions and metrics based on the historical data of all
6 days in the training set, where the number of clicks is 239
and the CPC is 19878 (10-3 Chinese FEN), worse than those
of the scheme that learns the key functions and metrics on
the 6th day.
B.
CTR ESTIMATION
Although the CTR estimator is not the focus of this paper, a
good one is critical to improving the effectiveness of budget
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 9
management framework. In this experiment, we evaluate
the performance of five CTR estimators -- LR, FM, FFM,
FFN, and PNN, by using the standard metrics: AUC value
and LogLoss. The experimental results are shown in Table
II. Obviously, the FFM model has a significant advantage
in predicting CTR among the five models in terms of AUC.
Therefore, we use the FFM model to predict CTR for every
impression in the subsequent experiments.
TABLE II
PERFORMANCE COMPARISON OF CTR ESTIMATORS
Model
LR
FM
FFM
FNN
PNN
AUC
0.8100
0.8114
0.8742
0.8117
0.8172
LogLoss(10-3)
4.4364
4.4103
5.4519
4.5024
4.5649
C.
COMPARISON OF BIDDING STRATEGIES
In this section, we compared our bidding strategy with oth-
er four bidding strategies: fixed bidding, linear bidding [11],
non-linear bidding [5], and RL-based bidding [9] in terms
of various performance metrics, such as the number of
clicks, CTR (click-through-rate), and CPC. In our dataset,
the paying prices fluctuate between 0 and 300. Accordingly,
we select a set of fixed bid prices from 5 to 300, increased
one by each time. The linear bidding function is defined as
( , ) ( ) ( )/ ( )bid i t cpv t pCTR i epCTR t
(18)
Where
()cpv t
and
()epCTR t
are the historical average mar-
ket price and the historical average pCTR of impressions in
time slot t, and
()pCTR i
is the predicted CTR of impression
i
. In [5], the nonlinear bidding function is defined as (19),
which only depends on the pCTR of the impression. Here,
both
c
and
are parameters and they do not change over
time slots. In RL-based bidding, the bid price of each im-
pression is determined by an optimal action-selection policy,
which is learned through a model-based RL framework.
Our bidding function is related to the specific time slot,
depending on the winning function of each time slot. Note
that we only compare the five bidding strategies, without
supporting any budget management. Therefore, in this ex-
periment, our bidding function for each time slot has not the
pCTR threshold. The daily budget for the campaign is
5,000,000 (10-3 Chinese FEN).
2
( , ) ( )
c
bid i t pCTR i c c
 
(19)
Firstly, we find that in all experiments with fixed bids the
best performance can be obtained when bidding price is 75.
Therefore, we only show the results of the experiment with
the fixed bid 75. The results are shown in Table III and the
distributions of key metrics for each time slot are shown in
Fig. 9. It tells us that RL-based bidding is superior to others,
which obtains the highest number of clicks and the lowest
CPC. This is because that the bid decision is modeled as a
sequentially dynamic interactive process in RL-based bid-
ding, where the bid price of an impression is determined
based on both the immediate and long-term future rewards.
That proves the reinforcement learning is indeed beneficial
to solve the optimization problem in the dynamic environ-
ment. Exclude RL-based bidding, our bidding strategy per-
forms best. The worst bidding strategy is the fixed bid with
75, followed by the linear and non-linear bids. In fact, there
is a large difference in terms of the number of clicks be-
tween the best bidding strategy (153) and the truth (356) on
the testing day, since the given budget is only 16.5% of the
total cost of all impressions on the testing day. Empirically,
except RL-based bidding, all strategies spent out their
budgets too early due to the absence of the budget manage-
ment. Desirably, RL-based bidding bids for all available
impressions in the whole delivery period, achieving the best
performance.
Furthermore, Table IV summarizes the reasons for losing
clicks in four bidding strategies (excluding RL-based bid-
ding). And the results reveal that most of the missing clicks
are due to the budget wiped out too early. Non-linear
stopped biding at time slot 6, the earliest ones; linear, fixed
(75) and our strategy all stopped bidding at time slot 7; as a
result, they lost all impressions that could bring clicks in the
subsequent time slots.
TABLE III
PERFORMANCE COMPARISON OF BIDDING STRATEGIES
Bidding strategy
Bids
Imps.
Clicks
CPC
CTR (%)
Fixed(75)
146,511
109,415
76
65788
0.0694
Linear
114,135
94,944
85
58823
0.0895
Non-Linear
130,805
102,245
95
52631
0.0929
RL-based
447,493
116,463
153
25331
0.1314
Ours
144,617
104,757
101
49504
0.0964
TABLE IV
THE NUMBER OF MISSING CLICKS FOR DIFFERENT REASONS
(R1: BUDGET WIPED OUT; R2: BID PRICE BELLOWED THE MARKET PRICE)
Reason
Fixed(75)
Linear
Non-Linear
Ours
R1
238
267
249
242
R2
42
4
12
13
Total
280
271
261
255
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0
10
20
30
40 Ours
Linear
Non-Linear
Fixed bid(75)
RL-based
The number of clicks
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12
0
20000
40000
60000
80000
100000
Ours
Linear
Non-Linear
Fixed bid(75)
RL-based
Cost-per-Click(10-3 Chinese FEN)
Time slots
FIGURE 9. The distributions of key metrics under different bidding
strategies without any budget management.
D.
COMPARISON OF BUDGET ALLOCATION ALGO-
RITHMS
In this subsection, we validate the performance by imposing
the budget allocation algorithm to the bidding strategy. Our
budget allocation algorithm consists of two components:
budget allocation and its dynamic adjustment. In this exper-
iment, we introduce some representative budget allocation
algorithms as baselines.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 10
No Budget Allocation (NBA): this scheme bids for
each bid request without budget allocation, also not limiting
the cost in each time slot.
RL-based Bidding (RLB): this scheme bids for each
bid request according to a bid price-selection policy learned
by a model-based RL framework, without budget allocation.
Uniform Allocation (UA): this scheme splits the daily
budget uniformly across the day, and does not support dy-
namic adjustment of the budget.
Uniform Allocation with Adjustment (UAA): this
scheme allocates budget for each time slot uniformly, while
supporting dynamic adjustment.
Traffic-based Allocation with Adjustment (TAA): in
this scheme, the allocated budget is proportional to the pre-
dicted number of impressions in each time slot, and can be
adjusted dynamically according to the actual cost.
Optimal Allocation with Adjustment (OAA): the
budget is allocated and adjusted for each time slot accord-
ing to our budget allocation algorithm.
1) PERFORMANCE COMPARISON
We first compare the performance of six algorithms under
the daily budget of 5,000,000, as shown in Table V. Then
we give a detailed description of the distributions of key
metrics throughout a day, which are illustrated in Fig. 10. It
is noted that our piecewise bidding strategy is adopted by
all schemes in this experiment, except NBA and RLB. In
NBA, the bidding function for each time slot bids for an
impression according to (15); and in RLB, the bid price is
determined by an optimal bid price-selection policy. There-
fore, neither of them has the pCTR threshold and budget
constraint for each time slot.
TABLE V
PERFORMANCE COMPARISON OF DIFFERENT BUDGET ALLOCATION ALGO-
RITHMS (BUDGET = 5,000,000)
Scheme
Bids
Imps.
Clicks
CPC
Smoothness
NBA
144,617
104,757
101
49504
567,116
RLB
447,493
116,463
153
25331
153,724
UA
66,207
60,702
224
17937
22,187
UAA
80,251
73,410
237
20340
69,003
TAA
76,540
70,633
241
19797
209,092
OAA
78,154
73,452
244
20184
131,702
Firstly, we observe that the performance of the four
schemes with budget management (UA, UAA, TAA, and
OAA) has indeed improved greatly, compared with NBA
and RLB. Specifically, the budget allocation and the piece-
wise bidding strategy with the pCTR threshold can indeed
spend money on impressions that have higher click proba-
bility, which play important roles in improving the bidding
performance. Secondly, the adaptive adjustments of budg-
ets for subsequent time slots are very small, as shown the
cost curves of UA and UAA, because we introduce a factor
that is related to the spending rate of the current time slot
when calculating the bid price and the budgets of the most
time slots are almost spent out. At last, we further observe
that the number of clicks of our budget allocation is slightly
0 1 2 3 4 5 6 7 8 9 10 11 12
0
10
20
30
40
50
NBA
RLB
UA
UAA
TAA
OAA
The number of clicks
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12 13
-10000
0
10000
20000
30000
40000
50000
60000
70000
NBA
RLB
UA
UAA
TAA
OAA
Cost-per-Click(10-3 Chinese FEN)
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
NBP
RLB
UA
UAA
TAA
OAA
Cost
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12
0
5000
10000
15000
20000
25000
30000
35000
40000
NBA
RLB
UA
UAA
TAA
OAA
The number of winning impressions
Time slots
FIGURE 10. Detailed performance comparison of six budget allocation
algorithms on the 7th day (budget = 5,000,000).
higher than those of UAA and TAA. This is because that
our optimal budget allocation is learned based on the histor-
ical data, which only includes logs for 6 days and is insuffi-
cient for predicting the distributions of some metrics (such
as clicks and the average market prices). Additionally, the
standard deviations of cost in each time slot under the six
allocation schemes are also shown in Table V, recorded as
smoothness. We find that the smoothness of our optimal
budget allocation is better than that of RLB, TAA, and
NBA, worse than that of UA and UAA. The results illus-
trate that the budgets can be spent smoothly in the whole
delivery period by imposing the budget allocation algorithm
on the bidding strategy; meanwhile, RLB also has the abil-
ity to control the smooth use of the budget.
TABLE VI
THE NUMBER OF MISSING CLICKS FOR DIFFERENT REASONS
(BUDGET = 5,000,000; R1: BUDGET WIPED OUT; R2: LOWER THAN THE
MARKET PRICE; R3: LOWER THAN THE THRESHOLD)
Reasons
NBA
UA
UAA
TAA
OAA
R1
242
0
0
0
0
R2
13
18
21
22
11
R3
0
114
98
93
101
Total
255
132
119
115
112
Furthermore, Table VI discusses the reasons for missing
the actual clicks on the testing day. Here, we ignore the
statistics of RLB because there is only one reason for losing
the actual clicks. That is, the bid prices of these impressions
are lower than their market prices. For other four schemes
with budget allocation, the primary reason is that the
pCTRs of these impressions are lower than their pCTR
thresholds and only a few clicks are lost because of their
bidding prices lower than their market prices. We further
analyze why most of the time slots have high pCTR thresh-
olds. According to (10), the pCTR threshold of each time
slot depends mainly on the allocated budget and the average
market price of high-quality impressions. Here, the higher
the average market price, the higher the pCTR threshold. In
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 11
this experiment, we use the average market price of high-
quality impressions in each time slot on the 6th day to esti-
mate that on the 7th day. However, the estimated average
values are significantly higher than their real values (as
shown in Fig. 8 (c)), so the derived pCTR thresholds in
most time slots are higher than the real pCTR thresholds.
Also, the high pCTR thresholds result in that those schemes
with budget management have not spent out their budgets,
even though the budget is only 16.5% of the total cost of all
impressions on the7th day.
2) PERFORMANCE UNDER DIFFERENT BUDGETS
In this experiment, we study the impact of daily budget on
the performance of budget management. So we set the daily
budget to 10,000,000, 33% of the total cost of all impres-
sions in testing set.
TABLE VII
PERFORMANCE COMPARISON AMONG DIFFERENT BUDGET ALLOCATION
ALGORITHMS (BUDGET = 10,000,000)
Scheme
Bids
Imps.
Clicks
CPC
CPM
Smoothness
NBA
286,836
208,782
202
49504
47.8968
722,277
RLB
447,493
189,082
210
36929
41.0152
307,671
UAA
168,076
151,465
286
33122
62.5421
221,238
TAA
164,893
148,286
285
32854
63.1459
423,119
OAA
171,738
155,267
291
33352
62.5098
338,152
0 1 2 3 4 5 6 7 8 9 10 11 12
-10
0
10
20
30
40
50 NBA
RLB
UAA
TAA
OAA
The number of clicks
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12
-10000
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000 NBA
RLB
UAA
TAA
OAA
Cost-per-Click(10-3 Chinese FEN)
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
NBA
RLB
UAA
TAA
OAA
Cost
Time slots
0 1 2 3 4 5 6 7 8 9 10 11 12
0
5000
10000
15000
20000
25000
30000
35000
40000 NBA
RLB
UAA
TAA
OAA
The number of winning impressions
Time slots
FIGURE 11. Detailed performance comparison of five budget allocation
algorithms on the 7th day (budget = 10,000,000).
The results are summarized in Table VII and the distribu-
tions of key metrics throughout a day are shown in Fig. 11.
We find that the numbers of clicks in all schemes increase
significantly with the allocated budgets increasing; espe-
cially for NBA, the number of clicks has risen up to 56.74%
of the actual clicks. Unfortunately, the CPC and the total
cost of OAA are higher than those of UAA and TAA, alt-
hough OAA still has the highest number of clicks. This is
because that the allocated budgets for many time slots in
OAA are higher than those in UAA and TAA, which lead
to the pCTR thresholds of these time slots are lower. Thus,
OAA can buy more invalid impressions than UAA and
TAA. On the other hand, the CPC of OAA is increased by
65.24%, compared with that under the budget of 5,000,000.
This is because that the pCTR threshold of each time slot
decreases as the budget increases and our piecewise bidding
strategy bids on more impressions with low pCTRs. In
summary, under different daily budgets, our budget alloca-
tion and the piecewise bidding strategy can still work well.
E.
DIFFERENT BIDDING STRATEGIES WITH OUR
BUDGET ALLOCATION
In this section, we compare the performance of different
bidding strategies under the proposed budget allocation
algorithm. The daily budget is still set to 100,000,000. We
take the linear, non-linear and our optimal bidding without
the pCTR threshold (called as OAA*) as the baselines. The
results are shown in Table VIII. We can see that the per-
formance of our piecewise bidding (OAA) is best, followed
by OAA*, Non-Linear and Linear, which demonstrate that
our piecewise bidding is effective since it focuses money on
impressions with high pCTRs. Table IX analyzes the rea-
sons for missing clicks. For those bidding strategies without
the pCTR thresholds that the allocated budget wiped out
early is the main reason for losing clicks. The results
demonstrate that our piecewise bidding strategy with pCTR
threshold is indeed effective for improving the bidding per-
formance.
TABLE VIII
PERFORMANCE COMPARISON UNDER DIFFERENT BIDDING STRATEGIES
(BUDGET = 100,000,000)
Scheme
Bids
Imps.
Clicks
CTR (%)
CPC
Linear
235,694
189,360
175
0.0924
57142
Non-Linear
307,386
214,578
214
0.0997
46728
OAA*
360,552
219,446
233
0.1062
42918
OAA
171,738
155,267
291
0.1874
33352
TABLE IX
THE NUMBER OF MISSING CLICKS FOR DIFFERENT REASONS
(BUDGET = 10,000,000; R1: BUDGET WIPED OUT; R2: LOWER THAN THE
MARKET PRICE; R3: LOWER THAN THE THRESHOLD)
Reasons
Linear
Non-Linear
OAA*
OAA
R1
170
105
61
0
R2
11
37
62
30
R3
0
0
0
35
Total
181
142
123
65
VI. CONCLUSION
In this paper, we present a budget management framework
to maximize the obtained clicks of an ad campaign under
budget and smooth delivery constraints. It allocates the sub-
budget for each time slot while considering not only the
expected number of clicks but also budget consumption
speed. We design a piecewise bidding strategy with the
pCTR threshold to select high-quality impressions and im-
plement a time-varying bidding function to capture the dy-
namics of RTB for improving the performance. In our ex-
periments conducted on a real-world dataset, we compare
our budget allocation algorithms with state-of-the-art base-
lines under different budget amounts and various bidding
strategies. The experimental results show that our proposed
methods can effectively improve the revenue of advertisers
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2970463, IEEE Access
VOLUME 10, 2020 12
and achieve smooth delivery. In our future work, we plan to
further investigate the bidding strategies based on the ma-
chine learning models (such as [14]) and the latest rein-
forcement learning models (such as [9, 15]).
REFERENCES
[1] J. Xu, X. Shao, J. Ma, K. C. Lee, H. Qi and Q. Lu, “Lift-based bid-
ding in ad selection,” in Proc. AAAI, Phoenix, AZ, USA, 2016, pp.
651-657.
[2] S. Yuan, J. Wang and X. Zhao, “Real-time bidding for online adver-
tising: measurement and analysis,” in Proc. ACM-ADKDD, Chicago,
IL, USA, 2013, pp. 19-27.
[3] J. Wang, W. Zhang and S. Yuan, “Display advertising with real-time
bidding (RTB) and behavioural targeting,” Foundations & Trends,
vol. 11, 2017.
[4] Y. Yuan et al, “A survey on real time bidding advertising,” in Proc.
IEEE-SOLI, Qingdao, China, 2014, pp. 418-423.
[5] W. Zhang, S. Yuan and J. Wang, “Optimal real-time bidding for
display advertising,” in Proc. ACM-SIGKDD, New York, NY, USA,
2014, pp. 1077-1086.
[6] Zhao J, Qiu G, Guan Z, et al., “Deep reinforcement learning for
sponsored search real-time bidding,” in Proc. ACM-SIGKDD, Lon-
don, UK, 2018, pp. 1021-1030.
[7] K. C. Lee, A. Jalali and A. Dasdan, “Real time bid optimization with
smooth budget delivery in online advertising,” in Proc. ACM-
ADKDD, Chicago, IL, USA, 2013, pp. 1-9.
[8] D. Agarwal, S. Ghosh, K. Wei and S. You, “Budget pacing for tar-
geted online advertisements at LinkedIn,” in Proc. ACM-SIGKDD,
New York, NY, USA, 2014, pp. 1613-1619.
[9] Cai H, Ren K, Zhang W, et al, “Real-time bidding by reinforcement
learning in display advertising,” in Proc. ACM-CWSDM, Cambridge,
UK, 2017, pp. 661-670.
[10] Chen Y, Berkhin P, Anderson B, et al., “Real-time bidding algo-
rithms for performance-based display ad allocation,” in Proc. ACM-
SIGKDD, San Diego, USA, 2011, pp. 1307-1315.
[11] C. Perlich et al, “Bid optimizing and inventory scoring in targeted
online advertising,” in Proc. ACM-SIGKDD, Beijing, China, 2012,
pp. 804-812.
[12] J. Fernandez-Tapia, “Optimal budget-pacing for real-time bidding,”
Social Science Electronic Publishing, 2015.
[13] J. Xu, K. C. Lee, W. Li, H. Qi and Q. Lu, “Smart pacing for effective
online ad campaign optimization,” in Proc. ACM-SIGKDD, Sydney,
NSW, Australia, 2015, pp. 2217-2226.
[14] Ren K., Zhang W., Chang K., et al, “Bidding machine: learning to
bid for directly optimizing profits in display advertising,” IEEE
Transactions on Knowledge and Data Engineering, vol. 30, no. 4, pp.
645-659, Apr, 2018, DOI. 10.1109/TKDE.2017.2775228.
[15] Zhao J, Qiu G, Guan Z, et al., “Deep reinforcement learning for
sponsored search real-time bidding,” in Proc. ACM-SIGKDD, Lon-
don, UK, 2018, pp. 1021-1030.
[16] D. Wu, X. Chen, X. Yang, H. Wang, Q. Tan, X. Zhang, J. Xu and K.
Cai, “Budget constrained bidding by Model-free reinforcement learn-
ing in display advertising,” in Proc. ACM-ICIKM, New York, NY,
USA, 2018, pp. 1443-1451.
[17] Agrawal O P., “Formulation of Euler-Lagrange equations for frac-
tional variational problems,” Journal of Mathematical Analysis and
Applications, vol. 272, no. 1, pp. 368-379, August, 2002, DOI.
10.1016/S0022-247X (02) 00180-4.
[18] O. Chapelle, E. Manavoglu and R. Rosales, “Simple and scalable
response prediction for display advertising,” ACM Transactions on
Intelligent Systems and Technology (TIST), vol. 5, no. 4, pp. 1-34,
Jan. 2014, DOI. 10.1145/2532128.
[19] M. Richardson, E. Dominowska and R. Ragno, “Predicting clicks:
estimating the click-through rate for new ads,” in Proc. WWW, Banff,
Alberta, Canada, 2007, pp. 521-530.
[20] S. Rendle, “Factorization machines with LibFM,” ACM Transactions
on Intelligent Systems and Technology, vol. 3, no. 3, pp. 1-22, May.
2012, DOI. 10.1145/2168752.2168771.
[21] Y. Juan et al, “Field-aware factorization machines for CTR predic-
tion,” in Proc. ACM-CRS, Boston, MA, USA, 2016, pp. 43-50.
[22] W. Zhang, T. Du and J. Wang, “Deep learning over multi-field cate-
gorical data: a case study on user response prediction,” in Proc. ECIR,
Padua, Italy, 2016, pp. 45-57.
[23] Y. Qu, H. Cai, K. Ren, W. Zhang, Y. Yu, Y. Wen and J. Wang,
“Product-based neural networks for user response prediction,” in
Proc. IEEE-ICDM, Barcelona, Spain, 2016, pp. 1149-1154.
[24] H. Liao, L. Peng, Z. Liu and X. Shen, “iPinYou global RTB bidding
algorithm competition dataset,” in Proc. ACM-ADKDD, New York,
NY, USA, 2014, pp. 1-6.
[25] J. Shen et al, “From 0.5 million to 2.5 million: efficiently scaling up
real-time bidding,” in Proc. IEEE-ICDM, Atlantic City, NJ, USA,
2015, pp. 973-978.
MENGJUAN LIU received Ph.D. degree in
Communication and Information System from
University of Science and Technology of China
in 2007. Currently, she is an associate professor
with the Department of Information and Software
Engineering, University of Electronic Science
and Technology of China. Her main research
interests are in the areas of computational adver-
tising, data mining, and artificial intelligence.
WEI YUE was born in Henan Province, China in
1995. He received the B.S. degree from the Uni-
versity of Electronic Science and Technology
(UESTC), Chengdu, China in 2017. He is cur-
rently pursuing the M.S. degree in software engi-
neering at the School of Information and Soft-
ware Engineering, UESTC since 2017. His re-
search interests include computational advertising
and machine learning.
LIZHOU QIU was born in 1995; he received the
B.S. degree from the School of Computer and
Communication Engineering, Changsha Univer-
sity of Science and Technology, China, in 2017.
He is pursuing the master’s degree in software
engineering at the School of Information and
Software Engineering, UESTC since 2017. His
main research interests include computational
advertising, recommendation system and data
mining.
JIAXING LI was born in Sichuan Province,
China in 1996. He received the B.S. degree in
software engineering from the University of
Electronic Science and Technology (UESTC),
Chengdu, China, in 2018. He is currently pursu-
ing the M.S. degree in software engineering at the
School of Information and Software Engineering,
UESTC, Chengdu, China, since 2018. His re-
search interests include machine learning and
computational advertising.
... The highest bidder wins(4) the impression opportunity, and the actual payment (cost) could be sent to the win DSP from SSP [1]. Following the impression(5), the user's response (6) will be sent to the DSP as feedback. The auction occurs in the split second to load a Web page. ...
... Traditional bid optimization suffers from the separation of optimum formulation and budget management, which is the second challenge of bid optimization. Most approaches formulate an optimal solution based on the linear programming prime-dual formulation to maximum the bidder's utility [1,3,5,6]. The advertiser's KPIs are treated as the constraints of the optimization. ...
... The advertiser's KPIs are treated as the constraints of the optimization. The primedual formulation can derive a controllable function and employ strategies [6] or controllers [7] to adjust the bid price. When the solution's constraints are changed or reconstructed, the prime-dual formulation and the derived function should be reformulated. ...
Preprint
In the Real-Time Bidding (RTB), advertisers are increasingly relying on bid optimization to gain more conversions (i.e trade or arrival). Currently, the efficiency of bid optimization is still challenged by the (1) sparse feedback, (2) the budget management separated from the optimization, and (3) absence of bidding environment modeling. The conversion feedback is delayed and sparse, yet most methods rely on dense input (impression or click). Furthermore, most approaches are implemented in two stages: optimum formulation and budget management, but the separation always degrades performance. Meanwhile, absence of bidding environment modeling, model-free controllers are commonly utilized, which perform poorly on sparse feedback and lead to control instability. We address these challenges and provide the Multi-Constraints with Merging Features (MCMF) framework. It collects various bidding statuses as merging features to promise performance on the sparse and delayed feedback. A cost function is formulated as dynamic optimum solution with budget management, the optimization and budget management are not separated. According to the cost function, the approximated gradients based on the Hebbian Learning Rule are capable of updating the MCMF, even without modeling of the bidding environment. Our technique performs the best in the open dataset and provides stable budget management even in extreme sparsity. The MCMF is applied in our real RTB production and we get 2.69% more conversions with 2.46% fewer expenditures.
... In RTB, bidding optimization has always been the focus of research [11], [18], [19]. As described in the introduction, most of the existing solutions formulate the bidding decision as a static optimization problem. ...
... Meanwhile, we input (s t , a t )(< s t , a t , r t , s t+1 >) into two Eval Critic networks respectively to calculate two Q values, recorded as Q θ 1 (s t , a t ) and Q θ 2 (s t , a t ). We update two Eval Critic networks' parameters by minimizing the two networks' loss functions based on TD error, as shown in formula (19). Here, y t is defined in (20), where r t comes from the input < s t , a t , r t , s t+1 >, γ is the discount factor, Q θ 1 (s t+1 ,ã) and Q θ 2 (s t+1 ,ã) are Q values computed by two Target Critic networks upon the next state s t+1 and the actionã. ...
Article
Full-text available
Real-time bidding (RTB) is one of the most striking advances in online advertising, where the websites can sell each ad impression through a public auction, and the advertisers can participate in bidding the impression based on its estimated value. In RTB, the bidding strategy is an essential component for advertisers to maximize their revenues (e.g., clicks and conversions). However, most existing bidding strategies may not work well when the RTB environment changes dramatically between the historical and the new ad delivery periods since they regard the bidding decision as $\boldsymbol {a}$ static optimization problem and derive the bidding function only based on historical data . Thus, the latest research suggests using the reinforcement learning (RL) framework to learn the optimal bidding strategy suitable for the highly dynamic RTB environment. In this paper, we focus on using model-free reinforcement learning to optimize the bidding strategy. Specifically, we divide an ad delivery period into several time slots. The bidding agent decides each impression’s bidding price depending on its estimated value and the bidding factor of its arriving time slot. Therefore, the bidding strategy is simplified to solve each time slot’s optimal bidding factor, which can adapt dynamically to the RTB environment. We exploit the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm to learn each time slot’s optimal bidding factor. Finally, the empirical study on a public dataset demonstrates the superior performance and high efficiency of the proposed bidding strategy compared with other state-of-the-art baselines.
... We note that since the exposure can be accurately evaluated and quantified by means of the modern instruments of social networks, the platforms can provide high resolution of data that allows regulating each campaign separately. The campaigner has records that are provided dynamically during the campaign that raises more complicated budgeting problems for each campaign (see e.g., Liu, Yue, Qiu & Li, 2020 ;Xu, Lee, Li, Qi & Lu, 2015 ), which is the focus of this study. ...
Article
This paper suggests a method for optimizing a dynamic budget allocation policy for an advertising campaign posted through a social network (e.g., Facebook, Instagram). The method, which considers unique features of social network marketing, yields an optimal targeted budget allocation policy over time for a single ad campaign and minimizes the campaign's length, given a specific budget and a desired level of exposure of each marketing segment. The model incorporates a general ‘effectiveness function’ that determines the relationship between the value of an advertising bid at a given time and the number of newly exposed users at that time. We develop closed-form solutions for dynamic budget allocation for several forms of the effectiveness function. We apply the approach to data obtained from a real-life ad campaign and show how a curve fitting regression procedure can estimate the shape and the parameters of the effectiveness function. Numerical simulations show the extent to which the optimal advertising policy is sensitive to the problem parameters.
Article
Accuracy and scalability are critical to the efficiency and effectiveness of real-time recommender systems. Recent deep learning-based click-through rate prediction models are improving in accuracy but at the expense of computational complexity. The purpose of this study is to propose an accurate and scalable click-through rate (CTR) prediction model for real-time recommendations. This study investigates the complexity, accuracy, and scalability aspects of various CTR models. This work ensembles top CTR models using a gated network and distill into a deep neural network (DNN) using a knowledge distillation framework. Distilled DNN model is more accurate and 20x scalable than any of the individual CTR models. The low latency of distilled model makes it scalable and fit for deployment in real-time recommender systems. The proposed distillation framework is extensible to integrate any CTR models to the ensemble and can be distilled to any neural architecture.
Article
Full-text available
Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users' ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.
Conference Paper
The real-time bidding (RTB), aka programmatic buying, has recently become the fastest growing area in online advertising. Instead of bulking buying and inventory-centric buying, RTB mimics stock exchanges and utilises computer algorithms to automatically buy and sell ads in real-time; It uses per impression context and targets the ads to specific people based on data about them, and hence dramatically increases the effectiveness of display advertising. In this paper, we provide an empirical analysis and measurement of a production ad exchange. Using the data sampled from both demand and supply side, we aim to provide first-hand insights into the emerging new impression selling infrastructure and its bidding behaviours, and help identifying research and design issues in such systems. From our study, we observed that periodic patterns occur in various statistics including impressions, clicks, bids, and conversion rates (both post-view and post-click), which suggest time-dependent models would be appropriate for capturing the repeated patterns in RTB. We also found that despite the claimed second price auction, the first price payment in fact is accounted for 55.4% of total cost due to the arrangement of the soft floor price. As such, we argue that the setting of soft floor price in the current RTB systems puts advertisers in a less favourable position. Furthermore, our analysis on the conversation rates shows that the current bidding strategy is far less optimal, indicating the significant needs for optimisation algorithms incorporating the facts such as the temporal behaviours, the frequency and recency of the ad displays, which have not been well considered in the past.
Conference Paper
Real-time bidding (RTB) is an important mechanism in online display advertising, where a proper bid for each page view plays an essential role for good marketing results. Budget constrained bidding is a typical scenario in RTB where the advertisers hope to maximize the total value of the winning impressions under a pre-set budget constraint. However, the optimal bidding strategy is hard to be derived due to the complexity and volatility of the auction environment. To address these challenges, in this paper, we formulate budget constrained bidding as a Markov Decision Process and propose a model-free reinforcement learning framework to resolve the optimization problem. Our analysis shows that the immediate reward from environment is misleading under a critical resource constraint. Therefore, we innovate a reward function design methodology for the reinforcement learning problems with constraints. Based on the new reward design, we employ a deep neural network to learn the appropriate reward so that the optimal policy can be learned effectively. Different from the prior model-based work, which suffers from the scalability problem, our framework is easy to be deployed in large-scale industrial applications. The experimental evaluations demonstrate the effectiveness of our framework on large-scale real datasets.
Conference Paper
Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing'' problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.
Conference Paper
Predicting user responses, such as clicks and conversions, is of great importance and has found its usage inmany Web applications including recommender systems, websearch and online advertising. The data in those applicationsis mostly categorical and contains multiple fields, a typicalrepresentation is to transform it into a high-dimensional sparsebinary feature representation via one-hot encoding. Facing withthe extreme sparsity, traditional models may limit their capacityof mining shallow patterns from the data, i.e. low-order featurecombinations. Deep models like deep neural networks, on theother hand, cannot be directly applied for the high-dimensionalinput because of the huge feature space. In this paper, we proposea Product-based Neural Networks (PNN) with an embeddinglayer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfieldcategories, and further fully connected layers to explorehigh-order feature interactions. Our experimental results on twolarge-scale real-world ad click datasets demonstrate that PNNsconsistently outperform the state-of-the-art models on various metrics.
Article
The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks.
Article
In display and mobile advertising, the most significant progress in recent years is the employment of the so-called Real-Time Bidding (RTB) mechanism to buy and sell ads. RTB essentially facilitates buying an individual ad impression in real time while it is still being generated from a user's visit. RTB not only scales up the buying process by aggregating a large amount of available inventories across publishers, but more importantly, enables directly targeting individual users. As such, RTB has fundamentally changed the landscape of the digital marketing. Scientifically, the demand for automation, integration, and optimization in RTB also brings new research opportunities in information retrieval, data mining, machine learning, and other related fields. In this monograph, we provide an overview of the fundamental infrastructure, algorithms, and technical challenges and their solutions of this new frontier of computational advertising. The topics we have covered include user response prediction, bid landscape forecasting, bidding algorithms, revenue optimisation, statistical arbitrage, dynamic pricing, and ad fraud detection.
Conference Paper
Click-through rate (CTR) prediction plays an important role in computational advertising. Models based on degree-2 polynomial mappings and factorization machines (FMs) are widely used for this task. Recently, a variant of FMs, field-aware factorization machines (FFMs), outperforms existing models in some world-wide CTR-prediction competitions. Based on our experiences in winning two of them, in this paper we establish FFMs as an effective method for classifying large sparse data including those from CTR prediction. First, we propose efficient implementations for training FFMs. Then we comprehensively analyze FFMs and compare this approach with competing models. Experiments show that FFMs are very useful for certain classification problems. Finally, we have released a package of FFMs for public use.