Content uploaded by Aijun An

Author content

All content in this area was uploaded by Aijun An on Jun 18, 2018

Content may be subject to copyright.

Time-Aware Subscription Prediction Model for User Acquisition in Digital

News Media

Heidar Davoudi ∗Morteza Zihayat ∗Aijun An ∗

Abstract

User acquisition is one of the most challenging problems

for online news providers. In fact, due to availability of

diﬀerent news media, users have a lot of choices in selecting

the news source. To date, most of digital news portals

have tried to approach the solution indirectly by targeting

the user satisfaction through the recommendation systems.

In contrast, we address the problem directly by identifying

valuable visitors who are likely potential subscribers in the

future. First, we suggest that the decision for subscription

is not a sudden, instantaneous action, but is the informed

decision based on positive experience with digital medium.

As such, we propose eﬀective engagement measures and

show that they are eﬀective in building the predictive

model for subscription. We design a model that not only

predicts the potential subscribers but also answers queries

about the subscription occurrence time. The proposed

model can be used to predict the subscription time and

recommend accurately the “potential users” to the current

marketing campaign. We evaluate the proposed model using

a real dataset from The Globe and Mail which is a major

newspaper in Canada. The experimental results show that

the proposed model outperforms the traditional state-of-the-

art approaches signiﬁcantly.

1 Introduction

Digital media and online news providers are facing the

user acquisition challenge as the pressing issue more

than before. In fact, from a business point of view,

successful user acquisition can be directly translated to

huge proﬁts and values. However, whilst around 45%

of people pay for a printed newspaper at least once a

week, it has been much harder to persuade readers to

pay for the online news subscription [10].

News recommender systems are widely exploited to

improve the user experience, and consequently user ac-

quisition indirectly. However, such systems mainly fo-

cus on recommending items that coincide with user’s in-

terests (to maximize the user’s satisfaction) and do not

identify potential subscribers and predict the subscrip-

tion time. Identifying potential subscribers and pre-

dicting their subscription time are of paramount impor-

tance for news websites since it allows them to launch

a targeted marketing campaign in advance. To the best

of our knowledge, this problem has not been explored

directly in the digital news media domain from data

∗Department of Electrical Engineering and Computer Science,

York University, Canada, {davoudi, zihayatm, ann}cse.yorku.ca.

mining/machine learning perspectives, but rather con-

sidered in marketing studies which need a lot of human

eﬀorts.

The problem of identifying potential subscribers

for news media from the data mining/machine learn-

ing point of view is facing several challenges. First, a

decision for subscription is under inﬂuence of many fac-

tors such as demographical, social, or cultural circum-

stances. For example, one might decide to subscribe

as she/he was referred by her/his friend (e.g., word of

mouth), or based on her/his good experience. Find-

ing an appropriate set of predictors for identifying and

recommending such users (i.e., potential subscribers)

is a challenging problem. Second, domain knowledge

is extremely limited for “the decision to subscription”

process (i.e., the knowledge acquisition bottleneck). In

other words, domain experts do not have a clear idea

on who subscribes and why/when a subscription occurs.

Third, subscription should be considered in combination

with the time dimension. In fact, the predictive model

should identify the potential subscribers in a right time

(i.e., neither soon nor late) since targeting a user who is

either not ready to subscribe yet or no longer interested

in subscription (while was previously interested) by any

marketing campaign results in no subscription.

In this paper, we propose an end-to-end solution to

address the aforementioned challenges in the problem

of identifying potential users prone to subscription in

news portals. First, we argue that the subscription act

is not an instantaneously sudden decision, but rather

an informed decision based on previous positive expe-

riences. Accordingly, we propose a set of engagement

measures as subscription predictors. The engagement

measures are quantiﬁed in fully data-driven fashion, so

we do not rely on the domain expert knowledge for their

calculation. Then, we propose a Time-aware Subscrip-

tion Prediction (TASP) model that combines the time

dimension with the suggested predictors. The proposed

model not only identiﬁes and recommends the users who

are very likely to become subscribers but also is able to

predict their subscription time. In the TASP model,

we treat subscription time as a dependent random vari-

able and utilize generalized linear model to combine all

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

135

Downloaded 06/18/18 to 143.191.196.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

engagement measures (i.e., independent random vari-

ables). Then, we cast the problem into an optimization

problem aiming to maximize the likelihood of the pro-

posed model. The learning algorithm is designed and

parameters of model are learned respectively. Our main

contributions are as follows:

•We deﬁne the problem of time-aware subscription

prediction for user acquisition in news portals and

design an end-to-end data-driven solution based

on the data which are usually available in news

portals.

•We propose eﬀective user engagement measures as

the main component of the subscription prediction

model and show that they have a good predictive

power to model subscription occurrence/time.

•We argue that time is an important factor in user

subscription prediction and develop a probabilis-

tic model to recommend the trustworthy potential

subscribers. The proposed model predicts the po-

tential users prone to subscription before a given

time. Moreover, it can predict when the subscrip-

tion occurs.

•The conducted experiments on a real dataset show

the eﬀectiveness of the proposed framework and the

developed model in solving the problem of time-

aware subscription prediction for user acquisition.

The rest of paper is organized as follows. Section 2

discusses the proposed framework for user acquisition.

In particular, we present our Time-aware Subscription

Prediction Model (TASP) in section 2.3. We outline

the empirical evaluation in section 3 and discuss related

work in section 4. Section 5 concludes the paper and

present the future work.

2 Time-Aware User Acquisition in News

Portals

Figure 1 shows an overview of the proposed framework

for user acquisition in news portals. The framework con-

sists of three main components: (1) Data preparation:

most of news portals (e.g., The Globe and Mail1) use

a data collection platform (e.g., Omniture by Adobe2)

to capture the interactions with users. However, the

captured data need to be preprocessed and aggregated

before applying any learning algorithm (see §2.1). (2)

Learning phase: given the preprocessed data, this com-

ponent ﬁrst ﬁnds a set of engagement measures (see

1www.theglobeandmail.com

2https://my.omniture.com

Data Collection

Preprocessing

and

Aggregation

User Engagement

Measures

Time-Aware

Subscription

Prediction

Subscription

Occurrence

Prediction

Subscription

Time

Prediction

Data preparation Learning phase Inference phase

Users

Management

Marketing Campaign

Portal

Figure 1: The proposed user acquisition framework.

§2.2) and then uses them to design the Time-aware Sub-

scription Prediction (TASP) model (see §2.3). (3) Infer-

ence phase: as we learn the parameters of the proposed

model, the interference models answer two type of ques-

tions: (i) time-aware subscription occurrence predic-

tion: (i.e., what is the probability that a user becomes a

subscriber by the given time tsince the ﬁrst visit?) (ii)

subscription time prediction (i.e., when will a user be-

come a subscriber since the ﬁrst visit?). The inference

outcomes can be utilized by the marketing campaign to

boost user acquisition.

2.1 Data Preparation In this section we present

two main phases to prepare the data for user acquisition

analysis: (1) Data collection, and (2) Preprocessing and

aggregation.

2.1.1 Data collection: Every time a user reads an

article, watches a video or generally takes an action in

a news portal, the interaction is being tracked on the

portal and is recorded as a hit. In data collection frame-

works (e.g., Omniture), a hit simply shows a record

in the data warehouse which contains rich information

about the visitor and her/his actions. Typically, a hit

contains information like date, time, user id (for a sub-

scribed user), user environment variables (e.g., browser

type, IP address), visited article, special events of inter-

est like subscription, sign in, etc. Although the click-

stream data are composed of billions of hits that tell

what visitors have done in their visits, they contain a

lot of noisy information needed to be cleaned properly.

2.1.2 Preprocessing and Aggregation: The data

captured in any data collection platform contains a lot

of low-level interactions (e.g., hits) mixed with a lot of

noises. For example, spending a lot of time in a session

does not necessarily mean that a user spends more time

on reading the clicked articles as the user might use

multi-tab browser and is engaged in other activities. As

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

136

Downloaded 06/18/18 to 143.191.196.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

such, we need to deal with both aggregation, and data

cleansing as a part of data preparation.

Given that data are organized as hits, we roll-up

the data from page view hits to visits and then to

visitors. We refer to a visit as a set of page views in one

“session” (a session is terminated if the data collection

server does not hear from the same user for 30 minutes).

We use cookie and the device’s IP information which

is anonymized and encoded in the data warehouse to

detect the unique visitors.

Data collection platforms record a timestamp for

each hit, so the diﬀerence between two consecutive page

click timestamps can be utilized to calculate the time

the user spent on an article. As usual in web analytics,

the last article in a visit is ignored since we cannot

estimate the time the user spent on it.

We ﬁlter out the unnecessary attributes which are

not needed in calculation of user engagement deﬁned in

the next section (§2.2). We perform data cleaning by

removing the outlier visitors whose engagement mea-

sures deviate more than 3 times the standard deviation

from the mean of respective engagement measures in

the data [5]. This helps us simply remove unreasonable

values for the measures. Finally, all the engagement

measures are normalized based on the z-score method.

2.2 User Engagement Measures As we suggest

that user engagement have a close relationship with user

acquisition, one important task in the proposed frame-

work is to measure the user engagement. To understand

the rationale behind the relationship, consider the sce-

nario that we want to predict users prone to subscrip-

tion based on the historical data stored as a clickstream

collection. A reasonable assumption is that the user’s

decision on subscription is based on a long-term and

short-term positive experiences rather than a sudden

instantaneously thought. This is exactly related to the

area of “user engagement” modeling. In fact, a well-

known deﬁnition of engagement is based on “positive

aspects” of user experience while interacting with an on-

line application [7]. The positive aspects of experience

are diﬀerent among domains and applications and very

hard to measure (e.g., visiting Twitter more frequently

by a user in comparison to Facebook does not show es-

sentially she/he has a better experience with Twitter

due to diﬀerences in engagement patterns of these two

social media). Moreover, other engagement measure-

ment approaches such as self-reporting methods [8] (i.e.,

using questionnaires, surveys or interviews) and physio-

logical methods [3] (i.e., utilizing observational methods

such as facial expression or speech analysis) are based

on a small number of users while assuming to be the

representative of the whole population.

Alternatively, as we aim to have a fully data-driven

framework, we propose the following simple but eﬀec-

tive web analytics measures, inspired by [7], to quantify

the user engagement and show that they have predictive

power for subscription prediction in digital news media

domain.

Total Number of Paywall: In news portals which

provide subscribed services, there is a restriction on the

number of articles that a non-subscriber can read in a

period of time. For example, in The Globe and Mail

this period is one month. That is, as a visitor tries to

read more articles, she/he is directed to a page asking

for subscription (or login). This page is referred as a

paywall. In our proposed approach, this interaction is

used as an indicator of a user’s interest in subscription.

We calculate the total number of paywalls each user hits

in all of her/his visits.

Average Number of Paywalls per Visit: This mea-

sure is calculated by normalizing the total number of

paywalls by the number of visits.

Total Article Read: This measure is simply deﬁned

as the number of articles read by the user. There is

diﬀerence between page visit and this measure. While

in page visit we consider all of the pages (e.g., naviga-

tional or search pages), in this measure we only count

article pages since they may better show the interest of

users in contents and could be more close to the real

user engagement, considering situations where, e.g., we

count the number of page visits when a user visits a lot

of navigational pages while looking for a single article.

Average Number of Articles per Visit: This mea-

sure is the number of articles read by the user normal-

ized by the number of visits.

Average Spent Time per Article: The time a user

spent on each article is calculated based on the method

described in §2.1.2. The average time spent per article

is calculated by dividing the total time that the user

spent on articles by the number articles she/he visited.

This measure roughly shows how much a user is inter-

ested in articles.

Average Spent Time per Visit: This measure is de-

ﬁned as the time that the user spent on visits divided by

the number of visits. Each visit time is calculated based

on the sum of time that the user spent on all articles

during the respective visit.

Total Spent Time: The total spent time is measured

as the sum of time that a visitor spent on each article

during all her/his visits.

Although these measures are the indirect proxy of

real engagement our experimental results show their

eﬀectiveness for user subscription prediction.

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

137

Downloaded 06/18/18 to 143.191.196.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

2.3 Time-aware Subscription Prediction Model

(TASP): Given the set of engagement measures, in

this section, we ﬁrst outline the problem statement;

then, in subsequent sections we describe our proposed

Time-aware Subscription Prediction (TASP) model in

details. We utilize the generalized linear model as the

building block of the model. By assuming an underlying

distribution for subscription time (i.e., Weibull), we cast

the problem into the maximum likelihood optimization.

Finally, we derive the solution to learn the parameters

of the model.

2.3.1 Problem Statement: Given the processed

data for all the users, we refer to the time period of

this data set as “exploration period”. We ﬁrst remove

the users who subscribed before the exploration period.

The remaining users either subscribed during the ex-

ploration period (i.e., subscribers) or never subscribed

either before or during the period (i.e., non-subscribers).

Note that we do not consider the users who subscribed

before the exploration period since we do not have their

information before their subscription and our targeted

problem is to build a model to predict how and when

the unsubscribed users turn to subscribed ones.

Definition 2.1. (Subscription Occurrence Time):

The subscription time e

tiis deﬁned as the time that

passed since the ﬁrst visit of user i until her/his sub-

scription. Thus, given the absolute subscription time t0

i

and the ﬁrst visit time tfifor user i, e

tiis computed as

follows:

(2.1) e

ti=t0

i−tfi

The absolute subscription time refers to the timestamp

that is recorded for each subscription. In our analysis,

all timestamps are in day scale.

For non-subscribers, we deﬁne the possible subscrip-

tion period as follows:

Definition 2.2. (Possible Subscription Period): We

deﬁne (¯

ti,∞)as the possible subscription period for user

i, where ¯

tiis deﬁned as:

(2.2) ¯

ti=tli−tfi

where tliis the last visit time in the exploration

period for a non-subscriber. Alternatively, ¯

timight be

considered as the time that subscription might occur

afterward since the ﬁrst visit for the user i. Please

note that if the subscription occurs we know the exact

time of subscription (e

ti), whereas in the case that the

subscription does not occur, all we know is that the

subscription time exceeds ¯

ti.

The training set for the subscription time prediction

problem is deﬁned as follows:

(2.3) L={(Xi, ti, Ii)|i= 1,2, . . . , n}

where Xi= [xji ]m×1is the engagement measure vector

for the user i(xji is the j’th engagement measure

calculated for the user i, see §2.2). We calculate the

user engagement measures for subscribers based on

the visits before the subscription time and for non-

subscribers based on the ﬁrst visit till the last visit in

the exploration period. For simplicity, the vector of Xi

is append by 1 to address the bias in the linear system.

Iiis deﬁned as the indicator function which speciﬁes

whether user isubscribed during the exploration period

or not:

(2.4) Ii=(1 if user i is a subscriber

0 otherwise

and tiis deﬁned as e

tifor subscribed users (i.e., Ii= 1)

and ¯

tifor non-subscribed user (i.e., Ii= 0). We refer to

this arrangement in §2.3.4 as we want to formulate the

optimization problem.

Let Tbe a non-negative continuous random vari-

able representing the waiting time for subscription oc-

currence since the ﬁrst visit. We assume fT(t) be the

probability density function and FT(t) = P(T < t)

(p.d.f.) be the cumulative distribution function (c.d.f.)

of subscription occurrence by time t.

Now we deﬁne the problem of user subscription

time prediction as follows. Given training data L(Eq.

2.3), we want to estimate the cumulative distribution

function F(t) = P(T < t) for any subscription time t.

2.3.2 Generalized Linear Model: In order to

make a connection between subscription time (i.e., vari-

able of interest) and engagement factors, we ﬁrst de-

velop a generalized linear model. The generalized linear

model bridges the gap between the probability distri-

bution of subscription time and engagement factors cal-

culated for each user and parameterize our model from

observed data. Once the connection (i.e., a model) is

established, we can predict the subscription time from

the engagement behaviors.

Given vector Xias the engagement measure vector

(i.e., exploratory variables), subscription time observa-

tion for the user iis modeled as follows:

(2.5) Ti=B|Xi+

where is a stochastic residual coming from exponential

family. The main idea is to model the expectation of

subscription time as a function (i.e., link function) of

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

138

Figure 2: Weibull Distribution.

linear combination of engagement measures. So,

(2.6) E[Ti] = g−1(B|Xi)

To ensure the strict positivity of E[Ti], we assume gis

the exponential function:

(2.7) E[Ti]∝exp (−B|Xi)

This assumption also helps us simplify the objective

function introduced in §2.3.4. Please note that if we

choose Gaussian or Bernoulli distribution, the model

will be reduced to linear regression or logistic regression

respectively.

2.3.3 Underlaying Distribution for Subscrip-

tion Time: As our goal is to model relationship be-

tween user engagement and subscription time, we need

to ﬁnd the proper distribution for predicting the sub-

scription time. The Weibull distribution has the ﬂexi-

bility to model right-skewed, left-skewed or even sym-

metric distributed data. Thus, we chose to use it in our

model. It has been used in diﬀerent domains to model

the waiting time of an event [6]. The Weibull probabil-

ity distribution for subscription time is as follows:

(2.8) fTi(t;γ, α) = γ

αit

αiγ−1

exp {− t

αiγ

}

where αiand γare scale and shape parameters respec-

tively. The shape parameter γcan be learned to model

the waiting time where the rate of event (i.e., hazard

function) decreases (γ < 1), increases (γ > 1), or is

constant (γ= 1) with time. Increasing the value of

scale parameter (αi) while holding shape parameter (γ)

constant has the eﬀect of stretching out the probabil-

ity density function. Figure 2 shows the Weibull dis-

tribution for diﬀerent parameters. The expectation of

Weibull distribution is expressed as:

(2.9) E[Ti] = αiΓ(1 + 1

γ)

where γis the Gamma function. Given (Eq. 2.7), we

can assume that:

(2.10) αi= exp (−B|Xi)

The cumulative distribution function is written as fol-

lows:

(2.11) FTi(t) = P(Ti≤t)=1−exp {− t

αiγ

}

Note that the distributions in (Eq. 2.8) have the same

shape parameter γ, but diﬀerent expectation values

via parameter αi. In fact, the basic assumption is

that each value of a random variable Tiis drawn

from a distribution indicated in (Eq. 2.8) where the

expectation of distribution depends on the data point

in (Eq. 2.9 and 2.10).

2.3.4 Optimization Problem: Assuming that ob-

servations (i.e., data points) are statistically indepen-

dent and drawn from the distribution (Eq. 2.8), the

log-likelihood of the model is formulated as follows:

(2.12)

log `=

n

X

i=1

{Iilog fTi(ti;γ, αi) + (1 −Ii) log P(Ti> ti)}

where tiis the subscription occurrence time (e

ti) for the

subscriber i(i.e., Ii= 1) and the start of possible

subscription period ( ¯

ti) for the non-subscriber i(i.e.,

Ii= 0). The basic idea is that subscribers contribute to

the log-likelihood by the probability density function

fTiwhile non-subscribers contribute to log-likelihood

by the probability P(Ti> ti). If we plug in the

probability density function in (Eq. 2.8) and the

cumulative distribution function in (Eq. 2.11) into the

log-likelihood function (Eq. 2.12), we can simplify the

log-likelihood of model in vector format as follows:

log `=I|(log(γ)1+ (γ−1) log(Ts))+

γ I|XB−1|exp{γ(log(Ts) + XB)}(2.13)

where I= [Ii]n×1is the indicator vector whose com-

ponents are deﬁned in (Eq. 2.4), 1= [1]n×1is the

identity vector (i.e., all components are 1), Ts= [ti]n×1

is the vector of subscription time deﬁned in (Eq. 2.3),

X= [Xi]n×mis the matrix of engagement measures,

where each row is Xideﬁned in (Eq. 2.3), and γ(scaler)

and B= [βi]m×1(vector) are parameters.

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

139

Algorithm 1 TASP Learning Algorithm

1: Initialize B(0)

1, B(0)

2,...B(0)

mrandomly

2: Initialize γ(0) = 1

3: t:= 0

4: while not converge and t < max iterations do

5: B(t+1) := B(t)+η∇B[log `(B(t), γ(t))]

6: γ(t+1) := γ(t)+η∇γ[log `(B(t+1), γ (t))]

7: t:= t+ 1

8: end while

9: return B, γ

2.3.5 Learning Algorithm: We use the gradient

ascending method to maximize the log-likelihood and

learn the parameters of the proposed model. First, we

derive the gradient of log-likelihood of model (Eq. 2.13)

with respect to γand B. The gradient of model with

respect to Bis speciﬁed as follows:

(2.14)

∇B[log `(B, γ )] = γX|I−γX|exp{γ(log(Ts) + XB)}

and gradient of log-likelihood with respect to the γis

derived as follows:

∇γ[log `(B, γ )] =I|{(1/γ)1+ log Ts+XB}−

(log(Ts) + XB)|exp{γ(log(Ts) + XB)}(2.15)

The overall procedure is outlined in Algorithm 1. We

use the coordinate ascending method [14] to learn the

Band γiteratively. In step 5, we update the parameter

Bbased on the gradient derived in (Eq. 2.14 ), then

in step 6, keeping Bﬁxed, the parameter γis updated

according to the gradient in (Eq. 2.15).

2.4 Inference models: After the parameters of the

model (i.e., γand B) are learned, inference with the

model is straightforward. Particularly, we are interested

in answering two types of questions: (1) what is the

probability that a user be subscriber by the given time t

since the ﬁrst visit? (time-aware subscriber prediction)

(2) when will a user be a subscriber since the ﬁrst visit?

(subscription time prediction).

2.4.1 Time-aware Subscription Occurrence

Prediction: To ﬁnd the users who will be subscriber

by time tsince the ﬁrst visit, we need to estimate the

P(T≤t). Given the user buhas a engagement vector

Xbu, we calculate the scale parameter αbuusing (Eq.

2.10):

(2.16) αbu= exp (−B|Xbu)

The desired probability is calculated as follows:

(2.17) FT(t) = P(T≤t)=1−exp {− t

αbuγ

}

Figure 3: Subscription time prediction performance.

We consider FT(t) = P(T≤t)≥0.5 as the subscription

occurrence.

2.4.2 Subscription Time Prediction: For the

subscription time prediction, as the ﬁnal distribution

can be skewed, we propose to use the median as predic-

tion time. This measure is less susceptible to outliers

and extreme values and empirically performs better in

our experiments. Given user buwith Xbuas the engage-

ment vector, the subscription time tfor the user buis

calculated as follows:

(2.18) tbu=αbulog(2)

1

γ

where αbuis estimated using (Eq. 2.16).

3 Empirical Evaluation

In this section, we evaluate our proposed Time-aware

Subscription Prediction (TASP) model and compare it

with the state-of-the-art techniques as the baselines. We

compare our model with Logistic Regression (LR), Ran-

dom Forest (RF), Decision Tree (J48) and Naive Bayes

(NB). We use the Mean Absolute Error (MAE) and

F1-Measure as performance measures for “subscription

time” and “subscription occurrence prediction” accord-

ingly. All the experiments in this section are based on

the 10-fold cross validation. All the time values in the

experiments are in day scale. We use The Globe and

Mail dataset in our experiments. We set the learning

rate ηand maximum number of iterations (i.e., max it-

erations) in Algorithm 1 to 0.01 and 1000 respectively.

3.1 Dataset The Globe and Mail is the major news

paper in Canada. In this news portal, interactions with

users are captured using the Omniture data collection

platform. The original data repository contains about

2 billions of hits (see §2.1.2) ranging from article read-

ing behavior to video watching. We use the data from

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

140

Figure 4: Subscription occurrence prediction performance (all time values are in days).

2014-01 to 2014-08 as the exploration period in our ex-

periments. Since the data contains a lot of irrelevant

information, the original data is processed to keep only

the necessary information needed to calculate the pro-

posed engagement measures. Then, we aggregate and

preprocess the data based on the proposed steps de-

scribed in (§2.1.2). The dataset used in the experiments

contains 17,009 subscribers and 71,639 non-subscribers.

Note again that the subscribers are the ones who sub-

scribed during the exploration period.

3.2 Subscription Time Prediction: Figure 3

shows the results of subscription time prediction for the

proposed model (TASP) and Average Time (AVG) as

the baseline. Each point in the ﬁgure shows the MAE

between the predicted subscription time and actual sub-

scription time for users who subscribe before time t(all

time values are in days). For the AVG model we calcu-

late the average subscription time of visitors who sub-

scribed before time tin the training set. Then, the MAE

is calculated based on the diﬀerence between the actual

subscription time of users in the test set and the re-

spective average time value. As observed, MAE for the

proposed method is much less than the AVG method for

diﬀerent t. In particular, for small values of t, the pro-

posed model performs better than bigger time values,

which means that the proposed method works better in

short-time subscription time prediction than it does in

longer term prediction although it performs better than

the AVG method in both short term and long term.

3.3 Subscription Occurrence Prediction: Fig-

ure 4 shows the performance of TASP compared to the

other baselines for diﬀerent values of t. Each ﬁgure

shows the performance of the diﬀerent models in pre-

dicting the subscription occurrence before time t. Note

that the proposed model (TASP) considers the time in

the training stage and answers the queries about sub-

scription with respect to the time (i.e., probability of

subscription before given time t). Figure 4 shows that

the proposed model outperforms the baselines for dif-

ferent tvalues. Moreover, it can be seen that the TASP

model performs better in short-time subscription pre-

diction. Among the baselines, tree-based models (i.e.,

J48 and Random Forest) perform the best and the Lo-

gistic Regression has the worst performance.

3.4 Imbalanced Sensitivity Analysis: In this sec-

tion, we study the performance sensitivity of the pro-

posed model under diﬀerent portions of non-subscribers

to subscribers as the training data. As such, we vary

the portion of non-subscribers to subscribers by down

sampling the non-subscribers. Figure 5 shows the per-

formance of the proposed model (TASP) as well as

the baselines for diﬀerent portions of non-subscribers to

subscribers (nns/ns) where nns and nsare the number

of non-subscribers and subscribers respectively in the

training set. The performance of the proposed model in

predicting the subscriber is better when the dataset is

balanced and consistently better than the baselines for

diﬀerent portions. As our model performance is better

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

141

Figure 5: The subscription occurrence prediction performance sensitivity with respect to the number of non-

subscribers to subscribers (nns /ns).

in the case that the dataset is balanced, we are aiming

to embed a mechanism in our model to deal with imbal-

anced data as the future work. Figure 6 shows the per-

formance sensitivity of the proposed model in predicting

the subscription time. As it can be seen the MAE has

a small sensitivity to portion of non-subscribers to sub-

scribers in the training set.

4 Related Work

User acquisition traditionally is studied under area of

Customer Relation Management (CRM) [12] where the

main goal is to understand the customer behaviors

and maximize the customer value to the organization

in the long term. However, to date, most of eﬀorts

have focused on user attraction,retention and churn

management rather than user acquisition.

Ng et al. [11] performed one of the ﬁrst attempts on

using data mining techniques for user retention in imagi-

native telecommunication domain (the real domain was

unanimous due to privacy issue). They identiﬁed the

objective indicators and used a decision tree induction

method for the prediction purpose. In [1], authors pro-

posed a rule-based evolutionary algorithm and applied it

to predict churns in a telecommunication domain. They

argued that interpretability was important in this prob-

lem, and their suggested rule-based method could ad-

dress the issue by uncovering interpretable churn pat-

terns.

In another work, Mozer et al [9] considered the

problem of churn prediction for a major carrier com-

pany. They utilized features (overall 134 variables) such

as call details records, billing information, application

for service to predict the users churns. Three classes

of predictive models: (i.e., decision tree, logit regres-

sion, and non linear neural network) were exploited and

compared for the user churn prediction.

Kim et al. [4] conducted a research to measure

the attractiveness (click values) of individual words

for users. Assuming some words signiﬁcantly induce

more clicks than others, they proposed a generative

model which jointly modeled headlines, contents of news

articles as well as the clickstream data. The model was

an extension to Latent Dirichlet Allocation (LDA) [2]

whereas topic-speciﬁc click values of each word and

clicked words were modeled using beta and binomial

distributions respectively.

Customer Life Time Value (LTV) analysis is an-

other related area to user acquisition. Customer LTV

is usually deﬁned as the total net income that a com-

pany expects from its customers. For example, Rosset

et al. [13] calculated the current customer LTV based on

three factors: customer value over time, length of ser-

vice, and discounting factor. However, they estimated

the eﬀects of retention campaigns on Lifetime Value and

did not investigate how a visitor (e.g., non-subscriber)

becomes a customer (subscriber).

In this paper, we consider the problem of user

acquisition in the digital news portal domain. To best of

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

142

our knowledge this is the ﬁrst work that considers this

problem in the digital news media domain and provides

end-to-end solution for it. In particular, our proposed

predictive model considers subscription time as a main

component in both learning and inference process which

has not been tackled before.

Figure 6: The subscription time prediction perfor-

mance sensitivity with respect to the number of non-

subscribers to subscribers (nns /ns).

5 Conclusion and Future Work

User acquisition for digital news portals are one of the

most pressing issue as the users are exposed to many

available news sources. In this paper, we addressed

the problem by predicting users who are prone to

subscription in a given period of time. One important

challenge is to deﬁne the measures that have enough

power to predict the subscription (since the subscription

is a complex decision depending on many factors). We

simply showed that engagement measures had the good

capability in predicting the subscription. The intuition

is that the engagement as a positive experience has a

direct impact on subscription. We proposed a time-

aware prediction model that not only could predict

the subscription in a given period of time, but also

the subscription time. The empirical study on a real

dataset showed that the proposed model performed well

compared to the baseline models. In the future, we plan

to improve and embed a mechanism in the model to deal

with imbalanced data (for the situation that the number

of subscribers to non-subscribers are very low). We will

also investigate the capability of the proposed model in

other domains.

Acknowledgements

This work is funded by Natural Sciences and Engineer-

ing Research Council of Canada (NSERC), The Globe

and Mail, and the Big Data Research, Analytics, and

Information Network (BRAIN) Alliance established by

the Ontario Research Fund - Research Excellence Pro-

gram (ORF-RE). We would like to thank The Globe and

Mail for providing the dataset used in this research. In

particular, we thank Gordon Edall and Shengqing Wu of

The Globe and Mail for their insights and collaboration

in our joint project.

References

[1] W.-H. Au, K. C. Chan, and X. Yao. A novel evolution-

ary data mining algorithm with applications to churn

prediction. IEEE transactions on evolutionary compu-

tation, 7(6):532–545, 2003.

[2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet

allocation. Journal of machine Learning research,

3(Jan):993–1022, 2003.

[3] C. Jennett, A. L. Cox, P. Cairns, S. Dhoparee, A. Epps,

T. Tijs, and A. Walton. Measuring and deﬁning the ex-

perience of immersion in games. International journal

of human-computer studies, 66(9):641–661, 2008.

[4] J. H. Kim, A. Mantrach, A. Jaimes, and A. Oh.

How to compete online for news audience: Modeling

words that attract clicks. In Proceedings of the 22nd

ACM SIGKDD international conference on Knowledge

discovery in data mining. ACM, 2016.

[5] H.-P. Kriegel, P. Kr¨oger, and A. Zimek. Outlier detec-

tion techniques. In Tutorial at the 16th ACM SIGKDD

international conference on Knowledge discovery and

data mining. ACM, 2010.

[6] C.-D. Lai, D. Murthy, and M. Xie. Weibull distribu-

tions and their applications. In Springer Handbook of

Engineering Statistics, pages 63–78. Springer, 2006.

[7] M. Lalmas, H. O’Brien, and E. Yom-Tov. Measuring

user engagement. Synthesis Lectures on Information

Concepts, Retrieval, and Services, 6(4):1–132, 2014.

[8] I. Lopatovska and I. Arapakis. Theories, methods and

current research on emotions in library and information

science, information retrieval and human–computer

interaction. Information Processing & Management,

47(4):575–592, 2011.

[9] M. C. Mozer, R. Wolniewicz, D. B. Grimes, E. Johnson,

and H. Kaushansky. Predicting subscriber dissatisfac-

tion and improving retention in the wireless telecom-

munications industry. IEEE Transactions on neural

networks, 11(3):690–696, 2000.

[10] N. Newman, D. A. Levy, and R. K. Nielsen. Reuters

institute digital news report 2016. Available at SSRN

2619576, 2016.

[11] K. Ng and H. Liu. Customer retention via data mining.

Artiﬁcial Intelligence Review, 14(6):569–590, 2000.

[12] E. W. Ngai, L. Xiu, and D. C. Chau. Application of

data mining techniques in customer relationship man-

agement: A literature review and classiﬁcation. Expert

systems with applications, 36(2):2592–2602, 2009.

[13] S. Rosset, E. Neumann, U. Eick, and N. Vatnik.

Customer lifetime value models for decision support.

Data mining and knowledge discovery, 7(3):321–339,

2003.

[14] S. J. Wright. Coordinate descent algorithms. Mathe-

matical Programming, 151(1):3–34, 2015.

Copyright © by SIAM

Unauthorized reproduction of this article is prohibited

143