Conference Paper

Data-driven multi-touch attribution models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In digital advertising, attribution is the problem of assigning credit to one or more advertisements for driving the user to the desirable actions such as making a purchase. Rather than giving all the credit to the last ad a user sees, multi-touch attribution allows more than one ads to get the credit based on their corresponding contributions. Multi-touch attribution is one of the most important problems in digital advertising, especially when multiple media channels, such as search, display, social, mobile and video are involved. Due to the lack of statistical framework and a viable modeling approach, true data-driven methodology does not exist today in the industry. While predictive modeling has been thoroughly researched in recent years in the digital advertising domain, the attribution problem focuses more on accurate and stable interpretation of the influence of each user interaction to the final user decision rather than just user classification. Traditional classification models fail to achieve those goals. In this paper, we first propose a bivariate metric, one measures the variability of the estimate, and the other measures the accuracy of classifying the positive and negative users. We then develop a bagged logistic regression model, which we show achieves a comparable classification accuracy as a usual logistic regression, but a much more stable estimate of individual advertising channel contributions. We also propose an intuitive and simple probabilistic model to directly quantify the attribution of different advertising channels. We then apply both the bagged logistic model and the probabilistic model to a real-world data set from a multi-channel advertising campaign for a well-known consumer software and services brand. The two models produce consistent general conclusions and thus offer useful cross-validation. The results of our attribution models also shed several important insights that have been validated by the advertising team. We have implemented the probabilistic model in the production advertising platform of the first author's company, and plan to implement the bagged logistic regression in the next product release. We believe availability of such data-driven multi-touch attribution metric and models is a break-through in the digital advertising industry.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To better allocate individual contributions and synergies, algorithm-based methodologies have increasingly attracted attention over the past decade (Abhishek et al., 2012;Anderl et al., 2013;Dalessandro et al., 2012;Li and Kannan, 2013;Shao and Li, 2011;Tucker, 2012;Wiesel et al., 2011). Shao and Li (2011) have proposed a "simple probabilistic model". ...
... To better allocate individual contributions and synergies, algorithm-based methodologies have increasingly attracted attention over the past decade (Abhishek et al., 2012;Anderl et al., 2013;Dalessandro et al., 2012;Li and Kannan, 2013;Shao and Li, 2011;Tucker, 2012;Wiesel et al., 2011). Shao and Li (2011) have proposed a "simple probabilistic model". It examines, in addition to each individual channel's effect, the joint effect of all possible combinations of 2, 3 or more channels. ...
... It examines, in addition to each individual channel's effect, the joint effect of all possible combinations of 2, 3 or more channels. Subsequently, Dalessandro et al. (2012) have developed a channel importance attribution method as a generalization of Shao and Li (2011). This method analyzes channel interactions up to full order, the so-called joint effects of all advertising channels. ...
Preprint
This paper examines and proposes several attribution modeling methods that quantify how revenue should be attributed to online advertising inputs. We adopt and further develop relative importance method, which is based on regression models that have been extensively studied and utilized to investigate the relationship between advertising efforts and market reaction (revenue). Relative importance method aims at decomposing and allocating marginal contributions to the coefficient of determination (R^2) of regression models as attribution values. In particular, we adopt two alternative submethods to perform this decomposition: dominance analysis and relative weight analysis. Moreover, we demonstrate an extension of the decomposition methods from standard linear model to additive model. We claim that our new approaches are more flexible and accurate in modeling the underlying relationship and calculating the attribution values. We use simulation examples to demonstrate the superior performance of our new approaches over traditional methods. We further illustrate the value of our proposed approaches using a real advertising campaign dataset.
... Artificial intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day-to-day life [28]. Nowadays, instead of attributing the ad touchpoints by heuristic rules [6], datadriven methods [3,8,13,21,23,30] which estimate the attribution credits according to the historical data have become the mainstream techniques. These methods learn a conversion prediction model with all observed historical data and then generate the counterfactual ad journeys by removing or replacing some touchpoints. ...
... To overcome these drawbacks, researchers proposed data-driven methods. The data-driven MTA model was first proposed in [23], and has been combined with survival analysis [32] and hazard rate [13] to reflect the influence of ad exposure. However, the data-driven methods mentioned above neglect the customers' features and cannot directly allocate personalized attribution. ...
... In our experiments, CausalMTA is compared with 8 baseline methods which can be divided into three categories, i.e., statistical learning-based methods, deep learningbased methods, and causal learning-based methods. The statistical learning-based methods consist of three methods, i.e., Logistic Regression [23] (LR), Simple Probabilistic [8] (SP), and Additive Hazard [32] (AH). Deep learning-based methods contain three methods, i.e., DNAMTA [3], DARNN [21], and DeepMTA [30]. ...
... In recent year, several data-driven attribution models have been proposed in computational advertising (Shao and Li 2011;Dalessandro et al. 2012;Zhang, Wei, and Ren 2014) and marketing analytics (Xu, Duan, and Whinston 2014;Wooff and Anderson 2015). However, these existing models only consider either the time-independent conversion rate of a user or the conversion time. ...
... In the domain of computational advertising, some recent researches have been devoted to the study of MTA for ad conversions through data-driven approaches. A bagged logistic regression method was proposed to predict the conversion rate based on the viewed ads of a user (Shao and Li 2011), which is the first study in this field. This approach characterize the user journey with the counts of ad exposures and uses the weights to measure the credits of different channels. ...
... A proportional hazard model was used to predict the conversion time based on the viewed ads of users (Manchanda et al. 2006). It is similar to the logistic regression method (Shao and Li 2011) and the difference is: this one aims at the conversion time, but Shao's model aims at the conversion rate. Inspired by first-touch attribution and last-touch attribution, Wooff et al. used beta distribution to model the influence of an ad exposure, which attributes most credit to the first ad and the last ad (Wooff and Anderson 2015). ...
Article
Multi-Touch Attribution studies the effects of various types of online advertisements on purchase conversions. It is a very important problem in computational advertising, as it allows marketers to assign credits for conversions to different advertising channels and optimize advertising campaigns. In this paper, we propose an additional multi-touch attribution model (AMTA) based on two obvious assumptions: (1) the effect of an ad exposure is fading with time and (2) the effects of ad exposures on the browsing path of a user are additive.AMTA borrows the techniques from survival analysis and uses the hazard rate to measure the influence of an ad exposure. In addition, we both take the conversion time and the intrinsic conversion rate of users into consideration.Experimental results on a large real-world advertising dataset illustrate that the our proposed method is superior to state-of-the-art techniques in conversion rate prediction and the credit allocation based on AMTA is reasonable.
... Exposing to advertisements in different channels helps customers make a purchase, subscribe to the service or take another advertised activity (Fulgoni, 2016). Campaigns having several digital channels increase sales more than campaigns with only one channel (van der Veen and van Ossenbruggen, 2015). Customers visit a website several times through different channels before making a purchase (Li and Kannan, 2014). ...
... Popular attribution models like last-click or first-click do not take into account the influence of other touchpoints visited in a customer journey (Wiesel et al., 2011). Previous studies provide insights on performance of individual touchpoints, but not on how different channels should be credited when working together (Kireyev et al., 2016) Other studies confirmed that customers visit multiple touchpoints before conversion and those touchpoints affect likelihood of purchase in a different way (Shao and Li, 2011). The multi-channel attribution approach assumes that each touchpoint contributes to sales and thus it is important to measure true influence of each touchpoint on conversion when those touchpoints work together (Lovett, 2009;Wiesel et al., 2011). ...
... Academic studies applied various methods for attribution modelling in advertising: game theory (Dalessandro et al., 2012), logistic regression (Shao and Li, 2011), Bayesian method, Markov chains (Abhishek et al., 2012;Anderl et al., 2016), Shapley value (Berman, 2014) and others. These studies have discovered that alternative methods produce channels contributions, which are different from GA's 'last-Interaction model'. ...
... Exposing to advertisements in different channels helps customers make a purchase, subscribe to the service or take another advertised activity (Fulgoni, 2016). Campaigns having several digital channels increase sales more than campaigns with only one channel (van der Veen and van Ossenbruggen, 2015). Customers visit a website several times through different channels before making a purchase (Li and Kannan, 2014). ...
... Popular attribution models like last-click or first-click do not take into account the influence of other touchpoints visited in a customer journey (Wiesel et al., 2011). Previous studies provide insights on performance of individual touchpoints, but not on how different channels should be credited when working together (Kireyev et al., 2016) Other studies confirmed that customers visit multiple touchpoints before conversion and those touchpoints affect likelihood of purchase in a different way (Shao and Li, 2011). The multi-channel attribution approach assumes that each touchpoint contributes to sales and thus it is important to measure true influence of each touchpoint on conversion when those touchpoints work together (Lovett, 2009;Wiesel et al., 2011). ...
... Academic studies applied various methods for attribution modelling in advertising: game theory (Dalessandro et al., 2012), logistic regression (Shao and Li, 2011), Bayesian method, Markov chains (Abhishek et al., 2012;Anderl et al., 2016), Shapley value (Berman, 2014) and others. These studies have discovered that alternative methods produce channels contributions, which are different from GA's 'last-Interaction model'. ...
... Nowadays, instead of attributing the ad touchpoints by heuristic rules (Berman 2018), data-driven methods (Shao and Li 2011;Dalessandro et al. 2012;Ji and Wang 2017;Ren and etc. 2018;Arava et al. 2018;Yang, Dyer, and Wang Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. ...
... To overcome these drawbacks, researchers proposed data-driven attribution methods. The data-driven MTA model was first proposed in (Shao and Li 2011), and has been combined with survival analysis (Zhang, Wei, and Ren 2014) and hazard rate (Ji and Wang 2017) to reflect the influence of ad exposure. However, the data-driven methods mentioned above neglect the customers' features and cannot directly allocate personalized attribution. ...
... • LR(Logistic Regression) model for ad attribution is proposed by Shao and Li(Shao and Li 2011), in which channel's attribution values are calculated as the learned coefficients. • SP(Simple Probabilistic) model calculates the conversion rate taking into the conversion probability from the observed data into account. ...
Preprint
Full-text available
Multi-touch attribution (MTA), aiming to estimate the contribution of each advertisement touchpoint in conversion journeys, is essential for budget allocation and automatically advertising. Existing methods first train a model to predict the conversion probability of the advertisement journeys with historical data and calculate the attribution of each touchpoint using counterfactual predictions. An assumption of these works is the conversion prediction model is unbiased, i.e., it can give accurate predictions on any randomly assigned journey, including both the factual and counterfactual ones. Nevertheless, this assumption does not always hold as the exposed advertisements are recommended according to user preferences. This confounding bias of users would lead to an out-of-distribution (OOD) problem in the counterfactual prediction and cause concept drift in attribution. In this paper, we define the causal MTA task and propose CausalMTA to eliminate the influence of user preferences. It systemically eliminates the confounding bias from both static and dynamic preferences to learn the conversion prediction model using historical data. We also provide a theoretical analysis to prove CausalMTA can learn an unbiased prediction model with sufficient data. Extensive experiments on both public datasets and the impression data in an e-commerce company show that CausalMTA not only achieves better prediction performance than the state-of-the-art method but also generates meaningful attribution credits across different advertising channels.
... 32). Shao & Li (2011) examined a massive advertising campaign data set, with data related to search, display, social, email, and video channels. The research is also one of the first to propose a more accurate data-driven method that considers all the data available (multi-touch). ...
... As Li & Kannan (2014), the study concentrated not only on advertising but also on other digital channels: social network, referral, paid search, organic search, email, display ad, and direct. The work of Zhao et al. (2019) resembles the study of Shao & Li (2011), evaluating different regression models in various advertising channels. One of the specific contributions of Zhao et al. (2019) is the proposal of the use of partially linear additive models. ...
... Preceding studies argue that the last-click model provides "biased insights" to marketing practitioners, mainly due to completely ignoring the previous touchpoints in the journey (de Haan et al., 2016;Shao & Li, 2011). Metrics provided by a linear model and weighted models (as time-touch and position-touch) consider all touchpoints of the journey that leads to a conversion. ...
Conference Paper
Full-text available
This study presents a data-driven attribution model applied in the context of HECJ, employing a novelty technique based on panel data from online and offline channels, including detailed data on social media engagement. It aims to contribute to the extent knowledge in attribution models applied in the Higher Education customer journey (HECJ). Throughout a study case at a Brazilian HEI, a total of 185,631 customer journeys were organized into two data sets corresponding to Covid19 pre-pandemic and pandemic periods. The data are modeled in a graph-based attribution model. The research finds that channels as emails, online chat, call center, sales, and inbound marketing are driving more than 70% of conversions. Instagram, sales promotion, online advertising, and instant messages grew 38% in the pandemic period. The HECJ gets longer, from 3.8 months in the pre-pandemic to 7.8 months on average during the pandemic period. This research provides a practical guide based on the proposed model application that permits a more accurate evaluation of marketing channels in the context of HECJ.
... programmatic native or display [19]. Multitouch attribution modeling attempts to address this problem [22], including theoretical modeling approaches [5,23]. However, these methods generally assign credit to every exposed and converting user while ignoring the counterfactual response of the user without ad exposures. ...
... Measuring advertising channel incrementality has been previously addressed as an online conversion attribution problem. Shao and Li approached multi-touch attribution as a feature importance machine learning problem regardless of the nature of the touch point [22]. By addressing the fundamental difference of touch points between user-initiated and firm-generated Li and Kannan developed an econometric based model to attribution [19]. ...
... Last-touch attribution has been the standard value attribution in online advertiser for more than a decade, despite numerous studies highlighting its issues [5,22,23]. Scientific evidence has shown that last-touch attribution based on viewed impressions over-estimates the value of display advertising [18]. As a result, industry has adopted a click-to-conversion attribution model (C2C) in an attempt to diminish this over-estimation. ...
Conference Paper
Full-text available
Measuring the incremental value of advertising (incrementality) is critical for financial planning and budget allocation by advertisers. Running randomized controlled experiments is the gold standard in marketing incrementality measurement. Current literature and industry practices to run incrementality experiments focus on running placebo, intention-to-treat (ITT), or ghost bidding based experiments. A fundamental challenge with these is that the serving engine as treatment administrator is not blind to the user treatment assignment. Similarly, ITT and ghost bidding solutions provide greatly decreased precision since many experiment users never see ads. We present a novel randomized design solution for incremen-tality testing based on ghost bidding with improved measurement precision. Our design provides faster and cheaper results including double-blind, to the users and to the serving engine, post-auction experiment execution without ad targeting bias. We also identify ghost impressions in open ad exchanges by matching the bidding values or ads sent to external auctions with held-out bid values. This design leads to larger precision than ITT or current ghost bidding solutions. Our proposed design has been fully deployed in a real production system within a commercial programmatic ad network combined with a Demand Side Platform (DSP) that places ad bids in third-party ad exchanges. We have found reductions of up to 85% of the advertiser budget to reach statistical significance with typical ghost bids conversion and winner rates. Moreover, the highest statistical power at 50% control size design of this current practice is reached at 8% of our proposed design. By deploying this design, for an advertiser in the insurance industry, to measure the incrementality of display and native programmatic advertising, we have found conclusive evidence that the last-touch attribution framework (current industry standard) undervalues these channels by 87% when compared to the incremental conversions derived from the experiment.
... Multi-touch attribution distributes the total value of the occurred conversion across several touchpoints within the observed customer journey (Shao & Li, 2011;Wooff & Anderson, 2015). Earlier studies discuss it in the context of a solely digital environment, which is related to the earlier availability of the digital data in comparison to offline touchpoints. ...
... Earlier studies discuss it in the context of a solely digital environment, which is related to the earlier availability of the digital data in comparison to offline touchpoints. Currently, multi-touch attribution summarises a variety of methods from the point of view of applied metrics, accounted channels and applied formulas of weight allocation to the touchpoints (Shao & Li, 2011;Wooff & Anderson, 2015). This capability leads to occasional confusion and the interchangeable use of the notion 'multi-touch' and the names of the specific cases of attribution, such as 'weighted attribution' (Wooff & Anderson, 2015). ...
... The application of Big Data enables the dynamic elaboration and adjustments of the standardised principles, leading to sophisticated mathematic models to be introduced (Larson & Chang, 2016). It can utilise linear or logistic regressions (Shao & Li, 2011;Wiesel et al., 2011) and incorporate analytical tools such as machine learning (Abhishek et al., 2012;Li & Kannan, 2014) and cooperative game theory to generate results (Abakus, 2013;Berman, 2018). The dependence of such methods on both data and sophisticated computational methods often leads to the interchangeable application of the terms 'algorithmic' and, sometimes, 'data-driven' attribution. ...
Article
The integration of technology in business strategy increases the complexity of marketing communications and urges the need for advanced marketing performance analytics. Rapid advancements in marketing attribution methods created gaps in the systematic description of the methods and explanation of their capabilities. This paper contrasts theoretically elaborated facilitators and the capabilities of data-driven analytics against the empirically identified classes of marketing attribution. It proposes a novel taxonomy, which serves as a tool for systematic naming and describing marketing attribution methods. The findings allow to reflect on the contemporary attribution methods’ capabilities to account for the specifics of the customer journey, thereby, creating currently lacking theoretical backbone for advancing the accuracy of value attribution.
... Online advertising is an effective way for advertisers to reach their targeted audiences and drive conversions. Compared to a single ad exposure, sequential advertising [27] has a higher chance of cultivating consumers' awareness, interest and driving purchases in several steps through multiple scenarios. Fig. 1 shows an example of sequential advertising on a Gaming chair in two scenarios. ...
... To overcome these difficulties, there are several related works. Interpretable methods like multi-touch attribution (MTA) [12,27] focus on assigning credits to the previously displayed ads before the conversion, but they usually do not provide future strategy optimization. Performance-oriented methods such as deep reinforcement learning (DRL) usually aggregate the consumer's historical behaviors as an input of a black-box neural network and obtain the advertising action directly from the output of the network [4,5,7,9,13]. ...
... However, these methods are usually evaluated with simple tasks and impractical for realistic applications. Although MTA methods [12,27] know how each exposure contributes to the conversion, they do not model user latent states and cannot support online inference; thus, they cannot directly solve our problem. ...
Conference Paper
Full-text available
To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy.In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.
... Online advertising is an effective way for advertisers to reach their targeted audiences and drive conversions. Compared to a single ad exposure, sequential advertising [27] has a higher chance of cultivating consumers' awareness, interest and driving purchases in several steps through multiple scenarios. Fig. 1 shows an example of sequential advertising on a Gaming chair in two scenarios. ...
... To overcome these difficulties, there are several related works. Interpretable methods like multi-touch attribution (MTA) [12,27] focus on assigning credits to the previously displayed ads before the conversion, but they usually do not provide future strategy optimization. Performance-oriented methods such as deep reinforcement learning (DRL) usually aggregate the consumer's historical behaviors as an input of a black-box neural network and obtain the advertising action directly from the output of the network [4,5,7,9,13]. ...
... However, these methods are usually evaluated with simple tasks and impractical for realistic applications. Although MTA methods [12,27] know how each exposure contributes to the conversion, they do not model user latent states and cannot support online inference; thus, they cannot directly solve our problem. ...
Preprint
Full-text available
To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy. In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.
... The main novelty of previous work in this category comes from the following two perspectives. The majority of research focuses on proposing new models to describe user behavior (e.g., Shao & Li 2011, Breuer et al. 2011, Danaher & van Heerde 2018, Xu et al. 2014, Zhao et al. 2019, while the others focus on studying the attribution scoring methods such as the justification and fairness of Sharpley value (e.g., Dalessandro et al. 2012, Singal et al. 2022). ...
... As a result, the change in the conversion probability is also known as the removal effect. In the past decade, researchers have developed a variety of models to describe consumer behavior, such as regression models (e.g., Shao & Li 2011, Breuer et al. 2011, Danaher & van Heerde 2018, Zhao et al. 2019, Markov models (Yang & Ghose 2010, Anderl et al. 2016, Berman 2018, Kakalejčík et al. 2018, Bayesian models (e.g., Li & Kannan 2014), time series models (Kireyev et al. 2016, De Haan et al. 2016, survival theory-based models (Zhang et al. 2014, Ji et al. 2016, deep learning models (Li et al. 2018, Kumar et al. 2020, and so on. The main novelty of previous work in this line comes from modeling user behavior. ...
Preprint
Marketers employ various online advertising channels to reach customers, and they are particularly interested in attribution for measuring the degree to which individual touchpoints contribute to an eventual conversion. The availability of individual customer-level path-to-purchase data and the increasing number of online marketing channels and types of touchpoints bring new challenges to this fundamental problem. We aim to tackle the attribution problem with finer granularity by conducting attribution at the path level. To this end, we develop a novel graphical point process framework to study the direct conversion effects and the full relational structure among numerous types of touchpoints simultaneously. Utilizing the temporal point process of conversion and the graphical structure, we further propose graphical attribution methods to allocate proper path-level conversion credit, called the attribution score, to individual touchpoints or corresponding channels for each customer's path to purchase. Our proposed attribution methods consider the attribution score as the removal effect, and we use the rigorous probabilistic definition to derive two types of removal effects. We examine the performance of our proposed methods in extensive simulation studies and compare their performance with commonly used attribution models. We also demonstrate the performance of the proposed methods in a real-world attribution application.
... Customer journey mapping improves these interactions, resulting in increased sales [13]. Scholars have proposed a variety of frameworks, including logistic regression models [14], game theory-based approaches [15], Bayesian models [16], mutually exciting point process models [17], VAR models [18], and hidden Markov models [19]. However, despite its great potential, our knowledge of predicting online consumer behavior using machine learning is not sufficient. ...
... In the marketing literature, the conversion funnel model has been studied as an important theory for understanding consumer decision-making and behavior [7][8][9], and is used as a core framework for marketing decision-making [3]. Researchers have tried to understand consumer behavior through various transformations in the basic funnel model structure such as attention, interest, decision, and purchase [14]. In recent years, online data has been used to more easily identify the funnel stage of online consumers compared to offline [3,28]. ...
Article
Full-text available
Machine learning technology is recently being applied to various fields. However, in the field of online consumer conversion, research is limited despite the high possibility of machine learning application due to the availability of big data. In this context, we investigate the following three research questions. First, what is the suitable machine learning model for predicting online consumer behavior? Second, what is the good data sampling method for predicting online con-sumer behavior? Third, can we interpret machine learning’s online consumer behavior prediction results? We analyze 374,749 online consumer behavior data from Google Merchandise Store, an online shopping mall, and explore research questions. As a result of the empirical analysis, the performance of the ensemble model eXtreme Gradient Boosting model is most suitable for pre-dicting purchase conversion of online consumers, and oversampling is the best method to mitigate data imbalance bias. In addition, by applying explainable artificial intelligence methods to the context of retargeting advertisements, we investigate which consumers are effective in retargeting advertisements. This study theoretically contributes to the marketing and machine learning lit-erature by exploring and answering the problems that arise when applying machine learning models to predicting online consumer conversion. It also contributes to the online advertising literature by exploring consumer characteristics that are effective for retargeting advertisements.
... The root cause of this conflict is that the industry-standard attribution model does not attribute actions fairly to the DSPs. Researchers have pointed out that an action should be attributed to multiple touch points in a data driven fashion (Shao and Li 2011) or based on causal lift (Dalessandro et al. 2012b). In the Attribution and Bidding section, we discuss the relationship between attribution models and bidding strategies. ...
... So they still bid based on the absolute AR. Several data-driven multi-touch attribution methods have been proposed in recent years (Shao and Li 2011;Dalessandro et al. 2012b;Wooff and Anderson 2014). Our focus in this paper is more on illustrating the relationship between bidding strategies and attribution models than on a specific attribution method. ...
Article
Real-time bidding has become one of the largest online advertising markets in the world. Today the bid price per ad impression is typically decided by the expected value of how it can lead to a desired action event to the advertiser. However, this industry standard approach to decide the bid price does not consider the actual effect of the ad shown to the user, which should be measured based on the performance lift among users who have been or have not been exposed to a certain treatment of ads. In this paper, we propose a new bidding strategy and prove that if the bid price is decided based on the performance lift rather than absolute performance value, advertisers can actually gain more action events. We describe the modeling methodology to predict the performance lift and demonstrate the actual performance gain through blind A/B test with real ad campaigns. We also show that to move the demand-side platforms to bid based on performance lift, they should be rewarded based on the relative performance lift they contribute.
... It minimizes the bias in the touchpoint as well as the time gap. [3] The probabilistic logistic regression model applied on the bivariate data: Positive converted and Negative converted user from the multi-channel advertising company and got the accuracy for each individual channel and quantified the attribution of each channel. [4] The Probabilistic Multi-Touch Attribution (PMTA) model applied on the large real world data set. ...
... While implementing the Random Forest model, the different parameters are taken into consideration like different estimators(10,25,50,100), different criterion(Gini, entropy), maximum depth, minimum sample split, minimum impurity decrease etc. While implementing the K Neighbors Classifier model, the different parameters are taken into consideration like no of neighbors (2,3,5), different weights(uniform, distance), different algorithms to compute neighbours like (auto, ball_tree, kd_tree, brute) and other parameters like leaf size, metric etc. While implementing the SVM Classifier model, the different parameters are taken into consideration like different kernels(linear, poly, rbf', sigmoid, precomputed), different gamma(scale, auto), cache size, class weight etc. ...
... An alternative approach is to observe a corpus of data on touchpoints and conversions, then train a model to determine how much weight to assign each touchpoint. Shao and Li (2011) [12] develop a multi-touch attribution model of this type. They employ a bagged logistic regression approach in which they train sub-models on subsets of the data, validate misclassification rates against holdout data, then aggregate up into a final model. ...
... An alternative approach is to observe a corpus of data on touchpoints and conversions, then train a model to determine how much weight to assign each touchpoint. Shao and Li (2011) [12] develop a multi-touch attribution model of this type. They employ a bagged logistic regression approach in which they train sub-models on subsets of the data, validate misclassification rates against holdout data, then aggregate up into a final model. ...
Preprint
Full-text available
Subscription services face a difficult problem when estimating the causal impact of content launches on acquisition. Customers buy subscriptions, not individual pieces of content, and once subscribed they may consume many pieces of content in addition to the one(s) that drew them to the service. In this paper, we propose a scalable methodology to estimate the incremental acquisition impact of content launches in a subscription business model when randomized experimentation is not feasible. Our approach uses simple assumptions to transform the problem into an equivalent question: what is the expected consumption rate for new subscribers who did not join due to the content launch? We estimate this counterfactual rate using the consumption rate of new subscribers who joined just prior to launch, while making adjustments for variation related to subscriber attributes, the in-product experience, and seasonality. We then compare our counterfactual consumption to the actual rate in order to back out an acquisition estimate. Our methodology provides top-line impact estimates at the content / day / region grain. Additionally, to enable subscriber-level attribution, we present an algorithm that assigns specific individual accounts to add up to the top-line estimate. Subscriber-level attribution is derived by solving an optimization problem to minimize the number of subscribers attributed to more than one piece of content, while maximizing the average propensity to be incremental for subscribers attributed to each piece of content. Finally, in the absence of definitive ground truth, we present several validation methods which can be used to assess the plausibility of impact estimates generated by these methods.
... Several data-driven approaches are proposed in literature. In Shao and Li (2011) the authors propose a logistic regression method to predict the conversion rate with respect to advertisement occurrences. The authors in Zhang et al. (2014) propose data-driven MTA with survival theory, but do not personalise MTA since they neglect user characteristics. ...
... • Logistic regression (LR) approach presented in Shao and Li (2011), where attributions of each channel is computed using logistic regression. ...
Preprint
Advertising channels have evolved from conventional print media, billboards and radio advertising to online digital advertising (ad), where the users are exposed to a sequence of ad campaigns via social networks, display ads, search etc. While advertisers revisit the design of ad campaigns to concurrently serve the requirements emerging out of new ad channels, it is also critical for advertisers to estimate the contribution from touch-points (view, clicks, converts) on different channels, based on the sequence of customer actions. This process of contribution measurement is often referred to as multi-touch attribution (MTA). In this work, we propose CAMTA, a novel deep recurrent neural network architecture which is a casual attribution mechanism for user-personalised MTA in the context of observational data. CAMTA minimizes the selection bias in channel assignment across time-steps and touchpoints. Furthermore, it utilizes the users' pre-conversion actions in a principled way in order to predict pre-channel attribution. To quantitatively benchmark the proposed MTA model, we employ the real world Criteo dataset and demonstrate the superior performance of CAMTA with respect to prediction accuracy as compared to several baselines. In addition, we provide results for budget allocation and user-behaviour modelling on the predicted channel attribution.
... Our research is inspired by the multi-touch attribution (MTA) literature (Shao and Li 2011, Ji et al. 2016, Li et al. 2018. This stream of work aims at designing machine learning algorithms that assign fractional credits to different touchpoints from the advertiser's point of view. ...
... Our notion of marginality is temporal, i.e. we are quantifying the marginal contribution with respect to the previous actions. This temporal marginality approach is in line with several works on the additivity effect of ads (Shao and Li 2011, Ji and Wang 2017, Zhang et al. 2014. This is different than a counterfactual marginality, used for instance in Danaher (2018), Anderl et al. (2016), which quantifies the marginal contribution with respect to an alternative action. ...
Preprint
Full-text available
Attribution-the mechanism that assigns conversion credits to individual marketing interactions-is one of the foremost digital marketing research topic. Indeed, attribution is used everywhere in online marketing whether for budget decisions or for the design of algorithmic bidders that the advertiser relies on to buy inventory. Still, an attribution definition over which everyone in the marketing community agrees upon is yet to be found. In this paper, we focus on the problem faced by a bidder who is subject to an attribution black box decided by the advertiser and needs to convert it into a bid. This naturally introduces a second level of attribution, performed internally by the bidder. We first formalize an often implicitly used solution concept for this task. Second, we characterize the solution of this problem which we call the core internal attribution. Moreover, we show it can be computed using a fixed-point method. Interestingly, the core attribution is based on a notion of marginality that differs from previously used definitions such as the counterfactual marginality.
... At the same time, the marketing industry has developed sophisticated tools to measure the impact of online advertising on the consumer journey and, fi nally, empowered marketers, who can now collect data about user online behavior in almost real time (Hanssens & Pauwels, 2016;Wedel & Kannan, 2016). Conversion attribution models allow marketers to assign the impact of particular advertising activities to marketing campaign goals (Shao & Li, 2011;Danaher & van Heerde, 2018). Currently, more money is spent on online advertising than on TV, radio, and press put together (Molla, 2018). ...
... Marketers seek and prefer solutions that allow the creation of daily reports which are based on day-to-day budget management (Shao & Li, 2011). Dalessandro et al. (2012) state that proper conversion attribution models must be: • fair-all channels must be taken under consideration and show a proper impact on the fi nal conversion, • data-driven-a valuable conversion attribution model should be designed for advertising campaign goals and assess both consumer reaction to advertisements and data on conversions from the campaign, • interpretable-it should be widely accepted by practitioners involved in the marketing industry; acceptance should arise on the basis of the gained metrics and an intuitive understanding of model rules. ...
Article
Full-text available
Marketers are currently focused on proper budget allocation to maximize ROI from online advertising. They use conversion attribution models assessing the impact of specific media channels (display, search engine ads, social media, etc.). Marketers use the data gathered from paid, owned, and earned media and do not take into consideration customer activities in category media, which are covered by the OPEC (owned, paid, earned, category) media model that the author of this paper proposes. The aim of this article is to provide a comprehensive review of the scientific literature related to the topic of marketing attribution for the period of 2010-2019 and to present the theoretical implications of not including the data from category media in marketers' analyses of conversion attribution. The results of the review and the analysis provide information about the development of the subject, the popularity of particular conversion attribution models, the ideas of how to overcome obstacles that result from data being absent from analyses. Also, a direction for further research on online consumer behavior is presented.
... Attribution credit in these models is based on the change in conversion probabilities when a channel is removed from the chain. [4] and [11] are similar in depending only on the sequence of ad events, but they each fit logistic regression models for a binary conversion outcome. They then use Shapley values to distribute attribution credit, using the probability of a conversion as the value function in the Shapley algorithm. ...
... Other methodologies, such as the popular Shapley value method used in [4,11,6], split this synergy evenly amongst the ads involved. Shapley values come from game theory and try to fairly divide the payoff (increase in intensity) for a coalition (group of ads) amongst the players (individual ads). ...
Preprint
Full-text available
Multi-touch attribution (MTA) estimates the relative contributions of the multiple ads a user may see prior to any observed conversions. Increasingly, advertisers also want to base budget and bidding decisions on these attributions, spending more on ads that drive more conversions. We describe two requirements for an MTA system to be suitable for this application: First, it must be able to handle continuously updated and incomplete data. Second, it must be sufficiently flexible to capture that an ad's effect will change over time. We describe an MTA system, consisting of a model for user conversion behavior and a credit assignment algorithm, that satisfies these requirements. Our model for user conversion behavior treats conversions as occurrences in an inhomogeneous Poisson process, while our attribution algorithm is based on iteratively removing the last ad in the path.
... These models offer significant advantages in understanding channel contributions but are increasingly limited by gaps in path data caused by stricter data privacy regulations and restrictions on third-party tracking (Romero Leguina et al. 2020). In response, researchers have proposed alternative solutions, such as logistic regression models for predictive accuracy (Shao and Li 2011), causal methods leveraging observational data (Dalessandro et al. 2012) and multilevel models to analyse semi-continuous data over extended periods (Blozis 2022). Despite these advancements, the quest for robust attribution methodologies continues, as businesses grapple with the complexities of measuring the effectiveness of marketing efforts in an evolving digital landscape. ...
Article
Full-text available
This study addresses a fundamental challenge in digital advertising: how performance-focused campaigns, traditionally aimed at lower-funnel activity, may influence brand awareness. With the growth of online advertising, businesses increasingly rely on performance campaigns across social media, search engine marketing and display networks to drive results. However, a notable phenomenon has emerged. Businesses may observe substantial branded search activity, suggesting lower-funnel campaigns could indirectly shape brand awareness. To investigate this relationship, data were analysed from an e-commerce business operating in five European countries with an annual multimillion-euro lower-funnel budget. We employed a ridge regression model with adstock values to account for temporal advertising effects and address common challenges of multicollinearity in multichannel analysis. Our analysis reveals that lower-funnel campaigns significantly drive branded search activity, suggesting they play an unacknowledged role in building brand awareness. The incorporation of adstock values into ridge regression improves model performance, increasing the R² from 0.658 to 0.768 and lowering root mean square. This approach stabilizes residual variance and provides a more accurate representation of advertising’s fading effects over time. These findings offer a replicable framework for practitioners to measure the indirect branding effects of performance campaigns and optimise cross-channel investments.
... Data-driven models are highly adaptable and can provide personalized insights, though they require large datasets and computational power. [5] C. Machine Learning and Advanced Algorithms in MTA: Machine learning algorithms have been increasingly used to enhance the accuracy and scalability of MTA models. Techniques such as logistic regression, Markov chain models, and neural networks are particularly effective for handling the non-linear and complex nature of customer journeys. ...
Article
Multi-touch attribution (MTA) offers a sophisticated, data-driven approach to distributing credit across touchpoints within the customer journey [1] [2], thereby enabling more precise insights into channel and campaign performance. However, implementing MTA at scale for a global enterprise's sales and marketing team presents unique challenges, including the need to integrate disparate data sources, handle vast data volumes, apply advanced attribution models, and provide insights in real-time. This paper presents the design and implementation of a scalable MTA framework tailored to the needs of a global sales and marketing team, leveraging cloud-native infrastructure, big data processing, and machine learning algorithms to achieve accurate and actionable insights at scale. The proposed MTA solution addresses data heterogeneity by creating a unified data pipeline that integrates structured and unstructured data from CRM systems, web analytics, social media, paid media platforms, and offline sources. Keywords: Multi-touch attribution, sales and marketing, data integration, machine learning, cloud computing, enterprise analytics.
... More recent attribution models are data driven. These models leverage advanced analytics and machine learning algorithms to dynamically allocate credit based on patterns and trends in the data (Shao & Li, 2011). A significant challenge in attribution is the issue of cross-platform and cross-device tracking (Brookman et al., 2017). ...
Article
Full-text available
Marketing professionals and business owners strive to evaluate the effectiveness of their marketing investments. With multiple marketing channels at their disposal, understanding how these channels interact and influence each other is crucial. Digital analytics tools, such as Google Analytics, tend to measure the isolated success of each marketing channel. However, the intertwined effects and interdependencies between channels are often undervalued. This study, therefore, ventures into this territory. It focuses on the association between website traffic from various digital marketing channels and the purchases made by users visiting websites through direct traffic sources. We analyzed 89,394 purchases from an e-commerce business in Europe. We conclude that three marketing channels can explain 61% of the variance. By shedding light on this overlooked aspect, we aim to guide advertisers toward a more holistic understanding of digital marketing channels.
... To quantify the impact of various marketing touchpoints on the first purchase decision, we propose a two-stage attribution model: • Shapley Value Attribution: Implement a game theory-based approach to fairly allocate credit among marketing touchpoints, accounting for all possible combinations of interactions [10]. • Time-Decay Adjustment: Apply a time-decay factor to the Shapley values to account for the recency of interactions, based on the assumption that more recent touchpoints have a stronger influence on the final purchase decision [11]. ...
Article
In the era of digital commerce, understanding the consumer’s journey to their first purchase is crucial for businesses seeking to optimize their marketing strategies and improve conversion rates. This paper presents a comprehensive statistical framework for modeling and analyzing the consumer’s decision pathway in online marketplaces.
... Attribution Modeling As marketing activities expand across online and offline channels, accurate attribution of their impact on outcomes like conversions becomes increasingly important (Farahat & Cunningham, 2012). Predictive algorithms including Last-Click, Time-Decay and Markov chain models analyze past customer journey data and experimentally assign credit to the multiple touchpoints influencing conversions (Shao & Li, 2011). Industry research shows these machine learning attribution techniques more accurately partition effects versus manual heuristics, commonly outperforming on evaluation metrics such as Adjusted R-Squared and log-loss (He et al., 2014). ...
Article
Full-text available
In the modern business landscape, data and analytics are playing an increasingly pivotal role in decision making across all sectors. Small and medium sized enterprises (SMEs) constitute a major portion of the economic fabric in the United States, yet many struggle with limited resources and an inability to leverage insights from customer and market data at their disposal. This study seeks to explore how SMEs operating in various industries across the US can optimize their marketing strategies through the application of predictive analytics techniques. By focusing on identifying patterns and trends in structured and unstructured data related to areas such as customer behavior, competitors, and industry shifts, SMEs stand to gain actionable recommendations for improving key metrics like sales, customer retention, and profitability. The findings of this research have implications for SME leadership teams seeking data-driven approaches to gain competitive advantage in a digital era defined by information abundance.
... The use of ML in marketing has been fertile over the last decades, given the wealth of data naturally produced by marketing and the capacity of ML tools/methods to transform data into information that can be used for marketing optimization and decision making. For instance, ML solutions have been proposed and/or applied to multiple domains, for instance: customer segmentation in industries such as retail [13,31], hospitality [1,41] and banking [32,42]; customer life time value (LTV) [18] in industries ranging from gaming [11], to e-commerce [21], to subscription services [27]; marketing attribution, in both multi-touch attribution [37] and marketing mix modelling [22]; the optimization of dierent marketing channels, such as email [30,38], paid search bidding [15,17], display advertising [35,39], among others. For a comprehensive review of the use of ML in marketing, we recommend the work presented in [25,28]. ...
Preprint
Full-text available
Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels. We describe key modules of the NOAH framework, including prediction, optimization, and adaptive heuristics, providing examples for bidding and content optimization. We then detail the successful application of NOAH to LinkedIn's email marketing system, showcasing significant wins over the legacy ranking system. Additionally, we share details and insights that are broadly useful, particularly on: (i) addressing delayed feedback with lifetime value, (ii) performing large-scale linear programming with randomization, (iii) improving retrieval with audience expansion, (iv) reducing signal dilution in targeting tests, and (v) handling zero-inflated heavy-tail metrics in statistical testing.
... For context, mix modeling allows business to understand the contribution of marketing and various internal and external factors on their sales (or other business objective) at an aggregate level, typically a daily or weekly time series [19][20][21]. Attribution, in contrast, helps marketers understand the detailed interaction between digital media marketing and some conversion, such as sales [22][23][24][25][26][27][28][29][30]. While attribution is much more granular and precise, it requires data to be stitched to individual users, which is only possible for some digital media marketing and is becoming increasingly difficult with the removal of support for third party cookies. ...
Article
Artificial intelligence (AI) is widely deployed to solve problems related to marketing attribution and budget optimization. However, AI models can be quite complex, and it can be difficult to understand model workings and insights without extensive implementation teams. In principle, recently developed large language models (LLMs), like GPT-4, can be deployed to provide marketing insights, reducing the time and effort required to make critical decisions. In practice, there are substantial challenges that need to be overcome to reliably usesuch models. We focus on domain-specific question-answering, SQL generation needed for data retrieval, and tabular analysis and show how a combination of semantic search, prompt engineering, and fine-tuning can be applied to dramatically improve the ability of LLMs to execute these tasks accurately. We compare both proprietary models, like GPT-4, and open-source models, like Llama-2-70b, as well as various embedding methods. These models are tested on sample use cases specific to marketing mix modeling and attribution.
... Attribution credit in these models is based on the change in conversion probabilities when a channel is removed from the chain. Dalessandro et al. (2012) and Shao and Li (2011) are similar in depending only on the sequence of ad events, but they each fit logistic regression models for a binary conversion outcome. They then use Shapley values to distribute attribution credit, using the probability of a conversion as the value function in the Shapley algorithm. ...
Article
Full-text available
Multi-touch attribution (MTA) estimates the relative contributions of the multiple ads a user may see prior to any observed conversions. Increasingly, advertisers also want to base budget and bidding decisions on these attributions, spending more on ads that drive more conversions. We describe two requirements for an MTA system to be suitable for this application: First, it must be able to handle continuously updated and incomplete data. Second, it must be sufficiently flexible to capture that an ad’s effect will change over time. We describe an MTA system, consisting of a model for user conversion behavior and a credit assignment algorithm, that satisfies these requirements. Our model for user conversion behavior treats conversions as occurrences in an inhomogeneous Poisson process, while our attribution algorithm is based on iteratively removing the last ad in the path.
... To predict a certain customer action, researchers used historical interaction data to attempt to predict attribution. The authors (Shao and Li, 2011) suggest a bagged logistic regression model to find the best predictive fit model, and then apply basic probabilistic models (second-or higherorder models) to assign credit for the outcome. Later, Dalessandro et al. (2012), analogously, suggested a data-driven causal approach, which compares the outcome with a touchpoint to the outcome without it, all else being equal. ...
Conference Paper
Full-text available
Resumo Marketers face significant obstacles as technology evolves and gets more complex. Attribution is one of the most important research problems in the field of marketing and studying it may assist media investment optimization strategies, as well as understand customer behavior across channels and platforms. This article provides a historical bibliometric analysis of Multi-Channel Attribution (MCA). A thorough study of the 156 (one hundred and fifty-six) papers aided in identifying the state-of-the-art methodologies being explored by researchers, as well as the most relevant authors, sources, and nations. Additionally, co-citation analyses were conducted in order to establish the conceptual and intellectual network. A B S T R A C T Marketers face significant obstacles as technology evolves and gets more complex. Attribution is one of the most important research problems in the field of marketing and studying it may assist media investment optimization strategies, as well as understand customer behavior across channels and platforms. This article provides a historical bibliometric analysis of Multi-Channel Attribution (MCA). A thorough study of the 156 (one hundred and fifty-six) papers aided in identifying the state-of-the-art methodologies being explored by researchers, as well as the most relevant authors, sources, and nations. Additionally, co-citation analyses were conducted in order to establish the conceptual and intellectual network.
... В библиотеке MTA имеются модули, позволяющие провести анализ рекламных каналов с помощью таких моделей, как модели Shao&Li, которые предлагают несколько статистических моделей атрибуции ценности на основе заранее собранных данных: bagged logistic regression model и simple probabilistic model [6]. Кроме модели Shao&Li, авторами также был выбран для работы подход с использованием вектора Шепли [7]. ...
... The growth in digital marketing and the ease of data collection has opened up the possibility of building data-driven models for attribution. The first data driven approach to solving this problem [5] involved frequency based methods to compute the relative credits for different marketing channels. A causal framework for attribution was proposed in [6]. ...
Preprint
Full-text available
In a multi-channel marketing world, the purchase decision journey encounters many interactions (e.g., email, mobile notifications, display advertising, social media, and so on). These impressions have direct (main effects), as well as interactive influence on the final decision of the customer. To maximize conversions, a marketer needs to understand how each of these marketing efforts individually and collectively affect the customer's final decision. This insight will help her optimize the advertising budget over interacting marketing channels. This problem of interpreting the influence of various marketing channels to the customer's decision process is called marketing attribution. We propose a Bayesian model of marketing attribution that captures established modes of action of advertisements, including the direct effect of the ad, decay of the ad effect, interaction between ads, and customer heterogeneity. Our model allows us to incorporate information from customer's features and provides usable error bounds for parameters of interest, like the ad effect or the half-life of an ad. We apply our model on a real-world dataset and evaluate its performance against alternatives in simulations.
... For example, time decay (distributing credit based on the assumption that the further back it was interacted with, the less important it was) and linear attribution (the distribution of credit spread evenly across all channels). Shao & Li (2011) state that the goal of attribution is to pin-point the credit assignment of each positive user to one or more touchpoint and suggest that attribution modelling should be easy to interpret and be used to derive insights for businesses to optimise their marketing strategies. They also stress the criticality of choosing the right attribution model as it drives the performance metric, produces insights regarding advertising and helps to optimise marketing strategies. ...
Conference Paper
Full-text available
This paper presents an exploration of market attribution methods and the integration of user behaviour. Attribution is the measurement of interaction between marketing touchpoints and channels along the customer journey, improving customer insights and driving smarter business decisions. Improving the accuracy of attribution requires a deeper understanding of user behaviour, not just marketing channel credit assignment. Evidence has been provided regarding the problems in the standardized approach to behavioural modelling and alternatives have been presented. The study explores data provided by a British based jewellery company with an investigation into pre-existing data features that can aid with the analysis of user behaviour. The study contains over 10 million rows collected over 2 years and presents the initial findings made in the first 15 months of a PhD study.
... Early methods to assign credit to various channels for conversions are found in the computer science literature. For example, Shao and Li (2011) use classification methods to determine whether consumers are going to convert as a result of a specific promotional campaign executed 1 https://gdpr-info.eu; https://oag.ca.gov/privacy/ccpa; https://iapp.org/resources/article/the-california-privacy-rights-act-of-2020/ 2 https://www.forbes.com/sites/kateoflahertyuk/2021/04/26/ios-145-is-available-now-with-this-stunning-new-privacy-feature/ ...
Article
In today’s online environment, consumers and sellers interact through multiple channels such as email, search engines, banner ads, affiliate websites and comparison-shopping websites. In this paper, we investigate whether knowing the history of channels the consumer has used until a point of time is predictive of their future visit patterns and purchase conversions. We propose a model in which future visits and conversions are stochastically dependent on the channels a consumer used on their path up to a point. Salient features of our model are: (1) visits by consumers are allowed to be clustered, which enables separation of their visits into intra- and inter-session components, (2) interaction effects between channels where prior visits and conversions from channels impact future inter-session visits, intra-session visits and conversions through a latent variable reflecting the cumulative weighted inventory of prior visits, (3) each channel attracts inter-session and intra-session visits differently, (4) each channel has different association with conversion conditional on a customer’s arrival to the website through that channel, (5) each channel engages customers differently (i.e., keeps the customer alive for a next session or for a next visit within a session), (6) the channel from which there was an arrival in the previous session can have an enhanced ability to generate an arrival for the same channel in the current session (channel persistence), and (7) parsimonious specification for high dimensionality in a low-velocity, sparse-data environment. We estimate the model on easy-to-collect first-party data obtained from an online retailer selling a durable good and find that information on the identities of channels and incorporation of inter- and intra-session visits have significant predictive power for future visitation and conversion behavior. We find that some channels act as “closers” and others as “engagers”—consumers arriving through the former are more likely to make a purchase, while consumers arriving through the latter, even if they do not make a purchase, are more likely to visit again in the future or extend the current session. We also find that some channels engage customers more than others, and that there are interaction effects between the channels visited. Our estimates show that the effect of prior inventory of visits is different from the immediate prior visit, and that visit and purchase probabilities can increase or decrease based on the history of channels used. We discuss several managerial implications of the model including using the predictions of the model to aid in selecting customers for marketing actions and using the model to evaluate a policy change regarding the obscuring of channel information.
... The research community has proposed alternative data-driven models to Shapley value. These works propose the utilization of neural networks [4], Markov chains [5,6], survival analysis [7,8], regressions [9] or econometric models [10]. Finally, some works have analyzed the impact of specific channels, in particular, display ads, in the global attribution [11,12]. ...
Article
Full-text available
Digital marketing is a profitable business generating annual revenue over USD 200B and an inter-annual growth over 20%. The definition of efficient marketing investment strategies across different types of channels and campaigns is a key task in digital marketing. Attribution models are an instrument used to assess the return of investment of different channels and campaigns so that they can assist in the decision-making process. A new generation of more powerful data-driven attribution models has irrupted in the market in the last years. Unfortunately, its adoption is slower than expected. One of the main reasons is that the industry lacks a proper understanding of these models and how to configure them. To solve this issue, in this paper, we present an empirical study to better understand the key properties of user-paths and their impact on attribution models. Our analysis is based on a large-scale dataset including more than 95M user-paths from real advertising campaigns of an international hoteling group. The main contribution of the paper is a set of recommendation to build accurate, interpretable and computationally efficient attribution models such as: (i) the use of linear regression, an interpretable machine learning algorithm, to build accurate attribution models; (ii) user-paths including around 12 events are enough to produce accurate models; (iii) the recency of events considered in the user-paths is important for the accuracy of the model.
... While traditional attribution modeling has used aggregate metrics (e.g., overall TV ad budget, number of website visits, net social media sentiment), more recent research uses individual-level path-topurchase data. This has enabled researchers to obtain a richer understanding of carryover and spillover effects across channels (Dalessandro et al. 2012;Ghose and Todri 2016;Li and Kannan 2014;Shao and Li 2011). ...
Article
Full-text available
Omnichannel marketing is often viewed as the panacea for one-to-one marketing, but this strategic path is mired with obstacles. This article investigates three challenges in realizing the full potential of omnichannel marketing: (1) data access and integration, (2) marketing attribution, and (3) consumer privacy protection. While these challenges predate omnichannel marketing, they are exacerbated in a digital omnichannel environment. This article argues that advances in machine learning and blockchain offer some promising solutions. In turn, these technologies present new challenges and opportunities for firms, which warrant further academic research. The authors identify both recent developments in practice and promising avenues for future research. Editor’s Note This article is part of the JM-MSI Special Issue on “From Marketing Priorities to Research Agendas,” edited by John A. Deighton, Carl F. Mela, and Christine Moorman. Written by teams led by members of the inaugural class of MSI Scholars, these articles review the literature on an important marketing topic reflected in the MSI Priorities and offer an expansive research agenda for the marketing discipline. A list of articles appearing in the Special Issue can be found at http://www.ama.org/JM-MSI-2020 .
... Channel Attribution. The problem of attributing conversion to known prior events is a well studied problem in marketing [8] and, more recently, machine learning [23]; recent work in deep learning on attribution includes LSTM-based frameworks such as [17,35]. While marketing campaigns could be included in our model, we measure in-session attribution focusing on fine-grained user interactions. ...
Preprint
We tackle the challenge of in-session attribution for on-site search engines in eCommerce. We phrase the problem as a causal counterfactual inference, and contrast the approach with rule-based systems from industry settings and prediction models from the multi-touch attribution literature. We approach counterfactuals in analogy with treatments in formal semantics, explicitly modeling possible outcomes through alternative shopper timelines; in particular, we propose to learn a generative browsing model over a target shop, leveraging the latent space induced by prod2vec embeddings; we show how natural language queries can be effectively represented in the same space and how "search intervention" can be performed to assess causal contribution. Finally, we validate the methodology on a synthetic dataset, mimicking important patterns emerged in customer interviews and qualitative analysis, and we present preliminary findings on an industry dataset from a partnering shop.
Chapter
Deep Dive 1 (Targets): Similar to how they are valuable in many other fields, the notion of “you can’t manage, what you can’t measure” also serves as a guiding beacon in the field of digital marketing. In this chapter the authors explain why they believe this mindset is essential for building and implementing a marketing performance management system in the context of their experience at Infineon Automotive for tracking the success of their activities along the customer journey. They describe in a detailed case study how they used a basic and simplified version of a performance measurement construct and the Plan-Do-Check-Act approach to quantify improvements and monitor progress within their transformation.KeywordsPlan-Do-Check-ActAnalyticsStakeholder managementTarget modelPerformance marketingChange management
Preprint
Given an unexpected change in the output metric of a large-scale system, it is important to answer why the change occurred: which inputs caused the change in metric? A key component of such an attribution question is estimating the counterfactual: the (hypothetical) change in the system metric due to a specified change in a single input. However, due to inherent stochasticity and complex interactions between parts of the system, it is difficult to model an output metric directly. We utilize the computational structure of a system to break up the modelling task into sub-parts, such that each sub-part corresponds to a more stable mechanism that can be modelled accurately over time. Using the system's structure also helps to view the metric as a computation over a structural causal model (SCM), thus providing a principled way to estimate counterfactuals. Specifically, we propose a method to estimate counterfactuals using time-series predictive models and construct an attribution score, CF-Shapley, that is consistent with desirable axioms for attributing an observed change in the output metric. Unlike past work on causal shapley values, our proposed method can attribute a single observed change in output (rather than a population-level effect) and thus provides more accurate attribution scores when evaluated on simulated datasets. As a real-world application, we analyze a query-ad matching system with the goal of attributing observed change in a metric for ad matching density. Attribution scores explain how query volume and ad demand from different query categories affect the ad matching density, leading to actionable insights and uncovering the role of external events (e.g., "Cheetah Day") in driving the matching density.
Article
In online advertising, it is critical for advertisers to forecast conversion rate (CVR) of campaigns. Previous work on campaign forecasting concentrates on time-series analysis with the dependency on a length of history. However, these approaches become inadequate for cold-start campaigns which lack for the observation of the past. In this work, we attempt to mitigate this challenge by learning an unsupervised and composite campaign embedding to capture multi-view semantic relationships on campaign information, and consequently forecasting the cold-start campaigns using the neighboring campaigns. Specifically, we propose a novel embedding framework which simultaneously extracts and fuses heterogeneous knowledge from multiple views of campaign data in a multi-task learning fashion, to learn the semantic relationship of ad message, conversion rule, and audience targeting. We develop a hierarchical attention mechanism to refine the embedding model at two levels - an intra-view attention to improve context aggregation, and an inter-task attention to balance task importance. Finally, we adopt the k-NN regression model to predict the CVR based on the neighboring campaigns by learned embeddings which encode the multi-view proximity. We conduct extensive experiments on a real-world advertising campaign dataset. The results demonstrate the effectiveness of the embedding method for CVR forecasting in cold-start scenarios.
Article
Display advertising is a $50 billion industry in which advertisers’ (e.g., P&G, Geico) demand for impressions is matched to publishers’ (e.g., Facebook, Wall Street Journal) supply of them. An ideal match is one wherein the publisher’s ad impression is assigned to the advertiser with the highest value for it. Intermediaries (e.g., Google) facilitate this match between advertisers and publishers by managing data and providing optimization tools and algorithms for serving ads. Although these markets exhibit high allocative efficiency, we argue there is considerable scope for improvement.
Chapter
Full-text available
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
Full-text available
Tree induction and logistic regression are two standard, off-the-shelf methods for building models for classification. We present a large-scale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on class-membership probabilities. We use a learning-curve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several things. (1) Contrary to some prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (that is, the learning curves cross), so conclusions about induction-algorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective at producing probability-based rankings, although apparently comparatively less so for a given training-set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable can be characterized surprisingly well by a simple measure of the separability of signal from noise.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.
Article
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
Online advertising has been a popular topic in recent years. In this paper, we address one of the important problems in online advertising, i.e., how to detect whether a publisher webpage contains sensitive content and is appropriate for showing advertisement(s) on it. We take a webpage classification approach to solve this problem. First we design a unique sensitive content taxonomy. Then we adopt an iterative training data collection and classifier building approach, to build a hierarchical classifier which can classify webpages into one of the nodes in the sensitive content taxonomy. The experimental result show that using this approach, we are able to build a unique sensitive content classifier with decent accuracy while only requiring limited amount of human labeling effort.
Article
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.
Conference Paper
This paper describes and evaluates privacy-friendly methods for extracting quasi-social networks from browser behavior on user-generated content sites, for the purpose of finding good audiences for brand advertising (as opposed to click maximizing, for example). Targeting social-network neighbors resonates well with advertisers, and on-line browsing behavior data counterintuitively can allow the identification of good audiences anonymously. Besides being one of the first papers to our knowledge on data mining for on-line brand advertising, this paper makes several important contributions. We introduce a framework for evaluating brand audiences, in analogy to predictive-modeling holdout evaluation. We introduce methods for extracting quasi-social networks from data on visitations to social networking pages, without collecting any information on the identities of the browsers or the content of the social-network pages. We introduce measures of brand proximity in the network, and show that audiences with high brand proximity indeed show substantially higher brand affinity. Finally, we provide evidence that the quasi-social network embeds a true social network, which along with results from social theory offers one explanation for the increases in audience brand affinity.
Conference Paper
The dynamic marketplace in online advertising calls for ranking systems that are optimized to consistently promote and capitalize better performing ads. The streaming nature of online data inevitably makes an advertising system choose between maximizing its expected revenue according to its current knowledge in short term (exploitation) and trying to learn more about the unknown to improve its knowledge (exploration), since the latter might increase its revenue in the future. The exploitation and exploration (EE) tradeoff has been extensively studied in the reinforcement learning community, however, not been paid much attention in online advertising until recently. In this paper, we develop two novel EE strategies for online advertising. Specifically, our methods can adaptively balance the two aspects of EE by automatically learning the optimal tradeoff and incorporating confidence metrics of historical performance. Within a deliberately designed offline simulation framework we apply our algorithms to an industry leading performance based contextual advertising system and conduct extensive evaluations with real online event log data. The experimental results and detailed analysis reveal several important findings of EE behaviors in online advertising and demonstrate that our algorithms perform superiorly in terms of ad reach and click-through-rate (CTR).
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Book
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Book
This book provides the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts of pattern recognition, the book describes techniques for modelling probability density functions, and discusses the properties and relative merits of the multi-layer perceptron and radial basis function network models. It also motivates the use of various forms of error functions, and reviews the principal algorithms for error function minimization. As well as providing a detailed discussion of learning and generalization in neural networks, the book also covers the important topics of data processing, feature extraction, and prior knowledge. The book concludes with an extensive treatment of Bayesian techniques and their applications to neural networks.
  • C M Bishop
Bishop, C.M. Pattern Recognition and Machine Learning, Springer, 2007.
Clearsaleing Attribution Model
  • Clearsaleing Inc
Clearsaleing Inc. Clearsaleing Attribution Model. http://www.clearsaleing.com/product/ accurate-attribution-management/
Atlas Institute, Microsoft Advertising. Measuring ROI Beyond the Last AD
  • J Chandler-Pepelnjak
Chandler-Pepelnjak, J. Atlas Institute, Microsoft Advertising. Measuring ROI Beyond the Last AD. http://www.atlassolutions.com/uploadedFiles/Atlas/ Atlas_Institute/Published_Content/ dmi-MeasuringROIBeyondLastAd.pdf.
Sensitive Webpage Classification for Content Advertising
  • X Jin
  • Y Li
  • T Mah
  • J Tong
Jin, X., Li, Y., Mah, T., and Tong, J. Sensitive Webpage Classification for Content Advertising. In Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, 2007.