Conference PaperPDF Available

Cross channel optimized marketing by reinforcement learning

Authors:

Abstract

The issues of cross channel integration and customer life time value modeling are two of the most important topics surrounding customer relationship management (CRM) today. In the present paper, we describe and evaluate a novel solution that treats these two important issues in a unified framework of Markov Decision Processes (MDP). In particular, we report on the results of a joint project between IBM Research and Saks Fifth Avenue to investigate the applicability of this technology to real world problems. The business problem we use as a testbed for our evaluation is that of optimizing direct mail campaign mailings for maximization of profits in the store channel. We identify a problem common to cross-channel CRM, which we call the Cross-Channel Challenge, due to the lack of explicit linking between the marketing actions taken in one channel and the customer responses obtained in another. We provide a solution for this problem based on old and new techniques in reinforcement learning. Our in-laboratory experimental evaluation using actual customer interaction data show that as much as 7 to 8 per cent increase in the store profits can be expected, by employing a mailing policy automatically generated by our methodology. These results confirm that our approach is valid in dealing with the cross channel CRM scenarios in the real world.
Cross Channel Optimized Marketing by Reinforcement
Learning
Naoki Abe, Naval Verma and Chid Apte
Mathematical Sciences Dept.
IBM T. J. Watson Res. Ctr.
Yorktown Heights, NY 10598
nabe, nverma, apte@us.ibm.com
Robert Schroko
Database Marketing
Saks Fifth Avenue
12 E. 49th Street, New York, NY 10017
Robert Schroko@s5a.com
ABSTRACT
Categories and Subject Descriptors
General Terms
Keywords
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
KDD’04, August 22–25, 2004, Seattle, Washington, USA.
Copyright 2004 ACM 1-58113-888-1/04/0008 ... 5.00.
1. INTRODUCTION
2. PROBLEM DESCRIPTION
2.1 The business problem
2.2 The cross channel challenge
3. METHODOLOGY
3.1 Cross channel life time value
maximization and MDP
3.2 Q-learning with variable time intervals
3.3 Batch Advantage Updating
4. EXPERIMENTS
4.1 Data
4.2 Evaluation
4.3 Experimental Results
5. CONCLUSIONS
6. ACKNOWLEDGMENTS
7. REFERENCES
... Although we consider direct feedbacks (i.e., users' clicking or ordering product items in the landing page of search stories), indirect feedbacks (i.e., users' clicking or ordering product items in the search page) is more important. Such a cross-channel effect [1] is hard to model using the conventional supervised learning framework. This motivates us to propose a novel reinforcement learning framework for personalized search story recommendation. ...
... There are some pioneering works applying RL to different tasks in recommendation and ranking, such as crosschannel recommendation [1], personalized news recommendation [44], impression allocation of advertisements [6], and learn-to-rank for search sessions [12]. Their motivations to use RL are all based on the long-term effect of current actions in the corresponding problems. ...
Preprint
In recent years, \emph{search story}, a combined display with other organic channels, has become a major source of user traffic on platforms such as e-commerce search platforms, news feed platforms and web and image search platforms. The recommended search story guides a user to identify her own preference and personal intent, which subsequently influences the user's real-time and long-term search behavior. %With such an increased importance of search stories, As search stories become increasingly important, in this work, we study the problem of personalized search story recommendation within a search engine, which aims to suggest a search story relevant to both a search keyword and an individual user's interest. To address the challenge of modeling both immediate and future values of recommended search stories (i.e., cross-channel effect), for which conventional supervised learning framework is not applicable, we resort to a Markov decision process and propose a deep reinforcement learning architecture trained by both imitation learning and reinforcement learning. We empirically demonstrate the effectiveness of our proposed approach through extensive experiments on real-world data sets from JD.com.
... Decision trees, for example, offer interpretability and are computationally efficient, making them suitable for real-time fraud detection (Coussement & Benoit, 2021). Logistic regression has been used extensively for binary classification tasks due to its simplicity and robustness, especially in detecting credit card fraud (Abe et al., 2004). Additionally, ensemble methods like random forests and gradient boosting improve classification accuracy by combining predictions from multiple models (Rahman & Kumar, 2020). ...
Article
Financial fraud is an ever-evolving threat that poses significant challenges to the stability and integrity of financial systems. This study systematically reviews the application of big data analytics and advanced technologies, including machine learning and natural language processing (NLP), in detecting and mitigating financial fraud. Adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, 189 peer-reviewed articles were meticulously analyzed to explore state-of-the-art methodologies and identify critical gaps in the literature. The findings underscore the transformative role of big data analytics in uncovering patterns and anomalies that traditional systems often miss, enabling real-time fraud detection with enhanced accuracy and efficiency. Machine learning techniques, particularly ensemble models and deep learning algorithms, were found to adapt effectively to dynamic fraud patterns, while NLP extended the scope of detection to unstructured data, such as emails, contracts, and social media. Furthermore, real-time fraud detection systems emerged as vital tools for immediate mitigation, yet challenges persist in integrating multimodal data sources like audio and video and shifting from reactive to proactive fraud prevention strategies. This comprehensive review provides a holistic understanding of current advancements, highlighting both achievements and areas requiring further exploration, and sets the stage for the development of robust, scalable, and ethical fraud detection frameworks.
... For example, Cui et al. (2006) and Kim et al. (2005) show that neural networks are valuable tools to improve targeting in response models. Moreover, Abe et al. (2004) successfully apply reinforcement learning to cross-channel marketing, and Schwartz et al. (2017) apply a multiarmed bandit to target display advertising. 12 See Simester et al. (2020) for a discussion of common data challenges of applying machine learning to target customers "in the wild." ...
Preprint
Full-text available
This paper studies optimal targeting as a means to increase fundraising efficacy. We randomly provide potential donors with an unconditional gift and use causal-machine learning techniques to "optimally" target this fundraising tool to the predicted net donors: individuals who, in expectation, give more than their solicitation costs. With this strategy, our fundraiser avoids lossy solicitations, significantly boosts available funds, and, consequently, can increase service and goods provision. Further, to realize these gains, the charity can merely rely on readily available data. We conclude that charities that refrain from fundraising targeting waste significant resources.
... L'apprentissage par renforcement est surtout utilisé dans le domaine de la robotique. La machine, aussi appelée dans ce contexte agent, effectue une action au sein d'un environnement spécifique (par exemple le mouvement d'un robot [Bagnell et Schneider, 2001], l'envoi de mail promotionnels [Abe et al., 2004], jouer un tour de backgammon [Tesauro et al., 1992]) dont les effets sont jugés par un ensemble de règles définies par les chercheurs. En fonction de ces règles, une récompense positive ou négative est donnée à l'agent, modifiant son état. ...
Thesis
L’histologie produit des images à l’échelle cellulaire grâce à des microscopes optiques très performants. La quantification du tissu marqué comme les neurones s’appuie de plus en plus sur des segmentations par apprentissage automatique. Cependant, l’apprentissage automatique nécessite une grande quantité d’informations intermédiaires, ou caractéristiques, extraites de la donnée brute multipliant d’autant la quantité de données à traiter. Ainsi, le nombre important de ces caractéristiques est un obstacle au traitement robuste et rapide de séries d’images histologiques. Les algorithmes de sélection de caractéristiques pourraient réduire la quantité d’informations nécessaires mais les ensembles de caractéristiques sélectionnés sont peu reproductibles. Nous proposons une méthodologie originale fonctionnant sur des infrastructures de calcul haute-performance (CHP) visant à sélectionner des petits ensembles de caractéristiques stables afin de permettre des segmentations rapides et robustes sur des images histologiques acquises à très haute-résolution. Cette sélection se déroule en deux étapes : la première à l’échelle des familles de caractéristiques. La deuxième est appliquée directement sur les caractéristiques issues de ces familles. Dans ce travail, nous avons obtenu des ensembles généralisables et stables pour deux marquages neuronaux différents. Ces ensembles permettent des réductions significatives des temps de traitement et de la mémoire vive utilisée. Cette méthodologie rendra possible des études histologiques exhaustives à haute-résolution sur des infrastructures CHP que ce soit en recherche préclinique et possiblement clinique.
... Researchers of various domains, such as digital marketing [1,47], sports [10,16,21,36,37,50], and healthcare [7,41], have proposed statistical models to perform what-if analysis on data. Visual techniques and interactive tools [17,33,55] have been developed to provide user-friendly interfaces for these models. ...
Preprint
Using causal relations to guide decision making has become an essential analytical task across various domains, from marketing and medicine to education and social science. While powerful statistical models have been developed for inferring causal relations from data, domain practitioners still lack effective visual interface for interpreting the causal relations and applying them in their decision-making process. Through interview studies with domain experts, we characterize their current decision-making workflows, challenges, and needs. Through an iterative design process, we developed a visualization tool that allows analysts to explore, validate, and apply causal relations in real-world decision-making scenarios. The tool provides an uncertainty-aware causal graph visualization for presenting a large set of causal relations inferred from high-dimensional data. On top of the causal graph, it supports a set of intuitive user controls for performing what-if analyses and making action plans. We report on two case studies in marketing and student advising to demonstrate that users can effectively explore causal relations and design action plans for reaching their goals.
... TITLE-ABS-KEY( ("reinforcement learning" OR "contextual bandit") AND ("personalization" OR "personalized" OR "personal" OR "personalisation" OR "personalised" OR "customization" OR "customized" OR "customised" OR "customised" OR "individualized" OR "individualised" OR "tailored")) Listing [167][168][169]171,[173][174][175]177,180,182,183,183,184,192,193,[195][196][197]200,201,[205][206][207]209 [160][161][162]168,169,171,[173][174][175][176][177]180,182,183,[183][184][185]193,194,[196][197][198]200,[205][206][207][208][209]211 119,121,127,128,132,137,138,144,145,147,151,159,168,169,173,177,182,183,185,192,193,198,200,205,211,213,214,217,222,223,226,228,237,239,241] other [3,19,25,73,74,130,131,134,150,156,158,161,195,197,230,232,240] state representation [1,4,11,20,27,29,32,36,41,48,49,51,56,58,[60][61][62]68,80,86,91,101,[106][107][108]110,113,116,117,120,[123][124][125][126]129,[139][140][141][142]148,151,155,157,160,162,167,171,[174][175][176]180,183,184,194,196,201,[206][207][208][209][210]215,216,218,221,224,225,227,229,231,[233][234][235][236]238,242] 12 ...
Article
Full-text available
The major application areas of reinforcement learning (RL) have traditionally been game playing and continuous control. In recent years, however, RL has been increasingly applied in systems that interact with humans. RL can personalize digital systems to make them more relevant to individual users. Challenges in personalization settings may be different from challenges found in traditional application areas of RL. An overview of work that uses RL for personalization, however, is lacking. In this work, we introduce a framework of personalization settings and use it in a systematic literature review. Besides setting, we review solutions and evaluation strategies. Results show that RL has been increasingly applied to personalization problems and realistic evaluations have become more prevalent. RL has become sufficiently robust to apply in contexts that involve humans and the field as a whole is growing. However, it seems not to be maturing: the ratios of studies that include a comparison or a realistic evaluation are not showing upward trends and the vast majority of algorithms are used only once. This review can be used to find related work across domains, provides insights into the state of the field and identifies opportunities for future work.
... Recent advancement of reinforcement learning (RL) shows great success within broad domains ranging from market strategy decisions [2], load balancing [9] to autonomous driving [26]. Reinforcement learning is a process to obtain a good policy in a given environment in an unsupervised learning manner. ...
Preprint
We study locally differentially private algorithms for reinforcement learning to obtain a robust policy that performs well across distributed private environments. Our algorithm protects the information of local agents' models from being exploited by adversarial reverse engineering. Since a local policy is strongly being affected by the individual environment, the output of the agent may release the private information unconsciously. In our proposed algorithm, local agents update the model in their environments and report noisy gradients designed to satisfy local differential privacy (LDP) that gives a rigorous local privacy guarantee. By utilizing a set of reported noisy gradients, a central aggregator updates its model and delivers it to different local agents. In our empirical evaluation, we demonstrate how our method performs well under LDP. To the best of our knowledge, this is the first work that actualizes distributed reinforcement learning under LDP. This work enables us to obtain a robust agent that performs well across distributed private environments.
Chapter
The fusion of Data Analytics, Big Data, and Machine Learning has become a powerful force in the always-changing world of data-driven decision-making. This chapter offers a brief overview of their practical uses, illuminating how these technologies are reshaping markets and driving creativity. The cornerstone, data analytics, is studied first, emphasizing its capacity to extract useful insights from a variety of sources. To demonstrate how Data Analytics enables organizations to optimize processes, improve consumer experiences, and manage risks through data-driven decision-making, real-world examples from industries including e-commerce, finance, and healthcare are shown. Next, Big Data takes center stage to demonstrate its ability to handle enormous amounts of data. We examine its uses in industries ranging from urban planning to agriculture, showing how it facilitates better decision-making through data-driven insights. The third element of the equation, machine learning, emerges as a crucial enabler of automation and intelligence. We highlight its use in customization, fraud detection, and healthcare diagnostics through fascinating real-world examples, highlighting its disruptive potential. The synergistic potential of these technologies, notably in predictive modeling and pattern recognition, is highlighted in the chapter’s conclusion. It also discusses the ethical issues surrounding the use of data and the proper application of AI, urging businesses to proceed in the data-driven world with caution and foresight. This chapter provides readers with a concise yet thorough overview of the influential trio of Big Data, Machine Learning, and Data Analytics, encouraging further investigation of their potential to reshape industries and spur innovation in the real world.
Article
Using causal relations to guide decision making has become an essential analytical task across various domains, from marketing and medicine to education and social science. While powerful statistical models have been developed for inferring causal relations from data, domain practitioners still lack effective visual interface for interpreting the causal relations and applying them in their decision-making process. Through interview studies with domain experts, we characterize their current decision-making workflows, challenges, and needs. Through an iterative design process, we developed a visualization tool that allows analysts to explore, validate, and apply causal relations in real-world decision-making scenarios. The tool provides an uncertainty-aware causal graph visualization for presenting a large set of causal relations inferred from high-dimensional data. On top of the causal graph, it supports a set of intuitive user controls for performing what-if analyses and making action plans. We report on two case studies in marketing and student advising to demonstrate that users can effectively explore causal relations and design action plans for reaching their goals.
Article
Developing marketing campaigns for a new product or a new target population is challenging because of the scarcity of relevant historical data. Building on dynamic Bayesian learning, a sequential optimization assists in creating new data points within a finite number of learning phases. This procedure identifies effective advertisement design elements as well as customer segments that maximize the expected outcome of the final marketing campaign. In this paper, the marketing campaign performance is modeled by a multiplicative advertising exposure model with Poisson arrivals. The intensity of the Poisson process is a function of the marketing campaign features. A forward-looking measurement policy is formulated to maximize the expected improvement in the value of information in each learning phase. A computationally efficient approach is proposed that consists of solving a sequence of mixed-integer linear optimization problems. The performance of the optimal learning policy over a set of benchmark policies is evaluated using examples inspired from the property and casualty insurance industry. Further extensions of the model are discussed. This paper was accepted by Eric Anderson, marketing.
Conference Paper
Full-text available
Recently, there has been increasing interest in the issues of cost-sensitive learning and decision making in a variety of applications of data mining. A number of approaches have been developed that are effective at optimizing cost-sensitive decisions when each decision is considered in isolation. However, the issue of sequential decision making, with the goal of maximizing total benefits accrued over a period of time instead of immediate benefits, has rarely been addressed. In the present paper, we propose a novel approach to sequential decision making based on the reinforcement learning framework. Our approach attempts to learn decision rules that optimize a sequence of cost-sensitive decisions so as to maximize the total benefits accrued over time. We use the domain of targeted' marketing as a testbed for empirical evaluation of the proposed method. We conducted experiments using approximately two years of monthly promotion data derived from the well-known KDD Cup 1998 donation data set. The experimental results show that the proposed method for optimizing total accrued benefits out performs the usual targeted-marketing methodology of optimizing each promotion in isolation. We also analyze the behavior of the targeting rules that were obtained and discuss their appropriateness to the application domain.
Article
Full-text available
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(), Q-learning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied. 1 Introduction A number of reinforcement learning algorithms based on the ideas of asynchronous dynamic programming and stochastic approximation have been developed recently for...
Conference Paper
We describe two methodologies for obtaining segmented regression estimators from massive training data sets. The first methodology, called Linear Regression Tree (LRT), is used for continuous response variables, and the second and complemen- tary methodology, called Naive Bayes Tree (NBT), is used for categorical response variables. These are implemented in the IBM ProbE TM (Probabilistic Estimation) data mining engine, which is an object-oriented framework for building classes of segmented predictive models from massive training data sets. Based on this method- ology, an application called ATM-SETM for direct-mail targeted marketing has been
Conference Paper
A new algorithm for reinforcement learning, advantage updating, is described. Advantage updating is a direct learning technique; it does not require a model to be given or learned. It is incremental, requiring only a constant amount of calculation per time step, independent of the number of possible actions, possible outcomes from a given action, or number of states. Analysis and simulation indicate that advantage updating is applicable to reinforcement learning systems working in continuous time (or discrete time with small time steps) for which standard algorithms such as Q-learning are not applicable. Simulation results are presented indicating that for a simple linear quadratic regulator (LQR) problem, advantage updating learns more quickly than Q-learning by a factor of 100,000 when the time step is small. Even for large time steps, advantage updating is never slower than Q-learning, and advantage updating is more resistant to noise than is Q-learning. Convergence properties are discussed. It is proved that the learning rule for advantage updating converges to the optimal policy with probability one
Article
Fingerhut Business Intelligence (BI) has a long and successful history of building statistical models to predict consumer behavior, and it constantly strives to improve its decision-making processes and tools. Fingerhut has found that predictive models can be much more effective when the target audience is split into subpopulations (i.e., customer segments) and individually tailored predictive models are developed for each segment. Historically, Fingerhut BI has used decision trees or simply domain expertise for creating customer segments. Even though these approaches work well, they are "sub-optimal" because effectiveness (i.e., predictive strength) of the segment models is not considered when defining the segments. Given their mailing volumes, Fingerhut is sensitive to the fact that increasing the predictive power of their models means millions of dollars in new revenue. Fingerhut BI approached IBM Research with the problem of how to build segmentation-based models more effectively...