ChapterPDF Available

AB Testing for Process Versions with Contextual Multi-armed Bandit Algorithms

Authors:

Abstract and Figures

Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm that can observe the performance of process versions and dynamically adjust the routing policy so that the customers are directed to the version that can best serve them. Our algorithm learns the best routing policy in the presence of complications such as multiple process performance indicators, delays in indicator observation, incomplete or partial observations, and contextual factors. We also propose a pluggable architecture that supports such routing algorithms. We evaluate our approach with a case study. Furthermore, we demonstrate that our approach identifies the best routing policy given the process performance and that it scales horizontally.
Content may be subject to copyright.
A preview of the PDF is not available
... This document is a pre-print copy of the manuscript (Satyal et al. 2018) published by Springer (available at link.springer.com). ...
Conference Paper
Full-text available
Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm that can observe the performance of process versions and dynamically adjust the routing policy so that the customers are directed to the version that can best serve them. Our algorithm learns the best routing policy in the presence of complications such as multiple process performance indicators, delays in indicator observation, incomplete or partial observations, and contextual factors. We also propose a pluggable architecture that supports such routing algorithms. We evaluate our approach with a case study. Furthermore, we demonstrate that our approach identifies the best routing policy given the process performance and that it scales horizontally.
Conference Paper
Full-text available
The rollout of new versions of a feature in modern applications is a manual multi-stage process, as the feature is released to ever larger groups of users, while its performance is carefully monitored. This kind of A/B testing is ubiquitous, but suboptimal, as the monitoring requires heavy human intervention, is not guaranteed to capture consistent, but short-term fluctuations in performance, and is inefficient, as better versions take a long time to reach the full population. In this work we formulate this question as that of expert learning, and give a new algorithm Follow-The-Best-Interval, FTBI, that works in dynamic, non-stationary environments. Our approach is practical, simple, and efficient, and has rigorous guarantees on its performance. Finally, we perform a thorough evaluation on synthetic and real world datasets and show that our approach outperforms current state-of-the-art methods.
Chapter
Full-text available
A fundamental assumption of improvement in Business Process Management (BPM) is that redesigns deliver refined and improved versions of business processes. These improvements can be validated online through sequential experiment techniques like AB Testing, as we have shown in earlier work. Such approaches have the inherent risk of exposing customers to an inferior process version during the early stages of the test. This risk can be managed by offline techniques like simulation. However, offline techniques do not validate the improvements because there is no user interaction with the new versions. In this paper, we propose a middle ground through shadow testing, which avoids the downsides of simulation and direct execution. In this approach, a new version is deployed and executed alongside the current version, but in such a way that the new version is hidden from the customers and process workers. Copies of user requests are partially simulated and partially executed by the new version as if it were running in the production. We present an architecture, algorithm, and implementation of the approach, which isolates new versions from production, facilitates fair comparison, and manages the overhead of running shadow tests. We demonstrate the efficacy of our technique by evaluating the executions of synthetic and realistic process redesigns.
Conference Paper
Full-text available
A fundamental assumption of Business Process Management (BPM) is that redesign delivers new and improved versions of business processes. This assumption, however, does not necessarily hold, and required compensatory action may be delayed until a new round in the BPM life-cycle completes. Current approaches to process redesign face this problem in one way or another, which makes rapid process improvement a central research problem of BPM today. In this paper, we address this problem by integrating concepts from process execution with ideas from DevOps. More specifically, we develop a technique called AB-BPM that offers AB testing for process versions with immediate feedback at runtime. We implemented this technique in such a way that two versions (A and B) are operational in parallel and any new process instance is routed to one of them. The routing decision is made at runtime on the basis of the achieved results for the registered performance metrics of each version. AB-BPM provides for ultimate convergence towards the best performing version, no matter if it is the old or the new version. We demonstrate the efficacy of our technique by conducting an extensive evaluation based on both synthetic and real-life data.
Article
Full-text available
Adaptive and sequential experiment design is a well-studied area in numerous domains. We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments. We first explore the traditional stochastic model of a multi-armed bandit, then explore a taxonomic scheme of complications to that model, for each complication relating it to a specific requirement or consideration of the experiment design context. Finally, at the end of the paper, we present a table of known upper-bounds of regret for all studied algorithms providing both perspectives for future theoretical work and a decision-making tool for practitioners looking for theoretical guarantees.
Article
Full-text available
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Through randomization and proper design, experiments allow establishing causality scientifically, which is why they are the gold standard in drug tests. In software development, multiple techniques are used to define product requirements; controlled experiments provide a valuable way to assess the impact of new features on customer behavior. At Microsoft, we have built the capability for running controlled experiments on web sites and services, thus enabling a more scientific approach to evaluating ideas at different stages of the planning process. In our previous papers, we did not have good examples of controlled experiments at Microsoft; now we do! The humbling results we share bring to question whether a-priori prioritization is as good as most people believe it is. The Experimentation Platform (ExP) was built to accelerate innovation through trustworthy experimentation. Along the way, we had to tackle both technical and cultural challenges and we provided software developers, program managers, and designers the benefit of an unbiased ear to listen to their customers and make data-driven decisions. A technical survey of the literature on controlled experiments was recently published by us in a journal (Kohavi, Longbotham, Sommerfield, & Henne, 2009). The goal of this paper is to share lessons and challenges focused more on the cultural aspects and the value of controlled experiments.
Article
Business Process Management (BPM) is the art and science of how work should be performed in an organization in order to ensure consistent outputs and to take advantage of improvement opportunities, e.g. reducing costs, execution times or error rates. Importantly, BPM is not about improving the way individual activities are performed, but rather about managing entire chains of events, activities and decisions that ultimately produce added value for an organization and its customers. This textbook encompasses the entire BPM lifecycle, from process identification to process monitoring, covering along the way process modelling, analysis, redesign and automation. Concepts, methods and tools from business management, computer science and industrial engineering are blended into one comprehensive and inter-disciplinary approach. The presentation is illustrated using the BPMN industry standard defined by the Object Management Group and widely endorsed by practitioners and vendors worldwide. In addition to explaining the relevant conceptual background, the book provides dozens of examples, more than 100 hands-on exercises – many with solutions – as well as numerous suggestions for further reading. The textbook is the result of many years of combined teaching experience of the authors, both at the undergraduate and graduate levels as well as in the context of professional training. Students and professionals from both business management and computer science will benefit from the step-by-step style of the textbook and its focus on fundamental concepts and proven methods. Lecturers will appreciate the class-tested format and the additional teaching material available on the accompanying website fundamentals-of-bpm.org.
Article
In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a result, it is imperative to learn online from partial interaction sequences, so that information acquired from one customer is efficiently assimilated and applied in subsequent interactions with other customers. We present the first framework for concurrent reinforcement learning, using a variant of temporal-difference learning to learn efficiently from partial interaction sequences. We evaluate our algorithms in two large-scale test-beds for online and email interaction respectively, generated from a database of 300,000 customer records.
Article
The first book of its kind to review the current status and future direction of the exciting new branch of machine learning/data mining called imbalanced learning Imbalanced learning focuses on how an intelligent system can learn when it is provided with imbalanced data. Solving imbalanced learning problems is critical in numerous data-intensive networked systems, including surveillance, security, Internet, finance, biomedical, defense, and more. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. The first comprehensive look at this new branch of machine learning, this book offers a critical review of the problem of imbalanced learning, covering the state of the art in techniques, principles, and real-world applications. Featuring contributions from experts in both academia and industry, Imbalanced Learning: Foundations, Algorithms, and Applications provides chapter coverage on: Foundations of Imbalanced Learning Imbalanced Datasets: From Sampling to Classifiers Ensemble Methods for Class Imbalance Learning Class Imbalance Learning Methods for Support Vector Machines Class Imbalance and Active Learning Nonstationary Stream Data Learning with Imbalanced Class Distribution Assessment Metrics for Imbalanced Learning Imbalanced Learning: Foundations, Algorithms, and Applications will help scientists and engineers learn how to tackle the problem of learning from imbalanced datasets, and gain insight into current developments in the field as well as future research directions. © 2013 by The Institute of Electrical and Electronics Engineers, Inc.
Article
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studied version of the contextual bandits problem. We prove a high probability regret bound of $\tilde{O}(\frac{d^2}{\epsilon}\sqrt{T^{1+\epsilon}})$ in time $T$ for any $0<\epsilon <1$, where $d$ is the dimension of each context vector and $\epsilon$ is a parameter used by the algorithm. Our results provide the first theoretical guarantees for the contextual version of Thompson Sampling, and are close to the lower bound of $\Omega(d\sqrt{T})$ for this problem. This essentially solves a COLT open problem of Chapelle and Li [COLT 2012].