Ron Kohavi

Ron Kohavi
  • PhD
  • Owner at Kohavi

About

131
Publications
360,530
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
52,701
Citations
Current institution
Kohavi
Current position
  • Owner
Additional affiliations
November 2019 - November 2021
Airbnb
Position
  • Vice President and Technical Fellow
Description
  • - Led Relevance and Experimentation, a team of world-class engineers and data scientists that included two principals and ten senior staff members. The team delivered 6%+ improvements to booking conversion from about 20 successful product changes out of over 250 ideas that were tested in controlled experiments. Surfaces impacted include search, product detail page, home page, post-booking cross-sells, and email. See Improving Deep Learning for Ranking Stays at Airbnb (https://medium.com/airbnb-e
August 1991 - July 1995
Stanford University

Publications

Publications (131)
Article
Full-text available
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal rela...
Conference Paper
Full-text available
Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over...
Conference Paper
Full-text available
Chapter
Full-text available
Many good resources are available with motivation and explanations about online controlled experiments (Kohavi et al. 2009a, 2020; Thomke 2020; Luca and Bazerman 2020; Georgiev 2018, 2019; Kohavi and Thomke 2017; Siroker and Koomen 2013; Goward 2012; Schrage 2014; King et al. 2017; McFarland 2012; Manzi 2012; Tang et al. 2010). For organizations ru...
Preprint
Full-text available
The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking.com, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netfli...
Conference Paper
Full-text available
A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly intuitive concepts being touted, including by A/B tool vendors and agencies, which are misleading, often...
Chapter
Full-text available
Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each runs more than 20,000 controlled experimen...
Book
Full-text available
Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiment...
Article
Full-text available
Background: Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originall...
Chapter
For the digital parts of businesses in Society 5.0, such as web site and mobile applications, manual testing is impractical and slow. Instead, implementation of ideas can now be evaluated with scientific rigor using online controlled experiments (A/B tests), which provide trustworthy reliable assessments of the impact of the implementations to key...
Conference Paper
Full-text available
The Internet and the general digitalization of products and operations provides an unprecedented opportunity to accelerate innovation while applying a rigorous and trustworthy methodology for supporting key product decisions. Developers of connected software, including web sites, applications, and devices, can now evaluate ideas quickly and accurat...
Article
Full-text available
Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at sc...
Conference Paper
Full-text available
The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google...
Chapter
Full-text available
The Internet connectivity of client software (e.g., apps running on phones and PCs), websites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining tec...
Conference Paper
Full-text available
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features. One of the key challenges for organizations tha...
Presentation
Full-text available
It’s easy to run a controlled experiment and compute a p-value with five digits after the decimal point. While getting such precise numbers is easy, getting numbers you can trust is much harder. We share practical pitfalls from online controlled experiments across multiple groups at Microsoft
Chapter
Full-text available
The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining te...
Conference Paper
Full-text available
Online controlled experiments are now widely run in the software industry. I share several challenging problems and motivate their importance. These include high-variance metrics, issues with p-values, metric-driven vs. design-driven decisions, novelty effects, and leaks.
Conference Paper
Full-text available
Five Challenging Problems for A/B/n Tests
Conference Paper
Full-text available
The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using trustworthy controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend recommendation...
Conference Paper
Full-text available
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn,...
Conference Paper
Full-text available
The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend algorithms, online controlled experiments are now utilized to make data-driven decisions at many other companies....
Patent
Full-text available
A system and method are disclosed for automatically detecting associations between particular sets of search criteria, such as particular search strings, and particular items. Actions of users of an interactive system, such as a web site, are monitored over time to generate event histories reflective of searches, item selection actions, and possibl...
Patent
Full-text available
Computing services that unwanted entities may wish to access for improper, and potentially illegal, use can be more effectively protected by using Active HIP systems and methodologies. An Active HIP involves dynamically swapping one random HIP challenge, e.g., but not limited to, image, for a second random HIP challenge, e.g., but not limited to, i...
Conference Paper
Full-text available
Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small differences in key metrics, on the order of fractions of a percent, may have very significant business implications. At Bing it is not uncommon to see experiments th...
Article
Full-text available
The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend recommendation systems and relevance algorithms, online controlled experiments are now utilized to make d...
Conference Paper
Full-text available
Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920...
Article
Full-text available
From ancient times through the 19th century, physicians used bloodletting to treat acne, cancer, diabetes, jaundice, plague, and hundreds of other diseases and ailments (D. Wooton, Doctors Doing Harm since Hippocrates, Oxford Univ. Press, 2006). It was judged most effective to bleed patients while they were sitting upright or standing erect, and bl...
Article
Full-text available
Tracking users' online clicks and form submits (e.g., searches) is critical for web analytics, controlled experiments, and business intelligence. Most sites use web beacons to track user actions, but waiting for the beacon to return on clicks and submits slows the next action (e.g., showing search results or the destination page). One possibility i...
Article
Full-text available
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Offline controlled experiments have been well studied and documented since Sir Ronald A. Fisher led the development of statistical experimental design while work...
Article
Full-text available
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Through randomization and proper design, experiments allow establishing causality scientifically, which is why they are the gold standard in drug tests. In softw...
Conference Paper
Full-text available
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings,...
Article
Full-text available
While at Amazon.com from 1997 to 2002, Greg Linden created a prototype system that made personalized recommendations to customers when they placed items in their shopping cart (http:// glinden.blogspot.com/2006/04/earlyamazon- shopping-cart.html). The prototype looked promising, but “a marketing senior vice-president was dead set against it,” claim...
Conference Paper
Full-text available
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single-factor or factorial designs), A/B tests (and their generalizations), split tests, Control/Treatment tests, and parallel flights. Controlled experiments embody the best scientific design for establishing a...
Chapter
Full-text available
We describe an experimental study of pruning methods for decision tree classifiers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the...
Conference Paper
Full-text available
Electronic Commerce is now entering its second decade, with Amazon.com and eBay now in existence for ten years. With massive amounts of data, an actionable domain, and measurable ROI, multiple companies use data mining and knowledge discovery to understand their customers and improve interactions. We present important lessons and challenges using e...
Article
Full-text available
The architecture of Blue Martini Software's e-commerce suite has supported data collection, data transformation, and data mining since its inception. With clickstreams being collected at the application-server layer, high-level events being logged, and data automatically transformed into a data warehouse using meta-data, common problems plaguing da...
Conference Paper
Full-text available
Segmentation based on RFM (Recency, Frequency, and Monetary) has been used for over 50 years by direct marketers to target a subset of their customers, save mailing costs, and improve profits. RFM analysis is commonly performed using the Arthur Hughes method, which bins each of the three RFM attributes independently into five equal frequency bins....
Article
Full-text available
Typical web analytic packages provide basic key performance indicators and standard reports to help assess traffic patterns on the website, evaluate site performance, and identify potential problems such as bad links resulting in page not found errors. Based on our experience in mining data for multiple retail e-commerce sites, we offer several rec...
Article
Bayesian classification addresses the classification problem by learning the distribution of instances given different class values. We review the basic notion of Bayesian classification, describe in some detail the naive Bayesian classifier, and briefly discuss some extensions.
Article
Full-text available
this paper we discuss the technology and enterprise-adoption trends in the area of business analytics. The key consumer of these analytics is the business user, a person whose job is not directly related to analytics per-se (e.g., a merchandiser, marketer, salesperson), but who typically must use analytical tools to improve the results of a busines...
Article
Full-text available
This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to...
Article
Full-text available
We show that the e-commerce domain can provide all the right ingredients for successful data mining. We describe an integrated architecture for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We...
Article
Full-text available
Web robots are software programs that automatically traverse the hyperlink structure of the World Wide Web in order to locate and retrieve information. There are many reasons why it is important to identify visits by Web robots and distinguish them from ...
Book
WorkshopTheme The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth of electronic commerce. In addition, customer interactions, including personalized content, e-mail c- paigns, and online feedback provide new channels of communication that were not previously available...
Conference Paper
Full-text available
This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to...
Article
Full-text available
Organizations conducting Electronic Commerce (e-commerce) can greatly benefit from the insight that data mining of transactional and clickstream data provides. Such insight helps not only to improve the electronic channel (e.g., a web site), but it is also a learning vehicle for the bigger organization conducting business at brick-and-mortar stores...
Article
Full-text available
We analyze critically the use of classification accuracy to compare classifiers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, and standard benchmark data sets. The results raise serious concerns about the use of accuracy for comparing classifiers and drawinto question the conclusi...
Conference Paper
Full-text available
Electronic commerce provides all the right ingredients for successful data mining (the Good). Web logs, however, are at a very low granularity level, and attempts to mine e-commerce data using only web logs often result in little interesting insight (the Bad). Getting the data into minable formats requires significant pre-processing and data transf...
Article
this document is to provide coding standards for writing C code in MLC++ . The description here can be used as a general guideline for programming in C , independent of MLC++ , but it is a low-level guide that does not discuss important issues of design. The MLC++ coding standards defines defines higher-level concepts used in MLC++ , including erro...
Article
Full-text available
. The simple Bayesian classifier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional depe...
Article
Full-text available
We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorit...
Article
Full-text available
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more...
Article
Full-text available
In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a blackbox. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwis...
Conference Paper
Full-text available
We show that the e-commerce domain can provide all the right ingredients for successful data mining. We describe an integrated architecture for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We...
Article
Full-text available
We show that the e-commerce domain can provide allthe right ingredients for successful data mining andclaim that it is a killer domain for data mining. Wedescribe an integrated architecture, based on our experienceat Blue Martini Software, for supporting thisintegration. The architecture can dramatically reducethe pre-processing, cleaning, and data...
Book
Applications of Data Mining to Electronic Commerce brings together in one place important contributions and up-to-date research results in this fast moving area. Applications of Data Mining to Electronic Commerce serves as an excellent reference, providing insight into some of the most challenging research issues in the field.
Article
Full-text available
We describe KDD-Cup 2000, the yearly competition in data mining. For the first time the Cup included insight problems in addition to prediction problems, thus posing new challenges in both the knowledge discovery and the evaluation criteria, and highlighting the need to "peel the onion" and drill deeper into the reasons for the initial patterns fou...
Article
Full-text available
a data mining project. Unfortunately, the other 80% contains several substantial hurdles that without heroic eort may block the successful completion of the project. The following are ve desiderata for success. Seldom are they they all present in one data mining application. 1. Data with rich descriptions. For example, wide customer records with ma...
Article
Full-text available
We show that the e-commerce domain can provide all the right ingredients for successful data mining. We describe an integrate architecture for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80%of the time in knowledge discovery projects. We e...
Preprint
Electronic commerce is emerging as the killer domain for data mining technology. The following are five desiderata for success. Seldom are they they all present in one data mining application. 1. Data with rich descriptions. For example, wide customer records with many potentially useful fields allow data mining algorithms to search beyond obvious...
Article
. This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Inductive Logic Programming (ILP) system progol is applied to the problem of identifying potential pharmacophores for ACE inhibition. The...
Article
Full-text available
Data Mining is the process of identifying new patterns and insights in data. As the volume of data collected and stored in databases grows, there is a growing need to provide data summarization (e.g., through visualization), identify important patterns and trends, and act upon the findings. Insight derived from data mining can provide tremendous ec...
Article
Full-text available
At KDD-99, the panel on Integrating Data Mining into Vertical Solutions addressed a series of questions regarding future trends in industrial applications. Panelists were chosen to represent different viewpoints from a variety of industry segments, including data providers (Jim Bozik), horizontal and vertical tool providers (Ken Ono and Steve Belch...
Article
Full-text available
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three varia...
Article
Full-text available
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis sp...
Article
Full-text available
: In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider...
Article
Full-text available
We describe an experimental study of Option Decision Trees with majority votes. Option Decision Trees generalize regular decision trees by allowing option nodes in addition to decision nodes; such nodes allow for several possible tests to be conducted instead of the commonly used single test. Our goal was to explore when option nodes are most usefu...
Article
Full-text available
this paper we detail some things that worked well, some things that did not work as well as we hoped, and some thoughts about the future. 2 What Worked Well
Article
Full-text available
Loan level modeling of prepayment is an important aspect of hedging, risk assessment, and retention efforts of the hundreds of companies in the US that trade and initiate Mortgage Backed Securities (MBS). In this paper we review and investigate different aspects of modeling customers who have taken jumbo loans in the US using MineSet. We show how r...
Article
Full-text available
The simple Bayesian classifier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional depend...
Chapter
Full-text available
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider h...
Article
We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorit...
Article
Full-text available
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present de...
Article
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular domain, a feature subset selection method should consider how the...
Article
We address the problem of finding the parameter settings that will result in optimal performance of a given learning algorithm using a particular dataset as training data. We describe a “wrapper” method, considering determination of the best parameters as a discrete function optimization problem. The method uses best-first search and crossvalidatio...
Article
Full-text available
The reasons for including applications papers in the machine learning literature, are discussed. Application papers in Machine learning literature are often included because these papers have success stories which act as an advertisement and boost morale. However there is another reason why such papers are value to field, which is even more vital....
Article
Full-text available
Conference Paper
Full-text available
Business users and analysts commonly use spread- sheets and 2D plots to analyze and understand their data. On-line Analytical Processing (OLAP) provides these users with added flexibility in pivoting data around dierent attributes and drilling up and down the multi-dimensional cube of aggregations. Machine learning researchers, however, have concen...
Article
Full-text available
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider h...
Article
Full-text available
Data mining algorithms including machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC++...
Article
Full-text available
. Nearest-neighbor algorithms are known to depend heavily on their distance metric. In this paper, we investigate the use of a weighted Euclidean metric in which the weight for each feature comes from a small set of options. We describe Diet, an algorithm that directs search through a space of discrete weights using cross-validation error as its ev...
Article
Full-text available
. We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision...
Article
Full-text available
blem. Only 8% of the articles presented results for more than one problem using real world data. While Prechelt (1996) only looked at whether comparisons were done, Cohen went a step further and described how to design good experiments. He wrote that "books like this one encourage well-designed experiments, which, if one isn't careful, can be utter...
Article
Full-text available
MineSet TM , Silicon Graphics' interactive system for data mining, integrates three powerful technologies: database access, analytical data mining, and data visualization. It supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment. MineSet is based on a client-server archi...
Article
Full-text available
Lazy learning algorithms, exemplified by nearestneighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single "best" decision tree during the training phase, and this tree is t...

Network

Cited By