Conference Paper

Coarse Personalization

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present a general framework to target customers using optimal targeting policies, and we document the profit differences from alternative estimates of the optimal targeting policies. Two foundations of the framework are conditional average treatment effects (CATEs) and off-policy evaluation using data with randomized targeting. This policy evaluation approach allows us to evaluate an arbitrary number of different targeting policies using only one randomized data set and thus provides large cost advantages over conducting a corresponding number of field experiments. We use different CATE estimation methods to construct and compare alternative targeting policies. Our particular focus is on the distinction between indirect and direct methods. The indirect methods predict the CATEs using a conditional expectation function estimated on outcome levels, whereas the direct methods specifically predict the treatment effects of targeting. We introduce a new direct estimation method called treatment effect projection (TEP). The TEP is a non-parametric CATE estimator that we regularize using a transformed outcome loss which, in expectation, is identical to a loss that we could construct if the individual treatment effects were observed. The empirical application is to a catalog mailing with a high-dimensional set of customer features. We document the profits of the estimated policies using data from two campaigns conducted one year apart, which allows us to assess the transportability of the predictions to a campaign implemented one year after collecting the training data. All estimates of the optimal targeting policies yield larger profits than uniform policies that target none or all customers. Further, there are significant profit differences across the methods, with the direct estimation methods yielding substantially larger economic value than the indirect methods.
Article
Full-text available
In this paper we describe an equivalence between random utility discrete-choice models and two-sided matching models with imperfectly transferable utility, and we exploit its consequences. Based on it, we suggest new approaches for estimation and identification of non-additive random utility models (NARUM), in which the utility shocks do not affect decision-makers' utilities in an additive manner. The estimation algorithms and procedures we describe are inspired by those in the matching literature. A noteworthy feature of our algorithms is that they yield the point estimate when the model is point identified, and yield the upper and lower bounds on the parameters under partial identification.
Article
Full-text available
Using results from Convex Analysis, we investigate a novel approach to identification and estimation of discrete-choice models that we call the mass transport approach. We show that the conditional choice probabilities and the choice-specific payoffs in these models are related in the sense of conjugate duality, and that the identification problem is a mass transport problem. Based on this, we propose a new two-step estimator for these models; interestingly, the first step of our estimator involves solving a linear program that is identical to the classic assignment (two-sided matching) game of Shapley and Shubik (1971). The application of convex-analytic tools to dynamic discrete-choice models and the connection with two-sided matching models is new in the literature. Monte Carlo results demonstrate the good performance of this estimator, and we provide an empirical application based on Rust's (1987) bus engine replacement model.
Article
Full-text available
Customized communications have the potential to reduce information overload and aid customer decisions, and the highly relevant products that result from customization can form the cornerstone of enduring customer relationships. In spite of such potential benefits, few models exist in the marketing literature to exploit the Internet's unique ability to design communications or marketing programs at the individual level. The authors develop a statistical and optimization approach for customization of information on the Internet. The authors use clickstream data from users at one of the top ten most trafficked Web sites to estimate the model and optimize the design and content of such communications for each user. The authors apply the model to the context of permission-based e-mail marketing, in which the objective is to customize the design and content of the e-mail to increase Web site traffic. The analysis suggests that the content-targeting approach can potentially increase the expected number of click-throughs by 62%.
Article
Full-text available
Scholars have questioned the effectiveness of several customer relationship management strategies. The author investigates the differential effects of customer relationship perceptions and relationship marketing instruments on customer retention and customer share development over time. Customer relationship perceptions are considered evaluations of relationship strength and a supplier's offerings, and customer share development is the change in customer share between two periods. The results show that affective commitment and loyalty programs that pro- vide economic incentives positively affect both customer retention and customer share development, whereas direct mailings influence customer share development. However, the effect of these variables is rather small. The results also indicate that firms can use the same strategies to affect both customer retention and customer share development.
Article
Full-text available
Marketing scholars commonly characterize market structure by studying the patterns of substitution implied by brand switching. Though the approach is useful, it typically ignores the destabilizing role of marketing variables (e.g., price) in switching behavior. The authors propose a flexible choice model that partitions the market into consumer segments differing in both brand preference and price sensitivity. The result is a unified description of market structure that links the pattern of brand switching to the magnitudes of own- and cross-price elasticities. The approach is applied in a study of competition between national brands and private labels in one product category.
Article
Full-text available
The main objective of this paper is to provide a decision-support system of micro-level customized promotions, primarily for use in online stores. Our proposed approach utilizes the one-on-one and interactive nature of the Internet shopping environment and provides recommendations on . We address the issue by first constructing a joint purchase incidence-brand choice-purchase quantity model that incorporates how variety-seeking/inertia tendency differs among households and change over time for the same household. Based on the model, we develop an optimization procedure to derive the optimal amount of price discount for each household on each shopping trip. We demonstrate that the proposed customization method could greatly improve the effectiveness of current promotion practices, and discuss the implications for retailers and consumer packaged goods companies in the age of Internet technology.
Article
Full-text available
An important aspect of marketing practice is the targeting of consumer segments for differential promotional activity. The premise of this activity is that there exist distinct segments of homogeneous consumers who can be identified by readily available demographic information. The increased availability of individual consumer panel data open the possibility of direct targeting of individual households. The goal of this paper is to assess the information content of various information sets available for direct marketing purposes. Information on the consumer is obtained from the current and past purchase history as well as demographic characteristics. We consider the situation in which the marketer may have access to a reasonably long purchase history which includes both the products purchased and information on the causal environment. Short of this complete purchase history, we also consider more limited information sets which consist of only the current purchase occasion or only information on past product choice without causal variables. Proper evaluation of this information requires a flexible model of heterogeneity which can accommodate observable and unobservable heterogeneity as well as produce household level inferences for targeting purposes. We develop new econometric methods to implement a random coefficient choice model in which the heterogeneity distribution is related to observable demographics. We couple this approach to modeling heterogeneity with a target couponing problem in which coupons are customized to specific households on the basis of various information sets. The couponing problem allows us to place a monetary value on the information sets. Our results indicate there exists a tremendous potential for improving the profitability of direct marketing efforts by more fully utilizing household purchase histories. Even rather short purchase histories can produce a net gain in revenue from target couponing which is 2.5 times the gain from blanket couponing. The most popular current electronic couponing trigger strategy uses only one observation to customize the delivery of coupons. Surprisingly, even the information contained in observing one purchase occasion boasts net couponing revenue by 50% more than that which would be gained by the blanket strategy. This result, coupled with increased competitive pressures, will force targeted marketing strategies to become much more prevalent in the future than they are today.
Article
Full-text available
We provide a fully personalized model for optimizing multiple marketing interventions in intermediate-term customer relationship management (CRM). We derive theoretically based propositions on the moderating effects of past customer behavior and conduct a longitudinal validation test to compare the performance of our model with that of commonly used segmentation models in predicting intermediate-term, customer-specific gross profit change. Our findings show that response to marketing interventions is highly heterogeneous, that heterogeneity of response varies across different marketing interventions, and that the heterogeneity of response to marketing interventions may be partially explained by customer-specific variables related to customer characteristics and the customer’s past interactions with the company. One important result from these moderating effects is that relationship-oriented interventions are more effective with loyal customers, while action-oriented interventions are more effective with nonloyal customers. We show that our proposed model outperformed models based on demographics, recency-frequency-monetary value (RFM), or finite mixture segmentation in predicting the effectiveness of intermediate-term CRM. The empirical results project a significant increase in intermediate-term profitability over all of the competing segmentation approaches and a significant increase in intermediate-term profitability over current practice.
Article
Full-text available
Does it matter if managers use an absolute amount or the relativepercentage discount when determining the optimal price and promotionalstrategy for a good? Intuitively, one might expect that the results ofboth models (deal amount and deal percentage) would be identical. Thisresearch shows that the deal-percentage model dominates the deal-amountmodel on three dimensions: (1) consumers pay more when an amount modelis used, (2) the seller has lower profits and different promotionalstrategies when an amount model is used, and (3) consumer behavior isunrealistically constrained by the amount model. This research showsthat whenever the seller offers a promotion in the deal-amount model,the net price paid by the consumer (regular price minus the deal) isalways higher than the net price in the deal-discount model. Thisresult implies that the ultimate price and promotional strategy (suchas depth of promotion, timing of promotions, and so on) prescribed bythe models are different. Additionally, this research shows that whenconsumers respond similarly in the models, the seller's profits arehigher in the deal-percentage model. This finding is a direct result ofthe higher net price in the deal-amount model. Finally, contrary toempirical findings in the marketing literature, the deal-amount modelrequires consumers to respond more strongly to price changes than topromotions (that is, the promotional elasticity plus one must be lessthan the magnitude of the price elasticity) for the optimal price to bepositive. The deal-percentage model does not place similar restrictionson consumer behavior.
Article
Problem definition: We seek to provide an interpretable framework for segmenting users in a population for personalized decision making. Methodology/results: We propose a general methodology, market segmentation trees (MSTs), for learning market segmentations explicitly driven by identifying differences in user response patterns. To demonstrate the versatility of our methodology, we design two new specialized MST algorithms: (i) choice model trees (CMTs), which can be used to predict a user’s choice amongst multiple options, and (ii) isotonic regression trees (IRTs), which can be used to solve the bid landscape forecasting problem. We provide a theoretical analysis of the asymptotic running times of our algorithmic methods, which validates their computational tractability on large data sets. We also provide a customizable, open-source code base for training MSTs in Python that uses several strategies for scalability, including parallel processing and warm starts. Finally, we assess the practical performance of MSTs on several synthetic and real-world data sets, showing that our method reliably finds market segmentations that accurately model response behavior. Managerial implications: The standard approach to conduct market segmentation for personalized decision making is to first perform market segmentation by clustering users according to similarities in their contextual features and then fit a “response model” to each segment to model how users respond to decisions. However, this approach may not be ideal if the contextual features prominent in distinguishing clusters are not key drivers of response behavior. Our approach addresses this issue by integrating market segmentation and response modeling, which consistently leads to improvements in response prediction accuracy, thereby aiding personalization. We find that such an integrated approach can be computationally tractable and effective even on large-scale data sets. Moreover, MSTs are interpretable because the market segments can easily be described by a decision tree and often require only a fraction of the number of market segments generated by traditional approaches. Disclaimer: This work was done prior to Ryan McNellis joining Amazon. Funding: This work was supported by the National Science Foundation [Grants CMMI-1763000 and CMMI-1944428]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2023.1195 .
Article
We estimate the causal effects of different targeted email promotions on the opening and purchase decisions of the consumers who receive them.
Article
Free trial promotions are a commonly used customer acquisition strategy in the Software as a Service industry. We use data from a large-scale field experiment to study the effect of trial length on customer-level outcomes. We find that, on average, shorter trial lengths (surprisingly) maximize customer acquisition, retention, and profitability. Next, we examine the mechanism through which trial length affects conversions and rule out the demand cannibalization theory, find support for the consumer learning hypothesis, and show that long stretches of inactivity at the end of the trial are associated with lower conversions. We then develop a personalized targeting policy that allocates the optimal treatment to each user based on individual-level predictions of the outcome of interest (e.g., subscriptions) using a lasso model. We evaluate this policy using the inverse propensity score reward estimator and show that it leads to 6.8% improvement in subscription compared with a uniform 30-days for-all policy. It also performs well on long-term customer retention and revenues in our setting. Further analysis of this policy suggests that skilled and experienced users are more likely to benefit from longer trials, whereas beginners are more responsive to shorter trials. Finally, we show that personalized policies do not always outperform uniform policies, and we should be careful when designing and evaluating personalized policies. In our setting, personalized policies based on other methods (e.g., causal forests, random forests) perform worse than a simple uniform policy that assigns a short trial length to all users. This paper was accepted by Duncan Simester, marketing.
Article
In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to this problem motivated by the theory of semiparametrically efficient estimation. Our method can be used to optimize either binary treatments or infinitesimal nudges to continuous treatments, and can leverage observational data where causal effects are identified using a variety of strategies, including selection on observables and instrumental variables. Given a doubly robust estimator of the causal effect of assigning everyone to treatment, we develop an algorithm for choosing whom to treat, and establish strong guarantees for the asymptotic utilitarian regret of the resulting policy.
Article
We study deep neural networks and their use in semiparametric inference. We establish novel nonasymptotic high probability bounds for deep feedforward neural nets. These deliver rates of convergence that are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second‐step inference after first‐step estimation with deep learning, a result also new to the literature. Our nonasymptotic high probability bounds, and the subsequent semiparametric inference, treat the current standard architecture: fully connected feedforward neural networks (multilayer perceptrons), with the now‐common rectified linear unit activation function, unbounded weights, and a depth explicitly diverging with the sample size. We discuss other architectures as well, including fixed‐width, very deep networks. We establish the nonasymptotic bounds for these deep nets for a general class of nonparametric regression‐type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. We then apply our theory to develop semiparametric inference, focusing on causal parameters for concreteness, and demonstrate the effectiveness of deep learning with an empirical application to direct mail marketing.
Article
Mobile in-app advertising is now the dominant form of digital advertising. Although these ads have excellent user-tracking properties, they have raised concerns among privacy advocates. This has resulted in an ongoing debate on the value of different types of targeting information, the incentives of ad networks to engage in behavioral targeting, and the role of regulation. To answer these questions, we propose a unified modeling framework that consists of two components—a machine learning framework for targeting and an analytical auction model for examining market outcomes under counterfactual targeting regimes. We apply our framework to large-scale data from the leading in-app ad network of an Asian country. We find that an efficient targeting policy based on our machine learning framework improves the average click-through rate by 66.80% over the current system. These gains mainly stem from behavioral information compared with contextual information. Theoretical and empirical counterfactuals show that although total surplus grows with more granular targeting, the ad network’s revenues are nonmonotonic; that is, the most efficient targeting does not maximize ad network revenues. Rather, it is maximized when the ad network does not allow advertisers to engage in behavioral targeting. Our results suggest that ad networks may have economic incentives to preserve users’ privacy without external regulation.
Book
Optimal transport theory is used widely to solve problems in mathematics and some areas of the sciences, but it can also be used to understand a range of problems in applied economics, such as the matching between job seekers and jobs, the determinants of real estate prices, and the formation of matrimonial unions. This is the first text to develop clear applications of optimal transport to economic modeling, statistics, and econometrics. It covers the basic results of the theory as well as their relations to linear programming, network flow problems, convex analysis, and computational geometry. Emphasizing computational methods, it also includes programming examples that provide details on implementation. Applications include discrete choice models, models of differential demand, and quantile-based statistical estimation methods, as well as asset pricing models. The book also features numerous exercises throughout that help to develop mathematical agility, deepen computational skills, and strengthen economic intuition.
Article
Mixture models are versatile tools that are used extensively in many fields, including operations, marketing, and econometrics. The main challenge in estimating mixture models is that the mixing distribution is often unknown, and imposing a priori parametric assumptions can lead to model misspecification issues. In this paper, we propose a new methodology for nonparametric estimation of the mixing distribution of a mixture of logit models. We formulate the likelihood-based estimation problem as a constrained convex program and apply the conditional gradient (also known as Frank–Wolfe) algorithm to solve this convex program. We show that our method iteratively generates the support of the mixing distribution and the mixing proportions. Theoretically, we establish the sublinear convergence rate of our estimator and characterize the structure of the recovered mixing distribution. Empirically, we test our approach on real-world datasets. We show that it outperforms the standard expectation-maximization (EM) benchmark on speed (16 times faster), in-sample fit (up to 24% reduction in the log-likelihood loss), and predictive (average 28% reduction in standard error metrics) and decision accuracies (extracts around 23% more revenue). On synthetic data, we show that our estimator is robust to different ground-truth mixing distributions and can also account for endogeneity. This paper was accepted by Serguei Netessine, operations management.
Book
The goal of Optimal Transport (OT) is to define geometric tools that are useful to compare probability distributions. Their use dates back to 1781. Recent years have witnessed a new revolution in the spread of OT, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This monograph reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications. Computational Optimal Transport presents an overview of the main theoretical insights that support the practical effectiveness of OT before explaining how to turn these insights into fast computational schemes. Written for readers at all levels, the authors provide descriptions of foundational theory at two-levels. Generally accessible to all readers, more advanced readers can read the specially identified more general mathematical expositions of optimal transport tailored for discrete measures. Furthermore, several chapters deal with the interplay between continuous and discrete measures, and are thus targeting a more mathematically-inclined audience. This monograph will be a valuable reference for researchers and students wishing to get a thorough understanding of Computational Optimal Transport, a mathematical gem at the interface of probability, analysis and optimization.
Article
We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
Article
The authors develop a joint estimation approach to segment households on the basis of their response to price and promotion in brand choice, purchase incidence, and purchase quantity decisions. The authors model brand choice (what to buy) by multinomial logit, incidence (whether to buy) by nested logit, and quantity (how much to buy) by poisson regression. Response segments are determined probabilistically using a latent mixture model. The approach simultaneously calibrates sales response on two dimensions: across segments and the three purchase behaviors. The procedure permits market-level sales elasticities to be decomposed by segment and purchase behavior (i.e., choice, incidence, and quantity). The authors apply the approach to scanner panel data for the yogurt category and find substantial differences across segments in the relative impact of the choice, incidence, and quantity decisions on overall sales response to price.
Article
The authors develop an approach to market segmentation based on consumer response to marketing variables in both brand choice and category purchase incidence. The approach reveals segmentation as well as the nature of choice and incidence response for each segment. Brand choice and purchase incidence decisions are modeled at the segment level with the disaggregate multinomial logit and nested logit models; segment sizes are estimated simultaneously with the choice and incidence probabilities. Households are assigned to segments by using their posterior probabilities of segment membership based on their purchase histories. The procedure thereby permits an analysis of the demographic, purchase behavior, and brand preference characteristics of each response segment. The authors illustrate their approach with scanner panel data on the liquid laundry detergent category and find segmentation in price and promotion sensitivity for both brand choice and category purchase incidence. The results suggest that many households that switch brands on the basis of price and promotion do not also accelerate their category purchases and that households that accelerate purchases do not necessarily switch brands.
Article
The author reviews the current status and recent advances in segmentation research, covering segmentation problem definition, research design considerations, data collection approaches, data analysis procedures, and data interpretation and implementation. Areas for future research are identified.
Article
Many scientific and engineering challenges---ranging from personalized medicine to customized marketing recommendations---require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. Given a potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms that, to our knowledge, is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially as the number of covariates increases.
Article
Companies in a variety of sectors are increasingly managing customer churn proactively, generally by detecting customers at the highest risk of churning and targeting retention efforts towards them. While there is a vast literature on developing churn prediction models that identify customers at the highest risk of churning, no research has investigated whether it is indeed optimal to target those individuals. Combining two field experiments with machine learning techniques, the author demonstrates that customers identified as having the highest risk of churning are not necessarily the best targets for proactive churn programs. This finding is not only contrary to common wisdom but also suggests that retention programs are sometimes futile not because firms offer the wrong incentives but because they do not apply the right targeting rules. Accordingly, firms should focus their modeling efforts on identifying the observed heterogeneity in response to the intervention and to target customers on the basis of their sensitivity to the intervention, regardless of their risk of churning. This approach is empirically demonstrated to be significantly more effective than the standard practice of targeting customers with the highest risk of churning. More broadly, the author encourages firms and researchers using randomized trials (or A/B tests) to look beyond the average effect of interventions and leverage the observed heterogeneity in customers' response to select customer targets.
Article
Clustering is a fundamental problem in statistics and machine learning. Lloyd's algorithm, proposed in 1957, is still possibly the most widely used clustering algorithm in practice due to its simplicity and empirical performance. However, there has been little theoretical investigation on the statistical and computational guarantees of Lloyd's algorithm. This paper is an attempt to bridge this gap between practice and theory. We investigate the performance of Lloyd's algorithm on clustering sub-Gaussian mixtures. Under an appropriate initialization for labels or centers, we show that Lloyd's algorithm converges to an exponentially small clustering error after an order of logn\log n iterations, where n is the sample size. The error rate is shown to be minimax optimal. For the two-mixture case, we only require the initializer to be slightly better than random guess. In addition, we extend the Lloyd's algorithm and its analysis to community detection and crowdsourcing, two problems that have received a lot of attention recently in statistics and machine learning. Two variants of Lloyd's algorithm are proposed respectively for community detection and crowdsourcing. On the theoretical side, we provide statistical and computational guarantees of the two algorithms, and the results improve upon some previous signal-to-noise ratio conditions in literature for both problems. Experimental results on simulated and real data sets demonstrate competitive performance of our algorithms to the state-of-the-art methods.
Article
In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without "sparsity" assumptions. We propose an "honest" approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the "ground truth" for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7-22%.
Article
We propose a notion of conditional vector quantile function and a vector quantile regression. A conditional vector quantile function (CVQF) of a random vector Y , taking values in ℝd given covariates Z = z, taking values in ℝk, is a map u→→QY|Z(u, z), which is monotone, in the sense of being a gradient of a convex function and such that given that vector U follows a reference non-atomic distribution FU for instance uniform distribution on a unit cube in ℝd the random vector QY|Z(U z) has the distribution of Y conditional on Z = z. Moreover we have a strong representation Y =QY|Z(UZ) almost surely for some version of U. The vector quantile regression (VQR) is a linear model for CVQF of Y given Z. Under correct specification the notion produces strong representation Y = β(U)Τ f (Z) for f (Z) denoting a known set of transformations of Z where u →β(u)Τ f (Z) is a monotone map the gradient of a convex function and the quantile regression coefficients u →β(u) have the interpretations analogous to that of the standard scalar quantile regression. As f (Z) becomes a richer class of transformations of Z the model becomes nonparametric as in series modelling. A key property of VQR is the embedding of the classical Monge-Kantorovich's optimal transportation problem at its core as a special case. In the classical case where Y is scalar VQR reduces to a version of the classical QR and CVQF reduces to the scalar conditional quantile function. An application to multiple Engel curve estimation is considered.
Book
Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding to the outcome that would be realized if a subject were exposed to a particular treatment or regime. In this approach, causal effects are comparisons of such potential outcomes. The fundamental problem of causal inference is that we can only observe one of the potential outcomes for a particular subject. The authors discuss how randomized experiments allow us to assess causal effects and then turn to observational studies. They lay out the assumptions needed for causal inference and describe the leading analysis methods, including, matching, propensity-score methods, and instrumental variables. Many detailed applications are included, with special focus on practical aspects for the empirical researcher.
Article
Recently, Grover and Srinivasan developed a latent-class-based approach to analyze market structure by using brand-switching data. The authors provide an iterative maximum likelihood procedure for estimating parameters of a model that incorporates heterogeneity within segments. Using the same dataset, they compare the results obtained by incorporating heterogeneity in two different ways.
Article
The authors propose an extension of the logit-mixture model that defines prior segment membership probabilities as functions of concomitant (demographic) variables. Using this approach it is possible to describe how membership in each of the segments, segments being characterized by a specific profile of brand preferences and marketing variable sensitivities, is related to household demographic characteristics. An empirical application of the methodology is provided using A.C. Nielsen scanner panel data on catsup. The authors provide a comparison with the results obtained using the extant methodology in estimation and validation samples of households.
Article
The authors define a market segment to be a group of consumers homogeneous in terms of the probabilities of choosing the different brands in a product class. Because the vector of choice probabilities is homogeneous within segments and heterogeneous across segments, each segment is characterized by its corresponding group of brands with "large" choice probabilities. The competitive market structure is determined as the possibly overlapping groups of brands corresponding to the different segments. The use of brand choice probabilities as the basis for segmentation leads to market structuring and market segmentation becoming reverse sides of the same analysis. Using panel data, the authors obtain the matrix of cross-classification of brands chosen on two purchase occasions and extract segments by using the maximum likelihood method for estimating latent class models. An application to the instant coffee market indicates that the proposed approach has substantial validity and suggests the presence of submarkets related to product attributes as well as to brand names.
Article
The authors develop a framework that incorporates projected profitability of customers in the computation of life-time duration. Furthermore, the authors identify factors under a manager's control that explain the variation in the profitable lifetime duration. They also compare other frameworks with the traditional methods such as the recency, frequency, and monetary value framework and past customer value and illustrate the superiority of the proposed framework. Finally, the authors develop several key implications that can be of value to decision makers in managing customer relationships.
Article
We investigate a model of one-to-one matching with transferable utility when some of the characteristics of the players are unobservable to the analyst. We allow for a wide class of distributions of unobserved heterogeneity, subject only to a separability assumption that generalizes Choo and Siow (2006). We first show that the stable matching maximizes a social gain function that trades off the average surplus due to the observable characteristics and a generalized entropy term that reflects the impact of matching on unobserved characteristics. We use this result to derive simple closed-form formulæ that identify the joint surplus in every possible match and the equilibrium utilities of all participants, given any known distribution of unobserved heterogeneity. If transfers are observed, then the pre-transfer utilities of both partners are also identified. We also present a very fast algorithm that computes the optimal matching for any specification of the joint surplus. We conclude by discussing some empirical approaches suggested by these results.
Article
With the advent of panel data on household purchase behavior, and the development of statistical procedures to utilize this data, firms can now target coupons to selected households with considerable accuracy and cost effectiveness. In this article, we develop an analytical framework to examine the effect of such targeting on firm profits, prices, and coupon face values. We also derive comparative statics on firms' optimal mix of offensive and defensive couponing, the number of coupons distributed, redemption rates, face values, and incremental sales per redemption. Among our findings: when rival firms can target their coupon promotions at brand switchers, the outcome will be a prisoner's dilemma in which the net effect of targeting is simply the cost of distribution plus the discount given to redeemers.
Article
It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.
Article
In this paper, we examine the pattern of correlation among consumer price sensitivities for customer purchase incidence decisions across complementary product categories. We use a hierarchical Bayesian multivariate probit model to uncover this pattern. We estimated this model using purchase incidence data for six categories involving three pairs of complementary products. Our results show a new and interesting pattern of correlation among price parameters of complementary products. For example, we find that the correlation of own-price sensitivities of complementary products is negative. These results are consistent across the three complementary pairs of products. We also investigate the reason for this counterintuitive result. Finally, we present some managerial implications of our model. We show how our model can be used for cross-category targeting decisions by retailers. We find that compared to nontargeted discounting, the average profitability gain from customized discounting across the three category pairs is only 1.29% when complementarity is ignored, but this gain improves to 8.26% when full complementarity is taken into account. We also investigate whether ignoring the complex pattern of correlation has implications for managerial actions regarding targeting and optimal discounting. We find that retailers can make misleading inferences about the impact of targeted discounts when they ignore cross-category effects in modeling.
Article
This study investigates the effectiveness of customized promotions at three levels of granularity (mass market, segment specific, and individual specific) in online and offline stores. The authors conduct an empirical examination of the profit potential of these customized promotion programs with a joint model of purchase incidence, choice, and quantity and through optimization procedures for approximately 300 conditions. They find that (1) optimization procedures lead to substantial profit improvements over the current practice for all types of promotions (customized and undifferentiated); (2) loyalty promotions are more profitable in online stores than in offline stores, while the opposite holds for competitive promotions; (3) the incremental payoff of individual-level over segment- and mass market-level customized promotions is small in general, especially in offline stores; (4) for categories that are promotion sensitive, individual-level customized promotions can lead to a meaningful profit increase over segment- and mass market-level customized promotions in online stores; and (5) low redemption rates are a major impediment to the success of customized promotions in offline stores. Optimal undifferentiated promotions should be the primary promotion program in this channel, and firms can benefit from offering a combination of optimal undifferentiated and customized promotions for suitable categories in offline stores.
Article
Asymptotic results from the statistical theory of k -means clustering are applied to problems of vector quantization. The behavior of quantizers constructed from long training sequences of data is analyzed by relating it to the consistency problem for k -means.
The Age of Personalization
  • Review Analytic Services Harvard Business
PyTorch: An imperative style, high-performance deep learning library
  • Adam Paszke
  • Sam Gross
  • Francisco Massa
  • Adam Lerer
  • James Bradbury
  • Gregory Chanan
  • Trevor Killeen
  • Zeming Lin
  • Natalia Gimelshein
  • Luca Antiga
  • Alban Desmaison
  • Andreas Köpf
  • Edward Yang
  • Zach Devito
  • Paszke Adam
Deep Learning for Individual Heterogeneity
  • H Max
  • Sanjog Farrell Tengyuan Liang
  • Misra