Matt Taddy’s research while affiliated with Amazon and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (37)


Lasso approximation of a correlated random effects model with approximately sparse deviations. Notes: Dots indicate (Mi,ai), with the horizontal axis showing values of M¯i and vertical axis the values of ai=Mi′δ0M+ξi. The lasso estimated unit effects are shown by rombi (Mi,aˆi), where aˆi:=MiδˆM+ξˆi. The time‐invariant controls Mi are generated as i.i.d. draws from U[0,1]; the sparse deviations are ξi = 1/i², i = 1,2,…, N = 20, T = 1; and δ0M=1. Here, we show the realization just for one experiment.
Histogram of estimated price elasticities for each category of Level 2 (Figure 2a) and Level 3 (Figure 2b) for Snacks. Estimates: Orthogonal lasso (left panel), debiased orthogonal lasso (middle panel), OLS (right panel). See the text for details.
Inference on heterogeneous treatment effects in high‐dimensional dynamic panels under weak dependence
  • Article
  • Full-text available

May 2023

·

63 Reads

·

24 Citations

Quantitative Economics

·

Matt Goldman

·

Victor Chernozhukov

·

Matt Taddy

This paper provides estimation and inference methods for conditional average treatment effects (CATE) characterized by a high‐dimensional parameter in both homogeneous cross‐sectional and unit‐heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross‐fitted residuals. This step uses a novel generic cross‐fitting method that we design for weakly dependent time series and panel data. This method “leaves out the neighbors” when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE—the lasso CATE—that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first‐stage regression, the orthogonal learner converges faster than the single‐stage regression‐based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low‐dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak's (1978) model of correlated unit effects as a linear function of time‐invariant covariates and make use of L1‐penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross‐sectional (i.i.d.) case.

Download

Measuring Technological Innovation over the Long Run

September 2021

·

108 Reads

·

223 Citations

American Economic Review Insights

We use textual analysis of high-dimensional data from patent documents to create new indicators of technological innovation. We identify important patents based on textual similarity of a given patent to previous and subsequent work: these patents are distinct from previous work but related to subsequent innovations. Our importance indicators correlate with existing measures of patent quality but also provide complementary information. We identify breakthrough innovations as the most important patents—those in the right tail of our measure—and construct time series indices of technological change at the aggregate and sectoral levels. Our technology indices capture the evolution of technological waves over a long time span (1840 to the present) and cover innovation by private and public firms as well as nonprofit organizations and the US government. Advances in electricity and transportation drive the index in the 1880s, chemicals and electricity in the 1920s and 1930s, and computers and communication in the post-1980s. (JEL C43, N71, N72, O31, O33, O34)


Text as Data

September 2019

·

214 Reads

·

639 Citations

Journal of Economic Literature

An ever-increasing share of human interaction, communication, and culture is recorded as digital text. We provide an introduction to the use of text as an input to economic research. We discuss the features that make text different from other forms of data, offer a practical overview of relevant statistical methods, and survey a variety of applications. (JEL C38, C55, L82, Z13)


Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech

July 2019

·

83 Reads

·

273 Citations

Econometrica

We study the problem of measuring group differences in choices when the dimensionality of the choice set is large. We show that standard approaches suffer from a severe finite‐sample bias, and we propose an estimator that applies recent advances in machine learning to address this bias. We apply this method to measure trends in the partisanship of congressional speech from 1873 to 2016, defining partisanship to be the ease with which an observer could infer a congressperson's party from a single utterance. Our estimates imply that partisanship is far greater in recent years than in the past, and that it increased sharply in the early 1990s after remaining low and relatively constant over the preceding century.


The Geometry of Culture: Analyzing Meaning through Word Embeddings

March 2018

·

626 Reads

·

412 Citations

American Sociological Review

We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing a relational model of meaning consistent with contemporary theories of identity and culture. We show that dimensions induced by word differences (e.g. man - woman, rich - poor, black - white, liberal - conservative) in these vector spaces closely correspond to dimensions of cultural meaning, and the projection of words onto these dimensions reflects widely shared cultural connotations when compared to surveyed responses and labeled historical data. We pilot a method for testing the stability of these associations, then demonstrate applications of word embeddings for macro-cultural investigation with a longitudinal analysis of the coevolution of gender and class associations in the United States over the 20th century and a comparative analysis of historic distinctions between markers of gender and class in the U.S. and Britain. We argue that the success of these high-dimensional models motivates a move towards "high-dimensional theorizing" of meanings, identities and cultural processes.


The Geometry of Culture: Analyzing Meaning through Word Embeddings

March 2018

·

2 Reads

We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing a relational model of meaning consistent with contemporary theories of identity and culture. We show that dimensions induced by word differences (e.g. man - woman, rich - poor, black - white, liberal - conservative) in these vector spaces closely correspond to dimensions of cultural meaning, and the projection of words onto these dimensions reflects widely shared cultural connotations when compared to surveyed responses and labeled historical data. We pilot a method for testing the stability of these associations, then demonstrate applications of word embeddings for macro-cultural investigation with a longitudinal analysis of the coevolution of gender and class associations in the United States over the 20th century and a comparative analysis of historic distinctions between markers of gender and class in the U.S. and Britain. We argue that the success of these high-dimensional models motivates a move towards "high-dimensional theorizing" of meanings, identities and cultural processes.


Low-Rank Bandit Methods for High-Dimensional Dynamic Pricing

January 2018

·

40 Reads

·

22 Citations

We consider high dimensional dynamic multi-product pricing with an evolving but low-dimensional linear demand model. Assuming the temporal variation in cross-elasticities exhibits low-rank structure based on fixed (latent) features of the products, we show that the revenue maximization problem reduces to an online bandit convex optimization with side information given by the observed demands. We design dynamic pricing algorithms whose revenue approaches that of the best fixed price vector in hindsight, at a rate that only depends on the intrinsic rank of the demand model and not the number of products. Our approach applies a bandit convex optimization algorithm in a projected low-dimensional space spanned by the latent product features, while simultaneously learning this span via online singular value decomposition of a carefully-crafted matrix containing the observed demands.



Figure 1: Product classification for Drinks
Figure 2: Average price elasticity for Level 2 categories: estimate and 95% confidence bands. The estimator is constructed in two steps. In Step 1, we estimate the reduced form of (log) price and (log) quantity sold by regressing the respective dependent variable on the L = 4 lags of price, quantity, product classification indicators using a variant of Lasso proposed by Kock and Tang (2019) with the penalty λ P chosen by cross-validation and take the residuals. In Step 2, we estimate the price elasticity using OLS with the sales residual as the dependent variable and price residual, interacted with product categories, as the regressors (i.e., we use Orthogonal Least Squares estimator of Definition 5.1). We use different samples between the two stages in the form of Panel cross-fitting of Definition 2.1 with K = 2 sample partition (i.e., I 1 consists of the first two years of data and I 2 consists of the last two years). The total sample sizes are 418494 observations for Protein, 388211 for Household Items, 270733 for Drinks, and 148013 for Other Food.
Figure 3: Histogram of estimated price elasticities for each category of Level 2 (Figure 3a), Level 3 (Figure 3b), and Level 4 (Figure 3c) for Dairy products. On the left panel, we execute Lasso regression (i.e., Orthogonal Lasso of Definition 3.1). On the middle panel, we execute a version of Debiased Orthogonal Lasso of Definition 3.3 where we use Ridge inverse ( Q + λI d ) −1 instead of CLIME inverse to simplify computation. On the right panel, we execute OLS (i.e., Orthogonal Least Squares of Definition 5.1). The total sample size is 418494. See text for details.
Figure 4: Average price elasticity for Level 2 categories: estimate and 95% confidence bands, by month. The estimator is constructed in two steps. In Step 1, we estimate the reduced form of (log) price and (log) quantity sold by regressing the respective dependent variable on the L = 4 lags of price, quantity, product classification indicators using a variant of Lasso proposed by Kock and Tang (2019) with the penalty chosen by cross-validation and take the residuals. In Step 2, we use OLS treating the sales residual as the dependent variable and price residual, interacted with month dummies, as the regressors' vector. We use different samples between the two stages in the form of Panel cross-fitting of Definition 2.1 with K = 2 sample partition (i.e., I 1 consists of the first two years of data and I 2 consists of the last two years). The total sample sizes are 418494 observations for Protein, 388211 for Household Items, 270733 for Drinks (Water and Sodas), and 148013 for Other Food.
Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels

December 2017

·

1,514 Reads

·

31 Citations

There has been growing interest in how economists can import machine learning tools designed for prediction to facilitate, optimize and automate the model selection process, while still retaining desirable inference properties for causal parameters. Focusing on partially linear models, we extend the Double ML framework to allow for (1) a number of treatments that may grow with the sample size and (2) the analysis of panel data under sequentially exogenous errors. Our low-dimensional treatment (LD) regime directly extends the work in Chernozhukov et al. (2016), by showing that the coefficients from a second stage, ordinary least squares estimator attain root-n convergence and desired coverage even if the dimensionality of treatment is allowed to grow at a rate of O(N/ log N ). Additionally we consider a high-dimensional sparse (HDS) regime in which we show that second stage orthogonal LASSO and debiased orthogonal LASSO have asymptotic properties equivalent to oracle estimators with known first stage estimators. We argue that these advances make Double ML methods a desirable alternative for practitioners estimating short-term demand elasticities in non-contractual settings.


Accurate Inference for Adaptive Linear Models

December 2017

·

47 Reads

·

18 Citations

Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method decorrelation procedure -- W-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finite-sample bias and variance of the W-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic W-decorrelation procedure in two different adaptive data settings: the multi-armed bandits and autoregressive time series models.


Citations (31)


... The argument above relies on the classical partialling-out or orthogonalization approach (Frisch and Waugh, 1933;Lovell, 1963;Robinson, 1988). The "residual-on-residual" method underlies double machine learning (also called R-learning), which uses cross-fitted machine learning to estimate residuals and then infers the CAPE by least squares (Chernozhukov et al., 2018a;Nie and Wager, 2021;Semenova et al., 2023). This approach is part of a broader class of debiased machine learning (DML) algorithms rooted in semiparametric learning theory (Levit, 1975;Hasminskii and Ibragimov, 1978;Pfanzagl and Wefelmeyer, 1985). ...

Reference:

Adventures in Demand Analysis Using AI
Inference on heterogeneous treatment effects in high‐dimensional dynamic panels under weak dependence

Quantitative Economics

... Papers by Audet and Dennis [2001] and by Lucidi et al. [2005], as well the paper by Abramson et al. [2009] (describing NOMAD) consider the mixed case and [Liuzzi et al. 2012[Liuzzi et al. , 2014] deals with problems whose variables are mixed-integer. Other methods of interest include HOPSPACK (Hybrid Optimization Parallel Search Package) [Gray et al. 2008[Gray et al. , 2010, which provides a general set of tools for pattern search exploiting parallel computing, but does not handle multilevel problems explicitly and MAPS (Model-Assisted Pattern Search) [Siefert et al. 1997], which uses pattern search on purely continuous problems. ...

HOPSPACK: Hybrid Optimization Parallel Search Package

·

·

Matt Taddy

·

[...]

·

Tamara G Kolda

... This shift from the mere presence of words to capturing meaning and connotations allow text models to identify abstract patterns and implicit ideas. For legislative rhetoric, this means the ability to detect the tone and the context that construct these messages (Radford et al., 2019;Gentzkow et al., 2019). ...

Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech
  • Citing Article
  • July 2019

Econometrica

... The essence of technological innovation lies in its commercial application and the economic benefits it generates, and not all inventions meet this criterion. Many patents may never be converted into innovative products in the market, and therefore patent data cannot be used as the only indicator of technological innovation performance (Kelly et al., 2018). ...

Measuring Technological Innovation over the Long Run
  • Citing Article
  • January 2018

SSRN Electronic Journal

... Word embeddings address the aforementioned limitations by creating a consistent and continuous meaning space, where words are positioned based on their similarity to other words, as determined by their usage in natural language samples [21, 76,84]. ...

The Geometry of Culture: Analyzing Meaning through Word Embeddings
  • Citing Article
  • March 2018

American Sociological Review

... Bandit algorithms were already suggested for pricing by Rothschild (1974) long before digital platform markets emerged. Today, there is an extensive academic literature on multi-armed bandit algorithms for pricing (Trovo et al. 2015, den Boer 2015, Bauer and Jannach 2018, Mueller et al. 2019, Elreedy et al. 2021, Taywade et al. 2023, Qu 2024 and there are numerous resources by practitioners on how to implement bandit algorithms for pricing. 2 Online optimization algorithms target online problems where agents optimize against a stochastic process that is unknown and independent of their actions. Game-theoretical problems are different because the actions of one player impact the objective of the others. ...

Low-Rank Bandit Methods for High-Dimensional Dynamic Pricing
  • Citing Article
  • January 2018

... where f (X) a train model and ρ(X) the treatment effect. Although ρ(X) is regarded as a function by some studies [42], we treat it as a constant ρ(X) = τ for simplicity. The next step is to explore the effect of the text feature T on the output Y while maintaining the rest of the features X constant. ...

Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels

... We emphasise that this is a well-studied phenomenon in MAB problems, referred to as incomplete learning (see e.g., Keskin and Zeevi 2018) and occurring when parameter estimates fail to converge to the true value. The main reason is the insufficient exploration of the arms, although recent work has pointed to some consequences of the sequential nature of data collection (see Deshpande et al. 2018). As a result, when using standard statistical estimators, such as the sample mean, in adaptively collected data, unbiasness, consistency, and asymptotic normality are no longer guaranteed (Hadad et al. 2021), with negative impacts on hypothesis testing, that is, inflated type-I error and low power . ...

Accurate Inference for Adaptive Linear Models
  • Citing Article
  • December 2017