# Sokbae Lee's research while affiliated with Columbia University and other places

## Publications (33)

Preprint
Full-text available
While applications of big data analytics have brought many new opportunities to economic research, with datasets containing millions of observations, making usual econometric inferences based on extreme estimators would require huge computing powers and memories that are often not accessible. In this paper, we focus on linear quantile regression em...
Article
This article describes three methods for carrying out nonasymptotic inference on partially identified parameters that are solutions to a class of optimization problems. Applications in which the optimization problems arise include estimation under shape restrictions, estimation of models of discrete games, and estimation based on grouped data. The...
Article
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online...
Preprint
Full-text available
This paper investigates the development of competitiveness by comparing three Korean groups in South Korea, born and raised in three countries with distinct institutional environments: South Korea, North Korea, and China. Results based on laboratory experiments show that North Korean refugees are significantly less competitive than South Koreans or...
Preprint
Full-text available
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online...
Article
Full-text available
In this paper, we estimate the time-varying COVID-19 contact rate of a Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate is constructed using data on actively infected, recovered and deceased cases. We propose a new trend filtering method that is a variant of the Hodrick-Prescott (HP) filter, constrained by the number...
Preprint
We develop an inference method for a (sub)vector of parameters identified by conditional moment restrictions, which are implied by economic models such as rational behavior and Euler equations. Building on Bierens (1990), we propose penalized maximum statistics and combine bootstrap inference with model selection. Our method is optimized to be powe...
Article
Full-text available
This paper examines the trends in geographic localization of knowledge spillovers via patent citations, extracting multiple cohorts of new sample US patents from the period of 1976–2015. Despite accelerating globalization and widespread perception of the ‘death of distance’, our matched-sample study reveals significant and growing localization effe...
Article
Full-text available
We investigate state‐dependent effects of fiscal multipliers and allow for endogenous sample splitting to determine whether the U.S. economy is in a slack state. When the endogenized slack state is estimated as the period of the unemployment rate higher than about 12%, the estimated cumulative multipliers are significantly larger during slack perio...
Article
Full-text available
We consider the problem of binary classification with covariate selection. We construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected covariates. This optimization problem is equivalent to obtaining an ℓ0-penalized maximum score estimator. We derive probability bounds on the e...
Preprint
We consider both $\ell _{0}$-penalized and $\ell _{0}$-constrained quantile regression estimators. For the $\ell _{0}$-penalized estimator, we derive an exponential inequality on the tail probability of excess quantile prediction risk and apply it to obtain non-asymptotic upper bounds on the mean-square parameter and regression function estimation...
Article
Full-text available
We compare two groups of the non-student Korean population—native-born South Koreans (SK) and North Korean refugees (NK)—with contrasting institutional and cultural backgrounds. In our experiment, the subjects play dictator games under three different treatments in which the income source varies: first, the income is randomly given to the subject;...
Article
Full-text available
The majority of immigrants to the United States at the beginning of the 20th century adopted American first names. In this paper we study the economic determinants of name choice, by relating the propensity of immigrants to carry an American first name to the local concentration of their compatriots and local labor market conditions. We find that h...
Preprint
We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive non-asymptotic probability bounds on the estimated sparsity as well as on the excess misclassification risk. In particular, we show that...
Article
Full-text available
In this paper, we investigate what can be learned about average counterfactual outcomes as well as average treatment effects when it is assumed that treatment response functions are smooth. We obtain a set of new partial identification results for both the average treatment response and the average treatment effect. In particular, we find that the...
Article
Full-text available
We analyse cross-country trends in several aspects of technological progress during 1980–2011 by examining the US patent citations data. Our estimation results on patent quality and citation lags relative to the US reveal the following. The emerging Asian economies of Korea, Taiwan and China have achieved substantial catch-up. In the case of Korea...
Data
Appendix A. Data. Appendix B. Hirsch Index for Patent Citations. Appendix C. Robustness Check.
Article
Full-text available
In this paper, we propose a doubly robust method to estimate the heterogeneity of the average treatment effect with respect to observed covariates of interest. We consider a situation where a large number of covariates are needed for identifying the average treatment effect but the covariates of interest for analyzing heterogeneity are of much lowe...
Article
Full-text available
We show that the generalized method of moments (GMM) estimation problem in instrumental variable quantile regression (IVQR) models can be equivalently formulated as a mixed integer quadratic programming problem. This enables exact computation of the GMM estimators for the IVQR models. We illustrate the usefulness of our algorithm via Monte Carlo ex...
Article
Full-text available
Recombinant innovation, the combination of existing ideas, is important for technological progress; we want to understand how important market frictions are in stifling the transmission of ideas from one firm to another. Although the theoretical literature emphasizes the importance of these frictions, direct empirical evidence on them is limited. W...
Article
Full-text available
We consider a variable selection problem for the prediction of binary outcomes. We study the best subset selection procedure by which the explanatory variables are chosen by maximising Manski (1975, 1985)'s maximum score type objective function subject to a constraint on the maximal number of selected variables. We show that this procedure can be e...
Article
The division of Korea is a historic social experiment that randomly assigned ex ante identical individuals into two different economic and political institutions. About 70 years after the division, we sample Koreans who were born and raised in the two different parts of Korea to study whether institutions affect social preferences. We find that tho...
Article
Full-text available
In a randomized control trial, the precision of an average treatment effect estimator can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as a census, or a household survey, to inform the choice of both the sam...
Article
In this paper, we propose a doubly robust method to present the heterogeneity of the average treatment effect with respect to observed covariates of interest. We consider a situation where a large number of covariates are needed for identifying the average treatment effect but the covariates of interest for analyzing heterogeneity are of much lower...
Article
We present the clrbound, clr2bound, clr3bound, and clrtest commands for estimation and inference on intersection bounds as developed by Chernozhukov, Lee, and Rosen (2013, Econometrica 81: 667–737). The intersection bounds framework encompasses situations where a population parameter of interest is partially identified by a collection of consistent...
Article
We analyze cross-country trends in several aspects of technological progress over the period of 1980-2011 by examining citations data from almost 4 million utility patents granted by the US Patent and Trademark Office (USPTO). Our estimation results on patent quality and citation lags relative to the US reveal the following observations. The emergi...

## Citations

... In our benchmark Monte Carlo experiment, we find that our strategy for finding the projections of the identified set can be approximately one million times faster than the Ciliberto and Tamer (2009) algorithm (the relative speed gain depends on the simulation parameters such as the number of simulation draws and the number of points used to approximate the parameter space). Finally, we also propose a simple and computationally attractive approach to constructing confidence sets for the identified sets by leveraging the key insights from Horowitz and Lee (2021); the main idea is to account for the sampling uncertainty by constructing simultaneous confidence intervals for the conditional choice probabilities. ...
... Threshold models with a nonconstant threshold have the advantage of flexibility in capturing time-varying economic concepts containing meaningful economic implications in empirical studies (e.g. Dueker et al. 2013;Yang and Su 2018;Yang 2019Yang , 2020Yang , 2021Yu and Fan 2021;Yang et al. 2021;Lee et al. 2021). ...
... First, the true number of cases is not observed and the ratio of unreported to reported cases varies over time, due to both changes in testing capacity and in testing behavior. Some econometric studies ignore unreported cases and model only reported cases (Jiang et al., Liu et al., 2021, Khismatullina and Vogt, 2021, Lee et al., 2021; this can lead to inconsistent parameter estimates or to serious mid and long-term forecasting errors, depending on the goal of the study (Korolev, 2021). Other studies employ various strategies to identify the share of unreported cases: Li et al. (2020) and Hortaçsu et al. (2021) identify the unreported cases through their mobility across regions; Arias et al. (2022), Rozhnova et al. (2021), Viana et al. (2021) and Toulis (2021) use random sample serology tests; Gourieroux and Jasiak (2020) use parametric time-varying transition probabilities, and Sonabend et al. (2021) use random tests in the population. ...
... Since knowledge is tacit and embodied in the people or routines of an organisation, the speed and geographic distance by which knowledge can be transferred are strictly limited (Nelson & Sidney, 1982). Such a nature of knowledge results in "localisation of knowledge spillover' (Almeida & Kogut, 1999;Kwon et al., 2022) and empirical evidence provides strong support for this. analyse patent citations to test the extent to which knowledge spillovers are localised geographically, and find that patent citations tend to come from the same area where the originating patents are located. ...
... Faria e Castro (2021) analyzes the effects of coronavirus outbreak on the United States economy, through an econometric model, relating fiscal policies, public debt, household income, and unemployment. Also, Lee, Liao, Seo, and Shin (2020) investigate the effects of fiscal multipliers on the United States economy. Auray and Eyquem (2020) discuss the effects of lockdown on Euro Area economy on inflation and unemployment, as well as government spending and unemployment insurance. ...
... For binary classification, where a matrix B reduces to a single vector β ∈ R d , the sparsity is naturally characterized by the l 0 (quasi)-norm ||β|| 0 -the number of non-zero entries of β (see, e.g., Abramovich and Grinshtein, 2019;Chen and Lee, 2021). For the multiclass case there is a wide spectrum of possible ways to extend the notion of sparsity associated with different assumptions on the regression coefficients matrix B. In this section we consider several of them and derive misclassification excess risk bounds for the resulting multiclass sparse linear classifiers. ...
... However, early studies about names mainly focus on name preferences (Allen et al., 1941;Arthaud et al., 1948;Finch et al., 1944), which remains a central topic in name studies even after 80 years (Beaudin et al., 2022;Carneiro et al., 2020). Although these literatures do not offer information about how a name changes one's personality or decisions, they reflect people's beliefs about names over time. ...
... It is natural to couple Assumption 3 with the following smoothness assumption, which has been used in models with counterfactual outcomes (Kim et al. 2018). ...
... One group of studies on Chinese domestic patents (e.g., Hu and Jefferson (2009), Eberhardt et al. (2017), Hu et al. (2017), and Fang et al. (2020)) merged firm-level data from Annual Surveys of Industrial Enterprises in China (ASIEC) released by the National Bureau of Statistics (NBS) with patent data from CNIPA, but none of these studies contain substantial information on patent quality. Another group of studies on Chinese patents (e.g., Boeing and Mueller (2019), Fisch et al. (2017), Hu and Mathews (2008), Kwon et al.(2017), and Rong et al. (2017)) used information on the quality of Chinese patents granted overseas. These studies, although useful, do not provide a comprehensive picture of Chinese patenting. ...
... Using the estimates from the two-model approach in combination with inverse probability weighting (IPW) decreases the variance of the estimator and controls for observed confounding (see e.g. ). Additional orthogonalization using the two conditional mean functions produced by the TMA further decreases the bias of the parameter of interest (Lee et al., 2017). The doubly-robust estimator can be used in high-dimensional settings to estimate a reduced dimensional conditional average treatment effect function. ...