Jeffrey Scott RacineMcMaster University | McMaster · Department of Economics, Graduate Program in Statistics, Department of Mathematics and Statistics
Jeffrey Scott Racine
PhD (1989) University of Western Ontario
About
128
Publications
17,838
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,162
Citations
Introduction
Jeffrey S. Racine occupies the Senator William McMaster Chair in Econometrics at McMaster University and is a Fellow of the Journal of Econometrics. His research interests include nonparametric estimation and inference, shape constrained estimation, cross-validatory model selection, frequentist model averaging, nonparametric instrumental methods, and entropy-based measures of dependence and their statistical underpinnings. He is also interested in parallel distributed computing paradigms and their application to computationally intensive nonparametric estimators. He is currently serving as an Associate Editor for Econometric Reviews. He is the co-author of Nonparametric Econometrics: Theory and Practice (2007, Princeton University Press).
Additional affiliations
January 2005 - present
January 2006 - present
July 2002 - January 2005
Education
September 1985 - August 1989
September 1984 - August 1985
September 1980 - April 1984
Publications
Publications (128)
Model averaging has a rich history dating from its use for combining forecasts from time-series models (Bates and Granger, 1969) and presents a compelling alternative to model selection methods. We propose a frequentist model averaging procedure defined over categorical regression splines (Ma, Racine, and Yang, 2015) that allows for mixed-data pred...
We consider a B-spline regression approach toward nonparametric modeling of a random effects (error component) model. We focus our attention on the estimation of marginal effects (derivatives) and their asymptotic properties. Theoretical underpinnings are provided, finite-sample performance is evaluated via Monte–Carlo simulation, and an applicatio...
We propose a kernel function for ordered categorical data that overcomes limitations present in ordered kernel functions appearing in the literature on the estimation of probability mass functions for multinomial ordered data. Some limitations arise from assumptions made about the support of the underlying random variable. Furthermore, many existin...
A number of tests have been proposed for assessing the location-scale assumption that is often invoked by practitioners. Existing approaches include Kolmogorov-Smirnov and Cramér-von-Mises statistics that each involve measures of divergence between unknown joint distribution functions and products of marginal distributions. In practice, the unknown...
This chapter looks at a range of popular univariate time series models and their use for forecasting.
This chapter looks at alternatives to the use of asymptotic theory and finite-sample theory for the purpose of inference. It considers numerical approaches that include the bootstrap and the Jackknife and considers procedures for dependent processes as well as heteroskedastic and independent identically distributed instances.
This chapter looks at issues surrounding outliers in data and methods for addressing their presence.
This book is designed to facilitate reproducibility in Econometrics. It does so by using open source software (R) and recently developed tools (R Markdown and bookdown) that allow the reader to engage in reproducible research. Illustrative examples are provided throughout, and a range of topics are covered. Assignments, exams, slides, and a solutio...
This chapter covers model selection methods and model averaging methods. It relies on knowledge of solving a quadratic program which is outlined in an appendix.
This chapter covers two advanced topics: a machine learning method (support vector machines useful for classification) and nonparametric kernel regression.
This chapter outlines pitfalls of using standard inference procedures common in cross- sectional settings in time series settings and presents alternative procedures. It also addresses the issue of spurious regression and cautions the reader against the unquestioning use of cross section tools in time series settings.
This chapter introduces time series data and outlines how it differs from cross sectional data. It also highlights how the object of interest when modelling time series data is the forecast, which differs from the object of interest in cross-sectional modelling, which is typically some parameter of interest that has an economic interpretation.
We consider the problem of model averaging over a set of semiparametric varying coefficient models where the varying coefficients can be functions of continuous and categorical variables. We propose a Mallows model averaging procedure that is capable of delivering model averaging estimators with solid finite-sample performance. Theoretical underpin...
In this paper we outline an interactive nonparametric methodology designed to facilitate the statistical analysis of nonlinear systems. The approach exploits an ensemble of nonparametric techniques including conditional density function estimation, conditional distribution function estimation, conditional mean estimation (regression) and conditiona...
The focus of this paper is the nonparametric estimation of the marginal effects (i.e. first partial derivatives) of an instrumental regression function ϕ defined by conditional moment restrictions that stem from a structural econometric model (Formula presented.), and involve endogenous variables Y and Z and instruments W. The derivative function (...
The semiparametric varying coefficient model is used in a wide range of applications. However, the traditional specification does not account for endogenous covariates, which restricts its application. In this paper we consider the estimation of semiparametric varying coefficient models when the functional coefficients may contain (continuous) endo...
Nonparametric conditional cumulative distribution function (CDF) estimation has emerged as a powerful tool having widespread potential application, which has led to a literature on estimators of conditional quantile functions that are obtained via inversion of the nonparametrically estimated conditional CDF. Other nonparametric estimators of condit...
This article outlines recent developments in Markdown scripting languages that facilitate the production of reproducible, publication quality, research. The approach is similar to that achieved by using, say, Sweave, R and LaTeX, but is written instead in simple Markdown syntax and not tied to any particular output format (e.g., MS Word) nor comput...
We propose a computationally efficient data-driven least squares cross-validation method to optimally select smoothing parameters for the nonparametric estimation of cumulative distribution/survivor functions. We allow for general multivariate covariates that can be continuous, discrete/ordered categorical or a mix of either. We provide asymptotic...
This article outlines recent developments in Markdown scripting languages that facilitate the production of replicable, publication quality, research. The approach is similar to that achieved by using, say, Sweave, R and LaTeX, but is written instead in simple Markdown syntax and not tied to any particular output format (e.g., MS Word) nor computat...
We consider the problem of model averaging over a set of semiparametric varying coefficient models where the varying coefficients can be functions of continuous and categorical variables. We propose a Mallows model averaging procedure that is capable of delivering model averaging estimators with solid finite-sample performance. Theoretical underpin...
We apply parametric and non-parametric regression discontinuity methodology within a multinomial choice setting to examine the effect of public healthcare user fee abolition on health facility choice by using data from South Africa. The non-parametric model is found to outperform the parametric model both in and out of sample, while also delivering...
Kernel estimates of entropy and mutual information have been studied extensively in statistics and econometrics, Kullback-Leibler divergence has been used in the kernel estimation literature, yet the information characteristic of kernel estimation remains unexplored. We explore kernel estimation as an information transmission operation where the em...
Local polynomial regression is extremely popular in applied settings. Recent developments in shape constrained nonparametric regression allow practitioners to impose constraints on local polynomial estimators thereby ensuring that the resulting estimates are consistent with underlying theory. However, it turns out that local polynomial derivative e...
In this paper, we examine the net effect of several major tax changes in Australia on residential property prices. Specifically, we consider the announcement and introduction effects that resulted from several policy changes including the introduction of the Goods and Services Tax (GST) and the accompanying First Home Owner Grant (FHOG). Using a la...
Li & Racine (2004) have proposed a nonparametric kernel-based method for smoothing in the presence of categorical predictors as an alternative to the classical nonparametric approach that splits the data into subsets (‘cells’) defined by the unique combinations of the categorical predictors. Li, Simar & Zelenyuk (2014) present an alternative to Li...
A semiparametric regression estimator that exploits categorical (i.e. discrete-support) kernel functions is developed for a broad class of hierarchical models including the pooled regression estimator, the fixed-effects estimator familiar from panel data, and the varying coefficient estimator, among others. Separate shrinking is allowed for each co...
We consider the problem of estimating a relationship nonparametrically using regression splines when there exist both continuous and categorical predictors. We combine the global properties of regression splines with the local properties of categorical kernel functions to handle the presence of categorical predictors rather than resorting to sample...
In this paper the consequences of considering the household ‘food share’ distribution as a welfare measure, in isolation from the joint distribution of itemized budget shares, is examined through the unconditional and conditional distribution of ‘food share’ both parametrically and nonparametrically. The parametric framework uses Dirichlet and Beta...
When comparing two competing approximate models, the one having smallest 'ex-pected true error' is closest to the data generating process (according to the specified loss function) and is therefore to be preferred. In this paper we consider a data-driven method of testing whether two competing approximate models, for instance a parametric and a non...
A number of approaches toward the kernel estimation of copula have appeared in the literature. Most existing approaches use a manifestation of the copula that requires kernel density estimation of bounded variates lying on a \(d\) -dimensional unit hypercube. This gives rise to a number of issues as it requires special treatment of the boundary and...
We extend Robinson's (1988) partially linear estimator to admit the mix of datatypes typically encountered by applied researchers, namely, categorical (nominal and ordinal) and continuous. We also relax the independence assumption that is prevalent in this literature and allow for β-mixing time-series data. We employ Li, Ouyang, and Racine's (2009)...
Many practical problems require nonparametric estimates of regression functions, and local polynomial regression has emerged as a leading approach. In applied settings practitioners often adopt either the local constant or local linear variants, or choose the order of the local polynomial to be slightly greater than the order of the maximum derivat...
This paper uses panel data and the Local Linear Kernel Estimator (LLKE) to investigate the effects of aid on economic growth in developing countries. Specifically, we investigate the robustness of a popular parametric specification of the aid/economic growth relationship in Less Developed countries (LDCs). First, we find that aid has a significant...
Production frontiers (i.e., 'production functions') specify the maximum output of rms, industries, or economies as a function of their inputs. A variety of innovative methods have been proposed for estimating both 'deterministic' and 'stochastic' frontiers. However, existing approaches are either parametric in nature, rely on nonsmooth nonparametri...
Nonparametric smoothing under shape constraints has recently received much well-deserved attention. Powerful methods have been proposed for impos-ing a single shape constraint such as monotonicity and concavity on univariate functions. In this paper, we extend the monotone kernel regression method in Hall and Huang (2001) to the multivariate and mu...
Semiparametric varying-coefficient models have become a common fixture in applied data analysis. Existing approaches, however, presume that those variables affecting the coefficients are continuous in nature (or that there exists at least one such continuous variable) which is often not the case. Furthermore, when all variables affecting the coeffi...
We propose a consistent kernel-based specification test for conditional density models when the dependent variable is categorical/discrete. The method is applicable to popular parametric binary choice models such as the logit and probit specification and their multinomial and ordered counterparts, along with parametric count models, among others. T...
We consider the problem of estimating a relationship using semiparametric additive regression splines when there exist both continuous and categorical regressors, some of which are irrelevant but this is not known a priori. We show that choosing the spline degree, number of subintervals, and bandwidths via cross-validation can automatically remove...
We propose a data-driven least-square cross-validation method to optimally select smoothing parameters for the nonparametric estimation of conditional cumulative distribution functions and conditional quantile functions. We allow for general multivariate covariates that can be continuous, categorical, or a mix of either. We provide asymptotic analy...
A new package crs is introduced for computing nonparametric regression (and quantile) splines in the presence of both continuous and categorical predictors. B-splines are employed in the regression model for the continuous predictors and kernel weighting is employed for the categorical predictors. We also develop a simple R interface to NOMAD, whic...
We consider the problem of obtaining appropriate weights for averaging M approximate (misspecified) models for improved estimation of an unknown conditional mean in the face of non-nested model uncertainty in heteroskedastic error settings. We propose a "jackknife model averaging" (JMA) estimator which selects the weights by minimizing a cross-vali...
This paper uses panel data from 77 developing countries, two measures of aid, and a dynamic panel data (DPD) estimator to investigate the effects of aid on economic growth. We find that the relationship between income growth and aid is quadratic in nature. We find a negative partial growth effect of aid at low levels of aid but a positive effect wh...
Brief introduction on the theoretical underpinnings of stKDE. The theoretical underpinnings of stKDE are briefly introduced here and more references are pointed out for interested readers.
(PDF)
trKDE Spatio-temporal relative risk surface showing the risk changes of Burkitt's lymphoma in the Western Nile district of Uganda from 1961–1975. The degree of risk is denoted by the shade of gray with black shading representing the highest risk and white the least risk. The solid contour lines delineate the significant high risk regions. Compared...
trKDE Spatio-temporal relative risk surface depicting the dynamic changes of schistosomiasis risk in the Guichi region of China from 2001–2006. The degree of risk is denoted by the shade of gray with black shading representing the highest risk and white the least risk. The solid contour lines delineate the significant high risk regions. Compared to...
Quantifying the distributions of disease risk in space and time jointly is a key element for understanding spatio-temporal phenomena while also having the potential to enhance our understanding of epidemiologic trajectories. However, most studies to date have neglected time dimension and focus instead on the "average" spatial pattern of disease ris...
This paper uses panel data and the Local Linear Kernel Estimator (LLKE), to investigate the effects of aid on physical capital investment in developing countries. Specifically, we investigate the robustness of the relationship between aid and physical capital investment in Less Developed countries (LDCs) using two different measures of aid and five...
The R environment for statistical computing and graphics (R Development Core Team, 2008) offers practitioners a rich set of statistical methods ranging from random number generation and optimization methods through regression, panel data, and time series methods, by way of illustration. The standard R distribution (base R) comes preloaded with a ri...
Data-driven methods of bandwidth selection are necessary for the sound application of kernel methods, with benefits including but not limited to automatic dimensionality reduction in the presence of irrelevant regressors [P. Hall, Q. Li, and J.S. Racine, ‘Nonparametric estimation of regression functions in the presence of irrelevant regressors, Rev...
Objectives: To estimate the impacts of policy change in two Canadian provinces on the ability of households to purchase dental care from 1969 to 2004. Methods: Data on out-of-pocket dental care expenditures were gathered from Statistic Canada's Family Expenditure Surveys and Surveys of Household Spending. Statistical techniques were used that contr...
The paper focuses on satisfaction with income and proposes a utility model built on two value systems, the `Ego' system - described as one own income assessment relatively to one own past and future income - and the `Alter' system - described as one own income assessment relatively to a reference group. We show how the union of these two value syst...
In this paper we consider the problem of testing for equality of two density or two conditional density functions defined over mixed discrete and continuous variables. We smooth both the discrete and continuous variables, with the smoothing parameters chosen via least-squares cross-validation. The test statistics are shown to have (asymptotic) norm...
We consider the problem of estimating a nonparametric regression model containing categorical regressors only. We investigate the theoretical properties of least squares cross-validated smoothing parameter selection, establish the rate of convergence (to zero) of the smoothing parameters for relevant regressors, and show that there is a high probab...
We consider a metric entropy capable of detecting deviations from symmetry that is suitable for both discrete and continuous processes. A test statistic is constructed from an integrated normed difference between nonparametric estimates of two density functions. The null distribution (symmetry) is obtained by resampling from an artificially lengthe...
The estimation of conditional probability distribution functions (PDFs) in a kernel nonparametric framework has recently received attention. As emphasized by Hall, Racine & Li (2004), these conditional PDFs are extremely useful for a range of tasks including modelling and predicting consumer choice. The aim of this paper is threefold. First, we imp...
We propose a new nonparametric conditional cumulative distribution function kernel estimator that admits a mix of discrete and categorical data along with an associated nonparametric conditional quantile estimator. Bandwidth selection for kernel quantile regression remains an open topic of research. We employ a conditional probability density funct...
We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of significance and consistent model specification tests...
Kernel smoothing techniques have attracted much attention and some notoriety in recent years. The attention is well deserved as kernel methods free researchers from having to impose rigid parametric structure on their data. The notoriety arises from the fact that the amount of smoothing (i.e., local averaging) that is appropriate for the problem at...
We propose a semiparametric varying-coefficient estimator that admits both qualitative and quantitative covariates along with a test for correct specification of parametric varying-coefficient models. The proposed estimator is exceedingly flexible and has a wide range of potential applica-tions including hierarchical (mixed) settings, small area es...
This article is a primer for those who wish to familiarize themselves with nonparametric econometrics. Though the underlying theory for many of these methods can be daunting for some practitioners, this article will demonstrate how a range of nonparametric methods can in fact be deployed in a fairly straightforward manner. Rather than aiming for en...
In this paper we propose a nonparametric kernel-based model specification test that can be used when the regression model contains both discrete and continuous regressors. We employ discrete variable kernel functions and we smooth both the discrete and continuous regressors using least squares cross-validation (CV) methods. The test statistic is sh...
We examine the performance of a metric entropy statistic as a robust test for time-reversibility (TR), symmetry, and serial dependence. It also serves as a measure of goodness-of-fit. The statistic provides a consistent and unified basis in model search, and is a powerful diagnostic measure with surprising ability to pinpoint areas of model failure...
In this paper we consider a nonparametric regression model that admits a mix of continuous and discrete regressors, some of which may in fact be redundant (that is, irrelevant). We show that, asymptotically, a data-driven least squares cross-validation method can remove irrelevant regressors. Simulations reveal that this "automatic dimensionality r...
Until recently, students and researchers in nonparametric and semiparametric statistics and econometrics have had to turn to the latest journal articles to keep pace with these emerging methods of economic analysis. Nonparametric Econometrics fills a major gap by gathering together the most up-to-date theory and techniques and presenting them in a...
In this paper we propose a test for the significance of categorical predictors in nonparametric regression models. The test is fully data-driven and employs cross-validated smoothing parameter selection while the null distribution of the test is obtained via bootstrapping. The proposed approach allows applied researchers to test hypotheses concerni...
Resampling methods such as the bootstrap are routinely used to estimate the finite-sample null distributions of a range of test statistics. We present a simple and tractable way to perform classical hypothesis tests based upon a kernel estimate of the CDF of the bootstrap statistics. This approach has a number of appealing features: (i) it can perf...
The identification of improved methods for characterizing crop yield densities has experienced a recent surge in activity due in part to the central role played by crop insurance in the Agricultural Risk Protection Act of 2000 (estimates of yield densities are required for the determination of insurance premium rates). Nonparametric kernel methods...
The relationship between alcohol availability and crime is investigated in this study. It first considers common parametric specifications that have been used in the literature. After applying a powerful consistent conditional moment test for correct specification, it is found that these common parametric specifications are rejected by the data. Th...
Abstract In this paper we analyse the influence of characteristics of the income distribution in modelling aggregate consumption expenditure. We model the aggregate consumption relation of a heterogeneous population, using a statistical distributional approach of aggregation, and apply it to UK-Family Expenditure Survey data. A bootstrap test based...
In this paper, we consider the problem of estimating a joint distribution that is defined over a set of discrete variables. We use a smoothing kernel estimator to estimate the joint distribution. We allow for the case in which some of the discrete variables are uniformly distributed, and explicitly address the vector-valued smoothing parameter case...
gnuplot , under active development since 1986, is an interactive plotting utility for UNIX, IBM OS|2, MS Windows, DOS, Apple Macintosh, VMS, Atari and many other platforms. It is free and open source, though it is not licensed under the GPL, nor is it GNU software. gnuplot supports a number of 'terminal' types including interactive screen terminals...
We revisit Fair's (1978) 'theory of extramarital affairs' using robust nonparametric methods developed for the analysis of categorical data. We find evidence suggesting that the number of years married is not a relevant predictor of the propensity to engage in extramarital affairs having controlled for other factors. This finding runs counter to th...
A transformed metric entropy measure of dependence is studied which satisfies many desirable properties, including being a proper measure of distance. It is capable of good performance in identifying dependence even in possibly nonlinear time series, and is applicable for both continuous and discrete variables. A nonparametric kernel density implem...
In this paper we consider the problem of estimating an unknown joint distribution which is defined over mixed discrete and continuous variables. A nonparametric kernel approach is proposed with smoothing parameters obtained from the cross-validated minimization of the estimator's integrated squared error. We derive the rate of convergence of the cr...
We consider a metric entropy capable of detecting deviations from symmetry. A consistent test statistic is constructed from an integrated normed di®erence between two density functions estimated using ker- nel methods. The null distribution (symmetry) is obtained by resam- pling from an arti¯cially lengthened series constructed from a rotation of t...
This paper proposes a semiparametric approach to the estimation of ¡®generalized¡¯ binary choice models. A ¡®generalized¡¯ binary choice model is one with separate indices for each conditioning variable which constitutes a generalization of the standard single-index approach typically employed in applied work. The choice probability distribution is...
R, an open-source programming environment for data analysis and graphics, has in only a decade grown to become a de-facto standard for statistical analysis against which many popular commercial programs may be measured. The use of R for the teaching of econometric methods is appealing. It provides cutting-edge statistical methods which are, by R's...
In this paper we consider a recently developed non parametric econometric method which is ideally suited to a wide range of marketing applications. We demonstrate the usefulness of this method via an application to direct marketing using data obtained from the Direct Marketing Association. Using independent hold-out data, the benchmark parametric m...