Article

Estimation of a transformation model with truncation, interval observation and time-varying covariates

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Abrevaya (1999b) considered estimation of a transformation model in the presence of left truncation. This paper observes that a cross-sectional version of the statistical model considered in Frederiksen et al. (2007) is a generalization of the model considered by Abrevaya (1999b) and the generalized model can be estimated by a pairwise comparison version of one of the estimators in Frederiksen et al. (2007). Specifically, our generalization will allow for discretized observations of the dependent variable and for piecewise constant time-varying explanatory variables. Copyright (C) The Author(s). Journal compilation (C) Royal Economic Society 2010.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... 3 This is in contrast to the fixed-effects panel transformation models (Abrevaya, 1999a(Abrevaya, , 2000Chen, 2002Chen, , 2010), or the cross-sectional transformation models (Abrevaya, 1999b;Shin, 2008;Honoré and Hu, 2010). 4 For the Box-Cox power transformation, h(y, λ) = 1 λ (y λ − 1) if λ = 0; log y if λ = 0, y > 0, which has a bounded range unless λ = 0 or an odd integer, the small-σ approximation makes the transformation function and the near-normality assumption more compatible. ...
Article
Full-text available
This paper investigates the asymptotic properties of quasi-maximum likelihood (QML) estimators for random-effect panel data transformation models where both the response and (some of) the covariates are subject to transformations for inducing normality, flexible functional form, homoskedasticity, and simple model structure. We develop a quasi maximum likelihood-type procedure for model estimation and inference. We prove the consistency and asymptotic normality of the QML estimators, and propose a simple bootstrap procedure that leads to a robust estimate of the variance-covariance matrix. Monte Carlo results reveal that these estimates perform well in finite samples, and that the gains by using bootstrap procedure for inference can be enormous.
Article
We propose a hazard model where dependence between events is achieved by assuming dependence between covariates. This model allows for correlated variables specific to observations as well as macro variables which all observations share. This setup better fits many economic and financial applications where events are not independent. Nonparametric estimation of the hazard function is then studied. Kernel estimators proposed in Nielsen and Linton (1995) and Linton et al. (2003) are shown to have similar asymptotic properties compared with the . case. Mixing conditions ensure the asymptotic results follow. These results depend on adjustments to bandwidth conditions. Simulations are conducted which verify the impact of dependence on estimators. Bandwidth selection accounting for dependence is shown to improve performance. In an empirical application, trade intensity in high-frequency financial data is estimated.
Article
The objective of this study is to investigate the simultaneity between farm couples decisions on labor allocation and production efficiency. Using an unbalanced panel data set of Norwegian farm households (1989-2008), we estimate off-farm labor supply of married farm couples and farm efficiency in a three-equation system of jointly determined endogenous variables. We address the issue of latent heterogeneity between households. We solve the problem by two-stage OLS and GLS estimation where state dependence is accounted for in the reduced form equations. We compare the results against simpler model specifications where we suppress censoring of off-farm labor hours and endogeneity of regressors, respectively. In the reduced form specification, a considerably large number of parameters are statistically significant. Davidson-McKinnon test of exogeneity confirms that both operator and spouses off-farm labor supply should be treated as endogenous in estimating farming efficiency. The parameter estimates seem robust across model specifications. Offfarm labor supply of farm operators and spouses is jointly determined. Off-farm work by farm operator and spouses positively affects farming efficiency. Farming efficiency increases with operators age, farm size, agricultural subsidises, and share of current investment to total farm capital stock.
Article
This paper presents a new estimator for counterfactuals in duration models. The counterfactual in a duration model is the length of the spell in case the regressor would have been different. We introduce the structural duration function, which gives these counterfactuals. The advantage of focusing on counterfactuals is that one does not need to identify the mixed proportional hazard model. In particular, we present examples in which the mixed proportional hazard model is unidentified or has a singular information matrix but our estimator for counterfactuals still converges at rate N 1/2, where N is the number of observations. We apply the structural duration function to simulate important policy effects, including a change in welfare benefits.
Article
Full-text available
We used survey data collected from cotton producers in eleven U.S. states to address the issues of correlated events and individual heterogeneity in multiple precision technologies adoption. Results from a conditional frailty model indicated that younger, better educated cotton producer adopted precision technology quickly once those technologies were available. Further, farm size and farm income have positive influence on a chance of technology adoption by the cotton farmers. Moreover, the conditional frailty model addresses for both heterogeneity and event dependence allowing different baseline hazards for each group of precision technology adopters.
Article
This chapter reviews the recent developments in the estimation of panel data models in which some variables are only partially observed. Specifically we consider the issues of censoring, sample selection, attrition, missing data, and measurement error in panel data models. Although most of these issues, except attrition, occur in cross-sectional or time series data as well, panel data models introduce some particular challenges due to the presence of persistent individual effects. The past two decades have seen many stimulating developments in the econometric and statistical methods dealing with these problems. This review focuses on two strands of research of the rapidly growing literature on semiparametric and nonparametric methods for panel data models: estimation of panel models with discrete or limited dependent variable and (ii) estimation of panel models based on nonparametric deconvolution methods.
Article
Full-text available
We examine fluctuations in the predicted educational attainment of newly arrived legal U.S. immigrants between 1972 and 1999 by combining data from the U.S. Immigration and Naturalization Service with the Current Population Survey. A mid-1980s decline gave way to a noticeable improvement in the skill base of the immigrant population between 1987 and 1993. A short decline in the quality of immigrant skills—less severe than that of the mid-1980s—took place in the mid-1990s. In 1998, the trend reverses once more: The labor market quality of new legal U.S. immigrants improves. The primary sources of the fluctuations include changes in the quality and quantity of immigrants obtaining an adjustment and variations in the distribution of source regions and entry class types among new legal U.S. immigrants.
Article
Full-text available
This paper investigates the responsiveness of household portfolios to tax incentives by exploiting a substantial tax reform that altered after-tax returns and cost of debt for a large number of households. An extraordinary panel data set that covers two years before and after the reform is used for the analysis. Our empirical findings suggest that households reshuffle their balance sheets in the case of a partial deductibility phase-out. In particular, heavily taxed,interest-bearing assets are used to pay off mortgage debt. Furthermore, we find that taxes have a significant impact on the structure of household portfolios even after controlling for unobserved heterogeneity.
Article
Full-text available
Han's maximum rank correlation estimator is shown to be square-root n-consistent and asymptotically normal. The proof rests on a general method for determining the asymptotic distribution of a maximization estimator, a simple U-statistic decomposition, and a uniform bound for degenerate U-processes. A consistent estimator of the asymptotic covariance matrix is provided, along with a result giving the explicit form of this matrix for any model within the scope of the maximum rate correlation estimator. The latter result is applied to the binary choice model, and it is found that the maximum rate correlation estimator does not achieve the semiparametric efficiency bound. Copyright 1993 by The Econometric Society.
Article
Full-text available
This paper tests the effects of the level and length of unemployment insurance benefits on unemployment durations. The paper particularly studies individual behavior during the weeks just prior to when benefits lapse. Higher unemployment insurance benefits are found to have a strong negative effect on the probability of leaving unemployment. However, the probability of leaving unemployment rises dramatically just prior to when benefits lapse. Individual data are used with accurate information of spell durations, and the level and length of benefits. The semiparametric estimation techniques used in the paper yield more plausible estimates than conventional approaches and provide useful diagnostics. Copyright 1990 by The Econometric Society.
Article
This paper considers the formulation and estimation of continuous time social science duration models. The focus is on new issues that arise in applying statistical models developed in biostatistics to analyze economic data and formulate economic models. Both single spell and multiple spell models are discussed. In addition, we present a general time inhomogeneous multiple spell model which contains a variety of useful models as special cases.Four distinctive features of social science duration analysis are emphasized:(1) Because of the limited size of samples available in economics and because of an abundance of candidate observed explanatory variables and plausible omitted explanatory variables, standard nonparametric procedures used in biostatistics are of limited value in econometric duration analysis. It is necessary to control for observed and unobserved explanatory variables to avoid biasing inference about underlying duration distributions. Controlling for such variables raises many new problems not discussed in the available literature.(2) The environments in which economic agents operate are not the time homogeneous laboratory environments assumed in biostatistics and reliability theory. Ad hoc methods for controlling for time inhomogeneity produce badly biased estimates.(3) Because the data available to economists are not obtained from the controlled experimental settings available to biologists, doing econometric duration analysis requires accounting for the effect of sampling plans on the distributions of sampled spells.(4) Econometric duration models that incorporate the restrictions produced by economic theory only rarely can be represented by the models used by biostatisticians. The estimation of structural econometric duration models raises new statistical and computational issues.Because of (1) it is necessary to parameterize econometric duration models to control for both observed and unobserved explanatory variables. Economic theory only provides qualitative guidance on the matter of selecting a functional form for a conditional hazard, and it offers no guidance at all on the matter of choosing a distribution of unobservables. This is unfortunate because empirical estimates obtained from econometric duration models are very sensitive to assumptions made about the functional forms of these model ingredients.In response to this sensitivity we present criteria for inferring qualitative properties of conditional hazards and distributions of unobservables from raw duration data sampled in time homogeneous environments; i.e. from unconditional duration distributions. No parametric structure need be assumed to implement these procedures.We also note that current econometric practice overparameterizes duration models. Given a functional form for a conditional hazard determined up to a finite number of parameters, it is possible to consistently estimate the distribution of unobservables nonparametrically. We report on the performance of such an estimator and show that it helps to solve the sensitivity problem.We demonstrate that in principle it is possible to identify both the conditional hazard and the distribution of unobservables without assuming parametric functional forms for either. Tradeoffs in assumptions required to secure such model identification are discussed. Although under certain conditions a fully nonparametric model can be identified, the development of a consistent fully nonparametric estimator remains to be done.We also discuss conditions under which access to multiple spell data aids in solving the sensitivity problem. A superficially attractive conditional likelihood approach produces inconsistent estimators, but the practical significance of this inconsistency is not yet known. Conditional inference schemes for eliminating unobservables from multiple spell duration models that are based on sufficient or ancillary statistics require unacceptably strong assumptions about the functional forms of conditional hazards and so are not robust. Contrary to recent claims, they offer no general solution to the model sensitivity problem.The problem of controlling for time inhomogeneous environments (Point (2)) remains to be solved. Failure to control for time inhomogeneity produces serious biases in estimated duration models. Controlling for time inhomogeneity creates a potential identification problem.For a single spell data it is impossible to separate the effect of duration dependence from the effect of time inhomogeneity by a fully nonparametric procedure. Although it is intuitively obvious that access to multiple spell data aids in the solution of this identification problem, the development of precise conditions under which this is possible is a topic left for future research.We demonstrate how sampling schemes distort the functional forms of sample duration distributions away from the population duration distributions that are the usual object of econometric interest (Point (3)). Inference based on misspecified duration distributions is in general biased. New formulae for the densities of commonly used duration measures are produced for duration models with unobservables in time inhomogeneous environments. We show how access to spells that begin after the origin date of a sample aids in solving econometric problems created by the sampling schemes that are used to generate economic duration data.We also discuss new issues that arise in estimating duration models explicitly derived from economic theory (Point (4)). For a prototypical search unemployment model we discuss and resolve new identification problems that arise in attempting to recover structural economic parameters. We also consider nonstandard statistical problems that arise in estimating structural models that are not treated in the literature. Imposing or testing the restrictions implied by economic theory requires duration models that do not appear in the received literature and often requires numerical solution of implicit equations derived from optimizing theory.
Article
This paper proposes a class of estimators of the semiparametric censored regression model under the assumption that the error terms are i.i.d. and independent of the covariates. The estimators exploit the fact that, for a pair of observations, a particular transformation of the censored variables (depending upon the parameter vector and both covariate vectors) will be identically distributed. Therefore, the difference in the transformed dependent variables will be symmetrically distributed around zero when the transformation is evaluated at the true parameter vector. An analogous class of estimators for the truncated regression model includes the Bhattacharya, Chernoff, and Yang (1983) estimator as a special case. The estimators are defined as minimizers of U-processes, so the large-sample theory for classical M-estimators is extended to this case. Conditions are given to ensure root-n-consistency and asymptotic normality of the estimators, and a small-scale simulation study for the censored regression estimator suggest that it will be well-behaved in finite samples.
Article
Dynamic discrete choice panel data models have received a great deal of attention. In those models, the dynamics is usually handled by including the lagged outcome as an explanatory variable. In this paper we consider an alternative model in which the dynamics is handled by using the duration in the current state as a covariate. We propose estimators that allow for group-specific effect in parametric and semiparametric versions of the model. The proposed method is illustrated by an empirical analysis of job durations allowing for firm-level effects.
Article
This paper shows that the objective function for the maximum rank correlation estimator can be evaluated in calculations (where n is the sample size). Previously, O(n2) calculations were thought necessary for computation of the objective function.
Article
The paper considers estimation of a model yi = D · F(x′iβ0, ui), where the composite transformation D · F is only specified that is non-degenerate monotonic and is strictly monotonic in each of its variables. The paper thus generalizes standard data analysis which assumes that the functional form of D · F is known and additive. The estimator which it proposes is the maximum rank correlation estimator which is non-parametric in the functional form of D · F and non-parametric in the distribution of the error terms, ui. The estimator is shown to be strongly consistent for the parameters β0 up to a scale coefficient.
Article
In this paper we propose estimators for the regression coefficients in censored duration models which are distribution free, impose no parametric specification on the baseline hazard function, and can accommodate general forms of censoring. The estimators are shown to have desirable asymptotic properties and Monte Carlo simulations demonstrate good finite sample performance. Among the data features the new estimators can accommodate are covariate-dependent censoring, double censoring, and fixed (individual or group specific) effects. We also examine the behavior of the estimator in an empirical illustration.
Article
Generalizing the mode regression of Lee (1989) with the rectangular kernel (RME), we try a quadratic kernel (QME), smoothing the rectangular kernel. Like RME, QME is the most useful when the dependent variable is truncated. QME is better than RME in that it gives a estimator and an asymptotic distribution which parallels that of Powell's (1986) symmetrically trimmed least squares (STLS). In general, the symmetry requirement of QME is weaker than that of STLS and stronger than that of RME. Estimation of the covariance matrices of both QME and STLS requires density estimation. But a variation of QME can provide an upper bound of the covariance matrix without the burden of density estimation. The upper bound can be made tight at the cost of computation time.
Article
The paper develops the bootstrap theory and extends the asymptotic theory of rank estimators, such as the Maximum Rank Correlation Estimator (MRC) of Han (1987), Monotone Rank Estimator (MR) of Cavanagh and Sherman (1998) or Pairwise-Difference Rank Estimators (PDR) of Abrevaya (2003). It is known that under general conditions these estimators have asymptotic normal distributions, but the asymptotic variances are difficult to find. Here we prove that the quantiles and the variances of the asymptotic distributions can be consistently estimated by the nonparametric bootstrap. We investigate the accuracy of inference based on the asymptotic approximation and the bootstrap, and provide bounds on the associated error. In the case of MRC and MR, the bound is a function of the sample size of order close to n^{-1/6}. The PDR estimators belong to a special subclass of rank estimators for which the bound is vanishing with the rate close to n^{-1/2}. The theoretical findings are illustrated with Monte-Carlo experiments and a real data example.
Article
Use of the proportional hazards regression model (Cox 1972) substantially liberalized the analysis of censored survival data with covariates. Available procedures for estimation of the relative risk parameter, however, do not adequately handle grouped survival data, or large data sets with many tied failure times. The grouped data version of the proportional hazards model is proposed here for such estimation. Asymptotic likelihood results are given, both for the estimation of the regression coefficient and the survivor function. Some special results are given for testing the hypothesis of a zero regression coefficient which leads, for example, to a generalization of the log-rank test for the comparison of several survival curves. Application to breast cancer data, from the National Cancer Institute-sponsored End Results Group, indicates that previously noted race differences in breast cancer survival times are explained to a large extent by differences in disease extent and other demographic characteristics at diagnosis.
Article
This paper presents a new estimator for the mixed proportional hazard model that allows for a nonparametric baseline hazard and time-varying regressors. In particular, this paper allows for discrete measurement of the durations as happens often in practice. The integrated baseline hazard and all parameters are estimated at the regular rate, root N, where N is the number of individuals. A hazard model is a natural framework for time-varying regressors. In particular, if a flow or a transition probability depends on a regressor that changes with time, a hazard model avoids the curse of dimensionality that would arise from interacting the regressors at each point in time with one another. This paper also presents a new test to detect unobserved heterogeneity.
Article
This paper introduces rank estimators for a general transformation model with observable truncation points. The estimators, which are modified versions of the rank estimators of A. K. Han [J. Econom. 35, 303-316 (1987; Zbl 0638.62063)] and C. Cavanagh and R. P. Sherman [J. Econom. 84, No. 2, 351-381 (1998; Zbl 0952.62105)], are asymptotically normal and require no bandwidth choice. Log-concavity of the error disturbance’s survival function is a sufficient condition for the associated monotonicity conditions. Monte Carlo simulations investigate the estimators’ behavior for various sample sizes.
Article
This paper considers estimation of truncated.and censored regression models with fixed effects. Up until now, no estimator has been shown to be consistent as the cross-section dimension increases with the time dimension fixed. Trimmed least absolute deviations and trimmed least squares estimators are proposed for the case where the panel is of length two, and it is proven that they are consistent and asymptotically normal. It is not necessary to maintain parametric assumptions on the error terms to obtain this result. A small scale Monte Carlo study demonstrates that these estimators can perform well in small samples. Copyright 1992 by The Econometric Society.
Article
This paper explores the robustness of the essential economic conclusions of the Roy model of self-selection and income inequality to relaxation of its normality assumptions. A log concave version of the model reproduces most of the main results. Log convex cases offer counterexamples. The authors show that in a Roy economy, random assignment is inegalitarian and Pareto inefficient. They consider nonparametric identifiability of latent skill distributions with cross-section and panel data. The authors' analysis proves nonparametric identifiability for the closely related competing risks model. Copyright 1990 by The Econometric Society.
Partial rank estimation of duration models with general forms of censoring The empirical content of the roy model. c ? Royal Economic Society 2009 rTransformation Models with Truncation and Time-Varying Covariates13 Lee Quadratic mode regression
  • S Khan
  • E Tamer
Khan, S. and E. Tamer (2007). Partial rank estimation of duration models with general forms of censoring. Journal of Econometrics 136, 251–280. The empirical content of the roy model. c ? Royal Economic Society 2009 rTransformation Models with Truncation and Time-Varying Covariates13 Lee, M. J. (1993). Quadratic mode regression. Journal of Econometrics 57, 1–19