This Technical Report introduces a new multivariate difference-estimator for complex sample-surveys. It is an alternative to conventional model-assisted estimators that use specific inference.
Model-assisted estimators and the new difference-estimator both reduce variance in population estimates for M study-variables by using population statistics for J correlated auxiliary-variables, where M and J can number in the hundreds or thousands. Both are closely related to the difference-estimator offered by Särndal et al. (1992), although the new difference-estimator uses a different stochastic model. Both employ linear transformations of design-based estimators (e.g., Horvitz-Thompson). Both choose coefficients for a M×(M+J) transformation matrix that minimize variance of population estimates for each study-variable, where the degree of variance-reduction depends upon the specific correlation between each study-variable and each auxiliary-variable. Both estimators support expansion factors, which facilitate small-area estimators.
The new difference-estimator introduces a novel approach to variance-reduction with auxiliary data. Unlike model-assisted estimators, which require known population parameters for the J auxiliary-variables, the new estimator accommodates sample-survey estimates of those population parameters. Therefore, the new difference-estimator can directly use population estimates for auxiliary-variables from more complex sample-surveys, including components such as multi-phase and multi-stage sampling-designs, cluster plots, interpenetrating panels, and supplemental surveys.
The new difference-estimator introduces numerical advances. The model-assisted estimator with specific inference requires inversion of the J×J covariance matrix for population estimates of J auxiliary-variables; and that matrix inverse is infeasible or numerically unstable if the covariance matrix is rank-deficient or ill-conditioned. The new difference-estimator incorporates a recursive method; it replaces that J×J matrix inverse with up to J scalar inverses. The j th step in the recursion minimizes variances of all M study-variables with the j th scalar auxiliary-residual; and it removes any collinearity between the j th auxiliary-variable and all remaining auxiliary-variables. The recursion ceases if the j th scalar inverse is numerically unstable (i.e., division by a very small number). This suggests the J×J covariance matrix has rank (j−1), and all auxiliary information is essentially exhausted after recursions with the first (j−1) auxiliary-variables.
The recursive method used in the new difference-estimator simplifies nonlinear estimation procedures, such as inequality constraints on population estimates for each study-variable and protection from negative variance estimates. The recursive method easily implements procedures that mitigate risks from outliers and overfitting with numerous auxiliary-variables. The recursive method supports stepwise covariate-selection among the auxiliary-variables, which reduces variance for the most important study-variables as identified by the analyst.
Internal consistency in statistical tables requires that the sum of population estimates in each row or column equals the population estimate for the corresponding margin in that table. Model-assisted estimators with a generic weight produce internal consistency, but at the cost of statistical efficiency. The new difference-estimator provides an alternative that does not compromise statistical efficiency; it uses recursive raking to sequentially impose equality constraints on each row and column of a statistical table.