Article

Comparison of the performance of best linear unbiased predictors (BLUP)

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The best linear unbiased predictors (the Cluster mean, the Mixed model, Scott & Smith's predictor and the Random Permutation model) of selected important public health variables were evaluated in practical settings via simulation studies. The variables corresponded to measures of diet, physical activity, and other biological measures. The simulation evaluated and compared the mean square errors (MSE) of those four predictors. It estimated variances between subjects and days, and response errors for parameters defined over one year period, based on data from a large-scale longitudinal study, the Season Study. Then, it evaluated the relative MSE increase between predictors of the true subject's mean in various settings based on theoretical results. In addition, a simulation compared the theoretical and the simulated MSE for all four predictors. The difference in the MSE between predictors was illustrated in 2D plots.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In animal breeding, Best Linear Unbiased Prediction, or BLUP, is a technique for estimating genetic merits. In general, it is a method of estimating random effects. It can be used to derive the Kalman filter, the method of Kriging used for ore reserve estimation, credibility theory used to work out insurance premiums, and Hoadley's quality measurement plan used to estimate a quality index. It can be used for removing noise from images and for small-area estimation. This paper presents the theory of BLUP, some examples of its application and its relevance to the foundations of statistics. Understanding of procedures for estimating random effects should help people to understand some complicated and controversial issues about fixed and random effects models and also help to bridge the apparent gulf between the Bayesian and Classical schools of thought.
Article
Full-text available
The authors examined sources of variance in self-reported physical activity in a cohort of healthy adults ( n = 580) from Worcester, Massachusetts (the Seasonal Variation of Blood Cholesterol Study, 1994–1998). Fifteen 24-hour physical activity recalls of total, occupational, and nonoccupational activity (metabolic equivalent-hours/day) were obtained over 12 months. Random effects models were employed to estimate variance components for subject, season, day of the week, and residual error, from which the number of days of assessment required to achieve 80% reliability was estimated. The largest proportional source of variance in total and nonoccupational activity was within-subject variance (50–60% of the total). Differences between subjects accounted for 20–30% of the overall variance in total activity, and seasonal and day-of-the-week effects accounted for 6% and 15%, respectively. For total activity, 7–10 days of assessment in men and 14–21 days of assessment in women were required to achieve 80% reliability. For nonoccupational activity, 21–28 days of assessment were required. This study is among the first to have examined the sources of variance in daily physical activity levels in a large population of adults using 24-hour physical activity recall. These findings provide insight for understanding the strengths and limitations of short term and long term physical activity assessments employed in epidemiologic studies.
Article
Full-text available
The intraindividual variances in serum/plasma cholesterol levels from a variety of sources have been examined. It is apparent that these are very substantial with mean coefficients of variation usually between 5% and 10%, even when the diet is controlled in metabolic studies. Some subjects show extreme variability from one blood sample to the next. Thus, it is very difficult to assess the degree of risk of individuals according to the guidelines provided by the Consensus Conference on lowering blood cholesterol levels to prevent heart disease, and many individuals will be misclassified unless particular attention is paid to this problem.
Article
Full-text available
In many situations there is interest in parameters (e.g., mean) associated with the response distribution of individual clusters in a finite clustered population. We develop predictors of such parameters using a two-stage sampling probability model with response error. The probability model stems directly from finite population sampling without additional assumptions and thus is design-based. The predictors are closely related to best linear unbiased predictors (BLUP) that arise from common mixed-model methods, as well as to model-based predictors obtained via super population approaches for survey sampling. The context assumes clusters of equal size and equal size sampling of units within clusters. Target parameters may correspond to clusters realized in the sample, as well as nonrealized clusters. In either case, the predictors are linear and unbiased, and minimize the expected mean squared error. They correspond to the sum of predictors of responses for realized and nonrealized units in the cluster, accounting directly for the second-stage sampling fraction. In contrast, the BLUP commonly used in mixed models can be interpreted as predicting only the responses of second-stage units not observed for a cluster, not the cluster mean. The development reveals that two-stage sampling does not give rise to a more general variance structure often assumed in superpopulation models, even when variances within clusters are heterogeneous. With response error present, we predict target random variables defined as an expected (or average) response over units in a cluster.
Article
A superpopulation model is proposed for two-stage sampling from a finite population and we consider the problem of estimating a linear function of the finite population elements. We find the estimate with smallest mean squared error among linear estimates with bounded mean squared error without any assumption about the form of the super-population distribution. If the superpopulation is assumed to be normal this estimate is the mean of the posterior distribution. The estimate is compared with standard results for the special case of the finite population mean.
Article
The linear least-squares prediction approach is applied to some problems in two-stage sampling from finite populations. A theorem giving the optimal (BLU) estimator and its error-variance under a general linear “superpopulation” model for a finite population is stated. This theorem is then applied to a model describing many populations whose elements are grouped naturally in clusters. Next, the probability model is used to analyze various conventional estimators and certain estimators suggested by the theory as alternatives to the conventional ones. Problems of design are considered, as are some consequences of regression-model failure.
Article
When interdependence of disturbances is present in a regression model, the pattern of sample residuals contains information which is useful in prediction of post-sample drawings. This information, which is often overlooked, is exploited in the best linear unbiased predictor derived here. The gain in efficiency associated with using this predictor instead of the usual expected value estimator may be substantial.
Article
Intra individual blood pressure (BP) and heart rate (HR) variations and their possible correlation with sex and age were evaluated in 271 healthy adults, adolescents and children divided into equivalent groups of both sexes. BP and HR were measured every minute during 14 minutes with an automatic device using the oscillometry method. A second measurement session was repeated two weeks later. On most patients, a decrease of BP was observed during the first minutes towards a point of relative stability, which is reached after the eighth measurement. Mean differences between the first and the fourteenth minutes is about 12 mmHg for systolic BP and 18 mmHg for diastolic BP. The mean values of the 14 determinations on day 1 and day 15 showed a significant correlation from 65 to 75 according to different groups. The variability index within one subject was almost similar in all groups. A comparison of the variability indices observed on day 1 and day 15 showed a significant correlation only for systolic BP in male and for HR in female subjects. Reproducibility of variability appears very inconstant. The results suggest that this procedure cannot characterize subjects with a high BP variability. This study suggests that fourteen repeated measurements of BP values over a fortnight period allows a better estimation of BP level at rest. It cannot be equivalent to 24 hour BP record to estimate variability.
Article
To obtain the best estimates of the average intraindividual biological variability (CVb) in the concentrations of total cholesterol (TC), low-density lipoprotein cholesterol (LDLC), high-density lipoprotein cholesterol (HDLC), and triglyceride serum lipids in a person's blood, we evaluated results from 30 studies published from 1970 to 1992. The usually more applicable random-effects model estimated an average CVb of 6.1% for TC, 7.4% for HDLC, 9.5% for LDLC, and 22.6% for triglyceride. Composite estimates of the average CVb from all evaluated published studies by different models of estimation ranged from 6.0% to 6.4% for TC, from 6.2% to 7.5% for HDLC, from 7.0% to 9.6% for LDLC, and from 22.4% to 22.9% for triglyceride. Two important factors influenced the reported biological variation of the study subjects: (a) the magnitude of the variability of the analytical method used and (b) the design characteristics of the study--primarily the number of subjects, the sampling interval, and the number of measurements per subject. For TC, we found a statistically significant positive correlation between the reported mean CVb and both the number of study subjects and the analytical variation. For TC and LDLC we estimate CVb as a function of the study design features. The number of patient specimens required to obtain reliable estimates for serum lipid concentrations are determined from the CVb and the current analytical variation.
Article
To assess reliability in terms of inter-observer agreement of blood pressure (BP) readings. Various health professionals and measuring systems. Influence of observer's experience. Observational, descriptive, cross-sectional study. Urban health centre, Córdoba. 131 hypertensive, randomised patients, belonging to a functional care unit. 11 were excluded. To reduce variability: course on the right way to take blood pressure, otoscope and verification of visual sharpness of observers, calibration and validation of measuring devices, limited time and blinding of measurements. 4 BP measurements per patient: 3 with mercury sphygmomanometer (2 simultaneously, one individual) and one with an automatic device. Descriptive, clinical and somatometric variables were gathered. Inter-observer agreement was evaluated through the intraclass correlation coefficient (ICC), the mean of differences method (MDM) and the simple concordance index (CI). An ICC > 0.75 was thought acceptable. A difference > 5 mmHg was thought clinically relevant (MDM and CI). Acceptable consistency for MDM: alone, systolic and diastolic pressure of OBS 1/ OBS 2, bi-auricular, -6.1/+8.9 mmHg and -6.8/+5.8 mmHg. Less favourable results: for systolic and diastolic pressure: OBS 1/AUTO -20.9/25.0 and -16.4/15.1; OBS 2/AUTO -22.8/24.4 and -16.6/15.2. Remaining intervals always > 10 mmHg; CI > 0.75 in all comparisons except diastolic pressure OBS 1/AUTO and diastolic pressure OBS 2/AUTO (0.69 in both cases). 41% of comparisons were > 5 mmHg. No differences in less expert professionals were found. Inaccuracy of the standard BP measurement method (mercury sphygmomanometer) for MDM and CI. Contradictory conclusions according to method of measurement. Differences not clinically acceptable.
Article
To examine the effects of physical activity, body posture and sleep quality on the reproducibility of continuous ambulatory blood pressure monitoring. Measurements were performed in 35 subjects (18 hypertensive, 11 male), mean +/- standard deviation age 49 +/- 13 years. Blood pressure (BP) was measured in the brachial artery, and beat-to-beat values of systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure and heart rate (HR) were computed. Physical activity and posture were continuously measured with five accelerometers. Subjective quality of sleep was assessed with a questionnaire. Reproducibility was expressed as an intraclass correlation coefficient and as the standard deviation of the within-subject differences. Posture and activity significantly influenced BP and HR. From lying to sitting, the SBP, DBP and HR increased 6 mmHg, 8 mmHg and 8 beats/min, respectively. From sitting to standing these respective increases were 4 mmHg, 2 mmHg and 13 beats/min. A further rise in activity (from standing to moving generally or walking) increased the SBP by 7 mmHg and the HR by 7 beats/min, and decreased the DBP by 8 mmHg. For daytime SBP, DBP and HR, the intraclass correlation coefficient (standard deviation of the within-subject differences) values were 0.93 (7.2 mmHg), 0.94 (3.8 mmHg) and 0.90 (4.1 beats/min). For night-time these respective values were 0.98 (4.4 mmHg), 0.97 (2.5 mmHg) and 0.96 (2.2 beats/min). Correction for physical activity level and posture hardly improved the reproducibility of daytime BP and HR. Reproducibility of night-time BP and HR was not improved by correction for physical activity, supine position or self-reported sleep quality. Within-subject differences between ambulatory BP recordings cannot be explained by differences in physical activity and body posture.
Article
Predictors of random effects are usually based on the popular mixed effects model developed under the assumption that the sample is obtained from a conceptual infinite population; such predictors are employed even when the actual population is finite. Two alternatives that incorporate the finite nature of the population are obtained from the superpopulation model proposed by Scott and Smith (1969, JASA, 64: 830-840) or from the finite population mixed model recently proposed by Stanek and Singer (2004, JASA, 99:1119-1130). Predictors derived under the latter model with the additional assumptions that all variance components are known and that within-cluster variances are equal have smaller mean squared error than the competitors based on either the mixed effects or Scott and Smith's models. As population variances are rarely known, we propose method of moment estimators to obtain empirical predictors and conduct a simulation study to evaluate their performance. The results suggest that the finite population mixed model empirical predictor is more stable than its competitors since, in terms of mean squared error, it is either the best or the second best and when second best, its performance lies within acceptable limits. When both cluster and unit intra-class correlation coefficients are very high (e.g., 0.95 or more), the performance of the empirical predictors derived under the three models is similar.
Predicting Random effects from finite population Clustered samples with response errors authored by Stanek and Singer
  • F Samaniego
Samaniego, F. 2003. Comments to "Predicting Random effects from finite population Clustered samples with response errors authored by Stanek and Singer".
SAS/STAT User's Guide. Version 8.0
  • Sas Institute Inc
SAS Institute Inc. 1999. SAS/STAT User's Guide. Version 8.0. Gary, NC.
Well. And I. Ockene. 1999. Why not routinely use best linear unbiased predictors (BLUPS) as estimates of cholesterol, percent fat from kea and physical activity?
  • E J Stanek
  • A Iii
Stanek, E. J. III., A. Well. And I. Ockene. 1999. Why not routinely use best linear unbiased predictors (BLUPS) as estimates of cholesterol, percent fat from kea and physical activity? Statistics in Medicine 18: 2943-2959.
Evaluating the MSE of Predictors in Balanced Two Stage Predictors of Realized Random Cluster Means With Response Error
  • E J Stanek
  • Iii
Stanek, E. J. III, 2003. Evaluating the MSE of Predictors in Balanced Two Stage Predictors of Realized Random Cluster Means With Response Error. (http://www.umass.edu/cluster/ed/biblio-papers.html).
Notation used to Construct Predictors and Estimates of Predictors in the Simulation Study for Performance of Balanced Two Stage Predictors of Realized Random Cluster Means
  • E Stanek
Stanek, E. 2003a. Notation used to Construct Predictors and Estimates of Predictors in the Simulation Study for Performance of Balanced Two Stage Predictors of Realized Random Cluster Means. (http://www.umass.edu/cluster/ed/biblio-papers.html).
Estimating the Variance in a Simulation Study of Balanced Two Stage Predictors of Realized Random Cluster Means
  • E Stanek
Stanek, E. 2003b. Estimating the Variance in a Simulation Study of Balanced Two Stage Predictors of Realized Random Cluster Means. (http://www.umass.edu/cluster/ed/bibliopapers.html).