Page 1

American Journal of Epidemiology

Copyright O 1999 by The Johns HopWns University School of Hygiene and Public Health

AD rights reserved

Vo). 150, No. 4

Printed hi U.S.A

Underestimation of Risk Associations Due to Regression Dilution in Long-

term Follow-up of Prospective Studies

Robert Clarke,1 Martin Shipley,2 Sarah Lewington,1 Linda Youngman,1 Rory Collins,1 Michael Marmot,2 and

Richard Peto1

In prospective studies, disease rates during follow-up are typically analyzed with respect to the values of

factors measured during an initial baseline survey. However, because of "regression dilution," this generally tends

to underestimate the real associations of disease rates with the "usual" levels of such risk factors during some

particular exposure period. The "regression dilution ratio" describes the ratio of the steepness of the unconnected

association to that of the real association. To assess the relevance of the usual value of a risk factor during

particular exposure periods (e.g., first, second, and third decades) to disease risks, regression dilution ratios can

be derived by relating baseline measurements of the risk factor to replicate measurements from a reasonably

representative sample of study participants after an interval equivalent to about the midpoint of each exposure

period (e.g., at 5, 15, and 25 years, respectively). This report illustrates the impact of this time interval on the

magnitude of the regression dilution ratios for blood pressure and blood cholesterol. The analyses were based

on biennial remeasurements over 30 years for participants in the Framingham Study (Framingham, Massachusetts)

and a 26-year resurvey for a sample of men in the Whitehall Study (London, England). They show that

uncorrected associations of disease risk with baseline measurements underestimate the strength of the real

associations with usual levels of these risk factors during the first decade of exposure by about one-third, the

second decade by about one-half, and the third decade by about two-thirds. Hence, to correct appropriately for

regression dilution, replicate measurements of such risk factors may be required at varying intervals after

baseline for at least a sample of participants. Am J Epidemiol 1999; 150:341 -53.

prospective studies; risk analysis; risk assessment

Much of our knowledge about the importance of risk

factors such as blood pressure and blood cholesterol

comes from prospective studies, in which various

characteristics of a cohort are recorded at a "baseline"

survey and the baseline characteristics of persons who

subsequently develop a particular disease are then

compared with those who do not (1, 2). However,

because of the combined effects of measurement errors

and longer-term fluctuations or changes within per-

sons, this procedure does not reliably indicate the real

association between the "usual" level of some risk fac-

tor during a particular period and the disease rate dur-

ing that same (or some later) period (3-7). The uncor-

rected association between the baseline measurement

Received for publication July 21, 1997, and accepted for publica-

tion December 16, 1998.

1 Clinical Trial Service Unit and Epidemiological Studies Unit

(CTSU), Nuffield Department of Clinical Medicine, Radcliffe

Infirmary, Oxford, England.

2 Department of Epidemiology and Public Hearth, University

College London Medical School, London, England.

Reprint requests to Dr. Robert Clarke, Clinical Trial Service Unit

and Epidemiological Studies Unit, Nuffield Department of Clinical

Medicine, Radcliffe Infirmary, Oxford, OX2 6HE, England.

of a risk factor and subsequent disease rates is gener-

ally shallower than the corrected association between

the usual value of that risk factor during a particular

period and disease (although exceptions are possible).

Replicate measurements of the risk factor in a rea-

sonably representative sample of persons in a prospec-

tive study can be used to estimate the importance of,

and correct for, this "regression dilution" effect (6).

However, if the aim is to relate the disease rates to the

usual risk factor levels during some much later expo-

sure period, then corrections based on remeasurements

made within just a few years of baseline may not take

appropriate account of the effects of within-person vari-

ability over more prolonged periods. To assess the rele-

vance of the usual levels of a risk factor during some

particular exposure period (e.g., the second decade of

follow-up) to disease risk, correction factors may need

to be based on remeasurements made after an interval

approximately equivalent to the midpoint of the rele-

vant period (e.g., after about 15 years of follow-up).

This report describes appropriate ways of correcting

prospective study results for regression dilution. Other

aims were to 1) assess quantitatively the effects of

regression dilution on blood pressure and cholesterol,

341

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 2

342 Clarke et al.

according to interval of follow-up, by using repeated

measurements of these factors during 30 years in the

Framingham Study (1, 8); 2) determine, for these par-

ticular factors, whether regression dilution is impor-

tantly different for men and women or for people of

different ages; 3) compare the findings from the

Framingham Study after an interval of 26 years (i.e.,

the approximate midpoint of the third decade of fol-

low-up) with those from the Whitehall Study of

London civil servants after a similar interval (9-11);

and 4) assess the extent to which, for these factors, the

progressive increase in the importance of regression

dilution with increasing duration of follow-up may be

explained by the early deaths of those at highest risk.

MATERIALS AND METHODS

Study populations

The prospective Framingham Study initially

included 5,209 men and women from Framingham,

Massachusetts, who were aged 30-62 years when first

examined between 1948 and 1952 (1, 8). Every 2 years

during 16 consecutive surveys, the remaining partici-

pants were invited to have their blood pressure mea-

sured and a blood sample collected from a forearm

vein. During each visit, a physician used a standard

mercury sphygmomanometer to make two consecutive

recordings of systolic and diastolic blood pressure, but

only the first of the two measurements was used in our

analyses. (Using the average of these two blood pres-

sure measurements rather than just the first would have

increased the correlations between measurements at

different times by only about 0.03.) Total cholesterol

was measured by using the colorimetric method of

Sperry (12) during the first examination, but the Abell-

Kendall method was used for all subsequent measure-

ments (13).

The prospective Whitehall Study initially included

19,019 male civil servants from London, England, who

were aged 40-69 years when first examined between

1967 and 1970 (9-11). During that baseline visit, a spe-

cially trained nurse made a single blood pressure mea-

surement on the left arm by using the London School of

Hygiene sphygmomanometer. Systolic blood pressure

was recorded at the first appearance of the arterial flow

sounds and diastolic blood pressure was recorded at

both the muffling (phase IV) and the disappearance

(phase V) of the sounds, but only the phase IV values

were used in our analyses. During this first examination,

blood samples were collected from earlobes into 2 mm

glass tubing, and total cholesterol was measured by

using a Technicon method (14); those values are thought

to have been about 10 percent too low because the col-

lection method used caused dilution of samples (10,11).

In 1995, after an average interval of 26 years, a ran-

dom sample of 401 participants in the Whitehall Study

not known by the Office of National Statistics

(London, England) to have died was selected for resur-

vey (15). Seventy-three percent completed a postal

questionnaire; 61 percent had their blood pressure

recorded twice after 5 minutes in the seated position,

with a 2-minute interval between readings, and blood

taken from a forearm vein at the office of the family

physician. Whole-blood samples were mailed to the

study laboratory (77 percent arrived within 24 hours

and 92 percent within 48 hours), where total choles-

terol was measured with a Beckmann auto-analyzer

(Beckman Inc., Fullerton, California) by using an

enzymatic method. Transport studies have shown that

blood cholesterol levels can be measured reliably from

samples collected in this way (16).

Statistical methods

Consider some adverse outcome, such as mortality

from a certain disease during a particular time period

(the "risk period"), whose rate of occurrence is to be

related to the "usual" level of a certain risk factor dur-

ing some particular time period (the "exposure

period"). The risk period may well be the same as the

exposure period, but it does not have to be: for exam-

ple, mortality rates during the second decade of follow-

up could be related to usual levels of the risk factor dur-

ing the first decade. The magnitude of the regression

dilution associated with a particular exposure period

can be estimated either nonparametrically or parametri-

cally from pairs of measurements made in the same

persons on different occasions separated by an appro-

priate time interval. For example, to infer the real asso-

ciations between the usual level of some particular

exposure during the first decade of follow-up and dis-

ease rates (during the first decade or some later period),

correction factors could be based on pairs of measure-

ments separated by about 5 years (i.e., the length of

time from baseline to the midpoint of the exposure

period).

With the nonparametric method of estimating and

correcting for the regression dilution associated with a

particular exposure period, pairs of measurements of

the relevant risk factor that are separated by an appro-

priate interval (e.g., about 5 years if the exposure

period is the first decade) are subdivided into a few

groups according to the value of just the first measure-

ment in each pair. The range of the initial values (r) is

defined as the difference between the means of these

first measurements in the groups with the lowest and

highest values. Even though the value of the second

measurement in each pair did not determine in which

group that pair belonged, the means of the second

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 3

Regression Dilution in Prospective Studies 343

measurements provide unbiased estimates of the

"usual" levels of the risk factor in each group during

the particular exposure period, against which the dis-

ease rates can be compared. Generally, the range of

these mean usual values (rj will be substantially nar-

rower than the range of the initial values (r^), so the

ratio of these two ranges (R = rjr) will be substan-

tially less than 1. This "regression dilution ratio" pro-

vides an assumption-free estimate of the importance of

regression dilution during the exposure period that is

valid no matter what the sources of variation might

have been over the particular time period (e.g., random

error, measurement error, selective survival, or the

"horse-racing" effect (17)) and is approximately inde-

pendent of the number of groups chosen and of the

boundaries used to define these groups. Moreover, the

regression coefficient relating disease risk to the usual

levels of the factor during the exposure period can be

estimated as \IR times the "uncorrected" regression

coefficient relating risk to the baseline levels of the

risk factor (6).

Under special circumstances, "parametric" statistical

methods can also be used to estimate the regression

dilution ratio. For example, when the variances of the

recorded values at baseline and at remeasurement are

similar (as for blood pressure and cholesterol in our

study; refer to Results), the correlation coefficient

between pairs of measurements separated by a particu-

lar interval (i.e., the "self-correlation") is about the same

as the ratio of the ranges for that interval (6). Estimates

of the importance of regression dilution that are based

on self-correlations tend to have slightly less random

variation than those based on the ratios of ranges and so

may be preferred when only a limited amount of data is

available (as in various subgroups by gender, age, and

year of first measurement in our analyses). However,

since the nonparametric method requires no assump-

tions about constant variances, the ratio of the ranges

may generally be more appropriate than self-correlation

coefficients for estimating the magnitude of regression

dilution during longer-term follow-up.

Because measurements were made every 2 years in

the Framingham Study, it was possible to classify mea-

surements for each participant into multiple pairs sep-

arated by the same time interval. For example, to cal-

culate the regression dilution ratio from measurements

separated by a 6-year interval (i.e., the approximate

midpoint of the first decade), up to 13 pairs of mea-

surements per person were constructed by using data

from the first (i.e., baseline) and fourth (i.e., year 6)

examinations, the second and fifth, the third and sixth,

and so on, up to the 13th and 16th examinations. Thus,

the 5,209 persons initially in the Framingham Study

contributed 44,000 pairs of measurements with a 6-

year interval between them for assessment of regres-

sion dilution during the first decade of exposure. Pairs

separated by 16- and 26-year intervals (i.e., the

approximate midpoints of the second and third decades

of exposure) were constructed in a similar way. (A

missing value in either one of a pair of measurements

resulted in a missing value for that pair.)

To explore the influence of gender, age, and secu-

lar trends on the regression dilution ratio, the self-

correlation coefficients for blood pressure and blood

cholesterol in the Framingham Study were calculated

separately for men and women, for different age

groups at the time of first measurement (ages 30-39,

40-49, 50-59, 60-69, and 70-79 years), and for pairs

in which the first measurement was made in the first,

second, and third decade of follow-up (i.e., at exami-

nations 1-5, 6-10, and 11-15). The impact of early

death and of antihypertensive medication use on the

regression dilution ratios for blood pressure and blood

cholesterol was investigated by considering separately

those who survived to at least year 26 and those who

did not receive such treatment. Estimates of the regres-

sion dilution ratio associated with the third decade of

follow-up were also calculated from the Whitehall

Study by comparing the baseline measurements made

in 1967-1970 with the repeat measurements made

about 26 years later, in 1995, on the same persons.

RESULTS

Characteristics of the study populations

Table 1 provides the mean values of selected charac-

teristics of 2,336 men and 2,873 women at baseline

(year 0) and after 6,16, and 26 years of follow-up in the

Framingham Study and of 19,019 men at baseline and

243 men after 26 years of follow-up in the Whitehall

Study. The increase in mean age with follow-up was

slightly smaller than the increased duration of follow-

up because of higher mortality among older persons.

There was little change in mean cholesterol, height, and

weight, but a modest decline occurred in mean diastolic

blood pressure and, in the Whitehall Study, in mean

systolic blood pressure. (However, the absolute values

of the initial cholesterol measurements in the Whitehall

Study might well have been somewhat too low; refer to

Materials and Methods.)

Differences In blood pressure and cholesterol

over time

Table 2 shows the mean values of blood pressure and

cholesterol at baseline and after 6, 16, and 26 years of

follow-up among men in Framingham who were aged

30-39, 40-49, and 50-59 years at the baseline survey

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 4

344 Clarke et al.

TABLE 1. Selected characteristics* of the study populations at baseline (year 0) and at later remeasurements In the Framlngham

and Whitehall studies

No. of persons^

Men

Women

Age (years)

Men

Women

Systolic blood pressure (mmHg)

Men

Women

Diastolic blood pressure (mmHg)

Men

Women

Total cholesterol (mmol/liter)

Men

Women

Height (m)

Men

Women

Weight (kg)

Men

Women

No. of paired comparisons^

Men

Women

0

2,336

2,873

44(9)

44(9)

136(20)

135(24)

86(12)

84(13)

5.7(1.1)

5.7(1.2)

1.72 (0.07)

1.59(0.06)

76(11)

64(12)

Framingham Study:

year of measurement

6

2,240

2,818

50(9)

50(9)

132 (20)

134(25)

84(12)

83(12)

6.1 (1.1)

6.2(1.2)

77(11)

65(11)

18,813

25,210

16

1,929

2,592

59(8)

59(8)

138(21)

140 (24)

82(11)

82(11)

6.0(1.1)

6.5(1.2)

77(12)

65(12)

9,874

13,751

26

1,458

2,182

67(8)

68(8)

135(19)

137(21)

78 (10)

76 (10)

5.6(1.0)

6.2(1.1)

1.70(0.07)

1.57 (0.06)

77(12)

64(12)

3,030

4,492

Whitehall Study:

year of measurement

0

19,019

52(7)

136(21)

85(14)

5.1 (1.2)

1.76(0.07)

76(11)

26t

243

75(7)

131 (17)

81 (12)

5.1 (1.3)

1.73(0.07)

76(11)

243

* Values expressed as total number or as mean (standard deviation).

t Mean values after 26 years in the Whitehall Study are based on responses by 243 persons to a pilot resurvey.

X Each person in the Framingham Study was to be seen biennially for 30 years and so could contribute up to 13 paired comparisons with

a 6-year interval between measurements and up to 8 and 3 paired comparisons with intervals of 16 and 26 years, respectively.

(left side of table 2, all men; right side, all men with

these measurements from all of these resurveys). Either

for all men initially screened at baseline or for just those

with measurements at all surveys, comparisons can be

made between people of different ages at the same sur-

vey or between people in the same birth cohort at dif-

ferent surveys. However, none of these comparisons

provides a simple description of the ways in which a

person's blood pressure or cholesterol typically changes

with age. For example, cross-sectional comparisons of

all men who attended a particular survey (i.e., within

columns in table 2) showed some differences between

those in different age groups, but such comparisons

were between different generations with different life

experiences rather than between the same persons.

Also, the prior effect of selective mortality among those

with high blood pressure or cholesterol would have

been greater for those aged 50-59 years at the time of

the survey than for those aged 30-39 years (particularly

if, as in the right half of table 2, attention was restricted

to men still alive at year 26 of follow-up).

Comparisons within birth cohorts of all who attended

the baseline survey with those who attended later sur-

veys (i.e., within rows in the left half of table 2) might

likewise have been biased by selective removal of those

with high blood pressure or cholesterol. This bias can

be avoided by restricting such comparisons to those

persons who survived to attend all four surveys (i.e.,

within rows in the right side of table 2). However, even

these longitudinal comparisons among the same per-

sons might be biased by changes in measurement tech-

niques (including subject habituation) and by differ-

ences over time in risk factor management.

Thus, in terms of the natural history of blood pres-

sure, it was difficult to interpret the observation that

among the 1,091 persons who attended all surveys,

systolic blood pressure decreased slightly between

years 0 and 6, increased by an average of 8 mmHg

between years 6 and 16, and then remained constant

between years 16 and 26, whereas diastolic blood pres-

sure hardly changed between baseline and 16 years

and had then decreased by an average of 5 mmHg at

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 5

Regression Dilution in Prospective Studies 345

TABLE 2. Effect of age on blood pressure and cholesterol at baseline (year 0) and at 6,16, and 26 years

of follow-up* among all men In the Framlngham Study and among those men with these measurements

from all four of the resurvey visits

No. of men

Age at baseline (years)

30-39

4O49

50-59

No. of men

Age at baseline (years)

30-39

40-49

50-59

No. of men

Age at baseline (years)

30-39

40-49

50-59

All men In the Framlngham Study:

year of measurement

0

2,259

130

135

141

2,259

83

87

88

1,440

5.6

5.8

5.8

6

16

26

Subset with all (our measurements:

year of measurement

0

Systolic blood pressure (mmHg)

1,954

127

133

138

1,619

132

140

144

1,169

133

138

138

1,091

129

132

132

Diastollc blood pressure (mmHg)

1,954

83

85

85

1,619

83

83

81

1,169

79

77

74

Total cholesterol (mmol/liter)

1,908

6.0

6.1

6.1

1,583

6.0

6.0

5.9

628

5.7

5.6

5.3

1,091

83

85

84

628

5.5

5.7

5.7

6

1,091

126

129

131

1,091

82

84

83

628

6.1

6.2

6.1

16

1,091

131

138

141

1,091

83

83

81

628

5.9

5.9

5.8

26

1,091

133

138

138

1,091

79

78

74

628

5.7

5.7

5.2

' Mean values at each follow-up visit are shown for each age group.

year 26. Likewise, among those with measurements

from all surveys, it was difficult to interpret either the

average increase of 0.5 mmol/liter in blood total cho-

lesterol by year 6 or the later 0.6 mmol/liter decrease.

These uncertainties about absolute values and trends

do not, however, affect the epidemiologic comparisons

of risk between persons of the same age at the same

survey, which were the chief concern of this report.

Shrinkage of ranges with Increasing intervals

between measurements

In contrast to the relatively modest changes in the

overall mean values of blood pressure and cholesterol

measurements that occurred over time in the

Framingham Study (table 2), large artifactual differ-

ences were observed in the mean values at subsequent

follow-up of persons subdivided into five similar-sized

groups according to the value of just the first of each

pair of measurements (figure 1). In general, there was a

sharp convergence by year 2 due to the combined

effects of measurement error and of short-term fluctua-

tions in the baseline measurements, which was followed

by a more modest convergence over the next few

decades. For example, the mean systolic blood pressure

of persons who were in the top fifth initially (i.e., those

with a systolic blood pressure value of 146 mmHg or

more at the first measurement) declined from 162

mmHg at year 0 to 156 mmHg after 2 years, 155 mmHg

after 6 years, 153 mmHg after 16 years, and 147 mmHg

after 26 years (figure 1 and left side of table 3).

Similarly, the mean systolic blood pressure of persons in

the bottom fifth initially (i.e., those with a value of less

than 118 mmHg at the first measurement) increased

from 111 mmHg at year 0 to 117 mmHg after 2 years,

120 mmHg after 6 years, 126 mmHg after 16 years, and

130 mmHg after 26 years. The absolute difference, or

range, between the mean values in the top and bottom

groups declined throughout the three successive

decades of follow-up from 51 mmHg at year 0 to 35

mmHg after 6 years, 27 mmHg after 16 years, and only

18 mmHg after 26 years. Similarly, for diastolic blood

pressure and total cholesterol, there was a sharp change

after 2 years, followed by more modest shrinkage in the

absolute differences between the mean values in the top

and bottom baseline-defined groups. After an interval of

26 years, the changes in these ranges were similar in the

Framingham and Whitehall studies (table 3).

Changes in regression dilution ratios with

increasing intervals between measurements

The convergent patterns in figure 1 (and table 3)

illustrate the progressive increase over time in the

importance of regression dilution for certain measure-

ments. Table 4 provides nonparametric estimates of the

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 6

O

0)

(D

0J

Systolic Blood Pressure

180

170

160-

150

140-

130-

120-

110-

100'

Initial

VtllKS

(mmHg)

146+

11S-14S

126-134

118-12S

111

Ring*: 31 39

27

£

110-1

100-

90-

80-

70-

18 (mmHg)

60-

Dlastollc Blood Pressure

tntfld

Viliwi

(mmHg)

Blood Cholesterol

93+

8842

81-83

73-80

<73

Rang*: 3a 18

1

O

74

12

h

16

(mmHg)

18

28

26

Interval between paired measurements (years)

Interval between paired measurements (years)

Interval between paired measurements (years)

8

o

^

"_!

<o

to

RGURE 1. Serial shrinkage of the ranges for blood pressure and cholesterol values in groups defined by the first of a pair of measurements in the Framingham Study. Mean values ini-

daily and at subsequent follow-up are plotted for persons subdivided into five similar-sized groups according to the value of just the first of each pair of measurements. The size of each

square is proportional to the quantity of statistical information available, and the 95% confidence intervals for the top and bottom groups are represented by the vertical bars. Mean val-

ues in the top and bottom groups, and the absolute differences (ranges) between them, are given at year 0 (i.e., the initial measurement of the pairs) and at years 6,16, and 26 (i.e., the

approximate midpoints of the first, second, and third decades, respectively, after the initial measurement).

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 7

Regression Dilution in Prospective Studies 347

TABLE 3. Mean values of 1

measurements among men

No. of measurements

Baseline fifths

1

II

III

IV

V

Difference (V-l)

No. of measurements

Baseline fifths

1

II

III

IV

V

Difference (V-l)

No. of measurements

Baseline fifths

1

II

III

IV

V

Difference (V-l)

Mood pressure and cholesterol after different Intervals between pairs of

In the Framlngham and Whitehall studies

Framlngham Study:

years between measurements

0

6

16

Whitehall Study:

years between measurements

26

Systolic blood pressure (mmHg)

18,813*

111

122

130

140

162

51

18,813

120

128

134

142

155

35

9,874

126

134

140

145

153

27

3,030

130

133

139

143

147

18

Diastollc blood pressure (mmHg)

18,813*

69

78

83

89

101

32

1

Total cholesterol (mmol/llter)

12,013*

4.6

5.4

5.9

6.5

7.6

3.0

18,813

75

79

82

86

93

18

9,874

75

78

81

84

87

12

3,030

74

76

77

80

82

8

12,013

5.1

5.6

6.0

6.4

7.2

2.1

5,604

4.9

5.4

5.7

5.9

6.5

1.6

1,687

5.1

5.5

5.7

6.0

6.4

1.3

0

243

112

124

134

145

164

52

243

68

77

83

90

103

35

239

3.5

4.4

5.0

5.7

6.9

3.4

26

243

141

146

152

149

158

17

243

77

82

84

82

88

11

239

5.3

5.3

5.9

6.0

6.2

1.0

* Values for the Framingham Study are based on the means of multiple pairs of measurements with the same

Interval between measurements, grouped according to the value of the first measurement in each pair (refer to

Materials and Methods). The values at year 0 are the means of the first measurements for those pairs with second

measurements also available after a 6-year interval (and would differ slightly if based only on pairs with a 16- or

26-year interval).

regression dilution ratio derived from the ratio of the

range of values in the baseline-defined groups after a

particular interval to the range at baseline (refer to

Materials and Methods). For blood pressure and cho-

lesterol, the regression dilution ratios became substan-

tially more extreme with increasing duration of follow-

up, whereas the ratios for height and weight remained

close to 1 (as might be expected, since these latter mea-

sures involve substantially less within-person variabil-

ity). Table 4 also shows parametric estimates of the

regression dilution ratios derived from correlation coef-

ficients between pairs of measurements separated by a

particular interval ("self-correlations"; refer to

Materials and Methods). In general, the nonparametric

and parametric methods yielded fairly similar results.

For example, for systolic blood pressure in the

Framingham Study, the regression dilution ratio associ-

ated with measurements made 16 years apart (i.e., the

approximate midpoint of the second decade of expo-

sure) was estimated to be 0.52 by using the ratio of the

ranges and 0.45 by using the self-correlation. With

either method, this value of about 0.5 implies that the

regression coefficient relating risk in the second decade

or later to a baseline measurement of systolic blood

pressure is only about half as steep as the corrected

regression coefficient relating this risk to the usual sys-

tolic blood pressure during the second decade.

Gender, age, and the regression dilution ratio

For systolic blood pressure, diastolic blood pressure

(not shown), and blood cholesterol, the decrease in the

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 8

348 Clarke et al.

TABLE 4. Nonparametric and parametric estimates of regression dilution for blood pressure,

cholesterol, height, and weight among men In the Framlngham and Whitehall studies

Framlngham Study

years between measurements

Whitehall Study:

years between

measurements

6 16

Systolic blood pressure

0.68

0.63

0.52

0.45

Diastolic blood pressure

0.58

0.57

Cholesterol

0.70

0.68

Height

0.97

0.96

Weight

0.93

0.91

0.38

0.38

0.52

0.53

1.01

0.97

0.85

0.82

26

0.34

0.31

0.26

0.27

0.43

0.46

0.97

0.96

0.79

0.73

26

0.32

0.26

0.29

0.30

0.28

0.38

0.83

0.84

0.86

0.75

Ratio of follow-up to baseline ranges

Correlation with initial measurement

Ratio of follow-up to baseline ranges

Correlation with initial measurement

Ratio of follow-up to baseline ranges

Correlation with initial measurement

Ratio of follow-up to baseline ranges

Correlation with initial measurement

Ratio of follow-up to baseline ranges

Correlation with initial measurement

regression dilution ratio with increasing duration of fol-

low-up was similar for men and women in the

Framingham Study (figure 2). For example, the regres-

sion dilution ratios derived from the self-correlation

coefficients for men decreased from 0.68 after 6 years

to 0.53 after 16 years to 0.46 after 26 years; in women,

they decreased from 0.68 to 0.52 to 0.42, respectively.

A similar decrease in the regression dilution ratios with

increasing duration of follow-up was also observed in

the five 10-year age groups at baseline (figure 3). For

example, the regression dilution ratio estimates for cho-

lesterol after an interval of 6 years were 0.71 for those

aged 30-39 years, 0.65 for those aged 40-49 years,

0.69 for those aged 50-59 years, 0.70 for those aged

60-69 years, and 0.71 for those aged 70-79 years.

Similar patterns, but with slightly more random varia-

tion, were obtained by using regression dilution ratios

derived from the ratios of ranges (data not shown).

Secular trends and the regression dilution ratio

For systolic blood pressure, the regression dilution

ratio derived from the correlation coefficient between

measurements made at 6-year intervals was 0.72 if the

first of the two measurements was made in the 1950s

(visits 1-5), 0.65 if made in the 1960s (visits 6-10),

and 0.51 if made in the 1970s (visits 11-15). This find-

ing suggests that there may have been a somewhat

lower signal-to-noise ratio during the 1970s than dur-

ing the 1950s (with, in this context, the signal being

the extent to which people's usual blood pressures

really differ from each other, and the noise being the

extent to which particular measurements of blood

pressure are liable to differ from that person's usual

blood pressure). By contrast, there was no such trend

in the regression dilution ratios for blood cholesterol:

self-correlation coefficients between measurements

made at 4-year intervals were 0.74 if the first of the

two measurements was made in the 1950s, 0.72 if

made in the 1960s, and 0.71 if made in the 1970s.

(Cholesterol was not measured at visits 11, 12, or 16,

so estimates of the regression dilution ratios for 6-year

intervals could not be derived for the 1970s, but they

were 0.72 for the 1950s and 0.68 for the 1960s.)

Early deaths, treatment, and the regression

dilution ratio

Table 5 shows the extent to which estimates of the

regression dilution ratio during the third decade of

follow-up in the Framingham and Whitehall studies

may have been influenced by differential mortality

across the baseline-defined groups. The mean values at

baseline for the total study population were compared

with the mean values at baseline for those persons who

were still alive after 26 years. For blood cholesterol,

these mean values were similar in the top group and in

the bottom group, whereas for systolic blood pressure

the range was somewhat narrower among the survivors.

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 9

•S1

!

8

o

to

CO

Systolic Blood PressureBlood Cholesterol

1.00-1

0.73-

0J0-

co

O 2 5 -

0.00-

1X0-1

0.75-

£

o

050-

0.25-

I

10

I

1S

I

20

I

25

I

30

0.00-

10

I

15

I

20

T

25

30

3

CD

CO

3

g

o'

o"

Interval between paired measurements (years)Interval between paired measurements (years)

•o

T3

FIGURE 2. Serial changes in the regression dilution ratios for systolic blood pressure and Wood cholesterol with Increasing duration of follow-up among men and women in the

Framingham Study. Parametric estimates of regression dilution ratios, derived from the correlation coefficients between pairs of measurements in persons (self-correlations), are plotted

against the intervals between measurements. Symbols and conventions as shown in figure 1.

C O

a.

a>

en

2

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 10

u

o

o

01

3-

CD

2.

Systolic Blood Pressure

Blood Cholesterol

1.00-i

0.73-

Q

I

osa-

025-

aoo-

40-49

50-59

30-39

60-69

70-78

0.7S-

c

JO

co

0.50-

30

1

40

I

50

1

60

1

70

1

80

1

90

1

100

0.00-

3 M !

( M l

70-79

30

40

I

50

60

70

90

I

100

I

8

o

<O

Midpoint of age at subsequent measurements (years)

Midpoint of age at subsequent measurements (years)

FIGURE 3. Serial changes in the regression dilution ratios for systolic blood pressure and blood cholesterol for participants at different ages in the Framingham Study. Parametric

estimates of the regression dilution ratios, derived from the correlation coefficients between pairs of measurements in persons (self-correlations), are plotted against the estimated ages

at the times of remeasurement. Symbols and conventions as shown in figure 1.

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 11

Regression Dilution in Prospective Studies 351

TABLE 5. Mean values of baseline measurements: Influence of selective loss due to earty deaths on

systolic blood pressure and

cholesterol among men In the Framingham and Whrtehal

No. of men

Baseline fifths

1

II

III

IV

V

Difference (V-l)

No. of men

Baseline fifths

1

II

III

IV

V

Difference (V-l)

Framingham Study

Total

population

2,336

112

123

132

142

166

54

1,502

4.3

5.1

5.7

6.3

7.5

3.2

Survivors

at year 26

Blood pressure (mmHg)

1,458

111

121

129

137

152

42

Total cholesterol (mmol/ltter)

933

4.2

4.9

5.5

6.1

7.3

3.1

I studies

Whitehall Study

Total

population

19,013

111

125

134

145

168

57

18,310

3.5

4.4

5.0

5.6

6.9

3.4

Survivors

at year 26

9,839

111

124

134

145

165

54

9,490

3.5

4.4

5.0

5.6

6.9

3.4

However, these differences between the total population

and the survivors were much less extreme than the con-

vergence in the ranges shown in table 3, which suggests

that selective mortality of persons with high blood pres-

sure or cholesterol does not account for much of the

shrinkage over 26 years.

In the Framingham Study, any treatment with a

diuretic or other antihypertensive medication was

recorded from year 6 onward. The proportion of sur-

vivors who received such treatment increased through-

out the follow-up period, but treatment did not materi-

ally affect the progressive decrease in the regression

dilution ratio for blood pressure with increasing dura-

tion of follow-up. For example, the estimates for sys-

tolic blood pressure, derived from self-correlations, of

0.64,0.50, and 0.37 for all survivors after 6, 16, and 26

years, respectively, were similar to the corresponding

values of 0.61, 0.47, and 0.33 for those survivors who

did not receive any antihypertensive therapy.

Other factors and the regression dilution ratio

Although the multiple pairs of measurements from

the Framingham Study are not independent, estimates

of the regression dilution ratio derived from these

multiple pairs were similar to those obtained when the

baseline measurement was always the first of a rele-

vant pair (i.e., when the data from any one visit were

used only once). For example, for systolic blood pres-

sure, the regression dilution ratio derived from the

self-correlation at 6 years was 0.71 when year 6 was

compared with baseline for 2,009 men, whereas the

ratio was 0.63 when obtained from the 18,813 pairs

with a 6-year interval between measurements (table

4). Estimates of the regression dilution ratio were also

largely unaffected by the change in the method used to

measure cholesterol after baseline in the Framingham

Study. For example, among men, the correlation

between pairs of measurements with a 6-year interval

between them was 0.69 for year 6 compared with

baseline (the only examination for which the Sperry

method (12) was used; refer to Materials and

Methods) and 0.71 for year 8 compared with year 2.

Similarly, the estimates of the regression dilution ratio

were unaffected by the initial decrease in mean sys-

tolic blood pressure in the Framingham Study (table

2). For example, among men, the correlation between

pairs of measurements with a 6-year interval between

them was 0.71 for year 6 compared with baseline and

0.68 for year 8 compared with year 2.

DISCUSSION

The present analyses addressed the extent to which

within-person variability in risk factor measurements

distorts assessment of the strength of the association

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 12

352 Clarke et al.

between the usual level of a risk factor during some

particular exposure period and the incidence of disease

during the same or a later period. Fluctuations in the

values of risk factors may be due to measurement error

and to short-term biologic variability (such as diurnal

or seasonal variation), or they may be due to longer-

term systematic changes. The effects of the former can

be limited by repeating baseline measurements several

times and by controlling the conditions under which

measurements are taken. However, underestimation of

the strength of the association of the usual level of a

risk factor during some later exposure period with dis-

ease will still occur unless replicate measurements

made some years later are used to correct directly for

regression dilution. The effects of shifts in mean blood

pressure or cholesterol due to long-term physiologic

changes (such as the progressive increase in systolic

blood pressure with age or the decrease due to the

onset of disease or to the effects of treatment) are

included in what is referred to as "regression dilution,"

as are the effects of selective mortality.

In our analyses of the Framingham and Whitehall

studies, the progressively smaller ranges of mean val-

ues in the baseline-defined groups for blood pressure

and cholesterol with longer intervals of follow-up

(table 3) indicate that the effects of within-person vari-

ability become progressively more important with

longer follow-up. It has been suggested that this find-

ing might be due largely to selective death of those at

highest risk (18), but analyses restricted to survivors in

these studies indicated that only a small fraction of the

shrinkage of these ranges could have been due to

selective mortality (and, in any case, this would still

need to be allowed for in the way demonstrated

above). Moreover, we found that unless appropriate

corrections are made for the regression dilution associ-

ated with particular exposure periods, the strength of

associations of disease with usual blood pressure or

cholesterol levels might be underestimated by about

one-third during the first decade, one-half during the

second decade, and two-thirds during the third decade

of exposure. The corrections that are needed are inde-

pendent of any assumptions about the constancy of rel-

ative risks over time or about the biologic mechanisms

by which exposures affect disease (and are not much

affected by age or gender).

Both the nonparametric and parametric statistical

methods that have been used to estimate the magnitude

of the regression dilution ratio should provide similar

results when the variances of the values recorded at

baseline and at remeasurement are similar. However,

since the nonparametric method based on the ratio of

ranges requires no such assumptions, it may be more

suitable for longer intervals of follow-up. Appropriate

adjustment for regression dilution in prospective stud-

ies requires remeasurements to be available at intervals

during follow-up from at least a subsample of the sur-

vivors. If particular studies lack such data, then apply-

ing the present results from the Framingham and

Whitehall studies for blood pressure and cholesterol

may provide reasonable approximations for the correc-

tions needed for each of the first three decades of

exposure.

ACKNOWLEDGMENTS

This study was supported by grants from the British Heart

Foundation and Medical Research Council, London,

England.

Dr. Paul Sorlie from the National Heart, Lung, and Blood

Institute of the National Institutes of Health, Bethesda,

Maryland, facilitated use of data from the Framingham

Study. Drs. David Leon and Astrid Fletcher facilitated use of

data from the Whitehall Study and provided helpful com-

ments, as did Drs. Richard Doll and Cathie Sudlow. Paul

Sherliker and Paul Appleby produced the figures.

REFERENCES

1. Gordon T, Sorlie P, Kannel WB. Coronary heart disease,

atherothrombotic brain infarction, intermittent claudication—a

multivariate analysis of some factors related to their incidence:

Framingham study, 16 year follow-up. US Department of

Health, Education and Welfare. Bethesda, MD: National

Institutes of Health, 1971. (Publication no. 74-599, section 27).

2. Martin MJ, Hulley S, Browner WS, et al. Serum cholesterol,

blood pressure, and mortality: implications from a cohort of

361,662 men. Lancet 1986;2:933-6.

3. Gardner MJ, Heady JA. Some effects of within-person vari-

ability in epidemiological studies. J Chronic Dis 1973;26:

781-95.

4. Davis CE, Rifkind BM, Brenner H, et al. A single cholesterol

measurement underestimates the risk of coronary heart dis-

ease. An empirical example from the Lipid Research Clinics

Mortality Follow-up Study. JAMA 1990;264:3044-6.

5. Tomberg SA, Jakobsson KF, Eklund GA. Stability and valid-

ity of a single serum cholesterol measurement in a prospective

cohort study. Int J Epidemiol 1988;17:797-803.

6. MacMahon S, Peto R, Cutler J, et al. Blood pressure, stroke,

and coronary heart disease. Part 1, Prolonged differences in

blood pressure: prospective observational studies corrected for

regression dilution bias. Lancet 1990;335:765-74.

7. Hughes MD, Pocock SJ. Within-subject diastolic blood pres-

sure variability: implications for risk assessment and screen-

ing. J Clin Epidemiol 1992;45:985-8.

8. Anderson KM, Castelli WP, Levy D. Cholesterol and mortal-

ity. 30 years of follow-up from the Framingham study. JAMA

1987;257:2176-80.

9. Reid DD, Brett GZ, Hamilton PJ, et al. Cardiorespiratory dis-

ease and diabetes among middle-aged male civil servants.

Lancet 1974; 1:469-73.

10. Smith GD, Shipley MJ, Marmot MG, et al. Plasma cholesterol

concentration and mortality. The Whitehall study. JAMA

1992:267:70-6.

11. Shipley MJ, Pocock SJ, Marmot MG. Does plasma cholesterol

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from

Page 13

Regression Dilution in Prospective Studies 353

concentration predict mortality from coronary heart disease in

elderly people? 18 year follow up in the Whitehall study. BMJ

1991;303:89-92.

12. Schoenheimer R, Sperry WM. A micromethod for the determi-

nation of free and combined cholesterol. J Biol Chem 1934;

106:745-60.

13. Abell LL, Levy BB, Brodie BB, et al. Simplified method for

estimation of total cholesterol in serum and demonstration of

its specificity. J Biol Chem 1952;193:357-66.

14. Block WD, Jarrett KJ, Levine JB. An improved automated

determination of serum total cholesterol with single colour

reagent. Clin Chem 1966;10:681-9.

15. Clarke R, Breeze E, Sherliker P, et al. Design, objectives, and

lessons from a pilot 25 year follow up re-survey of survivors

in the Whitehall study of London Civil Servants. J Epidemiol

Community Health 1998;52:364-9.

16. Youngman LD, Lyon V, Collins R, et al. Problems with mailed

blood in large scale epidemiological studies and methods of

correction. (Abstract). FASEB J 1993; 1:377.

17. Peto R. The horse-racing effect: statistical peculiarities of rates

of change of medical parameters observed in prospective stud-

ies, with special reference to lung disease. Proceedings of the

9th International Biometric Conference, Boston, MA, 1976.

Vol. n, pp 1-14.

18. de Vries CL, Feskins FJ, de Lezenne Coulander C, et al.

Repeated measurements of serum cholesterol and blood pres-

sure in relation to long-term incidence of myocardial infarc-

tion. The Zutphen Study. Cardiology 1993;82:89-99.

Am J Epidemiol Vol. 150, No. 4, 1999

by guest on July 13, 2011

aje.oxfordjournals.org

Downloaded from