Page 1

promoting access to White Rose research papers

Universities of Leeds, Sheffield and York

http://eprints.whiterose.ac.uk/

This is an author produced version of a paper published in Journal of Pharmaceutical

Statistics.

White Rose Research Online URL for this paper:

http://eprints.whiterose.ac.uk/3411/

Published paper

Bland, J. Martin and Altman, Douglas G. (2007) Agreement between methods of

measurement with multiple observations per individual. Journal of

Biopharmaceutical Statistics, 17 (4). 571-582.

White Rose Research Online

eprints@whiterose.ac.uk

Page 2

Post-referee version, 22 December 2006

Agreement between methods of measurement with

multiple observations per individual

J Martin Bland1* and Douglas G Altman2

1Professor of Health Statistics

Dept. of Health Sciences

University of York

York, UK

2Professor of Statistics in Medicine

Centre for Statistics in Medicine

Wolfson College Annexe

Linton Road

Oxford, UK

* Correspondence to J M Bland

Professor of Health Statistics

Dept. of Health Sciences

University of York

York YO10 5DD

email: mb55@york.ac.uk

1

Page 3

Abstract

Limits of agreement provide a straightforward and intuitive approach to agreement between

different methods for measuring the same quantity. When pairs of observations using the two

methods are independent, i.e. on different subjects, the calculations are very simple and

straightforward. Some authors collect repeated data, either as repeated pairs of measurements

on the same subject, whose true value of the measured quantity may be changing, or more

than one measurement by one or both methods of an unchanging underlying quantity. In this

paper we describe methods for analysing such clustered observations, both when the

underlying quantity is assumed to be changing and when it is not.

Introduction

The limits of agreement (LoA) method (Altman and Bland 1983, Bland and Altman 1986) for

assessing the agreement between two methods of medical measurement is widely used.

(Bland and Altman 1993, Ryan and Woodall 2005). We obtain the differences between

measurements by the two methods for each individual and calculate the mean and standard

deviation. We then estimate the 95% limits of agreement as the two values mean minus 1.96

standard deviations and mean plus 1.96 standard deviations. These limits are expected to

contain the difference between measurements by the two methods for 95% of pairs of future

measurements on similar individuals.

The motivating scenario for the LoA method is the case where each individual has one

measurement made by each of the methods X and Y. It is valuable, however, to obtain

replicate measurements by each method on each individual so that the repeatability of the two

methods can be compared (Bland and Altman 1999). Such data comprise a mixture of

between and within-individual information on the differences between methods. We did not

state in early publications that the LoA method assumes independent observations [Altman

2

Page 4

and Bland (1983) Bland and Altman (1986)], as this important requirement is not specific to

the LoA approach but rather applies to all types of statistical analyses. If each pair of X and Y

measurements is treated as if from a different individual the structure of the data is ignored

and incorrect estimates are likely; specifically, the interval between the limits of agreement

may be too narrow.

In this paper we look at how to apply the LoA method when we have repeated measurements

on each of a group of subjects. We consider separately two somewhat different situations.

Concepts

The key principle of the LoA method is to examine the average difference between the

methods, and also to consider the variability in those differences across individuals. It is an

implicit assumption that the difference between the two methods is reasonably stable across

the range of measurements, and we will assume this condition holds for the purpose of this

paper. We have discussed elsewhere possible strategies when this condition is not met,

including transformation of the data (Bland and Altman 1999).

Table 1 and Figure 1 show some typical data in which pairs of measurements were made

sequentially on each of a group of subjects. Here 60 pairs of measurements of cardiac

ejection fraction by two methods were made on 12 individuals, with 3-7 replicates per

individual. First, we might ignore the replication and treat these as 60 independent pairs of

measurements and calculate the mean and standard deviation of their differences. As noted

above, these limits of agreement could be too narrow. An alternative would be to average all

the observations on the same subject. The limits of agreement calculated in this way would

be for the average of several measurements and would be too narrow for a single

measurement. This approach is appropriate only when the usual clinical measurement is the

average of that number of observations.

3

Page 5

As an extreme example, Barry et al. (1997) reported the comparison of bioimpedance and

continuous thermodilution two methods of cardiac output using 2390 observations from just 7

patients.

A somewhat different problem is shown in a study by Almén et al. (1991) who reported the

glomerular filtration rate (GFR) in the left and right kidneys of 20 patients using both a

gamma camera and computed tomography (CT). They presented the GFR of each kidney as a

percentage of the total GFR for that patient. Unfortunately, they use data from both kidneys

in their comparison of the two methods, but they have effectively analysed all the data twice

for each patient, as the difference between methods with the left kidney is minus that for the

right kidney. Their plot displays point symmetry as a consequence of plotting each point as

both (X,Y) and (100-X, 100-Y). Had they calculated limits of agreement they would have

found that the mean difference was exactly zero.

There are two different situations to consider for replicated data. We can think of the

observations for the same subject as a series of measurements of a quantity that does not vary

over the period of observation. An example is measurements of carotid artery stenosis taken

on the same day. Or we can think of them as pairs of measurements by two methods of a

changing quantity, where it is the instantaneous measurement for the subject which we want

to capture. This second situation could arise either when the quantity being measured is

unstable, such as blood pressure or daily excretion of some chemical, or when observations

are made under different conditions – e.g. before and after exercise. The distinction is

important, as it determines whether we need to consider pairing of observations by the two

methods. Indeed, for the first (constant) case we do not require equal replication of each

method for each individual, whereas this is a requirement for the second (non-constant) case.

In Bland and Altman (1986) we described how to deal with the constant case, where the true

value of the quantity is not changing, but only for the simple case when the number of

4