Publications (132)323.88 Total impact
 [Show abstract] [Hide abstract]
ABSTRACT: Had I anticipated that my article on the sidedness of P(1) would have excited such a vigorous and sometimes adverse correspondence, would I have submitted it? Probably yes, for they do say that adverse references to one's papers improves one's scientific citation index no end! It has belatedly occurred to me to find out how often I have used onesided tests in my own lifetime of publications. Google Scholar tells me that the answer is 'never'. Well, hardly ever because I then remembered that, in 1962, Beale and I published a study on the presence or absence of femoral venous valves in those with, or without, a family history of varicose veins.(2) We took the 2×2 table that summarized our results to George Spears, a professional statistician This article is protected by copyright. All rights reserved.Clinical and Experimental Pharmacology and Physiology 06/2013; · 2.41 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: P stands for the probability, ranging in value from 0 to 1, that results from a test of significance. It can also be regarded as the strength of evidence against the statistical null hypothesis (H0 ). When H0 is evaluated by statistical tests based on distributions such as t, normal, or chisquared, P can be derived from one tail of the distribution: onesided or onetailed P. Or it can be derived from both tails: twosided or twotailed P. Distinguished statisticians, the authors of statistical texts, the authors of guidelines for human and animal experimentation, and the editors of biomedical journals give confusing advice, or none, about the choice between onesided and twosided P values. Such a choice is available only when there are no more than two groups to be compared. I argue that the choice depends on the alternative hypothesis (H1 ), which corresponds to the scientific hypothesis. If H1 is nonspecific and merely states that the means or proportions in the two groups are unequal, then a twosided P is appropriate. But if H1 is specific and, for instance, states than the mean or proportion of group A is greater than that of group B, then a onesided P maybe used. The form which H1 will take if H0 is rejected must be stipulated a priori, before the experiment is conducted. It is essential that authors state whether the P values resulting from their tests of significance are one or twosided © 2013 The Authors Clinical and Experimental Pharmacology and Physiology © 2013 Wiley Publishing Asia Pty Ltd.Clinical and Experimental Pharmacology and Physiology 03/2013; · 2.41 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: A survey of five journals of physiology or pharmacology for the year 2011 showed that Fisher's exact test was used three times as frequently as Pearson's chisquared test. I shall argue that neither test is appropriate for analysing 2 × 2 tables of frequency in biomedical research. Pearson's test requires that random samples are taken from defined populations. The resultant 2 × 2 table is described as unconditional because neither the row nor column marginal totals are fixed in advance. Fisher's test requires the rare condition that both row and column marginal totals are fixed in advance. The resultant 2 × 2 table is described as doubly conditioned. However the commonest design of biomedical studies is that a sample of convenience is taken, and divided randomly into two groups of predetermined size. The groups are then exposed to different sets of conditions. The binomial outcome is not fixed in advance but depends on the result of the study. Thus only the column (group) marginal totals are fixed in advance, and the table is described as singly conditioned. Singlyconditioned 2 × 2 tables are best analysed by tests of null hypotheses on the odds ratio (OR = 1); or by tests on proportions (p), such as the relative risk (RR = p(2) /p(1) = 1) or the difference between proportions (p(2)  p(1) = 0). One enormous advantage of these procedures is that they test specific hypotheses. They should be executed in an exact fashion by permutation. © 2013 The Authors Clinical and Experimental Pharmacology and Physiology © 2013 Wiley Publishing Asia Pty Ltd.Clinical and Experimental Pharmacology and Physiology 01/2013; · 2.41 Impact Factor 
Article: Is there still a place for Pearson's chisquared test and Fisher's exact test in surgical research?
ANZ Journal of Surgery 12/2011; 81(12):9236. · 1.50 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: 1. There are two very different ways of executing linear regression analysis. One is Model I, when the xvalues are fixed by the experimenter. The other is Model II, in which the xvalues are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give stepbystep instructions in the Supplementary Information as to how to use loss functions.Clinical and Experimental Pharmacology and Physiology 11/2011; 39(4):32935. · 2.41 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: 1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the PassingBablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by handheld calculator or by using the loss function in popular, generalpurpose, statistical software.Clinical and Experimental Pharmacology and Physiology 03/2010; 37(7):6929. · 2.41 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Twentyfour hour ambulatory blood pressure thresholds have been defined for the diagnosis of mild hypertension but not for its treatment or for other blood pressure thresholds used in the diagnosis of moderate to severe hypertension. We aimed to derive age and sex related ambulatory blood pressure equivalents to clinic blood pressure thresholds for diagnosis and treatment of hypertension. We collated 24 hour ambulatory blood pressure data, recorded with validated devices, from 11 centres across six Australian states (n=8575). We used least product regression to assess the relation between these measurements and clinic blood pressure measured by trained staff and in a smaller cohort by doctors (n=1693). Mean age of participants was 56 years (SD 15) with mean body mass index 28.9 (5.5) and mean clinic systolic/diastolic blood pressure 142/82 mm Hg (19/12); 4626 (54%) were women. Average clinic measurements by trained staff were 6/3 mm Hg higher than daytime ambulatory blood pressure and 10/5 mm Hg higher than 24 hour blood pressure, but 9/7 mm Hg lower than clinic values measured by doctors. Daytime ambulatory equivalents derived from trained staff clinic measurements were 4/3 mm Hg less than the 140/90 mm Hg clinic threshold (lower limit of grade 1 hypertension), 2/2 mm Hg less than the 130/80 mm Hg threshold (target upper limit for patients with associated conditions), and 1/1 mm Hg less than the 125/75 mm Hg threshold. Equivalents were 1/2 mm Hg lower for women and 3/1 mm Hg lower in older people compared with the combined group. Our study provides daytime ambulatory blood pressure thresholds that are slightly lower than equivalent clinic values. Clinic blood pressure measurements taken by doctors were considerably higher than those taken by trained staff and therefore gave inappropriate estimates of ambulatory thresholds. These results provide a framework for the diagnosis and management of hypertension using ambulatory blood pressure values.BMJ (online) 01/2010; 340:c1104. · 17.22 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Background: Twentyfour hour ambulatory blood pressure (ABP) thresholds exist for the diagnosis of mild hypertension but not for other blood pressure (BP) thresholds used either in the diagnosis of moderatesevere hypertension or its treatment. We aimed to derive age and sex differentiated ABP equivalents for both hypertension diagnosis and treatment thresholds. Methods: Twentyfour ABP data recorded with validated devices were collated from 11 centres across six Australian States (n=8,575) and related to clinic systolic BP (SBP) and diastolic BP (DBP) measured by health professional staff or by physician using least product regression. Results: Subjects were 56 years old (54% female), with body mass index 28.9 kg/m2 and clinic SBP/DBP of 142/82 mmHg. Average clinic measures were 6/3 mmHg higher than daytime ABP and 10/5 mmHg higher than 24 hour ABP. Staffmeasured clinic BP was 9/7 mmHg lower than physicianmeasured BP. Daytime ABP equivalents were 4/3 mmHg lower at the 140/90 mmHg threshold (lower limit of grade 1 hypertension), 2/2 mmHg lower at 130/80 mmHg (upper limit with associated conditions), and 1/1 mmHg lower at 125/75 mmHg. Equivalents were 3/2 mmHg lower for females and 24 mmHg lower in older subjects. Conclusions: Our study provides daytime ABP thresholds for target clinic BP which are slightly below clinic values when the latter are measured by professional staff. Physician measurements of clinic BP were considerably higher, which inappropriately modified estimates of ABP treatment thresholds. These results provide a framework for the diagnosis and management of hypertension using ABP.Brit Med J. 01/2010; 340:c1104.  [Show abstract] [Hide abstract]
ABSTRACT: 1. Altman and Bland argue that the virtue of plotting differences against averages in methodcomparison studies is that 95% confidence limits for the differences can be constructed. These allow authors and readers to judge whether one method of measurement could be substituted for another. 2. The technique is often misused. So I have set out, by statistical argument and worked examples, to advise pharmacologists and physiologists how best to construct these limits. 3. First, construct a scattergram of differences on averages, then calculate the line of best fit for the linear regression of differences on averages. If the slope of the regression is shown to differ from zero, there is proportional bias. 4. If there is no proportional bias and if the scatter of differences is uniform (homoscedasticity), construct 'classical' 95% confidence limits. 5. If there is proportional bias yet homoscedasticity, construct hyperbolic 95% confidence limits (prediction interval) around the line of best fit. 6. If there is proportional bias and the scatter of values for differences increases progressively as the average values increase (heteroscedasticity), logtransform the raw values from the two methods and replot differences against averages. If this eliminates proportional bias and heteroscedasticity, construct 'classical' 95% confidence limits. Otherwise, construct horizontal Vshaped 95% confidence limits around the line of best fit of differences on averages or around the weighted least products line of best fit to the original data. 7. In designing a methodcomparison study, consult a qualified biostatistician, obey the rules of randomization and make replicate observations.Clinical and Experimental Pharmacology and Physiology 09/2009; 37(2):1439. · 2.41 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: THE CLINICAL PROBLEM: If a surgeon has performed a particular operation on n consecutive patients without major complications, what is the longterm risk of major complications after performing many more such operations? Examples of such operations are endoscopic cholecystectomy, nephrectomy and sympathectomy. THE STATISTICAL PROBLEM AND SOLUTIONS: This general problem has exercised the minds of theoretical statisticians for more than 80 years. They agree only that the longterm risk is best expressed as the upper bound of a 95% confidence interval. We consider many proposed solutions, from those that involve complex statistical theory to the empirical 'rule of three', popular among clinicians, in which the percentage risk is given by the formula 100 x (3/n). OUR CONCLUSIONS: The 'rule of three' grossly underestimates the future risks and can be applied only when the initial complication rate is zero (that is, 0/n). If the initial complication rate is greater than zero, then no simple 'rule' suffices. We give the results of applying the more popular statistical models, including their coverage. The 'exact' ClopperPearson interval has wider coverage across all proportions than its nominal 95%, and is, thus, too conservative. The Wilson score confidence interval gives about 95% coverage on average overall population proportions, except very small ones, so we prefer it to the ClopperPearson method. Unlike all the other intervals, Bayesian intervals with uniform priors yield exactly 95% coverage at any observed proportion. Thus, we strongly recommend Bayesian intervals and provide free software for executing them.ANZ Journal of Surgery 08/2009; 79(78):56570. · 1.50 Impact Factor  Heart Lung and Circulation  HEART LUNG CIRC. 01/2009; 18(1):6666.
 ANZ Journal of Surgery 01/2009; 79(1‐2). · 1.50 Impact Factor
 Clinical and Experimental Pharmacology and Physiology 11/2008; 35(10):12714; author reply 1274. · 2.41 Impact Factor
 [Show abstract] [Hide abstract]
ABSTRACT: BACKGROUND: Biomedical investigators often use unsuitable statistical techniques for analysing the 2 x 2 tables that result from their experimental observations. This is because they are confused by the conflicting, and sometimes inaccurate, advice they receive from statistical texts or statistical consultants. METHODS: These consist of a review of published work, and the use of five different statistical procedures to analyse a 2 x 2 table, executed by StatXact 8.0, Testimate 6.0, Stata 10.0, SAS 9.1 and SPSS 16.0. Discussion and Conclusions It is essential to classify a 2 x 2 table before embarking on its analysis. A useful classification is into (i) Independence trials (doubly conditioned). These almost never occur in biomedical research because they involve predetermining the column and row totals in a 2 x 2 table. The Fisher exact test is the best method for analysing these trials. (ii) Comparative trials (singly conditioned). These correspond to the usual experimental design in biomedical work, in which a sample of convenience is randomized into two treatment groups, so that the group (column) totals are fixed in advance. The proper tests of significance are exact tests on the odds ratio, on the ratio of proportions (relative risk and risk ratio) or on the difference between proportions. (iii) Double dichotomy trials (unconditional). In these, a genuine random sample is taken from a defined population. Thus, neither column nor row totals are fixed in advance. The only practicable test is Pearson's chi(2)test. In analysing any of the above trials, exact tests are to be much preferred to asymptotic (approximate) tests. The different commercial software packages use different algorithms for exact tests, and can give different outcomes in terms of Pvalues and confidence intervals. The most useful are StatXact and Testimate.International Journal of Epidemiology 09/2008; 37(6):14305. · 6.98 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: 1. The problems of, and best solutions for, outlying observations and missing values are very dependent on the sizes of the experimental groups. For original articles published in Clinical and Experimental Pharmacology and Physiology during 20062007, the range of group sizes ranged from three to 44 ('small groups'). In surveys, epidemiological studies and clinical trials, the group sizes range from 100s to 1000s ('large groups'). 2. How can one detect outlying (extreme) observations? The best methods are graphical, for instance: (i) a scatterplot, often with mean+/2 s; and (ii) a boxandwhisker plot. Even with these, it is a matter of judgement whether observations are truly outlying. 3. It is permissable to delete or replace outlying observations if an independent explanation for them can be found. This may be, for instance, failure of a piece of measuring equipment or human error in operating it. If the observation is deleted, it can then be treated as a missing value. Rarely, the appropriate portion of the study can be repeated. 4. It is decidedly not permissable to delete unexplained extreme values. Some of the acceptable strategies for handling them are: (i) transform the data and proceed with conventional statistical analyses; (ii) use the mean for location, but use permutation (randomization) tests for comparing means; and (iii) use robust methods for describing location (e.g. median, geometric mean, trimmed mean), for indicating dispersion (range, percentiles), for comparing locations and for regression analysis. 5. What can be done about missing values? Some strategies are: (i) ignore them; (ii) replace them by hand if the data set is small; and (iii) use computerized imputation techniques to replace them if the data set is large (e.g. regression or EM (conditional Expectation, Maximum likelihood estimation) methods). 6. If the missing values are ignored, or even if they are replaced, it is essential to test whether the individuals with missing values are otherwise indistinguishable from the remainder of the group. If the missing values have not occurred at random, but are associated with some property of the individuals being studied, the subsequent analysis may be biased.Clinical and Experimental Pharmacology and Physiology 06/2008; 35(56):6708. · 2.41 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: The conventional method for estimating survival over time following an episode of disease or treatment is the KaplanMeier (KM) technique, which results in a stepdown survival plot, with upper and lower bounds of 1.0 and 0, respectively. The mirror image of this plot represents the cumulative incidence of an adverse event, such as death, with lower and upper bounds of 0 and 1.0, respectively. However, if there are two competing events that can occur during follow up, such as death or relapse, the KM technique gives a false picture of the cumulative incidence of either one of these events. This occurs because patients who have died cannot subsequently relapse. When there are two competing events, another technique must be used, which is known, variously, as cumulative incidence analysis, or 'actual' (as opposed to actuarial) incidence analysis. An example is given in which there are two competing adverse events following haemopoietic stem cell transplantation for a haematological malignancy: (i) relapse or (ii) transplantrelated death. Our analysis of the example shows that the cumulative probability of relapse is progressively inflated if the traditional KM productlimit method is used rather than actual cumulative incidence analysis. We show how KM and actual cumulative survival or incidence analyses can be executed by a handheld calculator for small datasets or by formulae within a computer spreadsheet for large datasets. Surgical investigators should not use the KM technique to predict cumulative survival or risk if there are two competing adverse events. They should use, instead, the technique of actual cumulative survival or incidence analysis.ANZ Journal of Surgery 04/2008; 78(3):20410. · 1.50 Impact Factor 
Article: Statistics in biomedical laboratory and clinical science: applications, issues and pitfalls.
[Show abstract] [Hide abstract]
ABSTRACT: This review is directed at biomedical scientists who want to gain a better understanding of statistics: what tests to use, when, and why. In my view, even during the planning stage of a study it is very important to seek the advice of a qualified biostatistician. When designing and analyzing a study, it is important to construct and test global hypotheses, rather than to make multiple tests on the data. If the latter cannot be avoided, it is essential to control the risk of making falsepositive inferences by applying multiple comparison procedures. For comparing two means or two proportions, it is best to use exact permutation tests rather then the better known, classical, ones. For comparing many means, analysis of variance, often of a complex type, is the most powerful approach. The correlation coefficient should never be used to compare the performances of two methods of measurement, or two measures, because it does not detect bias. Instead the AltmanBland method of differences or leastproducts linear regression analysis should be preferred. Finally, the educational value to investigators of interaction with a biostatistician, before, during and after a study, cannot be overemphasized.Medical Principles and Practice 02/2008; 17(1):113. · 0.96 Impact Factor  ANZ Journal of Surgery 01/2008; 59(3):267  267. · 1.50 Impact Factor
 ANZ Journal of Surgery 01/2008; 61(5):329  331. · 1.50 Impact Factor

Article: SOFTWARE REVIEW
Clinical and Experimental Pharmacology and Physiology 01/2008; 35(1). · 2.41 Impact Factor
Publication Stats
2k  Citations  
323.88  Total Impact Points  
Top Journals
Institutions

1993–2013

Victoria University Melbourne
Melbourne, Victoria, Australia


1992–2013

University of Melbourne
 Department of Surgery
Melbourne, Victoria, Australia


1989–2008

Royal Melbourne Hospital
Melbourne, Victoria, Australia


2002

Royal Perth Hospital
Perth City, Western Australia, Australia


2000–2001

Diabetes Australia, Victoria
Melbourne, Victoria, Australia


1991

Monash University (Australia)
 Department of Medicine
Melbourne, Victoria, Australia


1990

University of Vic
Vic, Catalonia, Spain


1980–1982

University of Adelaide
Tarndarnya, South Australia, Australia


1978

University of Milan
Milano, Lombardy, Italy
