Statistics in Medicine

Published by Wiley
Online ISSN: 1097-0258
Publications
Article
The standard analysis of variance (ANOVA) method is usually applied to analyse continuous data from cross-over studies. The method, however, has been known to be not robust for general variance-covariance structure. The simple empirical generalized least squares (EGLS) method, proposed in an attempt to improve the precision of the standard ANOVA method for general variance-covariance structure, is usually insufficient for small-sample cross-over trials. In this paper we compare the following commonly used or recent approaches: standard ANOVA; simple EGLS; modified ANOVA method derived from a modified approximate F-distribution; and a modified EGLS method adjusted by the Kenward and Roger procedure in terms of robustness and power while applying to small-sample cross-over studies (say, the sample size is less than 40) over a variety of variance-covariance structures by simulation. We find that the unconditional modified ANOVA method has robust performance for all of the simulated small-sample cross-over studies over the various variance-covariance structures, and has comparable power with the standard ANOVA method whenever they are comparable in type I error rate. The EGLS method (simple or modified) is not reliable when the sample size of a cross-over study is too small, say, less than 24 in the simulation, unless a simple covariance structure is correctly assumed. Given a relatively larger sample size, the modified EGLS method, assuming an unstructured covariance matrix, demonstrates robust performance over the various variance-covariance structures in the simulation and provides more powerful tests than those of the modified (or standard) ANOVA method.
 
Article
The assumption that comparative effectiveness research will provide timely, relevant evidence rests on changing the current framework for assembling evidence. In this commentary, we provide the background of how coverage decisions for new medical technologies are currently made in the United States. We focus on the statistical issues regarding how to use the ensemble of information for inferring comparative effectiveness. It is clear a paradigm shift in how clinical information is integrated in real-world settings to establish effectiveness is required.
 
Article
The data with which Student illustrated the application of his famous distribution are examined from a number of aspects. Central to the discussion is the within-patient clinical trial at Kalamazoo whose results were published by Cushny and Peebles and misquoted by Student and Fisher. This trial is discussed from historical, pharmacological and statistical perspectives. Student's and Fisher's analyses and a more modern analysis by Preece are considered as is Cushny's and Peebles's interpretation. Brief biographies of the five physicians involved in running the trial are presented.
 
Article
The Cancer Research UK study CR0720-11 is a trial to determine the tolerability and effect on survival of using two agents in combination in patients with advanced pancreatic cancer. In particular, the trial is designed first to identify the most suitable combination of doses of the two agents in terms of the incidence of dose-limiting toxicities. Then, the survival of all patients who have received that dose combination in the study so far, together with additional patients assigned to that dose combination to ensure that the total number is sufficient, will be analysed. If the survival outcomes show promise, then a definitive randomised study of that dose combination will be recommended. The first two patients in the trial will be treated with the lowest doses of each agent in combination. An adaptive Bayesian procedure based only on monotonicity constraints concerning the risks of toxicity at different dose levels will then be used to suggest dose combinations for subsequent patients. The survival analysis will concern only patients who received the chosen dose combination, and will compare observed mortality with that expected from an exponential model based on the known survival rates associated with current treatment. In this paper, the Bayesian dose-finding procedure is described and illustrated, and its properties are evaluated through simulation. Computation of the appropriate sample size for the survival investigation is also discussed.
 
Article
We propose a practical group sequential method, a conditional sequential sampling procedure, to test if a drug of interest (D) leads to an elevated risk for an adverse event E compared with a comparison drug C. The method is designed for prospective drug safety surveillance studies, in which, for each considered drug, a summary table with the exposed person-times and the associated numbers of adverse events summed by strata defined by several potential confounders, is collected and updated periodically using the health plans' administrative claims data. This new approach can be applied to test for elevated relative risk whenever the data are updated. Our approach adjusts for multiple testing to preserve the overall type I error with any specified alpha-spending function. Furthermore, it automatically adjusts for temporal trend and population heterogeneity across strata by conditioning on the numbers of adverse events within each stratum during each time period. Therefore, this approach is very flexible and applies to a wide class of settings. We conduct a simulation study to evaluate its performance under various scenarios. The approach is also applied to an example to examine if Rofecoxib leads to an increased relative risk for acute myocardial infraction (AMI) compared with its two counterparts Diclofenac and Naproxen, respectively. We end with discussions.
 
Article
A method of analysis is presented for estimating the magnitude of a treatment effect among compliers in a clinical trial which is asymptotically unbiased and respects the randomization. The approach is valid even when compliers have a different baseline risk than non-compliers. Adjustments for contamination (use of the treatment by individuals in the control arm) are also developed. When the baseline failure rates in non-compliers and contaminators are the same as those who accept their allocated treatment, the method produces larger treatment effects than an 'intent-to-treat' analysis, but the confidence limits are also wider, and (even without this assumption) asymptotically the efficiencies are the same. In addition to providing a better estimate of the true effect of a treatment in compliers, the method also provides a more realistic confidence interval, which can be especially important for trials aimed at showing the equivalence of two treatments. In this case the intent-to-treat analysis can give unrealistically narrow confidence intervals if substantial numbers of patients elect to have the treatment they were not randomized to receive.
 
Article
Floating absolute risks are an alternative way of presenting relative risk estimates for polychotomous risk factors. Instead of choosing one level of the risk factor as a reference category, each level is assigned a 'floated' variance which describes the uncertainty in risk without reference to another level. In this paper, a method for estimating the floated variances is presented that improves on the previously proposed 'heuristic' method. The estimates may be calculated iteratively with a simple algorithm. A benchmark for validating the floated variance estimates is also proposed and an interpretation of floating confidence intervals is given.
 
Article
I derive the exact distribution of the unpaired t-statistic computed when the data actually come from a paired design. I use this to prove a result Diehr et al. obtained by simulation, namely that the type I error rate of this procedure is no greater than alpha regardless of the sample size. I provide a formula to use in computation of power and type I error rate.
 
Article
This paper considers the statistical complexities that arise due to outcome related drop-outs in longitudinal clinical trials of the randomized parallel groups design with fixed assessment times and an explanatory aim. The shortcomings of currently popular methods of coping with the problem of drop-outs are discussed. It is proposed that progress can be made by applying the modern methodology that was primarily developed for sample surveys with non-response and for observational studies. A practical application using the Hamilton Rating Scale for Depression is presented.
 
Article
The standard phase II trial problem is to decide whether or not to continue the testing of a new agent (or combination). Typically, one tests the null hypothesis HO: p = pO against the alternative HA: p = pA, where p is the probability of response. There is available a variety of two-stage phase II designs, including optimal designs according to various criteria. Practical considerations in the conduct of multicentre trials, however, make it difficult to follow designs precisely. We investigate several approaches to adapting stopping rules when the attained sample size is not the planned size. We find that a simple approach of testing HA: p = pA at the 0.02 level at the first stage and HO: p = pO at the 0.055 level at the second stage works well across a variety of powers, pOs and pAs.
 
Article
This paper reviews statistical methods for the analysis of discrete and continuous longitudinal data. The relative merits of longitudinal and cross-sectional studies are discussed. Three approaches, marginal, transition and random effects models, are presented with emphasis on the distinct interpretations of their coefficients in the discrete data case. We review generalized estimating equations for inferences about marginal models. The ideas are illustrated with analyses of a 2 x 2 crossover trial with binary responses and a randomized longitudinal study with a count outcome.
 
Article
Meta-regression has become a commonly used tool for investigating whether study characteristics may explain heterogeneity of results among studies in a systematic review. However, such explorations of heterogeneity are prone to misleading false-positive results. It is unclear how many covariates can reliably be investigated, and how this might depend on the number of studies, the extent of the heterogeneity and the relative weights awarded to the different studies. Our objectives in this paper are two-fold. First, we use simulation to investigate the type I error rate of meta-regression in various situations. Second, we propose a permutation test approach for assessing the true statistical significance of an observed meta-regression finding. Standard meta-regression methods suffer from substantially inflated false-positive rates when heterogeneity is present, when there are few studies and when there are many covariates. These are typical of situations in which meta-regressions are routinely employed. We demonstrate in particular that fixed effect meta-regression is likely to produce seriously misleading results in the presence of heterogeneity. The permutation test appropriately tempers the statistical significance of meta-regression findings. We recommend its use before a statistically significant relationship is claimed from a standard meta-regression analysis.
 
Article
The needs for statistical surveillance in different areas of medicine are described. The predictive value of an alarm and other measures for the evaluation of alarm procedures are suggested. The measures are used to evaluate some common methods of continual surveillance of time series. It is demonstrated that some methods have about the same properties at the start of the surveillance period as later. This is however not the case for all methods. For some methods the consequence of an early alarm should be quite different from that of a late one.
 
Article
We discuss the implementation of a criterion due to Prentice for the statistical validation of intermediate endpoints for chronic disease. The criterion involves examining in a cohort or intervention study whether an exposure or intervention effect, adjusted for the intermediate endpoint, is reduced to zero. For example, to examine whether serum cholesterol level is an intermediate endpoint for coronary heart disease (CHD), we may investigate the effect of the cholesterol lowering drug cholestyramine on CHD incidence adjusted for serum cholesterol levels. We show that use of this criterion will usually demand some form of model selection. When the unadjusted exposure or treatment effect is less than four times its standard error, the analysis can usually lead only to a weak form of validation, a conclusion that the data are not inconsistent with the validation criterion. More significant unadjusted exposure effects offer the potential for stronger types of validation statement such as 'the intermediate endpoint explains at least 50 per cent (or 75 per cent) of the exposure effect'.
 
Article
We describe an adaptive dose escalation scheme for use in cancer phase I clinical trials. The method is fully adaptive, makes use of all the information available at the time of each dose assignment, and directly addresses the ethical need to control the probability of overdosing. It is designed to approach the maximum tolerated dose as fast as possible subject to the constraint that the predicted proportion of patients who receive an overdose does not exceed a specified value. We conducted simulations to compare the proposed method with four up-and-down designs, two stochastic approximation methods, and with a variant of the continual reassessment method. The results showed the proposed method effective as a means to control the frequency of overdosing. Relative to the continual reassessment method, our scheme overdosed a smaller proportion of patients, exhibited fewer toxicities and estimated the maximum tolerated dose with comparable accuracy. When compared to the non-parametric schemes, our method treated fewer patients at either subtherapeutic or severely toxic dose levels, treated more patients at optimal dose levels and estimated the maximum tolerated dose with smaller average bias and mean squared error. Hence, the proposed method is promising alternative to currently used cancer phase I clinical trial designs.
 
Article
Results from external studies often play an important role in many aspects of a clinical trial. Their incorporation into the decision making process of a trial, however, is rarely conducted in a formal manner. This conference will address what formal role, if any, meta-analytic summaries of external results should have in the design and monitoring of an ongoing trial. This introductory presentation describes in detail the example from obstetric research which motivated the conference topic, and, in a Bayesian framework, summarizes the general implications of formally incorporating meta-analytic results into the design and analysis of a new trial.
 
Article
A new exponentially weighted moving average (EWMA) control chart well suited for 'online' routine surveillance of medical procedures is introduced. The chart is based on inter-event counts for failures recorded when the failures occur. The method can be used for many types of hospital procedures and activities, such as problems or errors in surgery, hospital-acquired infections, erroneous handling or prescription of medicine, deviations from scheduled treatments causing inconveniences for patients. The construction, the use and the effectiveness of the control chart are demonstrated by two well-known examples about wound infection in orthopaedic surgery and neonatal arterial switch surgery. The method is easy to implement and apply, it illustrates, estimates and tests the current failure rate. Comparisons with two examples from the literature indicate that its ability to quickly detect an increased failure rate is comparable to that of other well-established methods.
 
Article
For meta-analysis, substantial uncertainty remains about the most appropriate statistical methods for combining the results of separate trials. An important issue for meta-analysis is how to incorporate heterogeneity, defined as variation among the results of individual trials beyond that expected from chance, into summary estimates of treatment effect. Another consideration is which 'metric' to use to measure treatment effect; for trials with binary outcomes, there are several possible metrics, including the odds ratio (a relative measure) and risk difference (an absolute measure). To examine empirically how assessment of treatment effect and heterogeneity may differ when different methods are utilized, we studied 125 meta-analyses representative of those performed by clinical investigators. There was no meta-analysis in which the summary risk difference and odds ratio were discrepant to the extent that one indicated significant benefit while the other indicated significant harm. Further, for most meta-analyses, summary odds ratios and risk differences agreed in statistical significance, leading to similar conclusions about whether treatments affected outcome. Heterogeneity was common regardless of whether treatment effects were measured by odds ratios or risk differences. However, risk differences usually displayed more heterogeneity than odds ratios. Random effects estimates, which incorporate heterogeneity, tended to be less precisely estimated than fixed effects estimates. We present two exceptions to these observations, which derive from the weights assigned to individual trial estimates. We discuss the implications of these findings for selection of a metric for meta-analysis and incorporation of heterogeneity into summary estimates. Published in 2000 by John Wiley & Sons, Ltd.
 
Article
Clinicians often wish to use data from clinical trials or hospital databases to study disease natural history. Of particular interest are estimated survival and prognostic factors. In this context, it may be appropriate to measure survival from diagnosis or some other time origin, possibly prior to study entry. We describe the application of methods for truncated survival data, and compare these with the standard product limit estimator and proportional hazards models in the measurement of survival from entry. Theoretical considerations suggest that analysis of survival from entry may under- or overestimate the survival distribution of interest, depending on the shape of the true underlying hazard. Analogous results hold for the coefficients from a proportional hazards model. We illustrate our findings with data from a multicenter clinical trial and a hospital database.
 
Article
In medical research, continuous variables are often converted into categorical variables by grouping values into two or more categories. We consider in detail issues pertaining to creating just two groups, a common approach in clinical research. We argue that the simplicity achieved is gained at a cost; dichotomization may create rather than avoid problems, notably a considerable loss of power and residual confounding. In addition, the use of a data-derived 'optimal' cutpoint leads to serious bias. We illustrate the impact of dichotomization of continuous predictor variables using as a detailed case study a randomized trial in primary biliary cirrhosis. Dichotomization of continuous data is unnecessary for statistical analysis and in particular should not be applied to explanatory variables in regression models.
 
Article
In general, intraclass correlation coefficients (ICC's) are designed to assess consistency or conformity between two or more quantitative measurements. They are claimed to handle a wide range of problems, including questions of reliability, reproducibility and validity. It is shown that care must be taken in choosing a suitable ICC with respect to the underlying sampling theory. For this purpose a decision tree is developed. It may be used to choose a coefficient which is appropriate for a specific study setting. We demonstrate that different ICC's may result in quite different values for the same data set, even under the same sampling theory. Other general limitations of ICC's are also addressed. Potential alternatives are presented and discussed, and some recommendations are given for the use of an appropriate method.
 
Article
The 13C-urea breath test (UBT) is currently regarded as one of the most important noninvasive diagnostic methods for detecting Helicobacter pylori (H. pylori) infection in adults and children. However, for infants and young children, the standard for UBT interpretation has not been validated, and its reliability has not been established for diagnosing H. pylori infection in this group. The primary outcome data from UBT consist of mixture data, which come from subjects whose H. pylori infection classifications are unconfirmed. In this paper, we propose the finite mixture distribution method to identify a reliable UBT cut-off value in a large baseline sample in which gastric biopsy is not available to confirm the H. pylori infection in younger children. Maximum likelihood estimators of the parameters in the mixture model were obtained using an expectation maximization (EM) algorithm. The standard deviation of the cut-off point was estimated by bootstrap methods. We applied the same analytical methods to the UBT results yielded from the follow up, as well as the overall UBT results in the longitudinal cohort data. The cut-off points from those UBT data sets are similar. The advantage of the finite mixture model is that it may be used to calculate sensitivity and specificity in the absence of other diagnostic tests.
 
Article
Armstrong and Sloan have reviewed two types of ordinal logistic models for epidemiologic data: the cumulative-odds model and the continuation-ratio model. I review here certain aspects of these models not emphasized previously, and describe a third type, the stereotype model, which in certain situations offers greater flexibility coupled with interpretational advantages. I illustrate the models in an analysis of pneumoconiosis among coal miners.
 
Article
Conventionally a confidence interval (CI) for the standardized mortality ratio is set using the conservative CI for a Poisson expectation, mu. Employing the mid-P argument we present alternative CIs that are shorter than the conventional ones. The mid-P intervals do not guarantee the nominal confidence level, but the true coverage probability is only lower than the nominal level for a few short ranges of mu. The implications for mid-P confidence intervals of various proposed definitions of two-sided tests for discrete data are discussed.
 
Top-cited authors
Frank E Harrell
  • Vanderbilt University
Kerry L Lee
  • Duke University
Patrick Royston
  • University College London
Peter Austin
  • Institute for Clinical Evaluative Sciences
Angela Wood
  • University of Cambridge