[Show abstract][Hide abstract] ABSTRACT: We propose an extension of the expectation-maximization (EM) algorithm, called the hyperpenalized EM (HEM) algorithm, that maximizes a penalized log-likelihood, for which some data are missing or unavailable, using a data-adaptive estimate of the penalty parameter. This is potentially useful in applications for which the analyst is unable or unwilling to choose a single value of a penalty parameter but instead can posit a plausible range of values. The HEM algorithm is conceptually straightforward and also very effective, and we demonstrate its utility in the analysis of a genomic data set. Gene expression measurements and clinical covariates were used to predict survival time. However, many survival times are censored, and some observations only contain expression measurements derived from a different assay, which together constitute a difficult missing data problem. It is desired to shrink the genomic contribution in a data-adaptive way. The HEM algorithm successfully handles both the missing data and shrinkage aspects of the problem.
Statistics in Biosciences 06/2015; DOI:10.1007/s12561-015-9132-x
[Show abstract][Hide abstract] ABSTRACT: We consider the problem of using permutation-based methods to test for treatment-covariate interactions from randomized clinical trial data. Testing for interactions is common in the field of personalized medicine, as subgroups with enhanced treatment effects arise when treatment-by-covariate interactions exist. Asymptotic tests can often be performed for simple models, but in many cases, more complex methods are used to identify subgroups, and non-standard test statistics proposed, and asymptotic results may be difficult to obtain. In such cases, it is natural to consider permutation-based tests, which shuffle selected parts of the data in order to remove one or more associations of interest; however, in the case of interactions, it is generally not possible to remove only the associations of interest by simple permutations of the data. We propose a number of alternative permutation-based methods, designed to remove only the associations of interest, but preserving other associations. These methods estimate the interaction term in a model, then create data that “looks like” the original data except that the interaction term has been permuted. The proposed methods are shown to outperform traditional permutation methods in a simulation study. In addition, the proposed methods are illustrated using data from a randomized clinical trial of patients with hypertension.
Statistics in Biosciences 03/2015; DOI:10.1007/s12561-015-9125-9
[Show abstract][Hide abstract] ABSTRACT: Objective:
. Given the long natural history of prostate cancer, we assessed differing graphical formats for imparting knowledge about the longitudinal risks of prostate cancer recurrence with or without 'hormone' or 'androgen deprivation' therapy.
. Male volunteers without a history of prostate cancer were randomized to 1 of 8 risk communication instruments that depicted the likelihood of prostate cancer returning or spreading over 1, 2, and 3 years. The tools differed in format (line, pie, bar, or pictograph) and whether the graph also included no numbers, 1 number (indicating the number of affected individuals), or 2 numbers (indicting both the number affected and the number unaffected). The main outcome variables evaluated were graphical preference and knowledge.
. A total of 420 men were recruited; respondents were least familiar and experienced with pictographs (P < 0.0001), and only 10% preferred this particular format. Overall accuracy ranged from 79% to 92%, and when assessed across all graphical subtypes, the addition of numerical information did not improve verbatim knowledge (P = 0.1). Self-reported numeracy was a strong predictor of accuracy of responses (odds ratio [OR] = 2.6, P = 0.008), and the impact of high numeracy varied across graphical type, having a greater impact on line (OR = 5.1; 95% confidence interval [CI] = 1.6-16; P = 0.04) and pie charts (OR = 7.1; 95% CI = 2.6-19; P =0.01), without an impact on pictographs (OR = 0.4; 95% CI = 0.1-1.7; P = 0.17) or bar charts (OR = 0.5; 95% CI = 0.1-1.8; P = 0.24).
. For longitudinal presentation of risk, baseline numeracy was strongly prognostic for outcome. However, the addition of numbers to risk graphs improved only the delivery of verbatim knowledge for subjects with lower numeracy. Although subjects reported the least familiarity with pictographs, they were one of the most effective means of transferring information regardless of numeracy.
Medical Decision Making 10/2014; 35(1). DOI:10.1177/0272989X14551639 · 3.24 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Because of the time and expense required to obtain clinical outcomes of interest, such as functional limitations or death, clinical trials often focus the effects of treatment on earlier and more easily obtained surrogate markers. Preliminary work to define surrogates focused on the fraction of a treatment effect "explained" by a marker in a regression model, but as notions of causality have been formalized in the statistical setting, formal definitions of high-quality surrogate markers have been developed in the causal inference framework, using either the "causal effect" or "causal association" settings. In the causal effect setting, high-quality surrogate markers have a large fraction of the total treatment effect explained by the effect of the treatment on the marker net of the treatment on the outcome. In the causal association setting, high-quality surrogate markers have large treatment effects on the outcome when there are large treatment effects on the marker, and small effects on the outcome when there are small effects on the marker. A particularly important feature of a surrogate marker is that the direction of a treatment effect be the same for both the marker and the outcome. Settings in which the marker and outcome are positively associated but the marker and outcome have beneficial and harmful or harmful and beneficial treatment effects, respectively, have been referred to as "surrogate paradoxes". If this outcome always occurs, it is not problematic; however, as correlations among the outcome, marker, and their treatment effects weaken, it may occur for some trials and not for others, leading to potentially incorrect conclusions, and real-life examples that shortened thousands of lives are unfortunately available. We propose measures for assessing the risk of the surrogate paradox using the meta-analytic causal association framework, which allows us to focus on the probability that a given treatment will yield treatment effect in different directions between the marker and the outcome, and to determine the size of a beneficial effect of the treatment on the marker required to minimize the risk of a harmful effect of the treatment on the outcome. We provide simulations and consider two applications.
[Show abstract][Hide abstract] ABSTRACT: A recent article (Zhang et al., 2012, Biometrics 168, 1010–1018) compares regression based and inverse probability based methods of estimating an optimal treatment regime and shows for a small number of covariates that inverse probability weighted methods are more robust to model misspecification than regression methods. We demonstrate that using models that fit the data better reduces the concern about non-robustness for the regression methods. We extend the simulation study of Zhang et al. (2012, Biometrics 168, 1010–1018), also considering the situation of a larger number of covariates, and show that incorporating random forests into both regression and inverse probability weighted based methods improves their properties.
[Show abstract][Hide abstract] ABSTRACT: With challenges in data harmonization and environmental heterogeneity across various data sources, meta-analysis of gene–environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed-effect meta-analysis: the standard inverse-variance weighted meta-analysis and a meta-regression approach. Akin to the results in Simmonds and Higgins (), we obtain analytic efficiency results for both methods under certain assumptions. The relative efficiency of the two methods depends on the ratio of within versus between cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta-analysis and meta-regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (, b) showed that a multivariate inverse-variance weighted estimator retains full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta-analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high-density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.
[Show abstract][Hide abstract] ABSTRACT: In this paper, we consider the problem of constructing confidence intervals (CIs) for G independent normal population means subject to linear ordering constraints. For this problem, CIs based on asymptotic distributions, likelihood ratio tests and bootstraps do not have good properties particularly when some of the population means are close to each other. We propose a new method based on defining intermediate random variables that are related to the original observations and using the CIs of the means of these intermediate random variables to restrict the original CIs from the separate groups. The coverage rates of the intervals are shown to exceed, but be close to, the nominal level for two groups, when the ratio of the variances is assumed known. Simulation studies show that the proposed CIs have coverage rates close to nominal levels with reduced average widths. Data on half-lives of an antibiotic are analyzed to illustrate the method.
[Show abstract][Hide abstract] ABSTRACT: In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21-29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431-440). The method is applied to data from a macular degeneration study and an ovarian cancer study.