Trivellore E Raghunathan

Harvard Medical School, Boston, Massachusetts, United States

Are you Trivellore E Raghunathan?

Claim your profile

Publications (200)888.97 Total impact

  • Hanzhi Zhou · Michael R. Elliott · Trivellore E. Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure.
    No preview · Article · Jan 2016
  • Source
    Niko A Kaciroti · Trivellore E Raghunathan · Jeremy M G Taylor · Stevo Julius

    Full-text · Dataset · Jul 2015
  • Source
    Niko A Kaciroti · Trivellore E Raghunathan · Jeremy M G Taylor · Stevo Julius

    Full-text · Dataset · May 2015
  • Qi Dong · Michael R. Elliott · Trivellore E. Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounting for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).
    No preview · Article · Dec 2014 · Survey methodology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Fat-soluble vitamin (FSV) deficiency is a well-recognized consequence of cholestatic liver disease and reduced intestinal intraluminal bile acid. We hypothesized that serum bile acid (SBA) would predict biochemical FSV deficiency better than serum total bilirubin (TB) level in infants with biliary atresia. Infants enrolled in the Trial of Corticosteroid Therapy in Infants with Biliary Atresia after hepatoportoenterostomy were the subjects of this investigation. Infants received standardized FSV supplementation and monitoring of TB, SBA, and vitamin levels at 1, 3, and 6 months. A logistic regression model was used with the binary indicator variable insufficient/sufficient as the outcome variable. Linear and nonparametric correlations were made between specific vitamin measurement levels and either TB or SBA. The degree of correlation for any particular vitamin at a specific time point was higher with TB than with SBA (higher for TB in 31 circumstances vs 3 circumstances for SBA). Receiver operating characteristic curve shows that TB performed better than SBA (area under the curve 0.998 vs 0.821). Including both TB and SBA did not perform better than TB alone (area under the curve 0.998). We found that TB was a better predictor of FSV deficiency than SBA in infants with biliary atresia. The role of SBA as a surrogate marker of FSV deficiency in other cholestatic liver diseases, such as progressive familial intrahepatic cholestasis, α-1-antitrypsin deficiency, and Alagille syndrome in which the pathophysiology is dominated by intrahepatic cholestasis, warrants further study.
    Full-text · Article · Dec 2014 · Journal of Pediatric Gastroenterology and Nutrition
  • Source
    Niko A. Kaciroti · Trivellore Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Pattern-mixture models (PMM) and selection models (SM) are alternative approaches for statistical analysis when faced with incomplete data and a nonignorable missing-data mechanism. Both models make empirically unverifiable assumptions and need additional constraints to identify the parameters. Here, we first introduce intuitive parameterizations to identify PMM for different types of outcome with distribution in the exponential family; then we translate these to their equivalent SM approach. This provides a unified framework for performing sensitivity analysis under either setting. These new parameterizations are transparent, easy-to-use, and provide dual interpretation from both the PMM and SM perspectives. A Bayesian approach is used to perform sensitivity analysis, deriving inferences using informative prior distributions on the sensitivity parameters. These models can be fitted using software that implements Gibbs sampling. Copyright © 2014 John Wiley & Sons, Ltd.
    Full-text · Article · Nov 2014 · Statistics in Medicine
  • Source

    Full-text · Dataset · Oct 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: To analyze the prevalence of acute asymptomatic group A and C rotavirus (RV-A and RV-C) infection in neonates with cholestasis. Participants were infants <180 days of age with cholestasis (serum direct or conjugated bilirubin >20% of total and ≥2 mg/dL) enrolled in the Childhood Liver Disease Research and Education Network during RV season (December-May). Forty infants with biliary atresia (BA), age 62 ± 29 days (range, 4.7-13 weeks) and 38 infants with cholestasis, age 67 ± 44 days (range, 3-15.8 weeks) were enrolled. At enrollment, RV-A IgM positivity rates did not differ between infants with BA (10%) vs those without (18%) (P = .349). RV-C IgM was positive in 0% of infants with BA vs 3% in those without BA (P = .49). RV-A IgG was lower in infants with BA: 51 ± 39 vs 56 ± 44 enzyme-linked immunoassay unit, P = .045 but this difference may lack biological relevance as maternal RV-A IgG titers were similar between groups. Infant RV-A IgM titers at 2-6 months follow-up increased markedly vs at presentation in both infants with BA (50 ± 30 vs 9 ± 9) and those without (43 ± 18 vs 16 ± 20 enzyme-linked immunoassay unit) (P < .0001), without differences between groups. RV-A infection in the first 6 months of life is common in infants with cholestasis of any cause. RV-A could have different pathogenetic effects by initiating different hepatic immune responses in infants with vs without BA or could lack pathogenetic significance. Copyright © 2014 Elsevier Inc. All rights reserved.
    No preview · Article · Oct 2014 · Journal of Pediatrics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Air pollution is linked to low lung function and to respiratory events, yet little is known of associations with lung structure. OBJECTIVES: We examined associations of particulate matter (PM2.5, PM10) and nitrogen oxides (NOx) with percent emphysema-like lung on computed tomography (CT). METHODS: The Multi-Ethnic Study of Atherosclerosis (MESA) recruited participants (45-84 years of age) in six U.S. states. Percent emphysema was defined as lung regions < -910 Hounsfield Units on cardiac CT scans acquired following a highly standardized protocol. Spirometry was also conducted on a subset. Individual-level 1- and 20-year average air pollution exposures were estimated using spatiotemporal models that included cohort-specific measurements. Multivariable regression was conducted to adjust for traditional risk factors and study location. RESULTS: Among 6,515 participants, we found evidence of an association between percent emphysema and long-term pollution concentrations in an analysis leveraging between-city exposure contrasts. Higher concentrations of PM2.5 (5 mu g/m(3)) and NOx (25 ppb) over the previous year were associated with 0.6 (95% CI: 0.1, 1.2%) and 0.5 (95% CI: 0.1, 0.9%) higher average percent emphysema, respectively. However, after adjustment for study site the associations were -0.6% (95% CI: -1.5, 0.3%) for PM2.5 and -0.5% (95% CI: -1.1, 0.02%) for NOx. Lower lung function measures (FEV1 and FVC) were associated with higher PM2.5 and NOx levels in 3,791 participants before and after adjustment for study site, though most associations were not statistically significant. CONCLUSIONS: Associations between ambient air pollution and percentage of emphysema-like lung were inconclusive in this cross-sectional study, thus longitudinal analyses may better clarify these associations with percent emphysema.
    Full-text · Article · Oct 2014 · Environmental Health Perspectives
  • [Show abstract] [Hide abstract]
    ABSTRACT: Prior studies suggest that circulating n-3 and trans-fatty acids influence the risk of sudden cardiac arrest (SCA). Yet, while other fatty acids also differ in their membrane properties and biological activities which may influence SCA, little is known about the associations of other circulating fatty acids with SCA. The aim of this study was to investigate the associations of 17 erythrocyte membrane fatty acids with SCA risk. We used data from a population-based case-control study of SCA in the greater Seattle, Washington, area. Cases, aged 25–74 years, were out-of-hospital SCA patients, attended by paramedics (n=265). Controls, matched to cases by age, sex and calendar year, were randomly identified from the community (n=415). All participants were free of prior clinically-diagnosed heart disease. Blood was obtained at the time of cardiac arrest by attending paramedics (cases) or at the time of an interview (controls). Higher levels of erythrocyte very long-chain saturated fatty acids (VLSFA) were associated with lower risk of SCA. After adjustment for risk factors and levels of n-3 and trans-fatty acids, higher levels of 20:0 corresponding to 1 SD were associated with 30% lower SCA risk (13%-43%, p=0.001). Higher levels of 22:0 and 24:0 were associated with similar lower SCA risk (ORs for 1 SD-difference: 0.71 [95% CI: 0.57–0.88, p=0.002] for 22:0; and 0.79 [95% CI: 0.63–0.98, p=0.04] for 24:0). These novel findings support the need for investigation of biologic effects of circulating VLSFA and their determinants.
    No preview · Article · Oct 2014 · Prostaglandins Leukotrienes and Essential Fatty Acids
  • Jian Zhu · Trivellore E. Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: A sequential regression or chained equations imputation approach uses a Gibbs sampling type iterative algorithm which imputes the missing values using a sequence of conditional regression models. It is a flexible approach for handling different types of variables and complex data structures. Many simulation studies have shown that the multiple imputation inferences based on this procedure have desirable repeated sampling properties. However, a theoretical weakness of this approach is that the specification of a set of conditional regression models may not be compatible with a joint distribution of the variables being imputed. Hence, the convergence properties of the iterative algorithm are not well understood. This paper develops conditions for convergence and assesses the properties of inferences from both compatible and incompatible sequence of regression models. The results are established for the missing data pattern where each subject may be missing a value on at most one variable. The sequence of regression models are assumed to be empirically good fit for the data chosen by the imputer based on appropriate model diagnostics. The results are used to develop criteria for the choice of regression models.
    No preview · Article · Aug 2014 · Journal of the American Statistical Association
  • Qi Dong · Michael R. Elliott · Trivellore E. Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequalprobability of selection sample designs.
    No preview · Article · Jun 2014 · Survey methodology
  • Joseph W. Sakshaug · Trivellore E. Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Small area statistics obtained from sample survey data provide a critical source of information used to study health, economic, and sociological trends. However, most large-scale sample surveys are not designed for the purpose of producing small area statistics. Moreover, data disseminators are prevented from releasing public-use microdata for small geographic areas for disclosure reasons; thus, limiting the utility of the data they collect. This research evaluates a synthetic data method, intended for data disseminators, for releasing public-use microdata for small geographic areas based on complex sample survey data. The method replaces all observed survey values with synthetic (or imputed) values generated from a hierarchical Bayesian model that explicitly accounts for complex sample design features, including stratification, clustering, and sampling weights. The method is applied to restricted microdata from the National Health Interview Survey and synthetic data are generated for both sampled and non-sampled small areas. The analytic validity of the resulting small area inferences is assessed by direct comparison with the actual data, a simulation study, and a cross-validation study.
    No preview · Article · Apr 2014 · Journal of Applied Statistics
  • Joseph W. Sakshaug · Trivellore Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed geographic information in public-use data sets, and accessing restricted data via Research Data Centers. This research examines an alternative method for disseminating microdata that contains more geographical details than are currently being released in public-use data files. Specifically, the method replaces the observed survey values with imputed, or synthetic, values simulated from a hierarchical Bayesian model. Confidentiality protection is enhanced because no actual values are released. The method is demonstrated using restricted data from the 2005-2009 American Community Survey. The analytic validity of the synthetic data is assessed by comparing small area estimates obtained from the synthetic data with those obtained from the observed data.
    No preview · Article · Apr 2013 · SSRN Electronic Journal
  • PT Baltrus · JW Lynch · SA Everson-Rose · TE Raghunathan · GA Kaplan
    [Show abstract] [Hide abstract]
    ABSTRACT: PT, Race Ethnicity, Life-Course Socioeconomic Position, 2005.pdf
    No preview · Article · Jan 2013
  • Wei Chen · Debashis Ghosh · Trivellore E Raghunathan · Maxim Norkin · Daniel J Sargent · Gerold Bepler
    [Show abstract] [Hide abstract]
    ABSTRACT: Providing personalized treatments designed to maximize benefits and minimizing harms is of tremendous current medical interest. One problem in this area is the evaluation of the interaction between the treatment and other predictor variables. Treatment effects in subgroups having the same direction but different magnitudes are called quantitative interactions, whereas those having opposite directions in subgroups are called qualitative interactions (QIs). Identifying QIs is challenging because they are rare and usually unknown among many potential biomarkers. Meanwhile, subgroup analysis reduces the power of hypothesis testing and multiple subgroup analyses inflate the type I error rate. We propose a new Bayesian approach to search for QI in a multiple regression setting with adaptive decision rules. We consider various regression models for the outcome. We illustrate this method in two examples of phase III clinical trials. The algorithm is straightforward and easy to implement using existing software packages. We provide a sample code in Appendix A. Copyright © 2012 John Wiley & Sons, Ltd.
    No preview · Article · Dec 2012 · Statistics in Medicine
  • Source
    Brisa N Sánchez · Meihua Wu · Trivellore E Raghunathan · Ana V Diez-Roux
    [Show abstract] [Hide abstract]
    ABSTRACT: In many studies, it has been hypothesized that stress and its biologic consequences may contribute to disparities in rates of cardiovascular disease. However, understanding of the most appropriate statistical methods to analyze biologic markers of stress, such as salivary cortisol, remains limited. The authors explore the utility of various statistical methods in modeling daily cortisol profiles in population-based studies. They demonstrate that the proposed methods allow additional insight into the cortisol profile compared with commonly used summaries of the profiles based on raw data. For instance, one can gain insights regarding the shape of the population average curve, characterize the types of individual-level departures from the average curve, and better understand the relation between covariates and attained cortisol levels or slopes at various points of the day, in addition to drawing inferences regarding common features of the cortisol profile, such as the cortisol awakening response and the area under the curve. The authors compare the inference and interpretations drawn from these methods and use data collected as part of the Multi-Ethnic Study of Atherosclerosis to illustrate them.
    Preview · Article · Oct 2012 · American journal of epidemiology
  • Yulei He · Trivellore E. Raghunathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple imputation has emerged as a popular approach to handling data sets with missing values. For incomplete continuous variables, imputations are usually produced using multivariate normal models. However, this approach might be problematic for variables with a strong non-normal shape, as it would generate imputations incoherent with actual distributions and thus lead to incorrect inferences. For non-normal data, we consider a multivariate extension of Tukey's gh distribution/transformation [38] to accommodate skewness and/or kurtosis and capture the correlation among the variables. We propose an algorithm to fit the incomplete data with the model and generate imputations. We apply the method to a national data set for hospital performance on several standard quality measures, which are highly skewed to the left and substantially correlated with each other. We use Monte Carlo studies to assess the performance of the proposed approach. We discuss possible generalizations and give some advices to practitioners on how to handle non-normal incomplete data.
    No preview · Article · Oct 2012 · Journal of Applied Statistics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Accurate estimates of hypertension prevalence are critical for assessment of population health and for planning and implementing prevention and health care programs. While self-reported data is often more economically feasible and readily available compared to clinically measured HBP, these reports may underestimate clinical prevalence to varying degrees. Understanding the accuracy of self-reported data and developing prediction models that correct for underreporting of hypertension in self-reported data can be critical tools in the development of more accurate population level estimates, and in planning population-based interventions to reduce the risk of, or more effectively treat, hypertension. This study examines the accuracy of self-reported survey data in describing prevalence of clinically measured hypertension in two racially and ethnically diverse urban samples, and evaluates a mechanism to correct self-reported data in order to more accurately reflect clinical hypertension prevalence. Methods We analyze data from the Detroit Healthy Environments Partnership (HEP) Survey conducted in 2002 and the National Health and Nutrition Examination (NHANES) 2001–2002 restricted to urban areas and participants 25 years and older. We re-calibrate measures of agreement within the HEP sample drawing upon parameter estimates derived from the NHANES urban sample, and assess the quality of the adjustment proposed within the HEP sample. Results Both self-reported and clinically assessed prevalence of hypertension were higher in the HEP sample (29.7 and 40.1, respectively) compared to the NHANES urban sample (25.7 and 33.8, respectively). In both urban samples, self-reported and clinically assessed prevalence is higher than that reported in the full NHANES sample in the same year (22.9 and 30.4, respectively). Sensitivity, specificity and accuracy between clinical and self-reported hypertension prevalence were ‘moderate to good’ within the HEP sample and ‘good to excellent’ within the NHANES sample. Agreement between clinical and self-reported hypertension prevalence was ‘moderate to good’ within the HEP sample (kappa =0.65; 95% CI = 0.63-0.67), and ‘good to excellent’ within the NHANES sample (kappa = 0.75; 95%CI = 0.73-0.80). Application of a ‘correction’ rule based on prediction models for clinical hypertension using the national sample (NHANES) allowed us to re-calibrate sensitivity and specificity estimates for the HEP sample. The adjusted estimates of hypertension in the HEP sample based on two different correction models, 38.1% and 40.5%, were much closer to the observed hypertension prevalence of 40.1%. Conclusions Application of a simple prediction model derived from national NHANES data to self-reported data from the HEP (Detroit based) sample resulted in estimates that more closely approximated clinically measured hypertension prevalence in this urban community. Similar correction models may be useful in obtaining more accurate estimates of hypertension prevalence in other studies that rely on self-reported hypertension.
    Preview · Article · Sep 2012 · BMC Health Services Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cholestasis predisposes to fat-soluble vitamin (FSV) deficiencies. A liquid multiple FSV preparation made with tocopheryl polyethylene glycol-1000 succinate (TPGS) is frequently used in infants with biliary atresia (BA) because of ease of administration and presumed efficacy. In this prospective multicenter study, we assessed the prevalence of FSV deficiency in infants with BA who received this FSV/TPGS preparation. Infants received FSV/TPGS coadministered with additional vitamin K as routine clinical care in a randomized double-blinded, placebo-controlled trial of corticosteroid therapy after hepatoportoenterostomy (HPE) for BA (identifier NCT 00294684). Levels of FSV, retinol binding protein, total serum lipids, and total bilirubin (TB) were measured 1, 3, and 6 months after HPE. Ninety-two infants with BA were enrolled in this study. Biochemical evidence of FSV insufficiency was common at all time points for vitamin A (29%-36% of patients), vitamin D (21%-37%), vitamin K (10%-22%), and vitamin E (16%-18%). Vitamin levels were inversely correlated with serum TB levels. Biochemical FSV insufficiency was much more common (15%-100% for the different vitamins) in infants whose TB was ≥2 mg/dL. At 3 and 6 months post HPE, only 3 of 24 and 0 of 23 infants, respectively, with TB >2 mg/dL were sufficient in all FSV. Biochemical FSV insufficiency is commonly observed in infants with BA and persistent cholestasis despite administration of a TPGS containing liquid multiple FSV preparation. Individual vitamin supplementation and careful monitoring are warranted in infants with BA, especially those with TB >2 mg/dL.
    Full-text · Article · Aug 2012 · PEDIATRICS

Publication Stats

9k Citations
888.97 Total Impact Points


  • 2014
    • Harvard Medical School
      • Department of Medicine
      Boston, Massachusetts, United States
  • 1995-2012
    • University of Michigan
      • • Department of Biostatistics
      • • Institute for Social Research
      Ann Arbor, Michigan, United States
  • 2009
    • Concordia University–Ann Arbor
      Ann Arbor, Michigan, United States
  • 2008
    • Portland State University
      • School of Community Health
      Portland, OR, United States
  • 2007
    • Salt Lake City Community College
      Salt Lake City, Utah, United States
    • Queensland University of Technology
      Brisbane, Queensland, Australia
    • Duke University
      Durham, North Carolina, United States
  • 1988-2006
    • University of Washington Seattle
      • • Department of Medicine
      • • Department of Biostatistics
      Seattle, Washington, United States
  • 2005
    • Morehouse School of Medicine
      • National Center for Primary Care
      Atlanta, Georgia, United States
    • Columbia University
      • Department of Political Science
      New York, New York, United States
  • 2001-2005
    • University of North Carolina at Chapel Hill
      • Department of Epidemiology
      Chapel Hill, NC, United States
  • 2002
    • Leiden University Medical Centre
      • Department of Clinical Epidemiology
      Leyden, South Holland, Netherlands
  • 1998
    • Kaiser Permanente
      Oakland, California, United States
  • 1997
    • Leiden University
      Leyden, South Holland, Netherlands