
Roderick J Little- Ph.D.
- University of Michigan
Roderick J Little
- Ph.D.
- University of Michigan
About
166
Publications
23,719
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,486
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (166)
Accidents are a leading cause of deaths in U.S. active duty personnel. Understanding accident deaths during wartime could facilitate future operational planning and inform risk prevention efforts. This study expands prior research, identifying health risk factors associated with U.S. Army accident deaths during the Afghanistan and Iraq war.
Militar...
When sample sizes are small, a useful alternative approach to multiple imputation (ML) is to add a prior distribution for the parameters and compute the posterior distribution of the parameters of interest. As with ML estimation with a general pattern of missing values, Bayes simulation requires iteration. The iterative simulation methods discussed...
Missing values in predictors are a common problem in survival analysis. In this paper, we review estimation methods for accelerated failure time models with missing predictors, and apply a new method called subsample ignorable likelihood (IL) Little and Zhang (J R Stat Soc 60:591-605, 2011) to this class of models. The approach applies a likelihood...
Imputations are means or draws from a predictive distribution of the missing values, and require a method of creating a predictive distribution for the imputation based on the observed data. There are two generic approaches to generating this distribution: Explicit modeling: the predictive distribution is based on a formal statistical model, and he...
The estimate is computed as part of the Newton?Raphson algorithm for Maximum Likelihood (ML) estimation, and computed as part of the scoring algorithm. This chapter considers methods for computing standard errors that do not require computation and inversion of an information matrix. Another method for calculating large-sample covariance matrices i...
This chapter considers alternative distributions to the t distribution for robust inference, and robust inference for multivariate data sets with missing values. It describes a general mixture model for robust estimation of a univariate sample that includes the t and contaminated normal disztributions as special cases. The case of multivariate data...
This article summarizes recommendations on the design and conduct of clinical trials of a National Research Council study on missing data in clinical trials. Key findings of the study are that (a) substantial missing data is a serious problem that undermines the scientific credibility of causal conclusions from clinical trials; (b) the assumption t...
Missing data in clinical trials can have a major effect on the validity of the inferences that can be drawn from the trial. This article reviews methods for preventing missing data and, failing that, dealing with data that are missing.
Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobse...
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the pr...
Summary We consider the linear regression of outcome Y on regressors W and Z with some values of W missing, when our main interest is the effect of Z on Y, controlling for W. Three common approaches to regression with missing covariates are (i) complete-case analysis (CC), which discards the incomplete cases, and (ii) ignorable likelihood methods,...
This pragmatic randomized trial evaluated the effectiveness of a tailored educational intervention on oral health behaviors and new untreated carious lesions in low-income African-American children in Detroit, Michigan.
Participating families were recruited in a longitudinal study of the determinants of dental caries in 1021 randomly selected child...
Rejoinder of "Calibrated Bayes, for Statistics in General, and Missing Data
in Particular" by R. Little [arXiv:1108.1917]
We consider the estimation of the regression of an outcome Y on a covariate X, where X is unobserved, but a variable W that measures X with error is observed. A calibration sample that measures pairs of values of X and W is also available; we consider calibration samples where Y is measured (internal calibration) and not measured (external calibrat...
It is argued that the Calibrated Bayesian (CB) approach to statistical
inference capitalizes on the strength of Bayesian and frequentist approaches to
statistical inference. In the CB approach, inferences under a particular model
are Bayesian, but frequentist methods are useful for model development and
model checking. In this article the CB approa...
Two common approaches to regression with missing covariates are complete-case analysis and ignorable likelihood methods. We review these approaches and propose a hybrid class, called subsample ignorable likelihood methods, which applies an ignorable likelihood method to the subsample of observations that are complete on one set of variables, but po...
In this paper, the authors describe a simple method for making longitudinal comparisons of alternative markers of a subsequent event. The method is based on the aggregate prediction gain from knowing whether or not a marker has occurred at any particular age. An attractive feature of the method is the exact decomposition of the measure into 2 compo...
We consider assessment of nonresponse bias for the mean of a survey variable Y subject to nonresponse. We assume that there are a set of covariates observed for nonrespondents and respondents. To reduce dimensionality and for simplicity we reduce the covariates to a proxy variable X that has the highest correlation with Y , estimated from a regress...
In clinical trials, a biomarker (S ) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z ) on T. A useful biomarker can be measured earlier than T and cost less than T. In this article, we consider the use of S as an auxiliary variabl...
In their valuable commentary, Drs. Ghosh and Castle (1) reinforce the points made in our article (2). Specifically, they emphasize the utility of combining measures of prevalence and predictive ability and show how the idea applies to another important epidemiologic measure, population attributable risk. They also describe applications of these ide...
Two major ideas in the analysis of missing data are (a) the EM algorithm
[Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for
maximum likelihood (ML) estimation, and (b) the formulation of models for the
joint distribution of the data ${Z}$ and missing data indicators ${M}$, and
associated "missing at random"; (MAR) conditi...
We propose a regression-based hot-deck multiple imputation method for gaps of missing data in longitudinal studies, where subjects experience a recurrent event process and a terminal event. Examples are repeated asthma episodes and death, or menstrual periods and menopause, as in our motivating application. Research interest concerns the onset time...
In longitudinal studies of developmental and disease processes, participants are followed prospectively with intermediate milestones identified as they occur. Frequently, studies enroll participants over a range of ages including ages at which some participants' milestones have already passed. Ages at milestones that occur prior to study entry are...
The Internet provides us with tools (user metrics or paradata) to evaluate how users interact with online interventions. Analysis of these paradata can lead to design improvements.
The objective was to explore the qualities of online participant engagement in an online intervention. We analyzed the paradata in a randomized controlled trial of alter...
This work is motivated by a quantitative Magnetic Resonance Imaging study of the differential tumor/healthy tissue change in contrast uptake induced by radiation. The goal is to determine the time in which there is maximal contrast uptake (a surrogate for permeability) in the tumor relative to healthy tissue. A notable feature of the data is its sp...
Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates over time. We consider problems created by high ages in cohort studies. Because of the risk of disclosure, ages of very old respondents can often not...
Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems cr...
The goal of the present study was to quantify the population-based background serum concentrations of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) by using data from the reference population of the 2005 University of Michigan Dioxin Exposure Study (UMDES) and the 2003-2004 National Health and Nutrition Examination Survey (NHANES).
Multiple imputation...
Raw data on the relationship between known and measured values of an analyte are collected and analyzed to determine the limit of quantification (LOQ) of an assay. In most LOQ problems, the researcher is given an observed value for the marker of interest if this value is greater than the LOQ, and a missing value (<LOQ) otherwise. From a statistical...
Repeated neuropsychological measurements, such as mini-mental state examination (MMSE) scores, are frequently used in Alzheimer’s disease (AD) research to study change in cognitive function of AD patients. A question of interest among dementia researchers is whether some AD patients exhibit transient “plateaus” of cognitive function in the course o...
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compli...
Hot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a “similar” unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and ob...
This work is motivated by a quantitative Magnetic Resonance Imaging study of the relative change in tumor vascular permeability during the course of radiation therapy. The differences in tumor and healthy brain tissue physiology and pathology constitute a notable feature of the image data-spatial heterogeneity with respect to its contrast uptake pr...
Asthma is a serious problem for low-income preteens living in disadvantaged communities. Among the chronic diseases of childhood and adolescence, asthma has the highest prevalence and related health care use. School-based asthma interventions have proven successful for older and younger students, but results have not been demonstrated for those in...
The objective of this study was to evaluate the existence of cognitive plateaus in some individuals during the course of Alzheimer's disease (AD).
Data came from the historical patient group collected via the Consortium to Establish a Registry for Alzheimer's Disease (CERAD, Duke University, 1988-1996). Data reduction was performed by using princip...
A common strategy for handling item nonresponse in survey sampling is hot deck imputation, where each missing value is replaced with an observed response from a "similar" unit. We discuss here the use of sampling weights in the hot deck. The naive approach is to ignore sample weights in creation of adjustment cells, which effectively imputes the un...
Little and An (2004, Statistica Sinica 14, 949-968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of...
Parametric model-based regression imputation is commonly applied to missing-data problems, but is sensitive to misspecification of the imputation model. Little and An (2004) proposed a semiparametric approach called penalized spline propensity prediction (PSPP), where the variable with missing values is modeled by a penalized spline (P-Spline) of t...
Selection models and pattern-mixture models are often used to deal with nonignorable dropout in longitudinal studies. These two classes of models are based on different factorizations of the joint distribution of the outcome process and the dropout process. We consider a new class of models, called mixed-effect hybrid models (MEHMs), where the join...
Health behavior intervention studies have focused primarily on comparing new programs and existing programs via randomized controlled trials. However, numbers of possible components (factors) are increasing dramatically as a result of developments in science and technology (e.g., Web-based surveys). These changes dictate the need for alternative me...
Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our...
Consider a meta-analysis of studies with varying proportions of patient-level missing data, and assume that each primary study has made certain missing data adjustments so that the reported estimates of treatment effect size and variance are valid. These estimates of treatment effects can be combined across studies by standard meta-analytic methods...
Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which, researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our...
Although the randomized, controlled trial (RCT) is considered the gold standard in research for determining the efficacy of health education interventions, such trials may be vulnerable to "preference effects"; that is, differential outcomes depending on whether an individual is randomized to his or her preferred treatment. In this study, we review...
We consider the analysis of clinical trials that involve randomization to an active treatment (T = 1) or a control treatment (T = 0), when the active treatment is subject to all-or-nothing compliance. We compare three approaches to estimating treatment efficacy in this situation: as-treated analysis, per-protocol analysis, and instrumental variable...
Initial trials of web-based smoking-cessation programs have generally been promising. The active components of these programs, however, are not well understood. This study aimed to (1) identify active psychosocial and communication components of a web-based smoking-cessation intervention and (2) examine the impact of increasing the tailoring depth...
Patient preference may influence intervention effects, but has not been extensively studied. Randomized controlled design (N=1075) assessed outcomes when women (60 years+) were given a choice of two formats of a program to enhance heart disease management.
Randomization to "no choice" or "choice" study arms. Further randomization of "no choice" to:...
Web-based programs for health promotion, disease prevention, and disease management often experience high rates of attrition. There are 3 questions which are particularly relevant to this issue. First, does engagement with program content predict long-term outcomes? Second, which users are most likely to drop out or disengage from the program? Thir...
Criteria for staging the menopausal transition are not established. This article evaluates five bleeding criteria for defining early transition and provides empirically based guidance regarding optimal criteria.
Prospective menstrual calendar data from four population-based cohorts: TREMIN, Melbourne Women's Midlife Health Project (MWMHP), Seattle...
We consider the analysis of longitudinal data sets that include times of recurrent events, where interest lies in variables that are functions of the number of events and the time intervals between events for each individual, and where some cases have gaps when the information was not recorded. Discarding cases with gaps results in a loss of the re...
This article concerns item nonresponse adjustment for two-stage cluster samples. Specifically, we focus on two types of nonignorable nonresponse: nonresponse depending on covariates and underlying cluster characteristics, and depending on covariates and the missing outcome. In these circumstances, standard weighting and imputation adjustments are l...
In a previous study, we validated a polysomnographic assessment for REM sleep behavior disorder (RBD). The method proved to be reliable but required slow, labor-intensive visual scoring of surface electromyogram (EMG) activity. We therefore developed a computerized metric to assess EMG variance and compared the results to those previously published...
Comment: Struggles with Survey Weighting and Regression Modeling [arXiv:0710.5005] Comment: Published in at http://dx.doi.org/10.1214/088342307000000186 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Top coding of extreme values of variables like income is a common method of statistical disclosure control, but it creates problems for the data analyst. The paper proposes two alternative methods to top coding for statistical disclosure control that are based on multiple imputation. We show in simulation studies that the multiple-imputation method...
We propose new model-based methods for unit non-response in two-stage survey samples. A commonly used design-based adjustment weights respondents by the inverse of the estimated response rate in each cluster (method WT). This approach is consistent if the response probabilities are constant within clusters but is potentially inefficient when the es...
Scintigraphic imaging with (123)I-metaiodobenzylguanidine ((123)I-MIBG) has demonstrated extensive losses of cardiac sympathetic neurons in idiopathic Parkinson's disease (IPD). In contrast, normal cardiac innervation has been observed in (123)I-MIBG studies of multiple-system atrophy (MSA) and progressive supranuclear palsy (PSP). Consequently, it...
The current criterion for onset of late menopausal transition is amenorrhea of 90 d or more. The Stages of Reproductive Aging Workshop proposed alternative criteria based on a shorter period of amenorrhea. Empirical data comparing proposed criteria are not available.
This paper evaluates the several bleeding criteria that served as the basis of the...
The Stages of Reproductive Aging Workshop proposed bleeding and hormonal criteria for the menopausal transition, but operational definitions of hormone parameters were not specified.
This paper investigates the longitudinal relationship of annual serum FSH levels with four proposed bleeding criteria for the late menopausal transition in two cohort...
The lack of an agreed inferential basis for statistics makes life "interesting" for academic statisticians, but at the price of negative implications for the status of statistics in industry, science, and government. The practice of our discipline will mature only when we can come to a basic agreement about how to apply statistics to real problems....
OBJECTIVE AND CONTEXT: Our objective was to examine predictability of reproductive hormone concentrations for bone mineral density (BMD) loss during the menopausal transition.
We conducted a longitudinal (five annual examinations), multiple-site (n = 5) cohort study, the Study of Women's Health Across the Nation (SWAN).
Participants included, at ba...
Recent studies suggest that the wide variability in type, detail, and reliability of online information motivate expert searchers to develop procedural search knowledge . In contrast to prior research that has focused on finding relevant sources, procedural search knowledge focuses on how to order multiple relevant sources with the goal of retrievi...
It has been speculated that gender differences in cardiovascular disease (CVD) mortality can be attributed to the effects of estrogens on inflammation and hemostatic marker profiles. Therefore, we evaluated endogenous hormone concentrations, menopause transition stages, and adoption of exogenous hormone use in relation to hemostatic and inflammatio...
The goal of this study was to relate annually measured endogenous androgens to hemostatic and inflammation markers in women longitudinally.
A total of 3302 participants from the Study of Women's Health Across the Nation, aged 42-52 yr at baseline and self-identified as African-American (28%), Caucasian (47%), Chinese (8%), Hispanic (8%), or Japanes...
Missing data are a common problem in the social and behavioral sciences. Here we present an overview of the problem and possible solutions. We begin by distinguishing between the pattern of missing data and the mechanism that creates the missing data. We then consider common, but limited, approaches: complete-cases, available cases, weighting analy...
Rapid eye movement (REM) sleep behavior disorder (RBD) was described more than 2 decades ago, but only 1 report on 5 patients and 5 normal subjects has tested the effectiveness of a method by which relevant polysomnographic findings can be quantified. We sought to validate this method in a larger sample of patients and control subjects.
Cross-secti...
IntroductionFull synthesisSMIKe and MIKeAnalysis of synthetic samplesAn applicationConclusions
Accurate, early differentiation of dementias will become increasingly important as new therapies are introduced. Differential diagnosis by standard clinical criteria has limited accuracy. PET offers the potential to increase diagnostic accuracy. (18)F-FDG studies detect metabolic abnormalities in demented patients, but with limited specificity. PET...
We compared the relative utility of neuropsychological testing and positron emission tomography (PET) with [18F]fluorodeoxyglucose ([18F]FDG) in differentiating Alzheimer's disease (AD) from dementia with Lewy bodies (DLB). We studied 25 patients with AD, 20 with DLB, and 19 normal elderly controls. There was no difference between patient groups fo...
Demographic analysis of data on births, deaths, and migration, together with coverage measurement surveys that use capture-recapture methods, have established that U.S. Census counts are flawed for certain subpopulations. Previous work using 1990 Census data in African—Americans age 30—49 proposed a hierarchical Bayesian model that assembled Census...
We wanted to identify what factors promote career development in patient-oriented clinical research (POCR).
We used a survey questionnaire covering areas relevant to the training of subspecialty fellows and the career development of POCR faculty.
Pursuit of an academic career after fellowship correlated with completion of a clinical project, availa...
Over 3,000 subjects were recruited in 3 U.S. regions for a randomized experiment of an online weight management intervention. Participants were sent invitations to web survey reassessments after 3, 6, and 12 months. High and increasing nonresponse to the three follow- up surveys created the potential for nonresponse bias in key program outcomes. A...
Nonresponse weighting is a common method for handling unit nonresponse in surveys and is aimed at reducing nonresponse bias. Because the method can be accompanied by an increase in variance, the efficacy of weighting adjustments is often seen as a bias-variance trade-off. This view is an oversimplification, because weighting can reduce variance as...
Noncompliance is a common problem in experiments involving randomized assignment of treatments, and standard analyses based on intention-to-treat or treatment received have limitations. An attractive alternative is to estimate the Complier-Average Causal Effect (CACE), which is the average treatment effect for the subpopulation of subjects who woul...
We used positron emission tomography (PET) with (+)-[(11)C]dihydrotetrabenazine ([+]-[(11)C]DTBZ) to examine striatal monoaminergic presynaptic terminal density in 20 patients with dementia with Lewy bodies (DLB), 25 with Alzheimer's disease (AD), and 19 normal elderly controls. Six DLB patients developed parkinsonism at least 1 year before dementi...
Serum reproductive hormone concentrations were measured longitudinally in a community-based, multiethnic population of midlife women to assess whether ethnic differences exist in the patterns of change in estradiol (E2) and FSH and, if so, whether these differences are explained by host characteristics. We studied 3257 participants from seven clini...
Finite population sampling is perhaps the only area of statistics in which the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design-based and model-based inference. The basic features of the two approaches are illustrated usi...
Abstract Samplers often distrust model-based approaches to survey inference due to con- cerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm,can work very success- fully in survey settings, provided models are chosen that take into account the sample design and avoid str...
To explore the neurochemical basis of REM sleep behavior disorder (RBD) in multiple-system atrophy (MSA).
In 13 patients with probable MSA, nocturnal, laboratory-based polysomnography was used to rate the severity of REM atonia loss by the percentage of REM sleep with tonically increased electromyographic (EMG) activity and the percentage of REM sl...
To explore the neurochemical basis of obstructive sleep apnea (OSA) in multiple-system atrophy (MSA).
In 13 patients with probable MSA, nocturnal, laboratory-based polysomnography was used to rate the severity of OSA using the apnea-hypopnea index during sleep. SPECT with (-)-5-[123I]iodobenzovesamicol ([123I]IBVM) was utilized to measure the densi...
Introduction and Modeling FrameworkAdjustment-cell Models for Unit NonresponseItem NonresponseNonignorable Missing DataConclusion
Acknowledgements
IntroductionModeling the Selection Mechanism
A basic estimation strategy in sample surveys is to weight units inversely proportional to the probability of selection and response. Response weights in this method are usually estimated by the inverse of the sample-weighted response rate in an adjustment cell, that is, the ratio of the sum of the sampling weights of respondents in a cell to the s...
Current search tools on the Web, such as general-purpose search engines (e.g. Google) and domain-specific portals (e.g. MEDLINEplus), do not provide search procedures that guide users to form appropriately ordered sub-goals. The lack of such procedural knowledge often leads users searching in unfamiliar domains to retrieve incomplete information. I...
Current search tools on the Web, such as general-purpose search engines (e.g. Google) and domain-specific portals (e.g. MEDLINEplus), do not provide search procedures that guide users to form appropriately ordered sub-goals. The lack of such procedural knowledge often leads users searching in unfamiliar domains to retrieve incomplete information. I...