Biostatistics (BIOSTATISTICS)
Description
Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference epidemiologic statistics clinical trials survival analysis and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods which in turn have improved our ability to draw valid inferences from data. The continued growth of biostatistics as a field has motivated Oxford University Press in association with the Biometrika Trust to create a new journal entitled Biostatistics to be published quarterly beginning in April 2000.
- Impact factor2.14Show impact factor historyImpact factorYear
- WebsiteBiostatistics website
-
Other titlesBiostatistics (Oxford, England: Online)
-
ISSN1465-4644
-
OCLC45169916
-
Material typeDocument, Periodical, Internet resource
-
Document typeInternet Resource, Computer File, Journal / Magazine / Newspaper
Publisher details
-
Pre-print
- Author can archive a pre-print version
-
Post-print
- Author cannot archive a post-print version
-
Restrictions
- 12 month embargo on science, technology, medicine articles
- 24 month embargo on arts and humanities articles
- Some titles may have different embargoes
-
Conditions
- Pre-print can only be posted prior to acceptance
- Pre-print must be accompanied by set statement (see link)
- Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
- Pre-print on personal website, employer website, free public server or pre-prints in subject area
- Post-print on Institutional or Central repositories
- Publisher version cannot be used except for Nucleic Acids Research articles
- Published source must be acknowledged
- Must link to publisher version
- Set phrase to accompany archived copy (see policy)
- Articles in some journals can be made Open Access on payment of additional charge
- Eligible UK authors may deposit in OpenDepot
- Publisher will deposit on behalf of NIH funded authors to PubMed Central, Nucleic Acids Research authors must pay their fee first
- Some titles may use different policies
-
Classification yellow
Publications in this journal
-
Article: Robust combination of multiple diagnostic tests for classifying censored event times.
[show abstract] [hide abstract]
ABSTRACT: Recent advancement in technology promises to yield a multitude of tests for disease diagnosis and prognosis. When there are multiple sources of information available, it is often of interest to construct a composite score that can provide better classification accuracy than any individual measurement. In this paper, we consider robust procedures for optimally combining tests when test results are measured prior to disease onset and disease status evolves over time. To account for censoring of disease onset time, the most commonly used approach to combining tests to detect subsequent disease status is to fit a proportional hazards model (Cox, 1972) and use the estimated risk score. However, simulation studies suggested that such a risk score may have poor accuracy when the proportional hazards assumption fails. We propose the use of a nonparametric transformation model (Han, 1987) as a working model to derive an optimal composite score with theoretical justification. We demonstrate that the proposed score is the optimal score when the model holds and is optimal "on average" among linear scores even if the model fails. Time-dependent sensitivity, specificity, and receiver operating characteristic curve functions are used to quantify the accuracy of the resulting composite score. We provide consistent and asymptotically Gaussian estimators of these accuracy measures. A simple model-free resampling procedure is proposed to obtain all consistent variance estimators. We illustrate the new proposals with simulation studies and an analysis of a breast cancer gene expression data set.Biostatistics 05/2008; 9(2):216-33. -
Article: Stochastic segmentation models for array-based comparative genomic hybridization data analysis.
[show abstract] [hide abstract]
ABSTRACT: Array-based comparative genomic hybridization (array-CGH) is a high throughput, high resolution technique for studying the genetics of cancer. Analysis of array-CGH data typically involves estimation of the underlying chromosome copy numbers from the log fluorescence ratios and segmenting the chromosome into regions with the same copy number at each location. We propose for the analysis of array-CGH data, a new stochastic segmentation model and an associated estimation procedure that has attractive statistical and computational properties. An important benefit of this Bayesian segmentation model is that it yields explicit formulas for posterior means, which can be used to estimate the signal directly without performing segmentation. Other quantities relating to the posterior distribution that are useful for providing confidence assessments of any given segmentation can also be estimated by using our method. We propose an approximation method whose computation time is linear in sequence length which makes our method practically applicable to the new higher density arrays. Simulation studies and applications to real array-CGH data illustrate the advantages of the proposed approach.Biostatistics 05/2008; 9(2):290-307. -
Article: Bayesian modeling of embryonic growth using latent variables.
[show abstract] [hide abstract]
ABSTRACT: In a growth model, individuals move progressively through a series of states in which each state is indicative of developmental status. Interest lies in estimating the rate of progression through each state while incorporating covariates that might affect the transition rates. We develop a Bayesian discrete-time multistate growth model for inference from cross-sectional data with unknown initiation times. For each subject, data are collected at only one time point at which we observe the state as well as covariates that measure developmental progress. We link the developmental progress variables to an underlying latent growth variable that can also affect the state transition rates. A subject with slow latent growth will then have relatively small developmental progress covariates and move through state transitions slowly. We then examine the association between latent growth and the probability of future events in a novel study of embryonic development and pregnancy loss. Using a Markov chain Monte Carlo (MCMC) algorithm for posterior computation, we found evidence in favor of a previously hypothesized but unproven association between slow growth early in pregnancy and increased risk of future spontaneous abortion.Biostatistics 05/2008; 9(2):373-89. -
Article: Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies.
[show abstract] [hide abstract]
ABSTRACT: Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected," namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size and decreases with the number of nondisease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T-selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.Biostatistics 05/2008; 9(2):201-15. -
Article: Small-sample estimation of negative binomial dispersion, with applications to SAGE data.
[show abstract] [hide abstract]
ABSTRACT: We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.Biostatistics 05/2008; 9(2):321-32. -
Article: Efficient resampling methods for nonsmooth estimating functions.
[show abstract] [hide abstract]
ABSTRACT: We propose a simple and general resampling strategy to estimate variances for parameter estimators derived from nonsmooth estimating functions. This approach applies to a wide variety of semiparametric and nonparametric problems in biostatistics. It does not require solving estimating equations and is thus much faster than the existing resampling procedures. Its usefulness is illustrated with heteroscedastic quantile regression and censored data rank regression. Numerical results based on simulated and real data are provided.Biostatistics 05/2008; 9(2):355-63. -
Article: The 2-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy.
[show abstract] [hide abstract]
ABSTRACT: The efficacy of an HIV vaccine to prevent infection is likely to depend on the genetic variation of the exposing virus. This paper addresses the problem of using data on the HIV sequences that infect vaccine efficacy trial participants to (1) test for vaccine efficacy more powerfully than procedures that ignore the sequence data and (2) evaluate the dependence of vaccine efficacy on the divergence of infecting HIV strains from the HIV strain that is contained in the vaccine. Because hundreds of amino acid sites in each HIV genome are sequenced, it is natural to treat the genetic divergence as a continuous mark variable that accompanies each failure (infection) time. Problems (1) and (2) can then be approached by testing whether the ratio of the mark-specific hazard functions for the vaccine and placebo groups is unity or independent of the mark. We develop nonparametric and semiparametric tests for these null hypotheses and nonparametric techniques for estimating the mark-specific relative risks. The asymptotic properties of the procedures are established. In addition, the methods are studied in simulations and are applied to HIV genetic sequence data collected in the first HIV vaccine efficacy trial.Biostatistics 05/2008; 9(2):263-76. -
Article: Principal stratification with predictors of compliance for randomized trials with 2 active treatments.
[show abstract] [hide abstract]
ABSTRACT: In behavioral medicine trials, such as smoking cessation trials, 2 or more active treatments are often compared. Noncompliance by some subjects with their assigned treatment poses a challenge to the data analyst. The principal stratification framework permits inference about causal effects among subpopulations characterized by potential compliance. However, in the absence of prior information, there are 2 significant limitations: (1) the causal effects cannot be point identified for some strata and (2) individuals in the subpopulations (strata) cannot be identified. We propose to use additional information-compliance-predictive covariates-to help identify the causal effects and to help describe characteristics of the subpopulations. The probability of membership in each principal stratum is modeled as a function of these covariates. The model is constructed using marginal compliance models (which are identified) and a sensitivity parameter that captures the association between the 2 marginal distributions. We illustrate our methods in both a simulation study and an analysis of data from a smoking cessation trial.Biostatistics 05/2008; 9(2):277-89. -
Article: Joint inference for nonlinear mixed-effects models and time to event at the presence of missing data.
[show abstract] [hide abstract]
ABSTRACT: In many longitudinal studies, the individual characteristics associated with the repeated measures may be possible covariates of the time to an event of interest, and thus, it is desirable to model the time-to-event process and the longitudinal process jointly. Statistical analyses may be further complicated in such studies with missing data such as informative dropouts. This article considers a nonlinear mixed-effects model for the longitudinal process and the Cox proportional hazards model for the time-to-event process. We provide a method for simultaneous likelihood inference on the 2 models and allow for nonignorable data missing. The approach is illustrated with a recent AIDS study by jointly modeling HIV viral dynamics and time to viral rebound.Biostatistics 05/2008; 9(2):308-20. -
Article: A penalized latent class model for ordinal data.
[show abstract] [hide abstract]
ABSTRACT: Latent class models provide a useful framework for clustering observations based on several features. Application of latent class methodology to correlated, high-dimensional ordinal data poses many challenges. Unconstrained analyses may not result in an estimable model. Thus, information contained in ordinal variables may not be fully exploited by researchers. We develop a penalized latent class model to facilitate analysis of high-dimensional ordinal data. By stabilizing maximum likelihood estimation, we are able to fit an ordinal latent class model that would otherwise not be identifiable without application of strict constraints. We illustrate our methodology in a study of schwannoma, a peripheral nerve sheath tumor, that included 3 clinical subtypes and 23 ordinal histological measures.Biostatistics 05/2008; 9(2):249-62. -
Article: Cross-study validation and combined analysis of gene expression microarray data.
[show abstract] [hide abstract]
ABSTRACT: Investigations of transcript levels on a genomic scale using hybridization-based arrays have led to formidable advances in our understanding of the biology of many human illnesses. At the same time, these investigations have generated controversy because of the probabilistic nature of the conclusions and the surfacing of noticeable discrepancies between the results of studies addressing the same biological question. In this article, we present simple and effective data analysis and visualization tools for gauging the degree to which the findings of one study are reproduced by others and for integrating multiple studies in a single analysis. We describe these approaches in the context of studies of breast cancer and illustrate that it is possible to identify a substantial biologically relevant subset of the human genome within which hybridization results are reliable. The subset generally varies with the platforms used, the tissues studied, and the populations being sampled. Despite important differences, it is also possible to develop simple expression measures that allow comparison across platforms, studies, laboratories and populations. Important biological signals are often preserved or enhanced. Cross-study validation and combination of microarray results requires careful, but not overly complex, statistical thinking and can become a routine component of genomic analysis.Biostatistics 05/2008; 9(2):333-54. -
Article: A modified sign test for comparing paired ROC curves.
[show abstract] [hide abstract]
ABSTRACT: We develop a permutation test for assessing a difference in the areas under the curve (AUCs) in a paired setting where both modalities are given to each diseased and nondiseased subject. We propose that permutations be made between subjects specifically by shuffling the diseased/nondiseased labels of the subjects within each modality. As these permutations are made within modality, the permutation test is valid even if both modalities are measured on different scales. We show that our permutation test is a sign test for the symmetry of an underlying discrete distribution whose size remains valid under the assumption of equal AUCs. We demonstrate the operating characteristics of our test via simulation and show that our test is equal in power to a permutation test recently proposed by Bandos and others (2005).Biostatistics 05/2008; 9(2):364-72. -
Article: Regression analysis of multivariate panel count data.
[show abstract] [hide abstract]
ABSTRACT: We consider panel count data which are frequently obtained in prospective studies involving recurrent events that are only detected and recorded at periodic assessment times. The data take the form of counts of the cumulative number of events detected at each inspection time, along with explanatory covariates. Examples arise in diverse areas such as epidemiological studies, medical follow-up studies, reliability studies, and tumorigenicity experiments. This article is concerned with regression analysis of multivariate panel count data which arise if more than one type of recurrent event is of interest and individuals are only observed intermittently. We present a class of marginal mean models which leave the dependence structures for related types of recurrent events completely unspecified. Estimating equations are developed for regression parameters, and the resulting estimates are shown to be consistent and asymptotically normal. Simulation studies show that the proposed estimation procedures work well for practical situations. The methodology is applied to a motivating study of patients with psoriatic arthritis in which the events of interest are the onset of joint damage according to 2 different criteria.Biostatistics 05/2008; 9(2):234-48. -
Article: Group additive regression models for genomic data analysis.
[show abstract] [hide abstract]
ABSTRACT: One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer-specific survival.Biostatistics 02/2008; 9(1):100-13. -
Article: An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown.
[show abstract] [hide abstract]
ABSTRACT: Multivariate meta-analysis models can be used to synthesize multiple, correlated endpoints such as overall and disease-free survival. A hierarchical framework for multivariate random-effects meta-analysis includes both within-study and between-study correlation. The within-study correlations are assumed known, but they are usually unavailable, which limits the multivariate approach in practice. In this paper, we consider synthesis of 2 correlated endpoints and propose an alternative model for bivariate random-effects meta-analysis (BRMA). This model maintains the individual weighting of each study in the analysis but includes only one overall correlation parameter, rho, which removes the need to know the within-study correlations. Further, the only data needed to fit the model are those required for a separate univariate random-effects meta-analysis (URMA) of each endpoint, currently the common approach in practice. This makes the alternative model immediately applicable to a wide variety of evidence synthesis situations, including studies of prognosis and surrogate outcomes. We examine the performance of the alternative model through analytic assessment, a realistic simulation study, and application to data sets from the literature. Our results show that, unless rho is very close to 1 or -1, the alternative model produces appropriate pooled estimates with little bias that (i) are very similar to those from a fully hierarchical BRMA model where the within-study correlations are known and (ii) have better statistical properties than those from separate URMAs, especially given missing data. The alternative model is also less prone to estimation at parameter space boundaries than the fully hierarchical model and thus may be preferred even when the within-study correlations are known. It also suitably estimates a function of the pooled estimates and their correlation; however, it only provides an approximate indication of the between-study variation. The alternative model greatly facilitates the utilization of correlation in meta-analysis and should allow an increased application of BRMA in practice.Biostatistics 02/2008; 9(1):172-86. -
Article: Identification of SNP interactions using logic regression.
[show abstract] [hide abstract]
ABSTRACT: Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to measure the importance of these interactions. There are many approaches based on classification methods such as CART and random forests that allow measuring the importance of single variables. But none of these methods enable the importance of combinations of variables to be quantified directly. In this paper, we show how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case-control study and propose 2 measures for quantifying the importance of these interactions for classification. These approaches are then applied on the one hand to simulated data sets and on the other hand to the SNP data of the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer.Biostatistics 02/2008; 9(1):187-98. -
Article: A hybrid model for reducing ecological bias.
[show abstract] [hide abstract]
ABSTRACT: A major drawback of epidemiological ecological studies, in which the association between area-level summaries of risk and exposure is used to make inference about individual risk, is the difficulty in characterizing within-area variability in exposure and confounder variables. To avoid ecological bias, samples of individual exposure/confounder data within each area are required. Unfortunately, these may be difficult or expensive to obtain, particularly if large samples are required. In this paper, we propose a new approach suitable for use with small samples. We combine a Bayesian nonparametric Dirichlet process prior with an estimating functions' approach and show that this model gives a compromise between 2 previously described methods. The method is investigated using simulated data, and a practical illustration is provided through an analysis of lung cancer mortality and residential radon exposure in counties of Minnesota. We conclude that we require good quality prior information about the exposure/confounder distributions and a large between- to within-area variability ratio for an ecological study to be feasible using only small samples of individual data.Biostatistics 02/2008; 9(1):1-17. -
Article: Inverse sampling of controls in a matched case control study.
[show abstract] [hide abstract]
ABSTRACT: A method of inverse sampling of controls in a matched case-control study is described in which, for each case, controls are sampled until a discordant set is achieved. For a binary exposure, inverse sampling is used to determine the number of controls for each case. When most individuals in a population have the same exposure, standard case-control sampling may result in many case-control sets being concordant with respect to exposure and thus uninformative in the conditional logistic analysis. The method using inverse control sampling is proposed as a solution to this problem in situations when it is practically feasible. In many circumstances, inverse control sampling is found to offer improved statistical efficiency relative to a comparable study with a fixed number of controls per case.Biostatistics 02/2008; 9(1):152-8. -
Article: Model-based clustering on the unit sphere with an illustration using gene expression profiles.
[show abstract] [hide abstract]
ABSTRACT: We consider model-based clustering of data that lie on a unit sphere. Such data arise in the analysis of microarray experiments when the gene expressions are standardized so that they have mean 0 and variance 1 across the arrays. We propose to model the clusters on the sphere with inverse stereographic projections of multivariate normal distributions. The corresponding model-based clustering algorithm is described. This algorithm is applied first to simulated data sets to assess the performance of several criteria for determining the number of clusters and to compare its performance with existing methods and second to a real reference data set of standardized gene expression profiles.Biostatistics 02/2008; 9(1):66-80. -
Article: Penalized logistic regression for detecting gene interactions.
[show abstract] [hide abstract]
ABSTRACT: We propose using a variant of logistic regression (LR) with (L)_(2)-regularization to fit gene-gene and gene-environment interaction models. Studies have shown that many common diseases are influenced by interaction of certain genes. LR models with quadratic penalization not only correctly characterizes the influential genes along with their interaction structures but also yields additional benefits in handling high-dimensional, discrete factors with a binary response. We illustrate the advantages of using an (L)_(2)-regularization scheme and compare its performance with that of "multifactor dimensionality reduction" and "FlexTree," 2 recent tools for identifying gene-gene interactions. Through simulated and real data sets, we demonstrate that our method outperforms other methods in the identification of the interaction structures as well as prediction accuracy. In addition, we validate the significance of the factors selected through bootstrap analyses.Biostatistics 02/2008; 9(1):30-50.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.
Keywords
Related Journals
Drug Metabolism and Pharmacokinetics
Nihon Yakubutsu Dōtai Gakkai
ISSN: 1880-0920, Impact factor: 2.32
The Journal of steroid biochemistry and molecular biology
Elsevier
ISSN: 1879-1220, Impact factor: 2.66
Acta psychologica
ScienceDirect (Service en ligne),...
ISSN: 1873-6297, Impact factor: 2.19
Computer methods and programs in biomedicine
Elsevier
ISSN: 1872-7565, Impact factor: 1.14
Topics in Cognitive Science
Blackwell Publishing
ISSN: 1756-8765, Impact factor: 2.88
Pharmacogenetics and Genomics
Lippincott, Williams & Wilkins
ISSN: 1744-6880, Impact factor: 3.48
European Journal of Forest Research
Springer Verlag
ISSN: 1612-4669, Impact factor: 1.98
Pharmaceutical Research
American Association of...
ISSN: 1573-904X, Impact factor: 4.09