Article

Sample size for biomarker studies: more subjects or more measurements per subject?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In molecular epidemiologic studies, optimizing the use of available biological specimens while minimizing the cost is always a challenge. This is particularly true in pilot studies, which often have limited funding and involve small numbers of biological samples too small for assessment of recently developed biomarkers. In this study we examined several statistical approaches for determining how many experimental subjects to use in a biomarker study and how many repeated measurements to make on each sample, given specific funding considerations and the correlated nature of the repeated measurements. A molecular epidemiology study of DNA repair and aging in basal cell carcinoma was used to illustrate the application of the statistical methods proposed. Our methods extend traditional designs on biomarker studies with repeated measurements to including funding constraints.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Despite observing a nominal association between maternal DNAm signatures of ΣAs exposure and offspring insulin resistance, all but one of the CpGs remained significant after adjustment for multiple comparisons. Another limitation was the small sample size, given our inclusion criteria, including the modest number of mothers with DNAm data and paired diabetes-free children enrolled in the subsequent SHFS (Lai et al., 2003). Future work should seek to validate these results in larger populations, including additional studies in American Indian communities. ...
Article
Exposure to low to moderate arsenic (As) levels has been associated with type 2 diabetes (T2D) and other chronic diseases in American Indian communities. Prenatal exposure to As may also increase the risk for T2D in adulthood, and maternal As has been associated with adult offspring metabolic health measurements. We hypothesized that T2D-related outcomes in adult offspring born to women exposed to low to moderate As can be evaluated utilizing a maternally-derived molecular biosignature of As exposure. Herein, we evaluated the association of maternal DNA methylation with incident T2D and insulin resistance (Homeostatic model assessment of insulin resistance [HOMA2-IR]) in adult offspring. For DNA methylation, we used 20 differentially methylated cytosine-guanine dinucleotides (CpG) previously associated with the sum of inorganic and methylated As species (ΣAs) in urine in the Strong Heart Study (SHS). Of these 20 CpGs, we found six CpGs nominally associated (p
... Sample size calculations for biomarker discovery studies are complex and subject to numerous unobservable factors. 33,34 Some authors suggest that a multifactorial biomarker discovery design with a likely heterogenous disease profile could have a sample size in the order of 200 patients. 35 Other guidelines suggest that the sample size varies based on the potential optimal predictive power of the biomarker set, with strong biomarker sets requiring 1 case per variable and weaker sets requiring as many as 9 cases per variable. ...
Article
Optimal timing and procedure selection that define staged treatment strategies can affect outcomes dramatically and remain an area of major debate in the treatment of multiply injured orthopaedic trauma patients. Decisions regarding timing and choice of orthopaedic procedure(s) are currently based on the physiologic condition of the patient, resource availability, and the expected magnitude of the intervention. Surgical decision-making algorithms rarely rely on precision-type data that account for demographics, magnitude of injury, and the physiologic/immunologic response to injury on a patient-specific basis. This study is a multicenter prospective investigation that will work toward developing a precision medicine approach to managing multiply injured patients by incorporating patient-specific indices that quantify (1) mechanical tissue damage volume; (2) cumulative hypoperfusion; (3) immunologic response; and (4) demographics. These indices will formulate a precision injury signature, unique to each patient, which will be explored for correspondence to outcomes and response to surgical interventions. The impact of the timing and magnitude of initial and staged surgical interventions on patient-specific physiologic and immunologic responses will be evaluated and described. The primary goal of the study will be the development of data-driven models that will inform clinical decision-making tools that can be used to predict outcomes and guide intervention decisions.
... These considerations should take into account both the financial constraints and the correlation between measurements. For the study design, one approach is to balance the variance and the financial constraints by minimizing the variance to maximize the power, because the power to detect a difference under the alternative is inversely related to the variance [111]. This calls for well-established, reliable phenotypic assays that can be performed consistently with a minimal co-efficient of variation. ...
Article
The rise of advanced technologies for characterizing human populations at the molecular level, from sequence to function, is shifting disease prevention paradigms toward personalized strategies. Because minimization of adverse outcomes is a key driver for treatment decisions for diseased populations, developing personalized therapy strategies represents an important dimension of both precision medicine and personalized prevention. In this commentary, we highlight recently developed enabling technologies in the field of DNA damage, DNA repair, and mutagenesis. We propose that omics approaches and functional assays can be integrated into population studies that fuse basic, translational and clinical research with commercial expertise in order to accelerate personalized prevention and treatment of cancer and other diseases linked to aberrant responses to DNA damage. This collaborative approach is generally applicable to efforts to develop data-driven, individualized prevention and treatment strategies for other diseases. We also recommend strategies for maximizing the use of biological samples for epidemiological studies, and for applying emerging technologies to clinical applications.
... Larger sample sizes generally provide for increased precision when estimating unknown factors, however, the collection of large quantities of experimental data can be expensive. In practice the sample size used in a study is often determined by consideration of the data collection cost, availability of samples, and the need for sufficient statistical power [61]. Commonly used packages for biomarker discovery include LIMMA from Bioconductor [95] and SAM from Stanford University [99]. ...
Conference Paper
In this paper, we summarized the challenges and promises of the study of immune biomarkers. We reviewed key concepts in biomarker discovery and discussed the framework for applying these concepts in the study of the immune system and its effects on the disease – cancer, infection, allergy, immunodeficiencies, and autoimmunity. The immune system plays a special role in biomarker discovery since it interacts with all other systems in the human body and immune biomarkers are relevant for large number of diseases.
... The cost incurred by an additional subject is usually different from that by an additional measurement. Several papers have provided the sample size estimate for repeated measurement studies incorporating costs for recruiting and measuring subjects (Bloch 1986; Lai et al. 2003; Lui and Cumberland 1992; Winkens et al. 2006 Winkens et al. , 2007). The determination of n and m is further complicated in the presence of missing data and different types of correlation structures among repeated measurements. ...
Article
Budget constraint is a challenge faced by investigators in planning almost every clinical trial. For a repeated measurement study, investigators need to decide whether to increase the number of participating subjects or to increase the number of repeated measurements per subject, with the ultimate goal of maximizing power for a given financial constraint. This financially constrained design problem is further complicated when taking into account things such as missing data and various correlation structures among the repeated measurements. We propose an approach that combines a GEE estimator of slope coefficients with the cost constraint. In the case where we have no missing data and the compound symmetric correlation structure, the optimal design is derived analytically. In the case where we have missing data or other correlation structures, the optimal design is identified through numerical search. We present an extensive simulation study to explore the impacts of cost ratio, missing pattern, dropout rate, and correlation structure. We also present an application example.
... 93 Fifth, some statistical issues, such as minimal sample size for a pilot study and the number of the assays per sample need to be resolved before embarking on a large study. 94 Notably, the genetic basis of DNA phenotypic assays need to be illustrated, preferably by a genome-wide scanning, 95 which may provide the rationale for large-scale screening in the general population in a high-throughput manner. ...
Article
DNA repair is a complicated biological process, consisting of several distinct pathways, that plays a fundamental role in the maintenance of genomic integrity. The very important field of DNA repair and cancer risk has developed rapidly in the past decades. In this review of selected published data from our laboratory, we describe mostly our work on the study of phenotypic markers of nucleotide excision repair (NER), as measured by the benzo(a)pyrene diol epoxide (BPDE)/ultraviolet (UV)-induced mutagen sensitivity assays, BPDE-induced adduct assay, host cell reactivation (HCR)-DNA repair capacity (DRC) assay, reverse transcription-polymerase chain reaction (RT-PCR) assay and reverse-phase protein lysate microarray (RPP) assay, by using peripheral blood lymphocytes in a series of molecular epidemiological studies. Results of our studies suggest that individuals with reduced DRC have an elevated cancer risk. This finding needs additional validation by other investigators, and we also discussed issues in conducting this kind of research in the future.
... However, it is uncertain whether the precision and cost of a study with two CRP measurements per subjects would be better than those of a study with one measurement and a larger sample size. 2 Stepwise regression is prone to biased selection of variables retained in the final model and to biased estimates of the effect of those variables. Bias occurs because statistical tests are used for variable selection with little consideration of subject matter and validity issues. ...
Conference Paper
In this paper, we summarized the challenges and promises of the study of immune biomarkers. We reviewed key concepts in biomarker discovery and discussed the framework for applying these concepts in the study of the immune system and its effects on the disease -- cancer, infection, allergy, immunodeficiencies, and autoimmunity. The immune system plays a special role in biomarker discovery since it interacts with all other systems in the human body and immune biomarkers are relevant for large number of diseases.
Article
Full-text available
The multisite trial, widely used in mental health research and education, enables experimenters to assess the average impact of a treatment across sites, the variance of treatment impact across sites, and the moderating effect of site characteristics on treatment efficacy. Key design decisions include the sample size per site and the number of sites. To consider power implications, this article proposes a standardized hierarchical linear model and uses rules of thumb similar to those proposed by J. Cohen (1988) for small, medium, and large effect sizes and for small, medium, and large treatment-by-site variance. Optimal allocation of resources within and between sites as a function of variance components and costs at each level are also considered. The approach generalizes to quasiexperiments with a similar structure. These ideas are illustrated with newly developed software.