BookPDF Available

Clinical Trial Data Analysis using R

Authors:
  • Georgia Southern University, Jiann-PIng Hsu College of Public Health

Abstract

Too often in biostatistical research and clinical trials, a knowledge gap exists between developed statistical methods and the applications of these methods. Filling this gap, Clinical Trial Data Analysis Using R provides a thorough presentation of biostatistical analyses of clinical trial data and shows step by step how to implement the statistical methods using R. The book’s practical, detailed approach draws on the authors’ 30 years of real-world experience in biostatistical research and clinical development. Each chapter presents examples of clinical trials based on the authors’ actual experiences in clinical drug development. Various biostatistical methods for analyzing the data are then identified. The authors develop analysis code step by step using appropriate R packages and functions. This approach enables readers to gain an understanding of the analysis methods and R implementation so that they can use R to analyze their own clinical trial data. With step-by-step illustrations of R implementations, this book shows how to easily use R to simulate and analyze data from a clinical trial. It describes numerous up-to-date statistical methods and offers sound guidance on the processes involved in clinical trials.
A preview of the PDF is not available
... Screening and baseline data are reported elsewhere 23 . The sample comprised mostly obese (BMI 32 (29)(30)(31)(32)(33)(34)(35) kg/m 2 ), non-smoking 10 (98%) well educated (85% post school qualifications) females (74%) of median age 45 (37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51) years. They also suffered from anxiety (26.8%) and depression (33.7%) and were treated for hypertension (25%). ...
... 23 [32]. Specifically, participants will be encouraged to be physically active almost every day. ...
... Outcomes were analysed using mixed models. The primary outcome variable weight was analysed using a published model building procedure 32 . Initially a simple model with main effects and a group by time interaction was considered. ...
... The primary outcome variable weight was analysed using a published model building procedure. 32 Initially a simple model with main effects and a group by time interaction was considered. Initial data exploration suggested quadratic and cubic terms may be needed and these were added in turn and tested with likelihood ratio tests to determine improvement in model fit. ...
... Screening and baseline data are reported elsewhere. 23 The sample comprised mostly obese (BMI 32 (29)(30)(31)(32)(33)(34)(35) kg/m 2 ), non-smoking (98%) well educated (85% postschool qualifications) women (74%) of median age 45 (37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51) years. They also suffered from anxiety (26.8%) and depression (33.7%) and were treated for hypertension (25%). ...
Article
Full-text available
Objective To determine the effectiveness of a novel interdisciplinary treatment compared with usual care on weight loss in overweight and obese adult volunteers. Design Single blinded controlled trial. Participants randomly assigned to usual care (C, general guideline-based diet and exercise advice), intervention (I, interdisciplinary protocol) or intervention + a healthy food supplement (30 g walnuts/day) (IW). Setting Community based study, Illawarra region, south of Sydney, Australia. Participants Generally well volunteer adult residents, 25-54 years, body mass index (BMI) 25-40kg/m² were eligible. At baseline 439 were assessed, 377 were randomised, 298 completed the 3-month intensive phase and 178 completed the 12-month follow-up. Interventions Treatment was provided at clinic visits intensively (0 months, 1 month, 2 months, 3 months) then quarterly to 12 months. Support phone calls were quarterly. All participants underwent blinded assessments for diet, exercise and psychological status. Primary and secondary measures The primary outcome was difference in weight loss between baseline and 12 months (clinically relevant target 5% loss). Secondary outcomes were changes in blood pressure, fasting blood glucose and lipids, and changes in diet, exercise and psychological parameters. Results At 12 months, differences in weight loss were identified (p<0.001). The I group lost more than controls at 3 months (91.11 (92.23,90.00), p<0.05) and the IW more than controls at 3 months (91.25 (92.35,90.15), p<0.05) and 6 months (92.20 (93.90,90.49), p<0.01). The proportion achieving 5% weight loss was significantly different at 3 months, 6 months and 9 months (p=0.04, p=0.03, p=0.03), due to fewer controls on target at 3 months, 6 months and 9 months and more IW participants at 6 months. Reductions in secondary outcomes (systolic blood pressure, blood glucose/lipid parameters and lifestyle measures) followed the pattern of weight loss. Conclusions An interdisciplinary intervention produced greater and more clinically significant and sustained weight loss compared with usual care. The intensive phase was sufficient to reach clinically relevant targets, but long-term management plans may be required. Trial registration number ANZCTRN 12614000581662; Post-results.
... Following selection of DEGs, DEMs and differentially expressed lncRNAs (DELs) between diabetic PaC patients and non-diabetic PaC patients, the survival information in each sample was extracted to perform cox analysis by the survival package. 21 Kaplan-Meier (KM) curve was used to visually display the survival result. ...
Article
Full-text available
Pancreatic cancer (PaC) is highly associated with diabetes mellitus (DM). However, the mechanisms are insufficient. The study aimed to uncover the underlying regulatory mechanism on diabetic PaC and find novel biomarkers for the disease prognosis. Two RNA-sequencing (RNA-seq) datasets, GSE74629 and GSE15932, as well as relevant data in TCGA were utilized. After pretreatment, differentially expressed genes (DEGs) or miRNAs (DEMs) or lncRNAs (DELs) between diabetic PaC and non-diabetic PaC patients were identified, and further examined for their correlations with clinical information. Prognostic RNAs were selected using KM curve. Optimal gene set for classification of different samples were recognized by support vector machine. Protein-protein interaction (PPI) network was constructed for DEGs based on protein databases. Interactions among three kinds of RNAs were revealed in the ‘lncRNA-miRNA-mRNA’ competing endogenous RNA (ceRNA) network. A group of 32 feature genes were identified that could classify diabetic PaC from non-diabetic PaC, such as CCDC33, CTLA4 and MAP4K1. This classifier had a high accuracy on the prediction. Seven lncRNAs were tied up with prognosis of diabetic PaC, especially UCA1. In addition, crucial DEMs were selected, such as hsa-miR-214 (predicted targets: MAP4K1 and CCDC33) and hsa-miR-429 (predicted targets: CTLA4). Notably, interactions of ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ were highlighted in the ceRNA network. Several biomarkers were identified for diagnosis of diabetic PaC, such as HOTAIR, CECR7, UCA1, hsa-miR-214, hsa-miR-429, CCDC33 and CTLA4. ‘HOTAIR-hsa-miR-214-CCDC33’ and ‘CECR7-hsa-miR-429-CTLA4’ regulations might be two important mechanisms for the disease progression.
... Estimated statistical power is an established metric for evaluating proposed medical studies, and statistical power analysis is a major topic of interest in the field of statistics research. [12,13] Different study designs require different statistical power analysis. In the study shown in this article, the statistical power is connected to the sample size through other values such as desired significance level α, data variance, frequency of data acquaintances, and the effect size (essential difference) that is scientifically or operationally important to detect. ...
Conference Paper
Full-text available
The reliability of photovoltaic (PV) technology systems is a major concern to the PV industry, and the focus of much research activity. To ensure that these efforts do not result in wasted resources, it is critical that attention be paid to the statistical significance of generated data. With pre-knowledge of certain aspects of a proposed study, data science study protocols may be specified that aim to determine required sample size to adequately address the research objective. We describe the process of designing such a study protocol, based upon expected uncertainties calculated from a pilot study. This represents a methodological approach to defining scientific studies that balances cost against potential information yield. Index Terms — photovoltaic systems, regression analysis, enterprise resource planning, knowledge management
... Growth chamber survival data were analyzed using the Kaplan– Meier estimator method (Chen and Peace 2011) for survival analysis to compare the longevity of insects reared on different hosts until pupation followed by the Wilcoxon rank sum test (Dalgaard 2008) to adjust for multiple comparisons, and means were separated using Tukey–Kramer's mean separation test using SAS 9.3 (SAS Institute 2001). Overall length of survival was measured in degree-days accumulated until death. ...
Article
Full-text available
The European corn borer, Ostrinia nubilalis (Hübner), was introduced in North America in the early 1900s and became a major pest of corn. After its introduction, it was found on > 200 other plant hosts, but corn remained its primary host. Early life history studies indicated that European corn borer had the potential of a wide host range. For nearly 80 yr before the introduction of Bt corn, the European corn borer was a major pest of corn in North America. This study investigated the growth and survivorship of the Z-pheromone race European corn borer on a range of hosts that vary in defensive chemistries and historic degree of infestation to better understand the current host plant range of Z-pheromone race of O. nubilalis. The plants tested include sweet corn, cry1F Bt field corn, non-Bt corn, cucumber, tomato, and green bean. Experiments were conducted in the growth chamber, greenhouse, and field to determine survival under different conditions. In most cases, results supported the expected outcome, with significantly higher survival on non-Bt corn hosts than the other hosts tested. Neonate larvae fed exclusively on leaves of green bean exhibited intermediate survival, whereas third-instars fed on only leaves of cucumber survived intermediately. Larvae on Bt corn and tomato had comparable low survival rates, overall suggesting that the defensive features of tomato are about as effective as Cry1F Bt corn. Non-Bt corn was found to be the most suitable host plant, overall for European corn borer among those tested.
Article
Full-text available
Background DTI is sensitive to white matter (WM) microstructural damage and has been suggested as a surrogate marker for phase 2 clinical trials in cerebral small vessel disease (SVD). The study’s objective is to establish the best way to analyse the diffusion-weighted imaging data in SVD for this purpose. The ideal method would be sensitive to change and predict dementia conversion, but also straightforward to implement and ideally automated. As part of the OPTIMAL collaboration, we evaluated five different DTI analysis strategies across six different cohorts with differing SVD severity. Methods Those 5 strategies were: (1) conventional mean diffusivity WM histogram measure (MD median), (2) a principal component-derived measure based on conventional WM histogram measures (PC1), (3) peak width skeletonized mean diffusivity (PSMD), (4) diffusion tensor image segmentation θ (DSEG θ) and (5) a WM measure of global network efficiency (Geff). The association between each measure and cognitive function was tested using a linear regression model adjusted by clinical markers. Changes in the imaging measures over time were determined. In three cohort studies, repeated imaging data together with data on incident dementia were available. The association between the baseline measure, change measure and incident dementia conversion was examined using Cox proportional-hazard regression or logistic regression models. Sample size estimates for a hypothetical clinical trial were furthermore computed for each DTI analysis strategy. Results There was a consistent cross-sectional association between the imaging measures and impaired cognitive function across all cohorts. All baseline measures predicted dementia conversion in severe SVD. In mild SVD, PC1, PSMD and Geff predicted dementia conversion. In MCI, all markers except Geff predicted dementia conversion. Baseline DTI was significantly higher in patients converting to vascular dementia than to Alzheimer’s disease. Significant change in all measures was associated with dementia conversion in severe but not in mild SVD. The automatic and semi-automatic measures PSMD and DSEG θ required the lowest minimum sample sizes for a hypothetical clinical trial in single-centre sporadic SVD cohorts. Conclusion DTI parameters obtained from all analysis methods predicted dementia, and there was no clear winner amongst the different analysis strategies. The fully automated analysis provided by PSMD offers advantages particularly for large datasets.
Article
Full-text available
Background: Case-based learning (CBL) is a distinct classroom-based teaching format. We compare learning and retention using a CBL teaching strategy vs simulation-based learning (SBL) on the topic of malignant hyperthermia. Methods: In this study, 54 anesthesia residents were assigned to either a CBL or SBL experience. All residents had prior simulation experience, and both groups received a pretest and a lecture on rare diseases with emphasis on malignant hyperthermia followed by a CBL or SBL session. Test questions were validated for face and construct validity. Postsession testing occurred on the same day and at 4 months. Results: Twenty-seven residents completed all components of the study. The CBL group had 10 residents, and the SBL group had 17 residents. Most residents (80%) had previous exposure to malignant hyperthermia education. ANOVA for repeated measures demonstrated superior learning and long-term retention in the CBL group. In addition, our cost analysis reveals the cost of SBL to be approximately 17 times more expensive per learner than CBL. Conclusions: We found that CBL promoted learning and long-term retention for the topic of malignant hyperthermia and it is a more affordable teaching method. Affordability and effectiveness evidence may guide some programs toward CBL, particularly if access to simulation is limited. The number of participants and full validation of the examination questions are limitations of the study. Further studies are required to validate the findings of this study.
Article
Full-text available
The cost-effectiveness of interventions (e.g. new medical therapies or health care technologies) is often evaluated in randomized clinical trials, where individuals are nested within clusters, for instance patients within general practices. In such two-level cost-effectiveness trials, one can randomly assign treatments to individuals within clusters (multicentre trial) or to entire clusters (cluster randomized trial). Such trials need careful planning to evaluate the cost-effectiveness of interventions within the available research resources. The optimal number of clusters and the optimal number of subjects per cluster for both types of cost-effectiveness trials can be determined by using optimal design theory. However, the construction of the optimal design requires information on model parameters, which may be unknown at the planning stage of a trial. To overcome this problem, a maximin strategy is employed. We have developed a computer program SamP2CeT in R to perform these sample size calculations. SamP2CeT provides a graphical user interface which enables the researchers to optimize the numbers of clusters and subjects per cluster in their cost-effectiveness trial as a function of study costs and outcome variances. In case of insufficient knowledge on model parameters, SamP2CeT also provides safe numbers of clusters and subjects per cluster, based on a maximin strategy. SamP2CeT can be used to calculate the smallest budget needed for a user-specified power level, the largest power attainable with a user-specified budget, and also has the facility to calculate the power for a user-specified design. Recent methodological developments on sample size and power calculation for two-level cost-effectiveness trials have been implemented in SamP2CeT. This program is user-friendly, as illustrated for two published cost-effectiveness trials.
Chapter
In audits, as in all experiments, researchers are confronted with choices about whether to collect and analyze repeated measures on the unit of analysis. In typical social science practice, this decision often involves consideration of whether to send single or multiple auditors to test for discrimination at a site that represents the unit of analysis, such as employers, landlords, or schools. In this chapter, we provide tools for researchers considering the statistical and substantive implications of this decision. For the former, we show how sample size and statistical efficiency questions hinge in large part on the expected concordance of outcomes when testers are sent to the same unit or site. For the latter, we encourage researchers to think carefully about what is gained and lost via matched and non-matched designs, particularly regarding the finite nature of certain populations, resource constraints, and the likelihood of detection in the field. For both approaches, we make recommendations for the appropriate statistical analysis in light of the given design and direct readers to software and code that may be helpful in informing design choices.
Article
Full-text available
To quantify the HIV epidemic, the classical population-based prevalence and incidence rates (P rates) are the two most commonly used measures used for policy interventions. However, these P rates ignore the heterogeneity of the size of geographic region where the population resides. It is intuitive that with the same P rates, the likelihood for HIV can be much greater to spread in a population residing in a crowed small urban area than the same number of population residing in a large rural area. With this limitation, Chen and Wang (2017) proposed the geographic area-based rates (G rates) to complement the classical P rates. They analyzed the 2000–2012 US data on new HIV infections and persons living with HIV and found, as compared with other methods, using G rates enables researchers to more quickly detect increases in HIV rates. This capacity to reveal increasing rates in a more efficient and timely manner is a crucial methodological contribution to HIV research. To enhance this newly proposed concept of G rates, this article presents a discussion of 3 areas for further development of this important concept: (1) analysis of global HIV epidemic data using the newly proposed G rates to capture the changes globally; (2) development of the associated population density-based rates (D rates) to incorporate the heterogeneities from both geographical area and total population-at-risk; and (3) development of methods to calculate variances and confidence intervals for the P rates, G rates, and D rates to capture the variability of these indices.
ResearchGate has not been able to resolve any references for this publication.