Paul R Burton

University of Bristol, Bristol, England, United Kingdom

Are you Paul R Burton?

Claim your profile

Publications (163)1491.5 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The data that put the 'evidence' into 'evidence-based medicine' are central to developments in public health, primary and hospital care. A fundamental challenge is to site such data in repositories that can easily be accessed under appropriate technical and governance controls which are effectively audited and are viewed as trustworthy by diverse stakeholders. This demands socio-technical solutions that may easily become enmeshed in protracted debate and controversy as they encounter the norms, values, expectations and concerns of diverse stakeholders. In this context, the development of what are called 'Data Safe Havens' has been crucial. Unfortunately, the origins and evolution of the term have led to a range of different definitions being assumed by different groups. There is, however, an intuitively meaningful interpretation that is often assumed by those who have not previously encountered the term: a repository in which useful but potentially sensitive data may be kept securely under governance and informatics systems that are fit-for-purpose and appropriately tailored to the nature of the data being maintained, and may be accessed and utilized by legitimate users undertaking work and research contributing to biomedicine, health and/or to ongoing development of healthcare systems. This review explores a fundamental question: 'what are the specific criteria that ought reasonably to be met by a data repository if it is to be seen as consistent with this interpretation and viewed as worthy of being accorded the status of 'Data Safe Haven' by key stakeholders'? We propose 12 such criteria. paul.burton@bristol.ac.uk. © The Author 2015. Published by Oxford University Press.
    Bioinformatics 06/2015; DOI:10.1093/bioinformatics/btv279 · 4.62 Impact Factor
  • Source
    Amadou Gaye, Thomas W Y Burton, Paul R Burton
    [Show abstract] [Hide abstract]
    ABSTRACT: Very large studies are required to provide sufficiently big sample sizes for adequately powered association analyses. This can be an expensive undertaking and it is important that an accurate sample size is identified. For more realistic sample size calculation and power analysis, the impact of unmeasured aetiological determinants and the quality of measurement of both outcome and explanatory variables should be taken into account. Conventional methods to analyse power use closed-form solutions that are not flexible enough to cater for all of these elements easily. They often result in a potentially substantial overestimation of the actual power. In this article, we describe the ESPRESSO tool that allows assessment errors in power calculation under various biomedical scenarios to be incorporated. We also report a real world analysis where we used this tool to answer an important strategic question for an existing cohort. The software is available for online calculation and downloads at http://espresso-research.org. The code is freely available at https://github.com/ESPRESSO-research. © The Author(s) 2015. Published by Oxford University Press.
    Bioinformatics 04/2015; DOI:10.1093/bioinformatics/btv219 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cathie Sudlow and colleagues describe the UK Biobank, a large population-based prospective study, established to allow investigation of the genetic and non-genetic determinants of the diseases of middle and old age.
    PLoS Medicine 03/2015; 12(3):e1001779. DOI:10.1371/journal.pmed.1001779 · 14.00 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed 'care.data' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data.
    International Journal of Epidemiology 09/2014; DOI:10.1093/ije/dyu188 · 9.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Asthma and chronic obstructive pulmonary disease (COPD) are heterogeneous diseases. Objective We sought to determine, in terms of their sputum cellular and mediator profiles, the extent to which they represent distinct or overlapping conditions supporting either the “British” or “Dutch” hypotheses of airway disease pathogenesis. Methods We compared the clinical and physiological characteristics and sputum mediators between 86 subjects with severe asthma and 75 with moderate-to-severe COPD. Biological subgroups were determined using factor and cluster analyses on 18 sputum cytokines. The subgroups were validated on independent severe asthma (n = 166) and COPD (n = 58) cohorts. Two techniques were used to assign the validation subjects to subgroups: linear discriminant analysis, or the best identified discriminator (single cytokine) in combination with subject disease status (asthma or COPD). Results Discriminant analysis distinguished severe asthma from COPD completely using a combination of clinical and biological variables. Factor and cluster analyses of the sputum cytokine profiles revealed 3 biological clusters: cluster 1: asthma predominant, eosinophilic, high TH2 cytokines; cluster 2: asthma and COPD overlap, neutrophilic; cluster 3: COPD predominant, mixed eosinophilic and neutrophilic. Validation subjects were classified into 3 subgroups using discriminant analysis, or disease status with a binary assessment of sputum IL-1β expression. Sputum cellular and cytokine profiles of the validation subgroups were similar to the subgroups from the test study. Conclusions Sputum cytokine profiling can determine distinct and overlapping groups of subjects with asthma and COPD, supporting both the British and Dutch hypotheses. These findings may contribute to improved patient classification to enable stratified medicine.
    The Journal of allergy and clinical immunology 08/2014; 135(1). DOI:10.1016/j.jaci.2014.06.035 · 11.25 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Errors, introduced through poor assessment of physical measurement or because of inconsistent or inappropriate standard operating procedures for collecting, processing, storing or analysing haematological and biochemistry analytes, have a negative impact on the power of association studies using the collected data. A dataset from UK Biobank was used to evaluate the impact of pre-analytical variability on the power of association studies. Methods: First, we estimated the proportion of the variance in analyte concentration that may be attributed to delay in processing using variance component analysis. Then, we captured the proportion of heterogeneity between subjects that is due to variability in the rate of degradation of analytes, by fitting a mixed model. Finally, we evaluated the impact of delay in processing on the power of a nested case-control study using a power calculator that we developed and which takes into account uncertainty in outcome and explanatory variables measurements. Results: The results showed that (i) the majority of the analytes investigated in our analysis, were stable over a period of 36 h and (ii) some analytes were unstable and the resulting pre-analytical variation substantially decreased the power of the study, under the settings we investigated. Conclusions: It is important to specify a limited delay in processing for analytes that are very sensitive to delayed assay. If the rate of degradation of an analyte varies between individuals, any delay introduces a bias which increases with increasing delay. If pre-analytical variation occurring due to delays in sample processing is ignored, it affects adversely the power of the studies that use the data.
    International Journal of Epidemiology 08/2014; 43(5). DOI:10.1093/ije/dyu127 · 9.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Severe refractory asthma is a heterogeneous disease. We sought to determine statistical clusters from the British Thoracic Society Severe refractory Asthma Registry and to examine cluster-specific outcomes and stability. Methods Factor analysis and statistical cluster modelling was undertaken to determine the number of clusters and their membership (N = 349). Cluster-specific outcomes were assessed after a median follow-up of 3 years. A classifier was programmed to determine cluster stability and was validated in an independent cohort of new patients recruited to the registry (n = 245). Findings Five clusters were identified. Cluster 1 (34%) were atopic with early onset disease, cluster 2 (21%) were obese with late onset disease, cluster 3 (15%) had the least severe disease, cluster 4 (15%) were the eosinophilic with late onset disease and cluster 5 (15%) had significant fixed airflow obstruction. At follow-up, the proportion of subjects treated with oral corticosteroids increased in all groups with an increase in body mass index. Exacerbation frequency decreased significantly in clusters 1, 2 and 4 and was associated with a significant fall in the peripheral blood eosinophil count in clusters 2 and 4. Stability of cluster membership at follow-up was 52% for the whole group with stability being best in cluster 2 (71%) and worst in cluster 4 (25%). In an independent validation cohort, the classifier identified the same 5 clusters with similar patient distribution and characteristics. Interpretation Statistical cluster analysis can identify distinct phenotypes with specific outcomes. Cluster membership can be determined using a classifier, but when treatment is optimised, cluster stability is poor.
    PLoS ONE 07/2014; 9(7):e102987. DOI:10.1371/journal.pone.0102987 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Data from individual collections, such as biobanks and cohort studies, are now being shared in order to create combined datasets which can be queried to ask complex scientific questions. But this sharing must be done with due regard for data protection principles. DataSHIELD is a new technology that queries nonaggregated, individual-level data in situ but returns query data in an anonymous format. This raises questions of the ability of DataSHIELD to adequately protect participant confidentiality. Methods: An ethico-legal analysis was conducted that examined each step of the DataSHIELD process from the perspective of UK case law, regulations, and guidance. Results: DataSHIELD reaches agreed UK standards of protection for the sharing of biomedical data. All direct processing of personal data is conducted within the protected environment of the contributing study; participating studies have scientific, ethics, and data access approvals in place prior to the analysis; studies are clear that their consents conform with this use of data, and participants are informed that anonymisation for further disclosure will take place. Conclusion: DataSHIELD can provide a flexible means of interrogating data while protecting the participants' confidentiality in accordance with applicable legislation and guidance. © 2014 S. Karger AG, Basel.
    Public Health Genomics 03/2014; 17(3). DOI:10.1159/000360255 · 2.46 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Health and social policies relevant to improving the lives of children draw on understanding of early life developmental trajectories and the social and material environments in which children are born and grow up. These policies draw on information from the UK's unique birth cohorts (comprising cohorts from 1946, 1958, 1970, and 2000 that have been assessed repeatedly to the present day) and other longitudinal studies. The importance of intergenerational and intragenerational effects on child health and development in the UK is increasingly recognised. A cross-disciplinary approach to the lifecourse is needed to meet this pressing scientific public health and policy challenge, which is sensitive to social, gender, and ethnic inequalities and incorporates biomedical, clinical, and social sciences from the outset. We will create a longitudinal data resource to address questions and hypotheses relevant to improving the lives of children, both now and in their futures. Five major research themes will be explored: inequalities, diversity, and social mobility; early life antecedents of school readiness and later educational performance; developmental origins of health and ill-health in childhood; social, emotional, and behavioural development: the interplay between infant and parent; and neighbourhoods and environment: effects on child and family.
    The Lancet 11/2013; 382:S31. DOI:10.1016/S0140-6736(13)62456-3 · 45.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Interindividual variation in mean leukocyte telomere length (LTL) is associated with cancer and several age-associated diseases. We report here a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals. We identified seven loci, including five new loci, associated with mean LTL (P < 5 × 10(-8)). Five of the loci contain candidate genes (TERC, TERT, NAF1, OBFC1 and RTEL1) that are known to be involved in telomere biology. Lead SNPs at two loci (TERC and TERT) associate with several cancers and other diseases, including idiopathic pulmonary fibrosis. Moreover, a genetic risk score analysis combining lead variants at all 7 loci in 22,233 coronary artery disease cases and 64,762 controls showed an association of the alleles associated with shorter LTL with increased risk of coronary artery disease (21% (95% confidence interval, 5-35%) per standard deviation in LTL, P = 0.014). Our findings support a causal role of telomere-length variation in some age-related diseases.
    Nature Genetics 03/2013; 45(4):422-427. DOI:10.1038/ng.2528 · 29.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene-environment interaction studies offer the prospect of robust causal inference through both gene identification and instrumental variable approaches. As such they are a major and much needed development. However, conducting these studies using traditional methods, which require direct participant contact, is resource intensive. The ability to conduct gene-environment interaction studies remotely would reduce costs and increase capacity. To develop a platform for the remote conduct of gene-environment interaction studies. A random sample of 15,000 men and women aged 50+ years and living in Cardiff, South Wales, of whom 6,012 were estimated to have internet connectivity, were mailed inviting them to visit a web-site to join a study of successful ageing. Online consent was obtained for questionnaire completion, cognitive testing, re-contact, record linkage and genotyping. Cognitive testing was conducted using the Cardiff Cognitive Battery. Bio-sampling was randomised to blood spot, buccal cell or no request. A heterogeneous sample of 663 (4.5% of mailed sample and 11% of internet connected sample) men and women (47% female) aged 50-87 years (median = 61 yrs) from diverse backgrounds (representing the full range of deprivation scores) was recruited. Bio-samples were donated by 70% of those agreeing to do so. Self report questionnaires and cognitive tests showed comparable distributions to those collected using face-to-face methods. Record linkage was achieved for 99.9% of participants. This study has demonstrated that remote methods are suitable for the conduct of gene-environment interaction studies. Up-scaling these methods provides the opportunity to increase capacity for large-scale gene-environment interaction studies.
    PLoS ONE 01/2013; 8(1):e54331. DOI:10.1371/journal.pone.0054331 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent genetic association studies have made progress in uncovering components of the genetic architecture of the body mass index (BMI). We used the ITMAT-Broad-Candidate Gene Association Resource (CARe) (IBC) array comprising up to 49 320 single nucleotide polymorphisms (SNPs) across ∼2100 metabolic and cardiovascular-related loci to genotype up to 108 912 individuals of European ancestry (EA), African-Americans, Hispanics and East Asians, from 46 studies, to provide additional insight into SNPs underpinning BMI. We used a five-phase study design: Phase I focused on meta-analysis of EA studies providing individual level genotype data; Phase II performed a replication of cohorts providing summary level EA data; Phase III meta-analyzed results from the first two phases; associated SNPs from Phase III were used for replication in Phase IV; finally in Phase V, a multi-ethnic meta-analysis of all samples from four ethnicities was performed. At an array-wide significance (P < 2.40E-06), we identify novel BMI associations in loci translocase of outer mitochondrial membrane 40 homolog (yeast) - apolipoprotein E - apolipoprotein C-I (TOMM40-APOE-APOC1) (rs2075650, P = 2.95E-10), sterol regulatory element binding transcription factor 2 (SREBF2, rs5996074, P = 9.43E-07) and neurotrophic tyrosine kinase, receptor, type 2 [NTRK2, a brain-derived neurotrophic factor (BDNF) receptor gene, rs1211166, P = 1.04E-06] in the Phase IV meta-analysis. Of 10 loci with previous evidence for BMI association represented on the IBC array, eight were replicated, with the remaining two showing nominal significance. Conditional analyses revealed two independent BMI-associated signals in BDNF and melanocortin 4 receptor (MC4R) regions. Of the 11 array-wide significant SNPs, three are associated with gene expression levels in both primary B-cells and monocytes; with rs4788099 in SH2B adaptor protein 1 (SH2B1) notably being associated with the expression of multiple genes in cis. These multi-ethnic meta-analyses expand our knowledge of BMI genetics.
    Human Molecular Genetics 01/2013; 22(1):184-201. DOI:10.1093/hmg/dds396 · 6.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: -Electrocardiographic (ECG) traits are important, substantially heritable, determinants of risk of arrhythmias and sudden cardiac death. METHODS AND RESULTS: -In this study, three population-based cohorts (n=10,526) genotyped with the Illumina HumanCVD Beadchip and four quantitative ECG traits (PR interval, QRS axis, QRS duration and QTc interval) were evaluated for single nucleotide polymorphism (SNP) associations. Six gene regions contained SNPs associated with these traits at p< 10(-6), including SCN5A (PR interval and QRS duration), CAV1-CAV2 locus (PR interval), CDKN1A (QRS duration), NOS1AP, KCNH2 and KCNQ1 (QTc interval). Expression QTL analyses of top associated SNPs were undertaken in human heart and aortic tissues. NOS1AP, SCN5A, IGFBP3, CYP2C9 and CAV1 showed evidence of differential allelic expression. We modelled the effects of ion channel activity on ECG parameters, estimating the change in gene expression that would account for our observed associations, thus relating epidemiological observations and eQTL data to a systems model of the ECG. CONCLUSIONS: -These association results replicate and refine the mapping of previous genome-wide association study (GWAS) findings for ECG traits, whilst the expression analysis and modelling approaches offer supporting evidence for a functional role of some of these loci in cardiac excitation/conduction.
    Circulation Cardiovascular Genetics 11/2012; 5(6). DOI:10.1161/CIRCGENETICS.112.962852 · 5.34 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies.
    Nature Genetics 10/2012; 44(12). DOI:10.1038/ng.2435 · 29.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWASs) have identified many SNPs underlying variations in plasma-lipid levels. We explore whether additional loci associated with plasma-lipid phenotypes, such as high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), and triglycerides (TGs), can be identified by a dense gene-centric approach. Our meta-analysis of 32 studies in 66,240 individuals of European ancestry was based on the custom ∼50,000 SNP genotyping array (the ITMAT-Broad-CARe array) covering ∼2,000 candidate genes. SNP-lipid associations were replicated either in a cohort comprising an additional 24,736 samples or within the Global Lipid Genetic Consortium. We identified four, six, ten, and four unreported SNPs in established lipid genes for HDL-C, LDL-C, TC, and TGs, respectively. We also identified several lipid-related SNPs in previously unreported genes: DGAT2, HCAR2, GPIHBP1, PPARG, and FTO for HDL-C; SOCS3, APOH, SPTY2D1, BRCA2, and VLDLR for LDL-C; SOCS3, UGT1A1, BRCA2, UBE3B, FCGR2A, CHUK, and INSIG2 for TC; and SERPINF2, C4B, GCK, GATA4, INSR, and LPAL2 for TGs. The proportion of explained phenotypic variance in the subset of studies providing individual-level data was 9.9% for HDL-C, 9.5% for LDL-C, 10.3% for TC, and 8.0% for TGs. This large meta-analysis of lipid phenotypes with the use of a dense gene-centric approach identified multiple SNPs not previously described in established lipid genes and several previously unknown loci. The explained phenotypic variance from this approach was comparable to that from a meta-analysis of GWAS data, suggesting that a focused genotyping approach can further increase the understanding of heritability of plasma lipids.
    The American Journal of Human Genetics 10/2012; 91(5). DOI:10.1016/j.ajhg.2012.08.032 · 10.99 Impact Factor
  • Norsk Epidemiologi 04/2012; 21(2).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To identify genetic factors contributing to type 2 diabetes (T2D), we performed large-scale meta-analyses by using a custom ∼50,000 SNP genotyping array (the ITMAT-Broad-CARe array) with ∼2000 candidate genes in 39 multiethnic population-based studies, case-control studies, and clinical trials totaling 17,418 cases and 70,298 controls. First, meta-analysis of 25 studies comprising 14,073 cases and 57,489 controls of European descent confirmed eight established T2D loci at genome-wide significance. In silico follow-up analysis of putative association signals found in independent genome-wide association studies (including 8,130 cases and 38,987 controls) performed by the DIAGRAM consortium identified a T2D locus at genome-wide significance (GATAD2A/CILP2/PBX4; p = 5.7 × 10(-9)) and two loci exceeding study-wide significance (SREBF1, and TH/INS; p < 2.4 × 10(-6)). Second, meta-analyses of 1,986 cases and 7,695 controls from eight African-American studies identified study-wide-significant (p = 2.4 × 10(-7)) variants in HMGA2 and replicated variants in TCF7L2 (p = 5.1 × 10(-15)). Third, conditional analysis revealed multiple known and novel independent signals within five T2D-associated genes in samples of European ancestry and within HMGA2 in African-American samples. Fourth, a multiethnic meta-analysis of all 39 studies identified T2D-associated variants in BCL2 (p = 2.1 × 10(-8)). Finally, a composite genetic score of SNPs from new and established T2D signals was significantly associated with increased risk of diabetes in African-American, Hispanic, and Asian populations. In summary, large-scale meta-analysis involving a dense gene-centric approach has uncovered additional loci and variants that contribute to T2D risk and suggests substantial overlap of T2D association signals across multiple ethnic groups.
    The American Journal of Human Genetics 02/2012; 90(3):410-25. DOI:10.1016/j.ajhg.2011.12.022 · 10.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To identify genetic factors contributing to type 2 diabetes (T2D), we performed large-scale meta-analyses by using a custom approximately 50,000 SNP genotyping array (the ITMAT-Broad-CARe array) with approximately 2000 candidate genes in 39 multiethnic population-based studies, case-control studies, and clinical trials totaling 17,418 cases and 70,298 controls. First, meta-analysis of 25 studies comprising 14,073 cases and 57,489 controls of European descent confirmed eight established T2D loci at genome-wide significance. In silico follow-up analysis of putative association signals found in independent genome-wide association studies (including 8,130 cases and 38,987 controls) performed by the DIAGRAM consortium identified a T2D locus at genome-wide significance (GATAD2A/CILP2/PBX4; p = 5.7 x 10(-9)) and two loci exceeding study-wide significance (SREBF1, and TH/INS; p < 2.4 x 10(-6)). Second, meta-analyses of 1,986 cases and 7,695 controls from eight African-American studies identified study-wide-significant (p = 2.4 x 10(-7)) variants in HMGA2 and replicated variants in TCF7L2 (p = 5.1 x 10(-15)). Third, conditional analysis revealed multiple known and novel independent signals within five T2D-associated genes in samples of European ancestry and within HMGA2 in African-American samples. Fourth, a multiethnic meta-analysis of all 39 studies identified T2D-associated variants in BCL2 (p = 2.1 x 10(-8)). Finally, a composite genetic score of SNPs from new and established T2D signals was significantly associated with increased risk of diabetes in African-American, Hispanic, and Asian populations. In summary, large-scale meta-analysis involving a dense gene-centric approach has uncovered additional loci and variants that contribute to T2D risk and suggests substantial overlap of T2D association signals across multiple ethnic groups.
    The American Journal of Human Genetics 01/2012; · 10.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To identify genetic factors contributing to type 2 diabetes (T2D), we performed large-scale meta-analyses by using a custom ∼50,000 SNP genotyping array (the ITMAT-Broad-CARe array) with ∼2000 candidate genes in 39 multiethnic population-based studies, case-control studies, and clinical trials totaling 17,418 cases and 70,298 controls. First, meta-analysis of 25 studies comprising 14,073 cases and 57,489 controls of European descent confirmed eight established T2D loci at genome-wide significance. In silico follow-up analysis of putative association signals found in independent genome-wide association studies (including 8,130 cases and 38,987 controls) performed by the DIAGRAM consortium identified a T2D locus at genome-wide significance (GATAD2A/CILP2/PBX4; p = 5.7 × 10(-9)) and two loci exceeding study-wide significance (SREBF1, and TH/INS; p < 2.4 × 10(-6)). Second, meta-analyses of 1,986 cases and 7,695 controls from eight African-American studies identified study-wide-significant (p = 2.4 × 10(-7)) variants in HMGA2 and replicated variants in TCF7L2 (p = 5.1 × 10(-15)). Third, conditional analysis revealed multiple known and novel independent signals within five T2D-associated genes in samples of European ancestry and within HMGA2 in African-American samples. Fourth, a multiethnic meta-analysis of all 39 studies identified T2D-associated variants in BCL2 (p = 2.1 × 10(-8)). Finally, a composite genetic score of SNPs from new and established T2D signals was significantly associated with increased risk of diabetes in African-American, Hispanic, and Asian populations. In summary, large-scale meta-analysis involving a dense gene-centric approach has uncovered additional loci and variants that contribute to T2D risk and suggests substantial overlap of T2D association signals across multiple ethnic groups.
    The American Journal of Human Genetics 01/2012; · 10.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Contemporary bioscience is seeing the emergence of a new data economy: with data as its fundamental unit of exchange. While sharing data within this new 'economy' provides many potential advantages, the sharing of individual data raises important social and ethical concerns. We examine ongoing development of one technology, DataSHIELD, which appears to elide privacy concerns about sharing data by enabling shared analysis while not actually sharing any individual-level data. We combine presentation of the development of DataSHIELD with presentation of an ethnographic study of a workshop to test the technology. DataSHIELD produced an application of the norm of privacy that was practical, flexible and operationalizable in researchers' everyday activities, and one which fulfilled the requirements of ethics committees. We demonstrated that an analysis run via DataSHIELD could precisely replicate results produced by a standard analysis where all data are physically pooled and analyzed together. In developing DataSHIELD, the ethical concept of privacy was transformed into an issue of security. Development of DataSHIELD was based on social practices as well as scientific and ethical motivations. Therefore, the 'success' of DataSHIELD would, likewise, be dependent on more than just the mathematics and the security of the technology.
    Public Health Genomics 01/2012; 15(5):243-53. DOI:10.1159/000336673 · 2.46 Impact Factor

Publication Stats

13k Citations
1,491.50 Total Impact Points

Institutions

  • 2014–2015
    • University of Bristol
      Bristol, England, United Kingdom
  • 2000–2014
    • University of Leicester
      • • Department of Health Sciences
      • • Department of Cardiovascular Sciences
      Leiscester, England, United Kingdom
  • 2011
    • McGill University
      • Department of Epidemiology, Biostatistics and Occupational Health
      Montréal, Quebec, Canada
  • 2009
    • Massachusetts General Hospital
      • Center for Human Genetic Research
      Boston, Massachusetts, United States
  • 2007
    • University of Aberdeen
      • Institute of Medical Sciences
      Aberdeen, Scotland, United Kingdom
  • 2006
    • University of Nottingham
      • Division of Primary Care
      Nottingham, ENG, United Kingdom
  • 1999
    • University of Western Australia
      • Centre for Genetic Epidemiology and Biostatistics
      Perth City, Western Australia, Australia
  • 1994–1999
    • Western Australia Health
      Perth City, Western Australia, Australia
    • Western Research Institute
      Laramie, Wyoming, United States
  • 1996
    • Cook Children's Health Care System
      Fort Worth, Texas, United States
  • 1995
    • The Princess Margaret Hospital
      Toronto, Ontario, Canada