Genetic Basis of Autoantibody Positive and Negative Rheumatoid Arthritis Risk in a Multi-ethnic Cohort Derived from Electronic Health Records

Division of Rheumatology, Immunology, and Allergy and Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
The American Journal of Human Genetics (Impact Factor: 10.93). 01/2011; 88(1):57-69. DOI: 10.1016/j.ajhg.2010.12.007
Source: PubMed


Discovering and following up on genetic associations with complex phenotypes require large patient cohorts. This is particularly true for patient cohorts of diverse ancestry and clinically relevant subsets of disease. The ability to mine the electronic health records (EHRs) of patients followed as part of routine clinical care provides a potential opportunity to efficiently identify affected cases and unaffected controls for appropriate-sized genetic studies. Here, we demonstrate proof-of-concept that it is possible to use EHR data linked with biospecimens to establish a multi-ethnic case-control cohort for genetic research of a complex disease, rheumatoid arthritis (RA). In 1,515 EHR-derived RA cases and 1,480 controls matched for both genetic ancestry and disease-specific autoantibodies (anti-citrullinated protein antibodies [ACPA]), we demonstrate that the odds ratios and aggregate genetic risk score (GRS) of known RA risk alleles measured in individuals of European ancestry within our EHR cohort are nearly identical to those derived from a genome-wide association study (GWAS) of 5,539 autoantibody-positive RA cases and 20,169 controls. We extend this approach to other ethnic groups and identify a large overlap in the GRS among individuals of European, African, East Asian, and Hispanic ancestry. We also demonstrate that the distribution of a GRS based on 28 non-HLA risk alleles in ACPA+ cases partially overlaps with ACPA- subgroup of RA cases. Our study demonstrates that the genetic basis of rheumatoid arthritis risk is similar among cases of diverse ancestry divided into subsets based on ACPA status and emphasizes the utility of linking EHR clinical data with biospecimens for genetic studies.

Download full-text


Available from: Fina A S Kurreeman
  • Source
    • "The Partners HealthCare Institutional Review Board approved the protocol. In this analysis, we restrict to cases and controls of European descent, and require that cases are anti-CCP positive, because previous studies have reported associations in this setting; extensions to other populations and anti-CCP negative cases have also been considered (Kurreeman et al. 2011). During algorithm development, it was determined that p̂ D should be thresholded at p 95 = 0.53 to maintain 95% specificity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case-control status, and this estimated case-control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies.
    Preview · Article · Jul 2014 · Human Genetics
  • Source
    • "Studies employing large-scale EHR data have begun to appear,14–19 and most of them employ this two-step approach. The state of the art in feature extraction is to use a heuristic, iterative approach to generate queries that run across the entire EHR database. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The national adoption of electronic health records (EHR) promises to make an unprecedented amount of data available for clinical research, but the data are complex, inaccurate, and frequently missing, and the record reflects complex processes aside from the patient's physiological state. We believe that the path forward requires studying the EHR as an object of interest in itself, and that new models, learning from data, and collaboration will lead to efficient use of the valuable information currently locked in health records.
    Full-text · Article · Sep 2012 · Journal of the American Medical Informatics Association
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The mechanisms maintaining homosexual behaviour in animals are not well understood. In fruit flies, where male–male courtship is prevalent, it has been suggested that young males gain from being courted by mature males, perhaps through learning. I conducted two series of experiments to critically examine why mature males court immature males and what immature males may gain from such courtship. The results indicate that mature males do not identify the sex of the sexually ambiguous immature males and find them attractive even after substantial experience. These findings agree with recent research indicating that males initially court a broad range of potential mating targets and then narrow their courtship focus based on their experience of being rejected by classes of flies such as recently mated females or heterospecific females, which are clearly identified by their distinct pheromonal profiles. Compared to inexperienced males, males that either had received courtship when immature or had been housed with mature males when young did not have either higher mating frequencies or shorter mating latencies but spent more time courting females. Such higher courtship intensities may translate into a mating advantage in some settings. It appears that young males that interact with mature males develop into a sexually aggressive phenotype, which could better prepare them to compete for females.
    Preview · Article · Nov 2010 · Animal Behaviour
Show more