[Show abstract][Hide abstract] ABSTRACT: Upgrades to electronic health record (EHR) systems scheduled to be introduced in the USA in 2014 will advance document interoperability between care providers. Specifically, the second stage of the federal incentive program for EHR adoption, known as Meaningful Use, requires use of the Consolidated Clinical Document Architecture (C-CDA) for document exchange. In an effort to examine and improve C-CDA based exchange, the SMART (Substitutable Medical Applications and Reusable Technology) C-CDA Collaborative brought together a group of certified EHR and other health information technology vendors.
Journal of the American Medical Informatics Association 06/2014; · 3.93 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Analysis of large-scale systems of biomedical data provides a perspective on neuropsychiatric disease that may be otherwise elusive. Described here is an analysis of three large-scale systems of data from autism spectrum disorder (ASD) and of ASD research as an exemplar of what might be achieved from study of such data. First is the biomedical literature that highlights the fact that there are two very successful but quite separate research communities and findings pertaining to genetics and the molecular biology of ASD. There are those studies positing ASD causes that are related to immunological dysregulation and those related to disorders of synaptic function and neuronal connectivity. Second is the emerging use of electronic health record systems and other large clinical databases that allow the data acquired during the course of care to be used to identify distinct subpopulations, clinical trajectories, and pathophysiological substructures of ASD. These systems reveal subsets of patients with distinct clinical trajectories, some of which are immunologically related and others which follow pathologies conventionally thought of as neurological. The third is genome-wide genomic and transcriptomic analyses which show molecular pathways that overlap neurological and immunological mechanisms. The convergence of these three large-scale data perspectives illustrates the scientific leverage that large-scale data analyses can provide in guiding researchers in an approach to the diagnosis of neuropsychiatric disease that is inclusive and comprehensive.
[Show abstract][Hide abstract] ABSTRACT: The length of the huntingtin (HTT) CAG repeat is strongly correlated with both age at onset of Huntington's disease (HD) symptoms and age at death of HD patients. Dichotomous analysis comparing HD to controls is widely used to study the effects of HTT CAG repeat expansion. However, a potentially more powerful approach is a continuous analysis strategy that takes advantage of all of the different CAG lengths, to capture effects that are expected to be critical to HD pathogenesis.
We used continuous and dichotomous approaches to analyze microarray gene expression data from 107 human control and HD lymphoblastoid cell lines. Of all probes found to be significant in a continuous analysis by CAG length, only 21.4% were so identified by a dichotomous comparison of HD versus controls. Moreover, of probes significant by dichotomous analysis, only 33.2% were also significant in the continuous analysis. Simulations revealed that the dichotomous approach would require substantially more than 107 samples to either detect 80% of the CAG-length correlated changes revealed by continuous analysis or to reduce the rate of significant differences that are not CAG length-correlated to 20% (n = 133 or n = 206, respectively). Given the superior power of the continuous approach, we calculated the correlation structure between HTT CAG repeat lengths and gene expression levels and created a freely available searchable website, "HD CAGnome," that allows users to examine continuous relationships between HTT CAG and expression levels of ∼20,000 human genes.
Our results reveal limitations of dichotomous approaches compared to the power of continuous analysis to study a disease where human genotype-phenotype relationships strongly support a role for a continuum of CAG length-dependent changes. The compendium of HTT CAG length-gene expression level relationships found at the HD CAGnome now provides convenient routes for discovery of candidates influenced by the HD mutation.
PLoS ONE 04/2014; 9(4):e95556. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data was donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
[Show abstract][Hide abstract] ABSTRACT: Abstract Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
Omics: a journal of integrative biology 01/2014; 18(1):10-4. · 2.29 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Newly released definitions of autism spectrum disorder demonstrate the need for precise diagnoses informed by the integration of clinical, molecular, and biochemical characteristics in a patient-information commons.
Science translational medicine 10/2013; 5(209):209ed18. · 14.41 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Autism spectrum disorder (ASD) is one of the most prevalent neurodevelopmental disorders with high heritability, yet a majority of genetic contribution to pathophysiology is not known. Siblings of individuals with ASD are at increased risk for ASD and autistic traits, but the genetic contribution for simplex families is estimated to be less when compared to multiplex families. To explore the genomic (dis-) similarity between proband and unaffected sibling in simplex families, we used genome-wide gene expression profiles of blood from 20 proband-unaffected sibling pairs and 18 unrelated control individuals. The global gene expression profiles of unaffected siblings were more similar to those from probands as they shared genetic and environmental background. A total of 189 genes were significantly differentially expressed between proband-sib pairs (nominal p < 0.01) after controlling for age, sex, and family effects. Probands and siblings were distinguished into two groups by cluster analysis with these genes. Overall, unaffected siblings were equally distant from the centroid of probands and from that of unrelated controls with the differentially expressed genes. Interestingly, five of 20 siblings had gene expression profiles that were more similar to unrelated controls than to their matched probands. In summary, we found a set of genes that distinguished probands from the unaffected siblings, and a subgroup of unaffected siblings who were more similar to probands. The pathways that characterized probands compared to siblings using peripheral blood gene expression profiles were the up-regulation of ribosomal, spliceosomal, and mitochondrial pathways, and the down-regulation of neuroreceptor-ligand, immune response and calcium signaling pathways. Further integrative study with structural genetic variations such as de novo mutations, rare variants, and copy number variations would clarify whether these transcriptomic changes are structural or environmental in origin.
[Show abstract][Hide abstract] ABSTRACT: In Huntington's disease (HD), the size of the expanded HTT CAG repeat mutation is the primary driver of the processes that determine age at onset of motor symptoms. However, correlation of cellular biochemical parameters also extends across the normal repeat range, supporting the view that the CAG repeat represents a functional polymorphism with dominant effects determined by the longer allele. A central challenge to defining the functional consequences of this single polymorphism is the difficulty of distinguishing its subtle effects from the multitude of other sources of biological variation. We demonstrate that an analytical approach based upon continuous correlation with CAG size was able to capture the modest (∼21%) contribution of the repeat to the variation in genome-wide gene expression in 107 lymphoblastoid cell lines, with alleles ranging from 15 to 92 CAGs. Furthermore, a mathematical model from an iterative strategy yielded predicted CAG repeat lengths that were significantly positively correlated with true CAG allele size and negatively correlated with age at onset of motor symptoms. Genes negatively correlated with repeat size were also enriched in a set of genes whose expression were CAG-correlated in human HD cerebellum. These findings both reveal the relatively small, but detectable impact of variation in the CAG allele in global data in these peripheral cells and provide a strategy for building multi-dimensional data-driven models of the biological network that drives the HD disease process by continuous analysis across allelic panels of neuronal cells vulnerable to the dominant effects of the HTT CAG repeat.
Human Molecular Genetics 04/2013; · 6.68 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The study by Ritchie et al.,1 in this issue employs electronic health record data and DNA biobanks to identify several genomic variants previously implicated 2, 3 in the variation of ECG parameters of cardiac conduction and diseases of cardiac conduction. So why is this study worthy of note? Ever since Enthoven first named the QRS complex 4, investigators have sought to define what constitutes a normal complex and the diagnostic and prognostic significance of deviations from the norm. The growing understanding that there is no categorical set of normal values, prompted population studies of (typically white and male) subjects numbering in the 100's. 5 and eventually tens of thousands 6. These studies did generate a more robust set of reference values and did emphasize that the notion of normal vs. abnormal QRS was not appropriate and argued for "an index of the possibility of normals or abnormals occurring at various levels" and "variations in electrocardiograms ... considerably greater than the present standards would lead one to expect..." 5 Subsequent, larger population studies including clinical trial populations 7, 8 with broader age and gender distributions revealed that variation in QRS characteristics in healthy individuals was larger than suspected. In parallel, several studies analyzed the clinical correlates of ECG features, For example in 1967, Pipberger et al 9 conducted what might today be called a "phenome scan" 10, 11. For each of the identified ECG measures, they scanned multiple constitutional features (e.g. obesity) and ethnicity to assess bias and correlation. Among their findings were the significant differences in QRS measures in African Americans, even when correcting for differences in the other constitutional features.
[Show abstract][Hide abstract] ABSTRACT: To quantify the impact of citalopram and other selective serotonin reuptake inhibitors on corrected QT interval (QTc), a marker of risk for ventricular arrhythmia, in a large and diverse clinical population.
A cross sectional study using electrocardiographic, prescribing, and clinical data from electronic health records to explore the relation between antidepressant dose and QTc. Methadone, an opioid known to prolong QT, was included to demonstrate assay sensitivity.
A large New England healthcare system comprising two academic medical centres and outpatient clinics.
38 397 adult patients with an electrocardiogram recorded after prescription of antidepressant or methadone between February 1990 and August 2011.
Relation between antidepressant dose and QTc interval in linear regression, adjusting for potential clinical and demographic confounding variables. For a subset of patients, change in QTc after drug dose was also examined.
Dose-response association with QTc prolongation was identified for citalopram (adjusted beta 0.10 (SE 0.04), P<0.01), escitalopram (adjusted beta 0.58 (0.15), P<0.001), and amitriptyline (adjusted beta 0.11 (0.03), P<0.001), but not for other antidepressants examined. An association with QTc shortening was identified for bupropion (adjusted beta 0.02 (0.01) P<0.05). Within-subject paired observations supported the QTc prolonging effect of citalopram (10 mg to 20 mg, mean QTc increase 7.8 (SE 3.6) ms, adjusted P<0.05; and 20 mg to 40 mg, mean QTc increase 10.3 (4.0) ms, adjusted P<0.01).
This study confirmed a modest prolongation of QT interval with citalopram, and identified additional antidepressants with similar observed risk. Pharmacovigilance studies using electronic health record data may be a useful method of identifying potential risk associated with treatments.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Psychiatric co-morbidity, in particular major depression and anxiety, is common in patients with Crohn's disease (CD) and ulcerative colitis (UC). Prior studies examining this may be confounded by the co-existence of functional bowel symptoms. Limited data exist examining an association between depression or anxiety and disease-specific endpoints such as bowel surgery. AIMS: To examine the frequency of depression and anxiety (prior to surgery or hospitalisation) in a large multi-institution electronic medical record (EMR)-based cohort of CD and UC patients; to define the independent effect of psychiatric co-morbidity on risk of subsequent surgery or hospitalisation in CD and UC, and to identify the effects of depression and anxiety on healthcare utilisation in our cohort. METHODS: Using a multi-institution cohort of patients with CD and UC, we identified those who also had co-existing psychiatric co-morbidity (major depressive disorder or generalised anxiety). After excluding those diagnosed with such co-morbidity for the first time following surgery, we used multivariate logistic regression to examine the independent effect of psychiatric co-morbidity on IBD-related surgery and hospitalisation. To account for confounding by disease severity, we adjusted for a propensity score estimating likelihood of psychiatric co-morbidity influenced by severity of disease in our models. RESULTS: A total of 5405 CD and 5429 UC patients were included in this study; one-fifth had either major depressive disorder or generalised anxiety. In multivariate analysis, adjusting for potential confounders and the propensity score, presence of mood or anxiety co-morbidity was associated with a 28% increase in risk of surgery in CD (OR: 1.28, 95% CI: 1.03-1.57), but not UC (OR: 1.01, 95% CI: 0.80-1.28). Psychiatric co-morbidity was associated with increased healthcare utilisation. CONCLUSIONS: Depressive disorder or generalised anxiety is associated with a modestly increased risk of surgery in patients with Crohn's disease. Interventions addressing this may improve patient outcomes.
[Show abstract][Hide abstract] ABSTRACT: Common variations at the loci harboring the fat mass and obesity gene (FTO), MC4R, and TMEM18 are consistently reported as being associated with obesity and body mass index (BMI) especially in adult population. In order to confirm this effect in pediatric population five European ancestry cohorts from pediatric eMERGE-II network (CCHMC-BCH) were evaluated. Method: Data on 5049 samples of European ancestry were obtained from the Electronic Medical Records (EMRs) of two large academic centers in five different genotyped cohorts. For all available samples, gender, age, height, and weight were collected and BMI was calculated. To account for age and sex differences in BMI, BMI z-scores were generated using 2000 Centers of Disease Control and Prevention (CDC) growth charts. A Genome-wide association study (GWAS) was performed with BMI z-score. After removing missing data and outliers based on principal components (PC) analyses, 2860 samples were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and BMI was tested using linear regression adjusting for age, gender, and PC by cohort. The effects of SNPs were modeled assuming additive, recessive, and dominant effects of the minor allele. Meta-analysis was conducted using a weighted z-score approach. Results: The mean age of subjects was 9.8 years (range 2-19). The proportion of male subjects was 56%. In these cohorts, 14% of samples had a BMI ≥95 and 28 ≥ 85%. Meta analyses produced a signal at 16q12 genomic region with the best result of p = 1.43 × 10(-) (7) [p (rec) = 7.34 × 10(-) (8)) for the SNP rs8050136 at the first intron of FTO gene (z = 5.26) and with no heterogeneity between cohorts (p = 0.77). Under a recessive model, another published SNP at this locus, rs1421085, generates the best result [z = 5.782, p (rec) = 8.21 × 10(-) (9)]. Imputation in this region using dense 1000-Genome and Hapmap CEU samples revealed 71 SNPs with p < 10(-) (6), all at the first intron of FTO locus. When hetero-geneity was permitted between cohorts, signals were also obtained in other previously identified loci, including MC4R (rs12964056, p = 6.87 × 10(-) (7), z = -4.98), cholecystokinin CCK (rs8192472, p = 1.33 × 10(-) (6), z = -4.85), Interleukin 15 (rs2099884, p = 1.27 × 10(-) (5), z = 4.34), low density lipoprotein receptor-related protein 1B [LRP1B (rs7583748, p = 0.00013, z = -3.81)] and near transmembrane protein 18 (TMEM18) (rs7561317, p = 0.001, z = -3.17). We also detected a novel locus at chromosome 3 at COL6A5 [best SNP = rs1542829, minor allele frequency (MAF) of 5% p = 4.35 × 10(-) (9), z = 5.89]. Conclusion: An EMR linked cohort study demonstrates that the BMI-Z measurements can be successfully extracted and linked to genomic data with meaningful confirmatory results. We verified the high prevalence of childhood rate of overweight and obesity in our cohort (28%). In addition, our data indicate that genetic variants in the first intron of FTO, a known adult genetic risk factor for BMI, are also robustly associated with BMI in pediatric population.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Numerous linkage studies have been performed in pedigrees of Autism Spectrum Disorders, and these studies point to diverse loci and etiologies of autism in different pedigrees. The underlying pattern may be identified by an integrative approach, especially since ASD is a complex disorder manifested through many loci. METHOD: Autism spectrum disorder (ASD) was studied through two different and independent genome-scale measurement modalities. We analyzed the results of copy number variation in autism and triangulated these with linkage studies. RESULTS: Consistently across both genome-scale measurements, the same two molecular themes emerged: immune/chemokine pathways and developmental pathways. CONCLUSION: Linkage studies in aggregate do indeed share a thematic consistency, one which structural analyses recapitulate with high significance. These results also show for the first time that genomic profiling of pathways using a recombination distance metric can capture pathways that are consistent with those obtained from copy number variations (CNV).
PLoS ONE 12/2012; 7(12):e48835. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: For certain research questions related to long-term outcomes or to rare disorders, designing prospective studies is impractical or prohibitively expensive. Such studies could instead utilize clinical and magnetic resonance imaging data (MRI) collected as part of routine clinical care, stored in the electronic medical record (EMR). Using major depressive disorder (MDD) as a disease model, we examined the feasibility of studying brain morphology and associations with remission using clinical and MRI data exclusively drawn from the EMR. Advanced automated tools were used to select MDD patients and controls from the EMR who had brain MRI data, but no diagnosed brain pathology. MDD patients were further assessed for remission status by review of clinical charts. Twenty MDD patients (eight full-remitters, six partial-remitters, and six non-remitters), and 15 healthy control subjects met all study criteria for advanced morphometric analyses. Compared to controls, MDD patients had significantly smaller right rostral-anterior cingulate volume, and level of non-remission was associated with smaller left hippocampus and left rostral-middle frontal gyrus volume. The use of EMR data for psychiatric research may provide a timely and cost-effective approach with the potential to generate large study samples reflective of the real population with the illness studied.
Psychiatry Research Neuroimaging 11/2012; · 2.83 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We are entering an era in which the cost of clinical whole-genome and targeted sequencing tests is no longer prohibitive to their application. However, currently the infrastructure is not in place to support both the patient and the physicians that encounter the resultant data. Here, we ask five experts to give their opinions on whether clinical data should be treated differently from other medical data, given the potential use of these tests, and on the areas that must be developed to improve patient outcome.
[Show abstract][Hide abstract] ABSTRACT: OBJECTIVE It has been suggested that there is a mechanism by which nonsteroidal anti-inflammatory drugs (NSAIDs) may interfere with antidepressant response, and poorer outcomes among NSAID-treated patients were reported in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. To attempt to confirm this association in an independent population-based treatment cohort and explore potential confounding variables, the authors examined use of NSAIDs and related medications among 1,528 outpatients in a New England health care system. METHOD Treatment outcomes were classified using a validated machine learning tool applied to electronic medical records. Logistic regression was used to examine the association between medication exposure and treatment outcomes, adjusted for potential confounding variables. To further elucidate confounding and treatment specificity of the observed effects, data from the STAR*D study were reanalyzed. RESULTS NSAID exposure was associated with a greater likelihood of depression classified as treatment resistant compared with depression classified as responsive to selective serotonin reuptake inhibitors (odds ratio=1.55, 95% CI=1.21-2.00). This association was apparent in the NSAIDs-only group but not in those using other agents with NSAID-like mechanisms (cyclooxygenase-2 inhibitors and salicylates). Inclusion of age, sex, ethnicity, and measures of comorbidity and health care utilization in regression models indicated confounding; association with outcome was no longer significant in fully adjusted models. Reanalysis of STAR*D results likewise identified an association in NSAIDs but not NSAID-like drugs, with more modest effects persisting after adjustment for potential confounding variables. CONCLUSIONS These results support an association between NSAID use and poorer antidepressant outcomes in major depressive disorder but indicate that some of the observed effect may be a result of confounding.
American Journal of Psychiatry 10/2012; 169(10):1065-72. · 14.72 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A set of principles is proposed for sponsors and developers of research computing applications that can increase the likelihood of successful adoption by researchers.
Science translational medicine 08/2012; 4(149):149fs32. · 14.41 Impact Factor