[Show abstract][Hide abstract] ABSTRACT: Allergic rhinitis is a common disease whose genetic basis is incompletely explained. We report an integrated genomic analysis of allergic rhinitis. We performed genome wide association studies (GWAS) of allergic rhinitis in 5633 ethnically diverse North American subjects. Next, we profiled gene expression in disease-relevant tissue (CD4+ lymphocytes) collected from subjects who had been genotyped. We then integrated the GWAS and gene expression data using expression single nucleotide (eSNP), coexpression network, and pathway approaches to identify the biologic relevance of our GWAS.
BMC Medical Genomics 08/2014; 7:48. · 3.47 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We compared an electronic health record-based influenza-like illness (ILI) surveillance system with manual sentinel surveillance and virologic data to evaluate the utility of the automated system for routine ILI surveillance.
We obtained weekly aggregate ILI reports from the Electronic medical record Support for Public Health (ESP) disease-detection and reporting system, which used an automated algorithm to identify ILI visits among a patient population of about 700,000 in Eastern Massachusetts. The percentage of total visits for ILI ("percent ILI") in ESP, percent ILI in the Massachusetts Department of Public Health's sentinel surveillance system, and percentage of laboratory specimens submitted to participating Massachusetts laboratories that tested positive for influenza were compared for the period October 2007-September 2011. We calculated Spearman's correlation coefficients and compared ESP and sentinel surveillance systems qualitatively, in terms of simplicity, flexibility, data quality, acceptability, timeliness, and usefulness.
ESP and sentinel surveillance percent ILI always peaked within one week of each other. There was 80% correlation between the two and 71%-73% correlation with laboratory data. Sentinel surveillance percent ILI was higher than ESP percent ILI during influenza seasons. The amplitude of variation in ESP percent ILI was greatest for 5- to 49-year-olds and typically peaked for the 5- to 24-year-old age group before the others.
The ESP system produces percent ILI data of similar quality to sentinel surveillance and offers the advantages of shifting disease reporting burden from clinicians to information systems, allowing tracking of disease by age group, facilitating efficient surveillance for very large populations, and producing consistent and timely reports.
Public Health Reports 01/2014; 129(1):55-63. · 1.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Reversibility of airway obstruction in response to β2-agonists is highly variable among asthmatics, which is partially attributed to genetic factors. In a genome-wide association study of acute bronchodilator response (BDR) to inhaled albuterol, 534 290 single-nucleotide polymorphisms (SNPs) were tested in 403 white trios from the Childhood Asthma Management Program using five statistical models to determine the most robust genetic associations. The primary replication phase included 1397 polymorphisms in three asthma trials (pooled n=764). The second replication phase tested 13 SNPs in three additional asthma populations (n=241, n=215 and n=592). An intergenic SNP on chromosome 10, rs11252394, proximal to several excellent biological candidates, significantly replicated (P=1.98 × 10(-7)) in the primary replication trials. An intronic SNP (rs6988229) in the collagen (COL22A1) locus also provided strong replication signals (P=8.51 × 10(-6)). This study applied a robust approach for testing the genetic basis of BDR and identified novel loci associated with this drug response in asthmatics.The Pharmacogenomics Journal advance online publication, 19 March 2013; doi:10.1038/tpj.2013.5.
The Pharmacogenomics Journal 03/2013; · 5.13 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: OBJECTIVE
To create surveillance algorithms to detect diabetes and classify type 1 versus type 2 diabetes using structured electronic health record (EHR) data.RESEARCH DESIGN AND METHODS
We extracted 4 years of data from the EHR of a large, multisite, multispecialty ambulatory practice serving ∼700,000 patients. We flagged possible cases of diabetes using laboratory test results, diagnosis codes, and prescriptions. We assessed the sensitivity and positive predictive value of novel combinations of these data to classify type 1 versus type 2 diabetes among 210 individuals. We applied an optimized algorithm to a live, prospective, EHR-based surveillance system and reviewed 100 additional cases for validation.RESULTSThe diabetes algorithm flagged 43,177 patients. All criteria contributed unique cases: 78% had diabetes diagnosis codes, 66% fulfilled laboratory criteria, and 46% had suggestive prescriptions. The sensitivity and positive predictive value of ICD-9 codes for type 1 diabetes were 26% (95% CI 12-49) and 94% (83-100) for type 1 codes alone; 90% (81-95) and 57% (33-86) for two or more type 1 codes plus any number of type 2 codes. An optimized algorithm incorporating the ratio of type 1 versus type 2 codes, plasma C-peptide and autoantibody levels, and suggestive prescriptions flagged 66 of 66 (100% [96-100]) patients with type 1 diabetes. On validation, the optimized algorithm correctly classified 35 of 36 patients with type 1 diabetes (raw sensitivity, 97% [87-100], population-weighted sensitivity, 65% [36-100], and positive predictive value, 88% [78-98]).CONCLUSIONS
Algorithms applied to EHR data detect more cases of diabetes than claims codes and reasonably discriminate between type 1 and type 2 diabetes.
[Show abstract][Hide abstract] ABSTRACT: Electronic medical record (EMR) systems have rich potential to improve integration between primary care and the public health system at the point of care. EMRs make it possible for clinicians to contribute timely, clinically detailed surveillance data to public health practitioners without changing their existing workflows or incurring extra work. New surveillance systems can extract raw data from providers' EMRs, analyze them for conditions of public health interest, and automatically communicate results to health departments. The current paper describes a model EMR-based public health surveillance platform called Electronic Medical Record Support for Public Health (ESP). The ESP platform provides live, automated surveillance for notifiable diseases, influenza-like illness, and diabetes prevalence, care, and complications. Results are automatically transmitted to state health departments.
American journal of preventive medicine 06/2012; 42(6 Suppl 2):S154-62. · 4.24 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Electronic medical record (EMR) systems have rich potential to improve integration between primary care and the public health system at the point of care. EMRs make it possible for clinicians to contribute timely, clinically detailed surveillance data to public health practitioners without changing their existing workflows or incurring extra work. New surveillance systems can extract raw data from providers' EMRs, analyze them for conditions of public health interest, and automatically communicate results to health departments. We describe a model EMR-based public health surveillance platform called Electronic Medical Record Support for Public Health (ESP). The ESP platform provides live, automated surveillance for notifiable diseases, influenza-like illness, and diabetes prevalence, care, and complications. Results are automatically transmitted to state health departments.
American Journal of Public Health 06/2012; 102 Suppl 3:S325-32. · 3.93 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The response to treatment for asthma is characterized by wide interindividual variability, with a significant number of patients who have no response. We hypothesized that a genomewide association study would reveal novel pharmacogenetic determinants of the response to inhaled glucocorticoids.
We analyzed a small number of statistically powerful variants selected on the basis of a family-based screening algorithm from among 534,290 single-nucleotide polymorphisms (SNPs) to determine changes in lung function in response to inhaled glucocorticoids. A significant, replicated association was found, and we characterized its functional effects.
We identified a significant pharmacogenetic association at SNP rs37972, replicated in four independent populations totaling 935 persons (P=0.0007), which maps to the glucocorticoid-induced transcript 1 gene (GLCCI1) and is in complete linkage disequilibrium (i.e., perfectly correlated) with rs37973. Both rs37972 and rs37973 are associated with decrements in GLCCI1 expression. In isolated cell systems, the rs37973 variant is associated with significantly decreased luciferase reporter activity. Pooled data from treatment trials indicate reduced lung function in response to inhaled glucocorticoids in subjects with the variant allele (P=0.0007 for pooled data). Overall, the mean (±SE) increase in forced expiratory volume in 1 second in the treated subjects who were homozygous for the mutant rs37973 allele was only about one third of that seen in similarly treated subjects who were homozygous for the wild-type allele (3.2±1.6% vs. 9.4±1.1%), and their risk of a poor response was significantly higher (odds ratio, 2.36; 95% confidence interval, 1.27 to 4.41), with genotype accounting for about 6.6% of overall inhaled glucocorticoid response variability.
A functional GLCCI1 variant is associated with substantial decrements in the response to inhaled glucocorticoids in patients with asthma.
New England Journal of Medicine 09/2011; 365(13):1173-83. · 51.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Network modeling of whole transcriptome expression data enables characterization of complex epistatic (gene-gene) interactions that underlie cellular functions. Though numerous methods have been proposed and successfully implemented to develop these networks, there are no formal methods for comparing differences in network connectivity patterns as a function of phenotypic trait.
Here we describe a novel approach for quantifying the differences in gene-gene connectivity patterns across disease states based on Graphical Gaussian Models (GGMs). We compare the posterior probabilities of connectivity for each gene pair across two disease states, expressed as a posterior odds-ratio (postOR) for each pair, which can be used to identify network components most relevant to disease status. The method can also be generalized to model differential gene connectivity patterns within previously defined gene sets, gene networks and pathways. We demonstrate that the GGM method reliably detects differences in network connectivity patterns in datasets of varying sample size. Applying this method to two independent breast cancer expression data sets, we identified numerous reproducible differences in network connectivity across histological grades of breast cancer, including several published gene sets and pathways. Most notably, our model identified two gene hubs (MMP12 and CXCL13) that each exhibited differential connectivity to more than 30 transcripts in both datasets. Both genes have been previously implicated in breast cancer pathobiology, but themselves are not differentially expressed by histologic grade in either dataset, and would thus have not been identified using traditional differential gene expression testing approaches. In addition, 16 curated gene sets demonstrated significant differential connectivity in both data sets, including the matrix metalloproteinases, PPAR alpha sequence targets, and the PUFA synthesis pathway.
Our results suggest that GGM can be used to formally evaluate differences in global interactome connectivity across disease states, and can serve as a powerful tool for exploring the molecular events that contribute to disease at a systems level.
BMC Systems Biology 05/2011; 5:89. · 2.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Reducing the impact of seasonal influenza epidemics and other pandemics such as the H1N1 is of paramount importance for public health authorities. Studies have shown that effective interventions can be taken to contain the epidemics if early detection can be made. Traditional approach employed by the Centers for Disease Control and Prevention (CDC) includes collecting influenza-like illness (ILI) activity data from “sentinel” medical practices. Typically there is a 1-2 week delay between the time a patient is diagnosed and the moment that data point becomes available in aggregate ILI reports. In this paper we present the Social Network Enabled Flu Trends (SNEFT) framework, which monitors messages posted on Twitter with a mention of flu indicators to track and predict the emergence and spread of an influenza epidemic in a population. Based on the data collected during 2009 and 2010, we find that the volume of flu related tweets is highly correlated with the number of ILI cases reported by CDC. We further devise auto-regression models to predict the ILI activity level in a population. The models predict data collected and published by CDC, as the percentage of visits to “sentinel” physicians attributable to ILI in successively weeks. We test models with previous CDC data, with and without measures of Twitter data, showing that Twitter data can substantially improve the models prediction accuracy. Therefore, Twitter data provides real-time assessment of ILI activity.
[Show abstract][Hide abstract] ABSTRACT: Prior studies suggest a role for a variant (rs5743836) in the promoter of toll-like receptor 9 (TLR9) in asthma and other inflammatory diseases. We performed detailed genetic association studies of the functional variant rs5743836 with asthma susceptibility and asthma-related phenotypes in three independent cohorts.
rs5743836 was genotyped in two family-based cohorts of children with asthma and a case-control study of adult asthmatics. Association analyses were performed using chi square, family-based and population-based testing. A luciferase assay was performed to investigate whether rs5743836 genotype influences TLR9 promoter activity.
Contrary to prior reports, rs5743836 was not associated with asthma in any of the three cohorts. Marginally significant associations were found with FEV1 and FVC (p = 0.003 and p = 0.008, respectively) in one of the family-based cohorts, but these associations were not significant after correcting for multiple comparisons. Higher promoter activity of the CC genotype was demonstrated by luciferase assay, confirming the functional importance of this variant.
Although rs5743836 confers regulatory effects on TLR9 transcription, this variant does not appear to be an important asthma-susceptibility locus.
BMC Medical Genetics 02/2011; 12:26. · 2.54 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Electronic medical record (EMR) systems are a rich potential source for detailed, timely, and efficient surveillance of large populations. We created the Electronic medical record Support for Public Health (ESP) system to facilitate and demonstrate the potential advantages of harnessing EMRs for public health surveillance. ESP organizes and analyzes EMR data for events of public health interest and transmits electronic case reports or aggregate population summaries to public health agencies as appropriate. It is designed to be compatible with any EMR system and can be customized to different states' messaging requirements. All ESP code is open source and freely available. ESP currently has modules for notifiable disease, influenza-like illness syndrome, and diabetes surveillance. An intelligent presentation system for ESP called the RiskScape is under development. The RiskScape displays surveillance data in an accessible and intelligible format by automatically mapping results by zip code, stratifying outcomes by demographic and clinical parameters, and enabling users to specify custom queries and stratifications. The goal of RiskScape is to provide public health practitioners with rich, up-to-date views of health measures that facilitate timely identification of health disparities and opportunities for targeted interventions. ESP installations are currently operational in Massachusetts and Ohio, providing live, automated surveillance on over 1 million patients. Additional installations are underway at two more large practices in Massachusetts.
Online journal of public health informatics. 01/2011; 3(3).
[Show abstract][Hide abstract] ABSTRACT: Pathogens have represented an important selective force during the adaptation of modern human populations to changing social and other environmental conditions. The evolution of the immune system has therefore been influenced by these pressures. Genomic scans have revealed that immune system is one of the functions enriched with genes under adaptive selection.
Here, we describe how the innate immune system has responded to these challenges, through the analysis of resequencing data for 132 innate immunity genes in two human populations. Results are interpreted in the context of the functional and interaction networks defined by these genes. Nucleotide diversity is lower in the adaptors and modulators functional classes, and is negatively correlated with the centrality of the proteins within the interaction network. We also produced a list of candidate genes under positive or balancing selection in each population detected by neutrality tests and showed that some functional classes are preferential targets for selection.
We found evidence that the role of each gene in the network conditions the capacity to evolve or their evolvability: genes at the core of the network are more constrained, while adaptation mostly occurred at particular positions at the network edges. Interestingly, the functional classes containing most of the genes with signatures of balancing selection are involved in autoinflammatory and autoimmune diseases, suggesting a counterbalance between the beneficial and deleterious effects of the immune response.
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies of human gene expression promise to identify functional regulatory genetic variation that contributes to phenotypic diversity. However, it is unclear how useful this approach will be for the identification of disease-susceptibility variants. We generated gene expression profiles for 22 184 mRNA transcripts using RNA derived from peripheral blood CD4+ lymphocytes, and genome-wide genotype data for 516 512 autosomal markers in 200 subjects. We screened for cis-acting variants by testing variants mapping within 50 kb of expressed transcripts for association with transcript abundance using generalized linear models. Significant associations were identified for 1585 genes at a false discovery rate of 0.05 (corresponding to P-values ranging from 1 × 10(-91) to 7 × 10(-4)). Importantly, we identified evidence of regulatory variation for 119 previously mapped disease genes, including 24 examples where the variant with the strongest evidence of disease-association demonstrates strong association with specific transcript abundance. The prevalence of cis-acting variants among disease-associated genes was 63% higher than the genome-wide rate in our data set (P = 6.41 × 10(-6)), and although many of the implicated loci were associated with immune-related diseases (including asthma, connective tissue disorders and inflammatory bowel disease), associations with genes implicated in non-immune-related diseases including lipid profiles, anthropomorphic measurements, cancer and neurologic disease were also observed. Genetic variants that confer inter-individual differences in gene expression represent an important subset of variants that contribute to disease susceptibility. Population-based integrative genetic approaches can help identify such variation and enhance our understanding of the genetic basis of complex traits.
Human Molecular Genetics 12/2010; 19(23):4745-57. · 7.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genetic variation at the MYH9 locus is linked to the high incidence of focal segmental glomerulosclerosis (FSGS) and non-diabetic end-stage renal disease among African Americans. To further define risk alleles with FSGS we performed a genome-wide association analysis using more than one million single-nucleotide polymorphisms in 56 African-American and 61 European-American patients with biopsy-confirmed FSGS. Results were compared to 1641 European Americans and 1800 African Americans as unselected controls. While no association was observed in the cohort of European Americans, the case-control comparison of African Americans found variants within a 60 kb region of chromosome 22 containing part of the APOL1 and MYH9 genes associated with increased risk of FSGS. This region spans different linkage disequilibrium blocks, and variants associating with disease within this region are in linkage disequilibrium with variants which have shown signals of natural selection. APOL1 is a strong candidate for a gene that has undergone recent natural selection and is known to be involved in the infection by Trypanosoma brucei, a parasite common in Africa that has recently adapted to infect human hosts. Further studies will be required to establish which variants are causally related to kidney disease, what mutations caused the selective sweep, and to ultimately determine if these are the same.
Kidney International 10/2010; 78(7):698-704. · 8.52 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Comparative effectiveness research, medical product safety evaluation, and quality measurement will require the ability to use electronic health data held by multiple organizations. There is no consensus about whether to create regional or national combined (eg, "all payer") databases for these purposes, or distributed data networks that leave most Protected Health Information and proprietary data in the possession of the original data holders.
Demonstrate functions of a distributed research network that supports research needs and also address data holders concerns about participation. Key design functions included strong local control of data uses and a centralized web-based querying interface.
We implemented a pilot distributed research network and evaluated the design considerations, utility for research, and the acceptability to data holders of methods for menu-driven querying. We developed and tested a central, web-based interface with supporting network software. Specific functions assessed include query formation and distribution, query execution and review, and aggregation of results.
This pilot successfully evaluated temporal trends in medication use and diagnoses at 5 separate sites, demonstrating some of the possibilities of using a distributed research network. The pilot demonstrated the potential utility of the design, which addressed the major concerns of both users and data holders. No serious obstacles were identified that would prevent development of a fully functional, scalable network.
Distributed networks are capable of addressing nearly all anticipated uses of routinely collected electronic healthcare data. Distributed networks would obviate the need for centralized databases, thus avoiding numerous obstacles.
Medical care 06/2010; 48(6 Suppl):S45-51. · 3.24 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Several family-based studies have identified genetic linkage for lung function and airflow obstruction to chromosome 2q.
We hypothesized that merging results of high-resolution single nucleotide polymorphism (SNP) mapping in four separate populations would lead to the identification of chronic obstructive pulmonary disease (COPD) susceptibility genes on chromosome 2q.
Within the chromosome 2q linkage region, 2,843 SNPs were genotyped in 806 COPD cases and 779 control subjects from Norway, and 2,484 SNPs were genotyped in 309 patients with severe COPD from the National Emphysema Treatment Trial and 330 community control subjects. Significant associations from the combined results across the two case-control studies were followed up in 1,839 individuals from 603 families from the International COPD Genetics Network (ICGN) and in 949 individuals from 127 families in the Boston Early-Onset COPD Study.
Merging the results of the two case-control analyses, 14 of the 790 overlapping SNPs had a combined P < 0.01. Two of these 14 SNPs were consistently associated with COPD in the ICGN families. The association with one SNP, located in the gene XRCC5, was replicated in the Boston Early-Onset COPD Study, with a combined P = 2.51 x 10(-5) across the four studies, which remains significant when adjusted for multiple testing (P = 0.02). Genotype imputation confirmed the association with SNPs in XRCC5.
By combining data from COPD genetic association studies conducted in four independent patient samples, we have identified XRCC5, an ATP-dependent DNA helicase, as a potential COPD susceptibility gene.
American Journal of Respiratory and Critical Care Medicine 05/2010; 182(5):605-13. · 11.04 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Researchers, policy makers and others commonly use electronic data routinely collected during the delivery of, or payment for, medical care to study the "real world" effectiveness, comparative effectiveness, safety, and costs of medical interventions. However, even very large individual healthcare data resources are often not big enough to adequately conduct post-marketing evidence studies. To improve the ability to use multiple distributed data resources, like the HMORN's Virtual Data Warehouse, the Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) centers at the HMORN Center for Education and Research on Therapeutics (CERT) and the University of Pennsylvania are developing a design for a scalable distributed research network. Funded by AHRQ's Effective Health Care Program, the project intends to develop the framework for a distributed research network that will help close the knowledge gap regarding the "real world" effectiveness, comparative effectiveness, and safety of medical technologies. The network will have these important attributes: distributed architecture, strong local control of data uses, and federated querying. Constructing the network is a challenge, even among sites with fully operational local virtual data warehouse (VDW) installations, because of differences in computing environments and information systems, the need for responsible stewardship of clinical records, concerns related to data privacy, software development obstacles, the use of proprietary data, and the need to develop effective governance mechanisms. To investigate some of these issues, the project includes implementation of a proof of concept prototype involving these HMORN sites: Kaiser Permanente Northern California, Kaiser Permanente Colorado, Harvard Pilgrim Health Care, Group Health Cooperative, and Geisinger Health System. The goals of the proof of concept work are to distribute a query to separately stored data resources, execute the query remotely, and present aggregated results to an authorized system user. The proof of concept will be designed with a single network portal that manages the network software, including security, authorization, authentication, messaging, and query aggregation. We will present our findings from the proof of concept, including information on specific architecture and design used, the success of the proof of concept, and lessons learned from implementation.