
Peggy L Peissig- Ph.D., MBA
- Managing Director at Marshfield Clinic
Peggy L Peissig
- Ph.D., MBA
- Managing Director at Marshfield Clinic
About
202
Publications
33,716
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,266
Citations
Introduction
Skills and Expertise
Current institution
Additional affiliations
January 2015 - January 2017
Publications
Publications (202)
Objective
Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participa...
Introduction
Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which lea...
Objective Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language.
Methods We iteratively developed NLP tools using clinical Text Analysis...
Chronic Kidney Disease (CKD) represents a slowly progressive disorder that is typically silent until late stages, but early intervention can significantly delay its progression. We designed a portable and scalable electronic CKD phenotype to facilitate early disease recognition and empower large-scale observational and genetic studies of kidney tra...
Introduction
Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which lea...
Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the...
Accurate estimation of healthcare costs is crucial for healthcare systems to plan and effectively negotiate with insurance companies regarding the coverage of patient-care costs. Greater accuracy in estimating healthcare costs would provide mutual benefit for both health systems and the insurers that support these systems by better aligning payment...
Background and aims
Diverticular disease is among the most prevalent conditions encountered by gastroenterologists, affecting ∼50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with diverticular disease, utilizing the electronic health record (EHR) with Natural Language Processin...
We propose temporal Poisson square root graphical models (TPSQRs), a generalization of Poisson square root graphical models (PSQRs) specifically designed for modeling longitudinal event data. By estimating the temporal relationships for all possible pairs of event types, TPSQRs can offer a holistic perspective about whether the occurrences of any g...
Adverse drug reactions (ADRs) are detrimental and unexpected clinical incidents caused by drug intake. The increasing availability of massive quantities of longitudinal event data such as electronic health records (EHRs) has redefined ADR discovery as a big data analytics problem, where data-hungry deep neural networks are especially suitable becau...
Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudin...
With the rapid expansion of applied 3D computational vision, shape descriptors have become increasingly important for a wide variety of applications and objects from molecules to planets. Appropriate shape descriptors are critical for accurate (and efficient) shape retrieval and 3D model classification. Several spectral-based shape descriptors have...
The widespread digitization of patient data via electronic health records (EHRs) has created an unprecedented opportunity to use machine learning algorithms to better predict disease risk at the patient level. Although predictive models have previously been constructed for a few important diseases, such as breast cancer and myocardial infarction, w...
Background:
Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process.
Methods:
Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership...
Uterine fibroids affect up to 77% of women by menopause and account for up to $34 billion in healthcare costs each year. Although fibroid risk is heritable, genetic risk for fibroids is not well understood. We conducted a two-stage case-control meta-analysis of genetic variants in European and African ancestry women with and without fibroids classi...
Doctors do not know whether treatment of high parathyroid hormone levels is linked to better outcomes in their patients with kidney disease. In this study, lower parathyroid hormone levels at baseline were linked to lower risk of fracture, vascular events, and death in people with kidney disease.
Purpose
Chronic kidney disease (CKD) affects ~ 20%...
Electronic health records (EHR) are valuable to define phenotype selection algorithms used to identify cohorts ofpatients for sequencing or genome wide association studies (GWAS). To date, the electronic medical records and genomics (eMERGE) network institutions have developed and applied such algorithms to identify cohorts with associated DNA samp...
Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that higher quality initial hypotheses are crucial. In this work, we propose a high-throughput p...
Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may pr...
Objective
To develop machine learning models for classifying the severity of opioid overdose events from clinical data.
Materials and methods
Opioid overdoses were identified by diagnoses codes from the Marshfield Clinic population and assigned a severity score via chart review to form a gold standard set of labels. Three primary feature sets were...
Purpose
As a preliminary evaluation of the outcomes of implementing pharmacogenetic testing within a large rural healthcare system, patients who received pre-emptive pharmacogenetic testing and warfarin dosing were monitored until June 2017.
Summary
Over a 20-month period, 749 patients were genotyped for VKORC1 and CYP2C9 as part of the electronic...
With the rapid expansion of applied 3D computational vision, shape descriptors have become increasingly important for a wide variety of applications and objects from molecules to planets. Appropriate shape descriptors are critical for accurate (and efficient) shape retrieval and 3D model classification. Several spectral-based shape descriptors have...
We present the baseline regularization model for computational drug repurposing using electronic health records (EHRs). In EHRs, drug prescriptions of various drugs are recorded throughout time for various patients. In the same time, numeric physical measurements (e.g., fasting blood glucose level) are also recorded. Baseline regularization uses st...
The predictive capability of combining demographic risk factors, germline genetic variants, and mammogram abnormality features for breast cancer risk prediction is poorly understood. We evaluated the predictive performance of combinations of demographic risk factors, high risk single nucleotide polymorphisms (SNPs), and mammography features for wom...
Background and Objectives: Chronic kidney disease (CKD) affects ~20% of older adults and secondary hyperparathyroidism (HPT) is a common condition in these patients. Studies have linked HPT to a greater risk of fractures, vascular events and mortality. However, the optimal parathyroid hormone (PTH) level needed to minimize these events remains unce...
Background:
Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "vir...
Blindness or vision impairment, one of the top ten disabilities among men and women, targets more than 7 million Americans of all ages. Accessible visual information is of paramount importance to improve independence and safety of blind and visually impaired people, and there is a pressing need to develop smart automated systems to assist their nav...
Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian...
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With...
We propose temporal Poisson square root graphical models (TPSQRs), a generalization of Poisson square root graphical models (PSQRs) specifically designed for modeling longitudinal event data. By estimating the temporal relationships for all possible pairs of event types, TPSQRs can offer a holistic perspective about whether the occurrences of any g...
Calciphylaxis is a disorder that results in necrotic cutaneous lesions with a high rate of mortality. Due to its rarity and complexity, the risk factors for and the disease mechanism of calciphylaxis are not fully understood. This work focuses on the use of machine learning to both predict disease risk and model the contributing factors learned fro...
Background
-Coronary heart disease (CHD) is a leading cause of death globally. Although therapy with HMG-CoA reductase inhibitors (statins) decreases circulating levels of low-density lipoprotein cholesterol (LDL-C) and the incidence of CHD, additional events occur despite statin therapy in some individuals. The genetic determinants of this residua...
A fully-labeled image dataset provides a unique resource for reproducible research inquiries and data analyses in several computational fields, such as computer vision, machine learning and deep learning machine intelligence. With the present contribution, a large-scale fully-labeled image dataset is provided, and made publicly and freely available...
The study of adverse vaccine reactions (AVRs) as either mild or serious complications, has been a longstanding topic in biomedical literature. In recent years, a vast amount of scientific articles have been published on a daily basis, however, the use of this wealth of data for adverse vaccine reaction analyses has been very limited so far, and lit...
Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic healt...
Every single day, a large amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health related social media posts, clinical notes, and drug reviews. Processing this data in an efficient manner is a really daunting task without the help of clever computational strategies, and it makes t...
Background:
The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study, and with little known about the content with respect to ADEs.
Objective:...
Introduction:
Several different types of drugs acting on the central nervous system (CNS) have previously been associated with an increased risk of suicide and suicidal ideation (broadly referred to as suicide). However, a differential association between brand and generic CNS drugs and suicide has not been reported.
Objectives:
This study compa...
What is known and objective:
Some public scepticism exists about generics in terms of whether brand and generic drugs produce identical outcomes. This study explores whether adverse event (AE) reporting patterns are similar between brand and generic drugs, using authorized generics (AGs) as a control for possible generic drug perception biases.
M...
Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is...
Background:
The US Food and Drug Administration Adverse Event Reporting System (FAERS), a post-marketing safety database, can be used to differentiate brand versus generic safety signals.
Objective:
To explore the methods for identifying and analyzing brand versus generic adverse event (AE) reports.
Methods:
Public release FAERS data from Janu...
Background:
The capture and integration of structured ophthalmologic data into electronic health records (EHRs) has historically been a challenge. However, the importance of this activity for patient care and research is critical.
Objective:
The purpose of this study was to develop a prototype of a context-driven dynamic extensible markup langua...
Several prominent public health incidents that occurred at the beginning of this century due to adverse drug events (ADEs) have raised international awareness of governments and industries about pharmacovigilance (PhV), the science and activities to monitor and prevent adverse events caused by pharmaceutical products after they are introduced to th...
Machine learning as an advanced computational technology has been around for several years in discovering patterns from diverse biomedical data sources and providing excellent capabilities ranging from gene annotation to predictive phenotyping. However, machine learning strategies remain underused in small and medium-scale biomedical research labs...
Objective
Despite the cost saving role of generic anti-epileptic drugs (AEDs), debate exists as to whether generic substitution of branded AEDs may lead to therapeutic failure and increased toxicity. This study compared adverse event (AE) reporting rates for brand vs. authorized generic (AG) vs. generic AEDs. Since AGs are pharmaceutically identica...
Background:
One potential use for the PR interval is as a biomarker of disease risk. We hypothesized that quantifying the shared genetic architectures of the PR interval and a set of clinical phenotypes would identify genetic mechanisms contributing to PR variability and identify diseases associated with a genetic predictor of PR variability.
Met...
Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study...
Genome-wide association study of European Americans with resistant hypertension versus controlled hypertensives.
A total of 2,530,150 SNPs were tested for an association with resistant hypertension (1,719 cases and 708 controls) among Europeans from the eMERGE I and II Network. Tests of association were performed using logistic regression assuming...
LocusZoon plot of genome-wide significant result in ESR1 in European Americans with resistant hypertension versus controlled hypertensives.
A genome-wide association study was performed for 1,719 European American cases of resistant hypertension and 708 controls from the eMERGE Network adjusted for sex, decade of birth, median body mass index, geno...
Medication classes considered in defining resistant hypertension status in the electronic MEdical Records & GEnomics (eMERGE) Network.
(DOCX)
Resistant hypertension cases and controls for discovery, by eMERGE study site and population.
Counts within parentheses represent number of additional samples identified as cases or controls but not included in the final analyses due to model convergence issues.
(DOCX)
Likely false-positive association in genome-wide association analysis of individuals with resistant hypertension versus controlled hypertensives.
A total of 876 cases of resistant hypertension and 2,830 controlled hypertensives from the eMERGE I and II Network were available for analysis. Single-SNP tests of association were performed for 2,530,150...
Q-Q plot of genome-wide association analysis of individuals with resistant hypertension versus controlled hypertensives.
The most significant finding (ESR1 rs9479122) is likely a false-positive due to poor genotyping prior to imputation and was removed.
(DOCX)
Genome-wide association study of European Americans with resistant hypertension versus controlled hypertensives.
A total of 2,530,150 SNPs were tested for an association with resistant hypertension (1,719 cases and 708 controls) among Europeans from the eMERGE I and II network. After removal of ESR1 rs9479122, tests of association were performed us...
electronic MEdical Records & GEnomics (eMERGE) Network Resistant Hypertension electronic health record (EHR) study inclusion and exclusion criteria.
(DOCX)
Exclusions from case and control definitions of resistant hypertension in the eMERGE Network.
Individuals were excluded from case or control status based on ICD-9-CM codes and situations as described. In addition to exclusions based on codes, individuals were excluded from case status if there was evidence of chronic kidney disease within six month...
Q-Q plot of genome-wide association study of European Americans with resistant hypertension versus controlled hypertensives.
A total of 2,530,150 SNPs were tested for an association with resistant hypertension (1,719 cases and 708 controls) among Europeans from the eMERGE I and II Network. Tests of association were performed using logistic regressi...
Variants previously identified in GWAS of blood pressure or hypertension in current GWAS of resistant hypertension.
The “Published GWAS” columns represent SNPs previously associated with blood pressure, systolic blood pressure, diastolic blood pressure, or hypertension among adults at genome-wide significance drawn from the NHGRI European Bioinform...
Q-Q plot of genome-wide association study of European Americans with resistant hypertension versus controlled hypertensives.
A total of 2,530,150 SNPs were tested for an association with resistant hypertension (1,719 cases and 708 controls) among Europeans from the eMERGE I and II network. After removal of ESR1 rs9479122, tests of association were...
Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1–4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, ST...
Study objective:
Generic drugs contain identical active ingredients as their corresponding brand drugs and are pharmaceutically equivalent and bioequivalent, whereas authorized generic drugs (AGs) contain both identical active and inactive ingredients as their corresponding brand drugs but are marketed as generics. This study compares generic-to-b...
Rationale: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA.
Objective: To identify additional AAA risk loci using data from all available genome-wide association studies (GWAS).
Methods and Re...
Rationale: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. Objective: To identify additional AAA risk loci using data from all available genome-wide association studies (GWAS). Methods and Re...
Authorized generics are identical in formulation to brand drugs, manufactured by the brand company but marketed as a generic. Generics, marketed by generic manufacturers, are required to demonstrate pharmaceutical and bioequivalence to the brand drug, but repetition of clinical trials is not required. This retrospective cohort study compared outcom...
Optical character recognition (OCR) as a classic machine learning challenge has been a longstanding topic in a variety of applications in healthcare, education, insurance, and legal industries to convert different types of electronic documents, such as scanned documents, digital images, and PDF files into fully editable and searchable text data. Th...
Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electronic health records. We conducted a retrospective case-control study, garnering 49 mammography...
Background
Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic hea...
Background:
-Continued reductions in morbidity and mortality attributable to ischemic heart disease (IHD) require an understanding of the changing epidemiology of this disease. We hypothesized that we could use genetic correlations, which quantitate the shared genetic architectures of phenotype pairs, and extant risk factors from a historical pros...
Objectives:
To understand opinions and perceptions on the state of information resources specifically targeted to genomics, and approaches to delivery in clinical practice.
Methods:
We conducted a survey of genomic content use and its clinical delivery from representatives across eight institutions in the electronic Medical Records and Genomics...
Primary open angle glaucoma (POAG) is a complex disease and is one of the major leading causes of blindness worldwide. Genome-wide association studies have successfully identified several common variants associated with glaucoma; however, most of these variants only explain a small proportion of the genetic risk. Apart from the standard approach to...
Phenotypic algorithm for POAG from eMERGE.
Flowchart as well as pesudocode for phenotypic algorithm to extract samples from Electronic Health Record data to classify as cases and controls for POAG.
(DOC)
Power calculation results.
Power calculation as performed by Quanto considering odds ratio in range of 1?3. Two separate sheets for disease risk of 0.0001 and 0.018
(XLSX)
All Replicated results between eMERGE and NEIGHBOR datasets.
All SNP-SNP interaction results replicated in eMERGE and NEIGHBOR analysis. This table lists p-values for all eMERGE samples analysis and also analysis in European American samples only. P-values for each variable in the model as well as likelihood ratio test p-value for interactive effec...