Yonghui Wu

Yonghui Wu
University of Florida | UF · Department of Health Outcomes and Policy

PhD

About

176
Publications
44,829
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,815
Citations
Additional affiliations
January 2016 - present
The University of Texas Health Science Center at Houston
Position
  • Research Assistant Professor
November 2005 - March 2010
March 2014 - December 2015
The University of Texas Health Science Center at Houston
Position
  • Researcher
Education
July 2005 - July 2010
Harbin Institute of Technology
Field of study
  • Computer Science
July 2003 - July 2005
Harbin Institute of Technology
Field of study
  • Computer Science

Publications

Publications (176)
Article
Full-text available
Background Integrating advanced machine-learning (ML) algorithms into clinical practice is challenging and requires interdisciplinary collaboration to develop transparent, interpretable, and ethically sound clinical decision support (CDS) tools. We aimed to design a ML-driven CDS tool to predict opioid overdose risk and gather feedback for its inte...
Article
Aim To develop an automated computable phenotype (CP) algorithm for identifying diabetes cases in children and adolescents using electronic health records (EHRs) from the UF Health System. Materials and Methods The CP algorithm was iteratively derived based on structured data from EHRs (UF Health System 2012–2020). We randomly selected 536 presume...
Chapter
Hospital-acquired falls are a continuing clinical concern. The emergence of advanced analytical methods, including NLP, has created opportunities to leverage nurse-generated data, such as clinical notes, to better address the problem of falls. In this nurse-driven study, we employed an iterative process for expert manual annotation of RNs clinical...
Preprint
Full-text available
Automatic generation of discharge summaries presents significant challenges due to the length of clinical documentation, the dispersed nature of patient information, and the diverse terminology used in healthcare. This paper presents a hybrid solution for generating discharge summary sections as part of our participation in the "Discharge Me!" Chal...
Article
Full-text available
INTRODUCTION Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data. METHODS We used EHRs from the University of Florida Health (UFHealth)...
Preprint
Full-text available
Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a nove...
Preprint
Background We aim to use Natural Language Processing (NLP) to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 to 2019. Structured and non-structured reports were...
Article
This study aimed to review the application of natural language processing (NLP) in thyroid-related conditions and to summarize current challenges and potential future directions. We performed a systematic search of databases for studies describing NLP applications in thyroid conditions published in English between January 1, 2012 and November 4, 20...
Article
Full-text available
Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information fro...
Article
Objective To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. Methods We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed u...
Preprint
Full-text available
Objective: Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with future suicide events. These are often captured in narrative clinical notes in electronic health records (EHRs). Collaboratively, Weill Cornell Medicine (WCM), Northwestern Medicine (NM), and the Univers...
Preprint
Full-text available
A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD...
Preprint
INTRODUCTION: Alzheimer's Disease (AD) are often misclassified in electronic health records (EHRs) when relying solely on diagnostic codes. This study aims to develop a more accurate, computable phenotype (CP) for identifying AD patients by using both structured and unstructured EHR data. METHODS: We used EHRs from the University of Florida Health...
Article
Full-text available
The benefits and harms of lung cancer screening (LCS) for patients in the real-world clinical setting have been argued. Recently, discriminative prediction modeling of lung cancer with stratified risk factors has been developed to investigate the real-world effectiveness of LCS from observational data. However, most of these studies were conducted...
Article
Full-text available
Objective: To summarize the recent methods and applications that leverage real-world data such as electronic health records (EHRs) with social determinants of health (SDoH) for public and population health and health equity and identify successes, challenges, and possible solutions. Methods: In this opinion review, grounded on a social-ecological-m...
Article
Full-text available
Background Studying successful cognitive aging presents an opportunity to identify factors that may mitigate risk for Alzheimer’s disease and related disorders to inform public health initiatives, community interventions, and policy. Yet, identifying these cohorts is challenging, especially at advanced ages. Commutable phenotype algorithms using el...
Article
Full-text available
There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical...
Article
Background Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium. Objective The long-term goal...
Article
Full-text available
INTRODUCTION Little is known about the heterogeneous treatment effects of metformin on dementia risk in people with type 2 diabetes (T2D). METHODS Participants (≥ 50 years) with T2D and normal cognition at baseline were identified from the National Alzheimer's Coordinating Center database (2005–2021). We applied a doubly robust learning approach t...
Article
Full-text available
Objective Having sufficient population coverage from the electronic health records (EHRs)-connected health system is essential for building a comprehensive EHR-based diabetes surveillance system. This study aimed to establish an EHR-based type 1 diabetes (T1D) surveillance system for children and adolescents across racial and ethnic groups by ident...
Preprint
Alzheimer’s disease (AD) is a complex heterogeneous neurodegenerative disease that requires an in-depth understanding of its progression pathways and contributing factors to develop effective risk stratification and prevention strategies. In this study, we proposed an outcome-oriented model to identify progression pathways from mild cognitive impai...
Article
Full-text available
Although lung cancer is a leading cause of death among people living with HIV (PLWH), limited research exists characterizing real-world lung cancer screening adherence among PLWH. Our objective was to compare low-dose computed tomography (LDCT) adherence among PLWH to those without HIV treated at one integrated health system. Using the University o...
Article
Background: Previous studies have reported conflicting results on the association between metformin and risk of dementia which might be caused by varied drug responses of metformin in different population subgroups. We aimed to examine the heterogeneous treatment effects (HTEs) of metformin on dementia risk in the population with type 2 diabetes (T...
Article
Background Individuals experiencing socioeconomic disadvantages are disproportionally affected by dementia. Social and behavioral determinants of health (SBDH) are major contributors to health inequities in dementia risk and outcomes. We aimed to determine the heterogeneous effect of SBDH on dementia risk in older adults. Method Using data from th...
Article
Objective: To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications. Methods: We formulate both clinical concept extraction and relation extraction us...
Article
Background: The use of Artificial intelligence (AI) in healthcare has grown exponentially with the promise of facilitating biomedical research and enhancing diagnosis, treatment, monitoring, disease prevention and healthcare delivery. We aim to examine the current state, limitations, and future directions of AI in thyroidology. Summary: AI has b...
Article
Full-text available
Objective: To establish, apply, and evaluate a computable phenotype for the recruitment of individuals with successful cognitive aging. Participants and methods: Interviews with 10 aging experts identified electronic health record (EHR)-available variables representing successful aging among individuals aged 85 years and older. On the basis of t...
Article
Objective: To develop and validate an approach that identifies patients eligible for lung cancer screening (LCS) by combining structured and unstructured smoking data from the electronic health record (EHR). Methods: We identified patients aged 50-80 years who had at least one encounter in a primary care clinic at Vanderbilt University Medical C...
Preprint
Full-text available
There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. Gato...
Article
We evaluated low-dose computed tomography (LDCT) adherence among people with HIV (PWH) treated at University of Florida (UF). From the UF Health Integrated Data Repository, we identified PWH who underwent at least one LDCT procedure (01/01/2012-10/31/2021). Lung cancer screening adherence was defined as having a second LDCT within recommended obser...
Preprint
BACKGROUND Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium. OBJECTIVE The long-term goal...
Article
Objective: To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge. Materials and methods: We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or...
Preprint
Full-text available
Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key techn...
Preprint
The ultrasound characteristics of thyroid nodules guide the evaluation of thyroid cancer in patients with thyroid nodules. However, the characteristics of thyroid nodules are often documented in clinical narratives such as ultrasound reports. Previous studies have examined natural language processing (NLP) methods in extracting a limited number of...
Preprint
Objective: To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge. Materials and methods: We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or not...
Article
Full-text available
Introduction: This study aims to explore machine learning (ML) methods for early prediction of Alzheimer's disease (AD) and related dementias (ADRD) using the real-world electronic health records (EHRs). Methods: A total of 23,835 ADRD and 1,038,643 control patients were identified from the OneFlorida+ Research Consortium. Two ML methods were us...
Article
Background: Preclinical studies have suggested potential beneficial effects of newer glucose-lowering drugs (GLDs) including dipeptidyl peptidase (DPP)-4 inhibitors, glucagon-like peptide-1 receptor agonists (GLP-1RAs), and sodium glucose co-transporter-2 (SGLT2) inhibitors, in protecting humans against cognitive decline and dementia. However, pop...
Article
Full-text available
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest...
Article
Alzheimer’s Disease (AD) and Related Dementias (ADRD) is a class of progressive fatal degenerative diseases that presents multiple causations and outcomes related to its biological, medical, and social determinants of health (SDoH). Studies have extensively explored biological and medical factors, however, SDoH, which is known to contribute to stru...
Preprint
Full-text available
Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH us...
Article
Background: Cognitive tests and biomarkers are the key information to assess the severity and track the progression of Alzheimer's' disease (AD) and AD-related dementias (AD/ADRD), yet, both are often only documented in clinical narratives of patients' electronic health records (EHRs). In this work, we aim to (1) assess the documentation of cognit...
Article
Full-text available
Despite the availability of efficacious direct-acting antiviral (DAA) therapy, the number of people infected with hepatitis C virus (HCV) continues to rise, and HCV remains a leading cause of liver-related morbidity, liver transplantation, and mortality. We developed and validated machine learning (ML) algorithms to predict DAA treatment failure. U...
Preprint
BACKGROUND Social determinants of health (SDoH), such as geographic neighborhoods, access to healthcare, education, and social structure are important factors affecting people’s health and health outcomes. SDoH of patients are scarcely documented in a discrete format in electronic health records (EHRs) but are often available in free-text clinical...
Article
Full-text available
Background Social determinants of health (SDoH), such as geographic neighborhoods, access to health care, education, and social structure, are important factors affecting people’s health and health outcomes. The SDoH of patients are scarcely documented in a discrete format in electronic health records (EHRs) but are often available in free-text cli...
Article
Full-text available
Background Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in f...
Article
Full-text available
Suicide is a leading cause of death in the US. Patients with pain conditions have higher suicidal risks. In a systematic review searching observational studies from multiple sources (e.g., MEDLINE) from 1 January 2000–12 September 2020, we evaluated existing suicide prediction models’ (SPMs) performance and identified risk factors and their derived...
Article
The Assessing the Burden of Diabetes by Type in Children, Adolescents, and Young Adults (DiCAYA) Network, a CDC/NIDDK-funded collaborative, aims to create a multi-site electronic health record (EHR) -based diabetes surveillance system. Foundational to the network's efforts is the development of a computable phenotype (CP) algorithm that can identif...
Conference Paper
This study aims to develop a natural language processing (NLP) tool to extract the pulmonary nodules and nodule characteristics information from free-text clinical narratives. We identified a cohort of 3,080 patients who received low dose computed tomography (LDCT) at the University of Florida health system and collected their clinical narratives i...
Article
Objectives A landscape scan of the methods that are used to either assess or mitigate biases when using social media data for public health surveillance, through a scoping review. Materials and Methods Following best practices, we searched two literature databases (i.e., PubMed and Web of Science) and covered literature published up to July 2021....
Article
Full-text available
Syndromic surveillance involves the near-real-time collection of data from a potential multitude of sources to detect outbreaks of disease or adverse health events earlier than traditional forms of public health surveillance. The purpose of the present study is to elucidate the role of syndromic surveillance during mass gathering scenarios. In the...
Article
Full-text available
Social determinants of health (SDoH) are important factors associated with cancer risk and treatment outcomes. There is an increasing interest in exploring SDoH captured in electronic health records (EHRs) to assess cancer risk and outcomes; however, most SDoH are only captured in free-text clinical narratives such as physicians' notes that are not...
Article
Objectives. To estimate the prevalence rates of Alzheimer’s disease and related dementias (ADRD) and their risk factors in the transgender population and compare the rates to those in cisgender adults. Methods. We identified 1784 transgender adults in the linked electronic health records and claims data between 2012 and 2020 from the OneFlorida Cli...
Article
Background: Sodium-glucose cotransporter-2 inhibitors ( SGLT2i ) and glucagon-like peptide-1 receptor agonists ( GLP1a ) are two newer glucose-lowering drugs ( GLD ) that can reduce cardiovascular disease ( CVD ) risk in patients with type 2 diabetes ( T2D ). Yet, uptake of these newer drugs remains low, especially among racial minority groups. Pri...
Preprint
Full-text available
Objective To develop a large pretrained clinical language model from scratch using transformer architecture; systematically examine how transformer models of different sizes could help 5 clinical natural language processing (NLP) tasks at different linguistic levels. Methods We created a large corpus with >90 billion words from clinical narratives...
Article
Full-text available
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models....
Article
During the coronavirus disease pandemic (COVID-19), social media platforms such as Twitter have become a venue for individuals, health professionals, and government agencies to share COVID-19 information. Twitter has been a popular source of data for researchers, especially for public health studies. However, the use of Twitter data for research al...
Preprint
BACKGROUND Non-alcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation and yet they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications and...
Article
Full-text available
Background: Nonalcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation; however, they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications a...
Article
Background and Aims We aimed to develop and validate machine learning algorithms to predict direct‐acting antiviral (DAA) treatment failure among patients with HCV infection. Approach and Results We used HCV‐TARGET registry data to identify HCV‐infected adults receiving all‐oral DAA treatment and having virologic outcome. Potential pretreatment pr...
Article
Introduction Although the National Lung Screening Trial (NLST) has proven low-dose computed tomography (LDCT) is effective for lung cancer screening (LCS), little is known about complication rates from invasive diagnostic procedures (IDPs) after LDCT in real-world settings. In this study, we used the real-world data from a large clinical research n...
Article
Background Alzheimer’s disease and related dementias (ADRD) are extremely prevalent and expected to significantly rise in the future. However, prevalence rates differ across ADRD sub‐types with Alzheimer’s disease (AD), vascular dementia (VaD), and dementia with Lewy Bodies (LBD) as the three most common. Aside from self‐report surveys, there is mi...
Preprint
Full-text available
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models....
Article
Full-text available
Objective: Social determinants of health (SDoH) are non-clinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such informat...