About
176
Publications
44,829
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,815
Citations
Introduction
Additional affiliations
January 2016 - present
November 2005 - March 2010
March 2014 - December 2015
Education
July 2005 - July 2010
July 2003 - July 2005
Publications
Publications (176)
Background
Integrating advanced machine-learning (ML) algorithms into clinical practice is challenging and requires interdisciplinary collaboration to develop transparent, interpretable, and ethically sound clinical decision support (CDS) tools. We aimed to design a ML-driven CDS tool to predict opioid overdose risk and gather feedback for its inte...
Aim
To develop an automated computable phenotype (CP) algorithm for identifying diabetes cases in children and adolescents using electronic health records (EHRs) from the UF Health System.
Materials and Methods
The CP algorithm was iteratively derived based on structured data from EHRs (UF Health System 2012–2020). We randomly selected 536 presume...
Hospital-acquired falls are a continuing clinical concern. The emergence of advanced analytical methods, including NLP, has created opportunities to leverage nurse-generated data, such as clinical notes, to better address the problem of falls. In this nurse-driven study, we employed an iterative process for expert manual annotation of RNs clinical...
Automatic generation of discharge summaries presents significant challenges due to the length of clinical documentation, the dispersed nature of patient information, and the diverse terminology used in healthcare. This paper presents a hybrid solution for generating discharge summary sections as part of our participation in the "Discharge Me!" Chal...
INTRODUCTION
Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data.
METHODS
We used EHRs from the University of Florida Health (UFHealth)...
Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a nove...
Background We aim to use Natural Language Processing (NLP) to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 to 2019. Structured and non-structured reports were...
This study aimed to review the application of natural language processing (NLP) in thyroid-related conditions and to summarize current challenges and potential future directions. We performed a systematic search of databases for studies describing NLP applications in thyroid conditions published in English between January 1, 2012 and November 4, 20...
Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information fro...
Objective
To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning.
Methods
We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed u...
Objective: Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with future suicide events. These are often captured in narrative clinical notes in electronic health records (EHRs). Collaboratively, Weill Cornell Medicine (WCM), Northwestern Medicine (NM), and the Univers...
A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD...
INTRODUCTION: Alzheimer's Disease (AD) are often misclassified in electronic health records (EHRs) when relying solely on diagnostic codes. This study aims to develop a more accurate, computable phenotype (CP) for identifying AD patients by using both structured and unstructured EHR data.
METHODS: We used EHRs from the University of Florida Health...
The benefits and harms of lung cancer screening (LCS) for patients in the real-world clinical setting have been argued. Recently, discriminative prediction modeling of lung cancer with stratified risk factors has been developed to investigate the real-world effectiveness of LCS from observational data. However, most of these studies were conducted...
Objective: To summarize the recent methods and applications that leverage real-world data such as electronic health records (EHRs) with social determinants of health (SDoH) for public and population health and health equity and identify successes, challenges, and possible solutions.
Methods: In this opinion review, grounded on a social-ecological-m...
Background
Studying successful cognitive aging presents an opportunity to identify factors that may mitigate risk for Alzheimer’s disease and related disorders to inform public health initiatives, community interventions, and policy. Yet, identifying these cohorts is challenging, especially at advanced ages. Commutable phenotype algorithms using el...
There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical...
Background
Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium.
Objective
The long-term goal...
INTRODUCTION
Little is known about the heterogeneous treatment effects of metformin on dementia risk in people with type 2 diabetes (T2D).
METHODS
Participants (≥ 50 years) with T2D and normal cognition at baseline were identified from the National Alzheimer's Coordinating Center database (2005–2021). We applied a doubly robust learning approach t...
Objective
Having sufficient population coverage from the electronic health records (EHRs)-connected health system is essential for building a comprehensive EHR-based diabetes surveillance system. This study aimed to establish an EHR-based type 1 diabetes (T1D) surveillance system for children and adolescents across racial and ethnic groups by ident...
Alzheimer’s disease (AD) is a complex heterogeneous neurodegenerative disease that requires an in-depth understanding of its progression pathways and contributing factors to develop effective risk stratification and prevention strategies. In this study, we proposed an outcome-oriented model to identify progression pathways from mild cognitive impai...
Although lung cancer is a leading cause of death among people living with HIV (PLWH), limited research exists characterizing real-world lung cancer screening adherence among PLWH. Our objective was to compare low-dose computed tomography (LDCT) adherence among PLWH to those without HIV treated at one integrated health system. Using the University o...
Background: Previous studies have reported conflicting results on the association between metformin and risk of dementia which might be caused by varied drug responses of metformin in different population subgroups. We aimed to examine the heterogeneous treatment effects (HTEs) of metformin on dementia risk in the population with type 2 diabetes (T...
Background
Individuals experiencing socioeconomic disadvantages are disproportionally affected by dementia. Social and behavioral determinants of health (SBDH) are major contributors to health inequities in dementia risk and outcomes. We aimed to determine the heterogeneous effect of SBDH on dementia risk in older adults.
Method
Using data from th...
Objective:
To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications.
Methods:
We formulate both clinical concept extraction and relation extraction us...
Background:
The use of Artificial intelligence (AI) in healthcare has grown exponentially with the promise of facilitating biomedical research and enhancing diagnosis, treatment, monitoring, disease prevention and healthcare delivery. We aim to examine the current state, limitations, and future directions of AI in thyroidology.
Summary:
AI has b...
Objective:
To establish, apply, and evaluate a computable phenotype for the recruitment of individuals with successful cognitive aging.
Participants and methods:
Interviews with 10 aging experts identified electronic health record (EHR)-available variables representing successful aging among individuals aged 85 years and older. On the basis of t...
Objective:
To develop and validate an approach that identifies patients eligible for lung cancer screening (LCS) by combining structured and unstructured smoking data from the electronic health record (EHR).
Methods:
We identified patients aged 50-80 years who had at least one encounter in a primary care clinic at Vanderbilt University Medical C...
There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. Gato...
We evaluated low-dose computed tomography (LDCT) adherence among people with HIV (PWH) treated at University of Florida (UF). From the UF Health Integrated Data Repository, we identified PWH who underwent at least one LDCT procedure (01/01/2012-10/31/2021). Lung cancer screening adherence was defined as having a second LDCT within recommended obser...
BACKGROUND
Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium.
OBJECTIVE
The long-term goal...
Objective:
To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge.
Materials and methods:
We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or...
Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key techn...
The ultrasound characteristics of thyroid nodules guide the evaluation of thyroid cancer in patients with thyroid nodules. However, the characteristics of thyroid nodules are often documented in clinical narratives such as ultrasound reports. Previous studies have examined natural language processing (NLP) methods in extracting a limited number of...
Objective: To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge. Materials and methods: We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or not...
Introduction:
This study aims to explore machine learning (ML) methods for early prediction of Alzheimer's disease (AD) and related dementias (ADRD) using the real-world electronic health records (EHRs).
Methods:
A total of 23,835 ADRD and 1,038,643 control patients were identified from the OneFlorida+ Research Consortium. Two ML methods were us...
Background:
Preclinical studies have suggested potential beneficial effects of newer glucose-lowering drugs (GLDs) including dipeptidyl peptidase (DPP)-4 inhibitors, glucagon-like peptide-1 receptor agonists (GLP-1RAs), and sodium glucose co-transporter-2 (SGLT2) inhibitors, in protecting humans against cognitive decline and dementia. However, pop...
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest...
Alzheimer’s Disease (AD) and Related Dementias (ADRD) is a class of progressive fatal degenerative diseases that presents multiple causations and outcomes related to its biological, medical, and social determinants of health (SDoH). Studies have extensively explored biological and medical factors, however, SDoH, which is known to contribute to stru...
Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH us...
Background:
Cognitive tests and biomarkers are the key information to assess the severity and track the progression of Alzheimer's' disease (AD) and AD-related dementias (AD/ADRD), yet, both are often only documented in clinical narratives of patients' electronic health records (EHRs). In this work, we aim to (1) assess the documentation of cognit...
Despite the availability of efficacious direct-acting antiviral (DAA) therapy, the number of people infected with hepatitis C virus (HCV) continues to rise, and HCV remains a leading cause of liver-related morbidity, liver transplantation, and mortality. We developed and validated machine learning (ML) algorithms to predict DAA treatment failure. U...
BACKGROUND
Social determinants of health (SDoH), such as geographic neighborhoods, access to healthcare, education, and social structure are important factors affecting people’s health and health outcomes. SDoH of patients are scarcely documented in a discrete format in electronic health records (EHRs) but are often available in free-text clinical...
Background
Social determinants of health (SDoH), such as geographic neighborhoods, access to health care, education, and social structure, are important factors affecting people’s health and health outcomes. The SDoH of patients are scarcely documented in a discrete format in electronic health records (EHRs) but are often available in free-text cli...
Background
Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in f...
Suicide is a leading cause of death in the US. Patients with pain conditions have higher suicidal risks. In a systematic review searching observational studies from multiple sources (e.g., MEDLINE) from 1 January 2000–12 September 2020, we evaluated existing suicide prediction models’ (SPMs) performance and identified risk factors and their derived...
The Assessing the Burden of Diabetes by Type in Children, Adolescents, and Young Adults (DiCAYA) Network, a CDC/NIDDK-funded collaborative, aims to create a multi-site electronic health record (EHR) -based diabetes surveillance system. Foundational to the network's efforts is the development of a computable phenotype (CP) algorithm that can identif...
This study aims to develop a natural language processing (NLP) tool to extract the pulmonary nodules and nodule characteristics information from free-text clinical narratives. We identified a cohort of 3,080 patients who received low dose computed tomography (LDCT) at the University of Florida health system and collected their clinical narratives i...
Objectives
A landscape scan of the methods that are used to either assess or mitigate biases when using social media data for public health surveillance, through a scoping review.
Materials and Methods
Following best practices, we searched two literature databases (i.e., PubMed and Web of Science) and covered literature published up to July 2021....
Syndromic surveillance involves the near-real-time collection of data from a potential multitude of sources to detect outbreaks of disease or adverse health events earlier than traditional forms of public health surveillance. The purpose of the present study is to elucidate the role of syndromic surveillance during mass gathering scenarios. In the...
Social determinants of health (SDoH) are important factors associated with cancer risk and treatment outcomes. There is an increasing interest in exploring SDoH captured in electronic health records (EHRs) to assess cancer risk and outcomes; however, most SDoH are only captured in free-text clinical narratives such as physicians' notes that are not...
Objectives. To estimate the prevalence rates of Alzheimer’s disease and related dementias (ADRD) and their risk factors in the transgender population and compare the rates to those in cisgender adults.
Methods. We identified 1784 transgender adults in the linked electronic health records and claims data between 2012 and 2020 from the OneFlorida Cli...
Background: Sodium-glucose cotransporter-2 inhibitors ( SGLT2i ) and glucagon-like peptide-1 receptor agonists ( GLP1a ) are two newer glucose-lowering drugs ( GLD ) that can reduce cardiovascular disease ( CVD ) risk in patients with type 2 diabetes ( T2D ). Yet, uptake of these newer drugs remains low, especially among racial minority groups. Pri...
Objective
To develop a large pretrained clinical language model from scratch using transformer architecture; systematically examine how transformer models of different sizes could help 5 clinical natural language processing (NLP) tasks at different linguistic levels.
Methods
We created a large corpus with >90 billion words from clinical narratives...
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models....
During the coronavirus disease pandemic (COVID-19), social media platforms such as Twitter have become a venue for individuals, health professionals, and government agencies to share COVID-19 information. Twitter has been a popular source of data for researchers, especially for public health studies. However, the use of Twitter data for research al...
BACKGROUND
Non-alcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation and yet they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications and...
Background:
Nonalcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation; however, they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications a...
Background and Aims
We aimed to develop and validate machine learning algorithms to predict direct‐acting antiviral (DAA) treatment failure among patients with HCV infection.
Approach and Results
We used HCV‐TARGET registry data to identify HCV‐infected adults receiving all‐oral DAA treatment and having virologic outcome. Potential pretreatment pr...
Introduction
Although the National Lung Screening Trial (NLST) has proven low-dose computed tomography (LDCT) is effective for lung cancer screening (LCS), little is known about complication rates from invasive diagnostic procedures (IDPs) after LDCT in real-world settings. In this study, we used the real-world data from a large clinical research n...
Background
Alzheimer’s disease and related dementias (ADRD) are extremely prevalent and expected to significantly rise in the future. However, prevalence rates differ across ADRD sub‐types with Alzheimer’s disease (AD), vascular dementia (VaD), and dementia with Lewy Bodies (LBD) as the three most common. Aside from self‐report surveys, there is mi...
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models....
Objective: Social determinants of health (SDoH) are non-clinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such informat...