Hongfang Liu

Hongfang Liu
  • PhD
  • Faculty Member at The University of Texas Health Science Center at Houston

About

786
Publications
138,703
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,005
Citations
Introduction
Dr. Hongfang Liu's primary research focus is to facilitate the use of real-world data for clinical and translational science research and health care delivery improvement using data science, artificial intelligence, and informatics approaches.
Additional affiliations
January 2006 - January 2011
Georgetown University
Position
  • Professor (Assistant)
May 2005 - December 2011
National Cancer Institute (USA), National Institutes of Health
Position
  • Bioinformatician
January 2005 - December 2012
National Institutes of Health
Position
  • Visiting Scientist

Publications

Publications (786)
Article
Full-text available
Background There is a significant delay between symptom onset and diagnosis of childhood asthma, but the impact of this delay on asthma outcomes has not been well understood. Objectives We sought to study the association of delayed diagnosis of asthma with asthma exacerbations (AEs) in children. Methods Using the Mayo Clinic birth cohort, we iden...
Preprint
Full-text available
Understanding the disease trajectories of specific diseases can provide important clinical insights. In this paper, we aimed to discover signature disease trajectories of 3 rare cancer types: pancreatic cancer, soft tissue sarcoma (STS) of the trunk and extremity (STS-TE), and STS of the abdomen and retroperitoneum (STS-AR), leveraging IQVIA Oncolo...
Preprint
Full-text available
Electronic health record (EHR) data are a rich and invaluable source of real-world clinical information, enabling detailed insights into patient populations, treatment outcomes, and healthcare practices. The availability of large volumes of EHR data are critical for advancing translational research and developing innovative technologies such as art...
Preprint
Full-text available
Objective: Suicide is a critical medical and public health challenge, particularly among individuals with mental illnesses in safety-net hospitals. To uncover insights about suicidality embedded in unstructured clinical notes, we propose to annotate, analyze, and model a corpus for suicidality understanding and lifesaving. Methods: A multidisciplin...
Preprint
BACKGROUND Complementary therapies are being increasingly used by cancer patients. As a channel for customers to share their feelings, outcomes, ideas, and perceived knowledge about the products purchased from e-commerce platforms, Amazon online reviews are a valuable real-world data source for health care studies. OBJECTIVE In this study, we aim...
Preprint
Full-text available
Obesity affects approximately 34% of adults and 15–20% of children and adolescents in the U.S, and poses significant economic and psychosocial burdens. Due to the multifaceted nature of obesity, currently patient responses to any single anti-obesity medication (AOM) vary significantly, highlighting the need for developing approaches to obesity deep...
Preprint
Full-text available
Background: Over the past two decades, the Food and Drug Administration (FDA) has significantly increased the approval of anti−obesity medications (AOMs) for obesity management. Both FDA−approved AOMs (F−AOMs) and off−label AOMs (O−AOMs) have gained popularity and demonstrated promising results in randomized clinical trials (RCTs). However, their e...
Preprint
Full-text available
Idiopathic pulmonary fibrosis (IPF) is a rare disease that is challenging to diagnose. Patients with IPF often spend years awaiting a diagnosis after the onset of initial respiratory symptoms, and only a small percentage receive antifibrotic treatment. In this study, we examine the associations between social determinants of health (SDoH) and two c...
Article
Full-text available
Introduction While incidentally discovered covert cerebrovascular diseases (id-CCD) are associated with future stroke, it is not known if patients with id-CCD are prescribed statins. Methods Patients age ≥50 with id-CCD on neuroimaging from 2009 to 2019 with no prior ischaemic stroke, transient ischaemic attack or dementia were identified using na...
Preprint
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In th...
Preprint
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based...
Article
Reoperation is the most significant complication following any surgical procedure. Developing machine learning methods that predict the need for reoperation will allow for improved shared surgical decision making and patient-specific and preoperative optimisation. Yet, no precise machine learning models have been published to perform well in predic...
Preprint
Full-text available
Perioperative organ injury represents a significant clinical challenge, leading to severe and often long-term complications for patients undergoing surgery. Despite its critical importance, the basic science underlying these injuries remains underexplored. This study employs bibliometric analysis to assess the research landscape of perioperative or...
Article
Full-text available
Background The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that is developed and maintained by the Observational Health Data Sciences and Informatics (OHDSI) community supports large scale cancer research by enabling distributed network analysis. As the number of studies using the OMOP CDM for cancer research increases...
Article
Full-text available
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tail...
Article
Full-text available
Background Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowled...
Preprint
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stand...
Article
Full-text available
Objectives The surge in patient portal messages (PPMs) with increasing needs and workloads for efficient PPM triage in healthcare settings has spurred the exploration of AI-driven solutions to streamline the healthcare workflow processes, ensuring timely responses to patients to satisfy their healthcare needs. However, there has been less focus on...
Preprint
Full-text available
Medical Large Language Models (LLMs) such as ClinicalCamel 70B, Llama3-OpenBioLLM 70B have demonstrated impressive performance on a wide variety of medical NLP task.However, there still lacks a large language model (LLM) specifically designed for cancer domain. Moreover, these LLMs typically have billions of parameters, making them computationally...
Preprint
Full-text available
The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (S...
Article
Full-text available
Background Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the N...
Preprint
Full-text available
Background: Postoperative ileus (POI) after colorectal surgery leads to increased morbidity, costs, and hospital stays. Identifying POI risk for early intervention is important for improving surgical outcomes especially given the increasing trend towards early discharge after surgery. While existing studies have assessed POI risk with regression mo...
Article
Objectives Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks...
Article
Introduction Letters of recommendation (LOR) are an integral component of physical therapy residency applications. Identifying the influence of applicant and writer gender in LOR will help identify whether potential implicit gender bias exists in physical therapy residency application processes. Review of Literature Several medical and surgical re...
Article
Full-text available
The use of digital twins (DTs) has proliferated across various fields and industries, with a recent surge in the healthcare sector. The concept of digital twin for health (DT4H) holds great promise to revolutionize the entire healthcare system, including management and delivery, disease treatment and prevention, and health well-being maintenance, u...
Article
Full-text available
Covert cerebrovascular disease (CCD) is frequently reported on neuroimaging and associates with increased dementia and stroke risk. We aimed to determine how incidentally-discovered CCD during clinical neuroimaging in a large population associates with mortality. We screened CT and MRI reports of adults aged ≥50 in the Kaiser Permanente Southern Ca...
Article
We investigate risk factors for severe COVID-19 in persons living with HIV (PWH), including among racialized PWH, using the U.S. population-sampled National COVID Cohort Collaborative (N3C) data released from January 1, 2020 to October 10, 2022. We defined severe COVID-19 as hospitalized with invasive mechanical ventilation, extracorporeal membrane...
Preprint
BACKGROUND Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowled...
Article
Full-text available
Advances in genetic technology have led to the increasing use of genomic panels in precision oncology practice, with panels ranging from a couple to hundreds of genes. However, the clinical utilization and utility of oncology genomic panels, especially among vulnerable populations, is unclear. We examined the association of panel size with socioeco...
Article
Full-text available
Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future studies in th...
Article
Full-text available
To address the current and long‐term unmet health needs of the growing population of non‐Hodgkin lymphoma (NHL) patients, we established the Lymphoma Epidemiology of Outcomes (LEO) cohort study (NCT02736357; https://leocohort.org/). A total of 7735 newly diagnosed patients aged 18 years and older with NHL were prospectively enrolled from 7/1/2015 t...
Article
Full-text available
Background: Covert cerebrovascular disease (CCD) includes white matter disease (WMD) and covert brain infarction (CBI). Incidentally-discovered CCD is associated with increased risk of subsequent symptomatic stroke. However, it is unknown whether the severity of WMD or the location of CBI predicts risk. Objectives: To examine the association of inc...
Conference Paper
Full-text available
"The Great Resignation" has become a concern for many in healthcare since the pandemic. Inspired by the literature on social network analysis, we applied SNA techniques to analyze the impact of the Great Resignation on a large health research center. We found that although the great resignation has caused evident turbulence among inter-program and...
Article
Traditional trial designs have well-recognized inefficiencies and logistical barriers to participation. Decentralized trials and digital health solutions have been suggested as potential solutions and have certainly risen to the challenge during the pandemic. Clinical trial designs are now increasingly data driven. The use of distributed clinical d...
Article
Full-text available
Objective The effects of Coronavirus disease 2019 (COVID-19) infection and altered processes of care on nonelective coronary artery bypass grafting (CABG) outcomes remain unknown. We hypothesized that patients with COVID-19 infection would have longer hospital lengths of stay and greater mortality compared with COVID-negative patients, but that the...
Article
Full-text available
Introduction We tested the ability of our natural language processing (NLP) algorithm to identify delirium episodes in a large-scale study using real-world clinical notes. Methods We used the Rochester Epidemiology Project to identify persons ≥ 65 years who were hospitalized between 2011 and 2017. We identified all persons with an International Cl...
Article
Full-text available
Objectives To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C). Materials and Methods We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and app...
Preprint
Full-text available
Accurate prediction models for individual-level endpoints and time-to-endpoints are crucial in clinical practice. In this study, we propose a novel approach, GRU-D-Weibull, which combines gated recurrent units with decay (GRU-D) to model the Weibull distribution. Our method enables real-time individualized endpoint prediction and population-level r...
Article
Full-text available
Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site set...
Article
Full-text available
Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classi...
Article
Full-text available
Clinical phenotyping is often a foundational requirement for obtaining datasets necessary for the development of digital health applications. Traditionally done via manual abstraction, this task is often a bottleneck in development due to time and cost requirements, therefore raising significant interest in accomplishing this task via in-silico mea...
Preprint
UNSTRUCTURED Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and int...
Article
Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of m...
Article
Full-text available
Background Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can yield multiple answers to a single question and multiple focus points in 1 question, which are lacking in existing data sets for the...
Article
Gender stereotyping is the practice of assigning or ascribing specific characteristics, differences, or identities to a person solely based on their gender. Biased conceptions of gender can create barriers to equality and need to be proactively identified and addressed. In biomedical education, letters of recommendation (LOR) are considered an impo...
Chapter
The rapid proliferation and implementation of electronic health record (EHR) systems have reshaped the documentation and management of patient data. This transformation has facilitated and accelerated the secondary use of EHRs for clinical research. A common approach to leveraging EHRs is via manual chart review, a process of reviewing or extractin...
Preprint
BACKGROUND A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations a...
Article
Background A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations a...
Article
Objective: Social determinants of health (SDoH) play critical roles in health outcomes and well-being. Understanding the interplay of SDoH and health outcomes is critical to reducing healthcare inequalities and transforming a "sick care" system into a "health-promoting" system. To address the SDOH terminology gap and better embed relevant elements...
Preprint
Full-text available
In this paper, we introduce a unified and generalist Biomedical Generative Pre-trained Transformer (BiomedGPT) model, which leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks. Our experiments demonstrate that BiomedGPT delivers expansive and inclusive representations of biom...
Article
Full-text available
In 2020, the CoViD-19 pandemic spread worldwide in an unexpected way and suddenly modified many life issues, including social habits, social relationships, teaching modalities, and more. Such changes were also observable in many different healthcare and medical contexts. Moreover, the CoViD-19 pandemic acted as a stress test for many research endea...
Article
Family history (FH) is important for disease risk assessment and prevention. However, incorporating FH information derived from electronic health records (EHRs) for downstream analytics is challenging due to the lack of standardization. We aimed to automatically align FH concepts derived from a clinical corpus to disease category resources popularl...
Article
A gold standard annotated corpus is usually indispensable when developing natural language processing (NLP) systems. Building a high-quality annotated corpus for clinical NLP requires considerable time and domain expertise during the annotation process. Existing annotation tools may provide powerful features to cover various needs of text annotatio...
Article
Full-text available
Chronic pain (CP) lasts for more than 3 months, causing prolonged physical and mental burdens to patients. According to the US Centers for Disease Control and Prevention, CP contributes to more than 500 billion US dollars yearly in direct medical cost plus the associated productivity loss. CP is complex in etiology and can occur anywhere in the bod...
Preprint
BACKGROUND A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized way to capture FH information in electronic health records (EHR) and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to us...
Article
Background A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use i...
Article
Full-text available
Exogenous estrogen is associated with reduced COVID mortality in non-immunosuppressed/immunocompromised (non-ISC) post-menopausal females. Here, we examined the association of estrogen or testosterone hormone replacement therapy (HRT) with COVID outcomes in solid organ transplant recipients (SOTR) compared to non-ISC individuals, given known differ...
Article
Full-text available
Background The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses...
Preprint
Full-text available
Background: Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future s...
Article
Multiplex immunofluorescence (MxIF) images provide detailed information of cell composition and spatial context for biomedical research. However, compromised data quality could lead to research biases. Comprehensive image quality checking (QC) is essential for reliable downstream analysis. As a reliable and specific staining of cell nuclei, 4',6-di...
Preprint
Full-text available
Objective: The generalizability of clinical large language models is usually ignored during the model development process. This study evaluated the generalizability of BERT-based clinical NLP models across different clinical settings through a breast cancer phenotype extraction task. Materials and Methods: Two clinical corpora of breast cancer pati...
Preprint
Full-text available
Gender stereotyping is the practice of assigning or ascribing specific characteristics, differences, or identities to a person solely based on their gender. Biased conceptions of gender can create barriers to equality and need to be proactively identified and addressed. In biomedical education, letters of recommendation (LOR) are considered an impo...
Article
Importance: The effectiveness of triplet therapy compared with androgen pathway inhibitor (API) doublets in a heterogeneous patient population with metastatic castration-sensitive prostate cancer (mCSPC) is unknown. Objective: To assess the comparative effectiveness of contemporary systemic treatment options for patients with mCSPC across clinic...
Article
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Info...
Preprint
Depression is a widespread mental health issue, affecting an estimated 3.8% of the global population. It is also one of the main contributors to disability worldwide. Recently it is becoming popular for individuals to use social media platforms (e.g., Reddit) to express their difficulties and health issues (e.g., depression) and seek support from o...
Article
Context Metformin is the first-line drug for treating diabetes but has a high failure rate. Objective To identify demographic and clinical factors available in the electronic health record (EHR) that predict metformin failure. Methods A cohort of patients with at least one abnormal diabetes screening test that initiated metformin was identified a...
Article
Introduction: Despite the decline in overall mortality and incidence of cancer in the US population, disparities in cancer care still largely exist within certain groups. The proportion of racial and ethnic minorities recruited to participate in cancer research is persistently lower than the US population. The rapid adoption of electronic health re...
Article
Full-text available
An increasing number of studies have reported using natural language processing (NLP) to assist observational research by extracting clinical information from electronic health records (EHRs). Currently, no standardized reporting guidelines for NLP‐assisted observational studies exist. The absence of detailed reporting guidelines may create ambigui...
Article
Full-text available
Background Covert cerebrovascular disease (CCD) has been shown to be associated with dementia in population‐based studies with magnetic resonance imaging (MRI) screening, but dementia risk associated with incidentally discovered CCD is not known. Methods and Results Individuals aged ≥50 years enrolled in the Kaiser Permanente Southern California h...
Preprint
Full-text available
Real-time individual endpoint prediction has always been a challenging task but of great clinic utility for both patients and healthcare providers. With 6,879 chronic kidney disease stage 4 (CKD4) patients as a use case, we explored the feasibility and performance of gated recurrent units with decay that models Weibull probability density function...
Preprint
Social determinants of health (SDoH) have a significant impact on health outcomes and well-being. Addressing SDoH is the key to reducing healthcare inequalities and transforming a sick care system into a health-promoting system. To address the SDOH terminology gap and better embed relevant elements in advanced biomedical informatics, we propose an...
Preprint
BACKGROUND Aspirin exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation o...
Article
Full-text available
Background Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation o...
Article
Full-text available
Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the lar...
Article
Motivation: Despite the increasing evidence of utility of genomic medicine in clinical practice, systematically integrating genomic medicine information and knowledge into clinical systems with a high-level of consistency, scalability, and computability remains challenging. A comprehensive terminology is required for relevant concepts and the asso...
Article
Full-text available
Background As one of the key criteria to differentiate benign vs. malignant tumors in ovarian and other solid cancers, tumor-stroma reaction (TSR) is long observed by pathologists and has been found correlated with patient prognosis. However, paucity of study aims to overcome subjective bias or automate TSR evaluation for enabling association analy...

Network

Cited By