Juan M. Banda

Juan M. Banda
Stanford University | SU · Stanford Center for Biomedical Informatics Research

Ph.D in Computer Science - Data Mining

About

129
Publications
41,301
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,885
Citations
Additional affiliations
March 2016 - present
Stanford University
Position
  • Researcher
July 2011 - March 2014
Montana State University
Position
  • Postdoctoral Reseach Associate
March 2014 - February 2016
Stanford University
Position
  • PostDoc Position
Education
July 2007 - May 2011
Montana State University
Field of study
  • Computer Science
January 2005 - May 2007
Eastern New Mexico University
Field of study
  • Mathematics
July 2000 - December 2004
Universidad Autónoma de Chihuahua
Field of study
  • Computer Science

Publications

Publications (129)
Article
Full-text available
Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Unde...
Article
Recent adverse event reports have raised the question of increased angioedema risk associated with exposure to levetiracetam. To help address this question, the Observational Health Data Sciences and Informatics research network conducted a retrospective observational new-user cohort study of seizure patients exposed to levetiracetam (n = 276,665)...
Article
Full-text available
Background and objective: Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associ...
Conference Paper
Full-text available
Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as...
Article
Full-text available
We detail the investigation of the first application of several dissimilarity measures for large-scale solar image data analysis. Using a solar-domain-specific benchmark dataset that contains multiple types of phenomena, we analyzed combinations of image parameters with different dissimilarity measures to determine the combinations that will allow...
Article
Full-text available
Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming...
Conference Paper
Full-text available
For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared ta...
Preprint
Full-text available
Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming...
Preprint
Full-text available
Background: The integration of large language models (LLMs) in healthcare offers immense opportunity to streamline healthcare tasks, but also carries risks such as response accuracy and bias perpetration. To address this, we conducted a red-teaming exercise to assess LLMs in healthcare and developed a dataset of clinically relevant scenarios for fu...
Article
Objective The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participa...
Chapter
The COVID-19 pandemic has presented significant challenges to the healthcare industry and society as a whole. With the rapid development of COVID-19 vaccines, social media platforms have become a popular medium for discussions on vaccine-related topics. Identifying vaccine-related tweets and analyzing them can provide valuable insights for public h...
Preprint
Full-text available
The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposiu...
Article
Full-text available
Objective: Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults. Materials and methods: We created an experimental framework to characterize the performa...
Article
Full-text available
Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneo...
Preprint
Full-text available
Despite growing interest in using large language models (LLMs) in healthcare, current explorations do not assess the real-world utility and safety of LLMs in clinical settings. Our objective was to determine whether two LLMs can serve information needs submitted by physicians as questions to an informatics consultation service in a safe and concord...
Article
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Info...
Article
Objective: Observational studies can impact patient care but must be robust and reproducible. Nonreproducibility is primarily caused by unclear reporting of design choices and analytic procedures. This study aimed to: (1) assess how the study logic described in an observational study could be interpreted by independent researchers and (2) quantify...
Article
Full-text available
This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from...
Preprint
Full-text available
Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of h...
Preprint
Full-text available
Objective: Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in sub-group performance of phenotyping algorithms for Alzheimer's Disease and Related Dementias (ADRD) in older adults. Materials and methods: We created an experimental framework to characterize the performanc...
Preprint
Full-text available
Supervised learning algorithms are heavily reliant on annotated datasets to train machine learning models. However, the curation of the annotated datasets is laborious and time consuming due to the manual effort involved and has become a huge bottleneck in supervised learning. In this work, we apply the theory of noisy learning to generate weak sup...
Preprint
Full-text available
Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered from the Twitter stream using the name of the natural disaster and the filtered tweets are sent for human annotation. The process of human annotation to create labeled sets for machine learning models is labor...
Article
Full-text available
Purpose Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisati...
Article
Full-text available
Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 5,002,387 cases and 127,258 deaths as of October 31, 2021. The aggressive transmission dynamics of SARS-CoV-2 motivate an investigation of COVID-19 at the national and regional levels in Colombia. We...
Article
Full-text available
Twitter has been a remarkable resource for research in pharmacovigilance in the last decade. Traditionally, rule- or lexicon-based methods have been utilized for automatically extracting drug tweets for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming and not scalable....
Article
Full-text available
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as...
Article
Full-text available
The COVID-19 pandemic hit hard society, strongly affecting the emotions of the people and wellbeing. It is difficult to measure how the pandemic has affected the sentiment of the people, not to mention how people responded to the dramatic events that took place during the pandemic. This study contributes to this discussion by showing that the negat...
Preprint
Full-text available
Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 4,240,982 cases and 106,544 deaths as of June 30, 2021. This motivates an investigation of the SARS-CoV-2 transmission dynamics at the national and regional level using case incidence data. Mathemati...
Article
Full-text available
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered...
Preprint
Full-text available
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered...
Article
Full-text available
Mexico has experienced one of the highest COVID-19 mortality rates in the world. A delayed implementation of social distancing interventions in late March 2020 and a phased reopening of the country in June 2020 has facilitated sustained disease transmission in the region. In this study we systematically generate and compare 30-day ahead forecasts u...
Preprint
Full-text available
As the SARS-CoV-2 virus (COVID-19) continues to affect people across the globe, there is limited understanding of the long term implications for infected patients. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually designed by clinicians, and not granular e...
Article
The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the ”#medtwitter” community o...
Article
Full-text available
Background Low testing rates and delays in reporting hinder the estimation of the mortality burden associated with the COVID-19 pandemic. During a public health emergency, estimating all cause excess deaths above an expected level of death can provide a more reliable picture of the mortality burden. Here, we aim to estimate the absolute and relativ...
Article
Full-text available
Objective To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm. Materials and Methods The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive t...
Preprint
Background: News media coverage of anti-mask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views, but does little to represent views of the general public. Investigating the public’s response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policymakers to c...
Preprint
BACKGROUND News media coverage of anti-mask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views, but does little to represent views of the general public. Investigating the public’s response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policymakers to cr...
Article
Full-text available
Background: News media coverage of anti-mask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views, but does little to represent views of the general public. Investigating the public's response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policymakers to...
Article
Full-text available
Despite the significant health impacts of adverse events associated with drug-drug interactions, no standard models exist for managing and sharing evidence describing potential interactions between medications. Minimal information models have been used in other communities to establish community consensus around simple models capable of communicati...
Preprint
Full-text available
Background: The low testing rates, compounded by reporting delays, hinders the estimation of the mortality burden associated with the COVID-19 pandemic based on surveillance data alone. A more reliable picture of the effect of COVID-19 pandemic on mortality can be derived by estimating excess deaths above an expected level of death. In this study w...
Preprint
Full-text available
Background: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response [1,2]. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) [3] Characterizing Health Associated Risks, and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework fo...
Preprint
Full-text available
The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community o...
Article
The normalization of clinical documents is essential for health information management with the enormous amount of clinical documentation generated each year. The LOINC Document Ontology (DO) is a universal clinical document standard in a hierarchical structure. The objective of this study is to investigate the feasibility and generalizability of L...
Preprint
Full-text available
The ongoing coronavirus pandemic reached Mexico in late February 2020. Since then Mexico has observed a sustained elevation in the number of COVID-19 deaths. Mexicos delayed response to the COVID-19 pandemic until late March 2020 hastened the spread of the virus in the following months. However, the government followed a phased reopening of the cou...
Article
Full-text available
Physicians' beliefs and attitudes about COVID-19 are important to ascertain because of their central role in providing care to patients during the pandemic. Identifying topics and sentiments discussed by physicians and other healthcare workers can lead to identification of gaps relating to theCOVID-19 pandemic response within the healthcare system....
Article
Full-text available
Objectives Concern has been raised in the rheumatology community regarding recent regulatory warnings that HCQ used in the coronavirus disease 2019 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation or psychosis associated with HCQ as used for RA. Methods We performed a...
Article
Full-text available
Comorbid conditions appear to be common among individuals hospitalised with coronavirus disease 2019 (COVID-19) but estimates of prevalence vary and little is known about the prior medication use of patients. Here, we describe the characteristics of adults hospitalised with COVID-19 and compare them with influenza patients. We include 34,128 (US: 8...
Article
Full-text available
Adherence to guidelines for phototherapy initiation in preterm infants was 39% in our academic NICU (61% of phototherapy was initiated at total bilirubin (TB) levels below recommended thresholds). We hypothesized that adoption of an electronic health record integrated clinical decision support (CDS) tool would improve adherence to phototherapy guid...
Article
Full-text available
Background Hydroxychloroquine, a drug commonly used in the treatment of rheumatoid arthritis, has received much negative publicity for adverse events associated with its authorisation for emergency use to treat patients with COVID-19 pneumonia. We studied the safety of hydroxychloroquine, alone and in combination with azithromycin, to determine the...
Preprint
Full-text available
As the COVID-19 virus continues to infect people across the globe, there is little understanding of the long term implications for recovered patients. There have been reports of persistent symptoms after confirmed infections on patients even after three months of initial recovery. While some of these patients have documented follow-ups on clinical...
Preprint
Full-text available
Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter...
Preprint
Full-text available
Objectives Concern has been raised in the rheumatological community regarding recent regulatory warnings that hydroxychloroquine used in the COVID-19 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation, or psychosis associated with hydroxychloroquine as used for rheumatoid...
Article
Full-text available
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code...
Preprint
Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 280 million tweets of COVID-19 chatter t...
Article
In the last few years, Twitter has become an important resource for the identification of Adverse Drug Reactions (ADRs), monitoring flu trends, and other pharmacovigilance and general research applications. Most researchers spend their time crawling Twitter, buying expensive pre-mined datasets, or tediously and slowly building datasets using the li...
Preprint
Full-text available
Automated methods for granular categorization of large corpora of text documents have become increasingly more important with the rate scientific, news, medical, and web documents are growing in the last few years. Automatic keyphrase extraction (AKE) aims to automatically detect a small set of single or multi-words from within a single textual doc...
Article
Full-text available
Objective: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites...
Preprint
Full-text available
Background To better understand the profile of individuals with severe coronavirus disease 2019 (COVID-19), we characterised individuals hospitalised with COVID-19 and compared them to individuals previously hospitalised with influenza. Methods We report the characteristics (demographics, prior conditions and medication use) of patients hospitalise...
Preprint
Full-text available
As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experience...
Article
Full-text available
As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experience...