Juan M. Banda

Juan M. Banda
Stanford University | SU · Stanford Center for Biomedical Informatics Research

Ph.D in Computer Science - Data Mining

About

104
Publications
32,594
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,697
Citations
Additional affiliations
March 2016 - present
Stanford University
Position
  • Researcher
March 2014 - February 2016
Stanford University
Position
  • PostDoc Position
July 2011 - March 2014
Montana State University
Position
  • Postdoctoral Reseach Associate
Education
July 2007 - May 2011
Montana State University
Field of study
  • Computer Science
January 2005 - May 2007
Eastern New Mexico University
Field of study
  • Mathematics
July 2000 - December 2004
Universidad Autónoma de Chihuahua
Field of study
  • Computer Science

Publications

Publications (104)
Article
Full-text available
Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Unde...
Article
Recent adverse event reports have raised the question of increased angioedema risk associated with exposure to levetiracetam. To help address this question, the Observational Health Data Sciences and Informatics research network conducted a retrospective observational new-user cohort study of seizure patients exposed to levetiracetam (n = 276,665)...
Article
Full-text available
Background and objective: Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associ...
Conference Paper
Full-text available
Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as...
Article
Full-text available
We detail the investigation of the first application of several dissimilarity measures for large-scale solar image data analysis. Using a solar-domain-specific benchmark dataset that contains multiple types of phenomena, we analyzed combinations of image parameters with different dissimilarity measures to determine the combinations that will allow...
Preprint
Full-text available
Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered from the Twitter stream using the name of the natural disaster and the filtered tweets are sent for human annotation. The process of human annotation to create labeled sets for machine learning models is labor...
Article
Full-text available
Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisa...
Article
Full-text available
Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 5,002,387 cases and 127,258 deaths as of October 31, 2021. The aggressive transmission dynamics of SARS-CoV-2 motivate an investigation of COVID-19 at the national and regional levels in Colombia. We...
Article
Full-text available
Twitter has been a remarkable resource for research in pharmacovigilance in the last decade. Traditionally, rule- or lexicon-based methods have been utilized for automatically extracting drug tweets for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming and not scalable....
Article
Full-text available
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as...
Article
Full-text available
The COVID-19 pandemic hit hard society, strongly affecting the emotions of the people and wellbeing. It is difficult to measure how the pandemic has affected the sentiment of the people, not to mention how people responded to the dramatic events that took place during the pandemic. This study contributes to this discussion by showing that the negat...
Preprint
Full-text available
Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 4,240,982 cases and 106,544 deaths as of June 30, 2021. This motivates an investigation of the SARS-CoV-2 transmission dynamics at the national and regional level using case incidence data. Mathemati...
Article
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered...
Preprint
Full-text available
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered...
Article
Full-text available
Mexico has experienced one of the highest COVID-19 mortality rates in the world. A delayed implementation of social distancing interventions in late March 2020 and a phased reopening of the country in June 2020 has facilitated sustained disease transmission in the region. In this study we systematically generate and compare 30-day ahead forecasts u...
Preprint
Full-text available
As the SARS-CoV-2 virus (COVID-19) continues to affect people across the globe, there is limited understanding of the long term implications for infected patients. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually designed by clinicians, and not granular e...
Article
The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the ”#medtwitter” community o...
Article
Full-text available
Background Low testing rates and delays in reporting hinder the estimation of the mortality burden associated with the COVID-19 pandemic. During a public health emergency, estimating all cause excess deaths above an expected level of death can provide a more reliable picture of the mortality burden. Here, we aim to estimate the absolute and relativ...
Article
Full-text available
Objective To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm. Materials and Methods The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive t...
Preprint
Background: News media coverage of anti-mask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views, but does little to represent views of the general public. Investigating the public’s response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policymakers to c...
Preprint
BACKGROUND News media coverage of anti-mask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views, but does little to represent views of the general public. Investigating the public’s response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policymakers to cr...
Article
Full-text available
Background: News media coverage of anti-mask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views, but does little to represent views of the general public. Investigating the public's response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policymakers to...
Preprint
Full-text available
Background: The low testing rates, compounded by reporting delays, hinders the estimation of the mortality burden associated with the COVID-19 pandemic based on surveillance data alone. A more reliable picture of the effect of COVID-19 pandemic on mortality can be derived by estimating excess deaths above an expected level of death. In this study w...
Preprint
Full-text available
Background: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response [1,2]. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) [3] Characterizing Health Associated Risks, and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework fo...
Article
Full-text available
Despite the significant health impacts of adverse events associated with drug-drug interactions, no standard models exist for managing and sharing evidence describing potential interactions between medications. Minimal information models have been used in other communities to establish community consensus around simple models capable of communicati...
Preprint
Full-text available
The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community o...
Article
The normalization of clinical documents is essential for health information management with the enormous amount of clinical documentation generated each year. The LOINC Document Ontology (DO) is a universal clinical document standard in a hierarchical structure. The objective of this study is to investigate the feasibility and generalizability of L...
Preprint
Full-text available
The ongoing coronavirus pandemic reached Mexico in late February 2020. Since then Mexico has observed a sustained elevation in the number of COVID-19 deaths. Mexicos delayed response to the COVID-19 pandemic until late March 2020 hastened the spread of the virus in the following months. However, the government followed a phased reopening of the cou...
Article
Physicians' beliefs and attitudes about COVID-19 are important to ascertain because of their central role in providing care to patients during the pandemic. Identifying topics and sentiments discussed by physicians and other healthcare workers can lead to identification of gaps relating to theCOVID-19 pandemic response within the healthcare system....
Article
Full-text available
Objectives Concern has been raised in the rheumatology community regarding recent regulatory warnings that HCQ used in the coronavirus disease 2019 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation or psychosis associated with HCQ as used for RA. Methods We performed a...
Article
Full-text available
Comorbid conditions appear to be common among individuals hospitalised with coronavirus disease 2019 (COVID-19) but estimates of prevalence vary and little is known about the prior medication use of patients. Here, we describe the characteristics of adults hospitalised with COVID-19 and compare them with influenza patients. We include 34,128 (US: 8...
Article
Full-text available
Adherence to guidelines for phototherapy initiation in preterm infants was 39% in our academic NICU (61% of phototherapy was initiated at total bilirubin (TB) levels below recommended thresholds). We hypothesized that adoption of an electronic health record integrated clinical decision support (CDS) tool would improve adherence to phototherapy guid...
Article
Full-text available
Background Hydroxychloroquine, a drug commonly used in the treatment of rheumatoid arthritis, has received much negative publicity for adverse events associated with its authorisation for emergency use to treat patients with COVID-19 pneumonia. We studied the safety of hydroxychloroquine, alone and in combination with azithromycin, to determine the...
Preprint
Full-text available
As the COVID-19 virus continues to infect people across the globe, there is little understanding of the long term implications for recovered patients. There have been reports of persistent symptoms after confirmed infections on patients even after three months of initial recovery. While some of these patients have documented follow-ups on clinical...
Preprint
Full-text available
Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter...
Preprint
Full-text available
Objectives Concern has been raised in the rheumatological community regarding recent regulatory warnings that hydroxychloroquine used in the COVID-19 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation, or psychosis associated with hydroxychloroquine as used for rheumatoid...
Article
Full-text available
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code...
Preprint
Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 280 million tweets of COVID-19 chatter t...
Preprint
Full-text available
Automated methods for granular categorization of large corpora of text documents have become increasingly more important with the rate scientific, news, medical, and web documents are growing in the last few years. Automatic keyphrase extraction (AKE) aims to automatically detect a small set of single or multi-words from within a single textual doc...
Article
Full-text available
Objective: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites...
Preprint
Full-text available
Background To better understand the profile of individuals with severe coronavirus disease 2019 (COVID-19), we characterised individuals hospitalised with COVID-19 and compared them to individuals previously hospitalised with influenza. Methods We report the characteristics (demographics, prior conditions and medication use) of patients hospitalise...
Preprint
Full-text available
As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experience...
Article
Full-text available
As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experience...
Preprint
Full-text available
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code...
Preprint
Full-text available
With the increase in popularity of deep learning models for natural language processing (NLP) tasks, in the field of Pharmacovigilance, more specifically for the identification of Adverse Drug Reactions (ADRs), there is an inherent need for large-scale social-media datasets aimed at such tasks. With most researchers allocating large amounts of time...
Preprint
Full-text available
In the last few years Twitter has become an important resource for the identification of Adverse Drug Reactions (ADRs), monitoring flu trends, and other pharmacovigilance and general research applications. Most researchers spend their time crawling Twitter, buying expensive pre-mined datasets, or tediously and slowly building datasets using the lim...
Preprint
Full-text available
With the advent of deep learning for computer vision tasks, the need for accurately labeled data in large volumes is vital for any application. The increasingly available large amounts of solar image data generated by the Solar Dynamic Observatory (SDO) mission make this domain particularly interesting for the development and testing of deep learni...
Article
Full-text available
Background Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with...
Article
Large healthcare datasets of Electronic Health Record data became indispensable in clinical research. Data quality in such datasets recently became a focus of many distributed research networks. Despite the fact that data quality is specific to a given research question, many existing data quality platform prove that general data quality assessment...
Article
Full-text available
The usage of controlled biomedical vocabularies is the cornerstone that enables seamless interoperability when using a common data model across multiple data sites. The Observational Health Data Science and Informatics (OHDSI) initiative combines over 100 controlled vocabularies into its own. However, the OHDSI vocabulary is limited in the sense th...
Preprint
Full-text available
Objective: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed ten phenotype classifiers using this approach and evaluated performance across multiple sites...
Article
Full-text available
Predicting the impact of natural disasters such as hurricanes on the transmission dynamics of infectious diseases poses significant challenges. In this paper, we put forward a simple modelling framework to investigate the impact of heavy rainfall events (HREs) on mosquito-borne disease transmission in temperate areas of the world such as the southe...
Article
Full-text available
Familial hypercholesterolemia (FH) is an underdiagnosed dominant genetic condition affecting approximately 0.4% of the population and has up to a 20-fold increased risk of coronary artery disease if untreated. Simple screening strategies have false positive rates greater than 95%. As part of the FH Foundation′s FIND FH initiative, we developed a cl...
Article
Full-text available
Cancer stage is rarely captured in structured form in the electronic health record (EHR). We evaluate the performance of a classifier, trained on structured EHR data, in identifying prostate cancer patients with metastatic disease. Using EHR data for a cohort of 5,861 prostate cancer patients mapped to the Observational Health Data Sciences and Inf...
Article
Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity p...
Preprint
Full-text available
The authors of this report met on 28-30 March 2018 at the New Jersey Institute of Technology, Newark, New Jersey, for a 3-day workshop that brought together a group of data providers, expert modelers, and computer and data scientists, in the solar discipline. Their objective was to identify challenges in the path towards building an effective frame...
Preprint
Full-text available
Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this...
Article
Full-text available
Importance Consensus around an efficient second-line treatment option for type 2 diabetes (T2D) remains ambiguous. The availability of electronic medical records and insurance claims data, which capture routine medical practice, accessed via the Observational Health Data Sciences and Informatics network presents an opportunity to generate evidence...