Jason Alan Fries

Jason Alan Fries
Verified
Jason verified their affiliation via an institutional email.
Verified
Jason verified their affiliation via an institutional email.
  • Doctor of Philosophy
  • Senior Researcher at Stanford University

About

99
Publications
25,595
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,949
Citations
Introduction
Jason Fries is a computer scientist at Stanford University's Center for Biomedical Informatics Research. His work focuses on methods that enable domain experts to rapidly build and modify machine learning models in complex domains such as medicine where obtaining large-scale, expert-labeled training data is a significant challenge. His research focuses on weakly supervised machine learning, foundation models, and methods for data-centric AI.
Current institution
Stanford University
Current position
  • Senior Researcher
Additional affiliations
July 2015 - December 2017
Stanford University
Position
  • PostDoc

Publications

Publications (99)
Article
Full-text available
In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The...
Preprint
Full-text available
Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the ben...
Preprint
Full-text available
While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We he...
Preprint
Full-text available
The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for e...
Preprint
Full-text available
With the rise of medical foundation models and the growing availability of imaging data, scalable pretraining techniques offer a promising way to identify imaging biomarkers predictive of future disease risk. While current self-supervised methods for 3D medical imaging models capture local structural features like organ morphology, they fail to lin...
Preprint
Full-text available
While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-vali...
Article
Full-text available
Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants...
Preprint
Full-text available
Large language models (LLMs) have emerged as promising tools for assisting in medical tasks, yet processing Electronic Health Records (EHRs) presents unique challenges due to their longitudinal nature. While LLMs' capabilities to perform medical tasks continue to improve, their ability to reason over temporal dependencies across multiple patient vi...
Preprint
Full-text available
The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participant...
Conference Paper
Full-text available
The 4th Machine Learning for Health (ML4H) symposium was held in person on December 15-16, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participants and senio...
Article
Full-text available
In this study, we investigate the performance of computer vision AI algorithms in predicting patient disposition from the emergency department (ED) using short video clips. Clinicians often use “eye-balling” or clinical gestalt to aid in triage, based on brief observations. We hypothesize that AI can similarly use patient appearance for disposition...
Preprint
Full-text available
Verifying factual claims is critical for using large language models (LLMs) in healthcare. Recent work has proposed fact decomposition, which uses LLMs to rewrite source text into concise sentences conveying a single piece of information, as an approach for fine-grained fact verification. Clinical documentation poses unique challenges for fact deco...
Preprint
Full-text available
Foundation Models (FMs) trained on Electronic Health Records (EHRs) have achieved state-of-the-art results on numerous clinical prediction tasks. However, most existing EHR FMs have context windows of <1k tokens. This prevents them from modeling full patient EHRs which can exceed 10k's of events. Recent advancements in subquadratic long-context arc...
Article
Importance Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language proc...
Preprint
Full-text available
The growing demand for machine learning in healthcare requires processing increasingly large electronic health record (EHR) datasets, but existing pipelines are not computationally efficient or scalable. In this paper, we introduce meds_reader, an optimized Python package for efficient EHR data processing that is designed to take advantage of many...
Article
Full-text available
Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benef...
Preprint
Full-text available
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automat...
Preprint
Full-text available
Importance: Large Language Models (LLMs) can assist in a wide range of healthcare-related activities. Current approaches to evaluating LLMs make it difficult to identify the most impactful LLM application areas. Objective: To summarize the current evaluation of LLMs in healthcare in terms of 5 components: evaluation data type, healthcare task, Natu...
Article
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer mul...
Preprint
Full-text available
Background: The integration of large language models (LLMs) in healthcare offers immense opportunity to streamline healthcare tasks, but also carries risks such as response accuracy and bias perpetration. To address this, we conducted a red-teaming exercise to assess LLMs in healthcare and developed a dataset of clinically relevant scenarios for fu...
Article
Background With the capability to render prediagnoses, consumer wearables have the potential to affect subsequent diagnoses and the level of care in the health care delivery setting. Despite this, postmarket surveillance of consumer wearables has been hindered by the lack of codified terms in electronic health records (EHRs) to capture wearable use...
Article
Full-text available
The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for e...
Article
Full-text available
Background Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-...
Conference Paper
Full-text available
Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified l...
Preprint
Full-text available
Objective We sought to develop a weak supervision-based approach to demonstrate feasibility of post-market surveillance of wearable devices that render AF pre-diagnosis. Materials and Methods Two approaches were evaluated to reduce clinical note labeling overhead for creating a training set for a classifier: one using programmatic codes, and the ot...
Article
Introduction: With the advent of consumer-facing devices that can render atrial fibrillation (AF) pre-diagnosis, medical wearables now have the potential to affect diagnosis rates and medical care. Post-market surveillance is necessary to understand the impact of wearables on patient outcomes and health care utilization, but is hindered by the lack...
Article
Objective: Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised learning using adult data may be a promising approach to creating robust pediatric prediction models. The primary objective was to determine whether a self-supervised model trained...
Article
Full-text available
Objective To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to c...
Article
Full-text available
The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models’ capabilities. In this narrative review, we examine 84...
Article
Full-text available
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
Preprint
Full-text available
The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models t...
Preprint
Full-text available
Importance Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. Objective Primary objective was to describe lab- and dia...
Preprint
Full-text available
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
Article
Full-text available
Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the u...
Article
Background Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) perfor...
Article
Full-text available
Objectives: To evaluate whether one summary metric of calculator performance sufficiently conveys equity across different demographic subgroups, as well as to evaluate how calculator predictive performance affects downstream health outcomes. Study design: We evaluate 3 commonly used clinical calculators-Model for End-Stage Liver Disease (MELD),...
Preprint
Full-text available
Time-to-event models (also known as survival models) are used in medicine and other fields for estimating the probability distribution of the time until a particular event occurs. While providing many advantages over traditional classification models, such as naturally handling censoring, time-to-event models require more parameters and are challen...
Article
Full-text available
Although individual psychotherapy is generally effective for a range of mental health conditions, little is known about the moment-to-moment language use of effective therapists. Increased access to computational power, coupled with a rise in computer-mediated communication (telehealth), makes feasible the large-scale analyses of language use durin...
Article
Full-text available
Background Given the costs of machine learning implementation, a systematic approach to prioritizing which models to implement into clinical practice may be valuable. Objective The primary objective was to determine the health care attributes respondents at 2 pediatric institutions rate as important when prioritizing machine learning model impleme...
Preprint
Full-text available
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
Article
Full-text available
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
Preprint
Full-text available
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer mul...
Preprint
Full-text available
Although individual psychotherapy is generally effective for a range of mental health conditions, little is known about the moment-to-moment language use of effective therapists. Increased access to computational power, coupled with a rise in computer-mediated communication (telehealth), makes feasible the large-scale analyses of language use durin...
Preprint
Full-text available
Background Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. Objective To evaluate t...
Article
Full-text available
Temporal dataset shift associated with changes in healthcare over time is a barrier to deploying machine learning-based clinical decision support systems. Algorithms that learn robust models by estimating invariant properties across time periods for domain generalization (DG) and unsupervised domain adaptation (UDA) might be suitable to proactively...
Preprint
Full-text available
PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaborativel...
Preprint
Full-text available
Despite the routine use of electronic health record (EHR) data by radiologists to contextualize clinical history and inform image interpretation, the majority of deep learning architectures for medical imaging are unimodal, i.e., they only learn features from pixel-level information. Recent research revealing how race can be recovered from pixel da...
Preprint
Full-text available
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks. It has been hypothesized that this is a consequence of implicit multitask learning in language model training. Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale...
Article
Objective The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset...
Preprint
Full-text available
Importance: Temporal dataset shift associated with changes in healthcare over time is a barrier to deploying machine learning-based clinical decision support systems. Algorithms that learn robust models by estimating invariant properties across time periods for domain generalization (DG) and unsupervised domain adaptation (UDA) might be suitable to...
Article
Full-text available
Importance: Implant registries provide valuable information on the performance of implants in a real-world setting, yet they have traditionally been expensive to establish and maintain. Electronic health records (EHRs) are widely used and may include the information needed to generate clinically meaningful reports similar to a formal implant regis...
Article
Full-text available
Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. However, this process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes insp...
Article
Full-text available
Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for diffe...
Article
Full-text available
There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-...
Article
Full-text available
Background - The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods - From a sample of 34,287 white British-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. Aortic valve area...
Preprint
Full-text available
Motivation: Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds im...
Article
Full-text available
Motivation: Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds i...
Article
Full-text available
Objective: Responding to the COVID-19 pandemic requires accurate forecasting of health system capacity requirements using readily available inputs. We examined whether testing and hospitalization data could help quantify the anticipated burden on the health system given shelter-in-place (SIP) order. Materials and methods: 16,103 SARS-CoV-2 RT-PC...
Article
Full-text available
A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, th...
Article
Full-text available
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies...
Preprint
Full-text available
Responding to the COVID-19 pandemic requires accurate forecasting of health system capacity requirements using readily available inputs. We examined whether testing and hospitalization data could help quantify the anticipated burden on the health system given shelter-in-place (SIP) order. We find a marked slowdown in the hospitalization rate within...
Preprint
The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. From a sample of 26,142 European-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. A genome-wide association study of aortic val...
Preprint
Full-text available
Widespread adoption of electronic health records (EHRs) has fueled development of clinical outcome models using machine learning. However, patient EHR data are complex, and how to optimally represent them is an open question. This complexity, along with often small training set sizes available to train these clinical outcome models, are two core ch...
Preprint
Full-text available
A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, th...
Preprint
Full-text available
Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collectio...
Article
Full-text available
Abstract Post-market medical device surveillance is a challenge facing manufacturers, regulatory agencies, and health care providers. Electronic health records are valuable sources of real-world evidence for assessing device safety and tracking device-related patient outcomes over time. However, distilling this evidence remains challenging, as info...
Article
Full-text available
Biomedical repositories such as the UK Biobank provide increasing access to prospectively collected cardiac imaging, however these data are unlabeled, which creates barriers to their use in supervised machine learning. We develop a weakly supervised deep learning model for classification of aortic valve malformations using up to 4,000 unlabeled car...
Preprint
Full-text available
Post-market medical device surveillance is a challenge facing manufacturers, regulatory agencies, and health care providers. Electronic health records are valuable sources of real world evidence to assess device safety and track device-related patient outcomes over time. However, distilling this evidence remains challenging, as information is fract...
Preprint
This volume represents the accepted submissions from the Machine Learning for Health (ML4H) workshop at the conference on Neural Information Processing Systems (NeurIPS) 2018, held on December 8, 2018 in Montreal, Canada.
Preprint
Full-text available
Recent releases of population-scale biomedical repositories such as the UK Biobank have enabled unprecedented access to prospectively collected medical imaging data. Applying machine learning methods to analyze these data holds great promise in facilitating new insights into the genetic and epidemiological associations between anatomical structures...
Article
Full-text available
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies...
Preprint
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies...
Article
Full-text available
In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data. However, current methods do not jointly optimize over structured covariates and time series in the feature extraction proc...
Article
Full-text available
We present SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly and without hand-labeled data. Our approach views biomedical resources like lexicons as function primitives for autogenerating weak supervision. We then use a generative model to unify and denoise this supervision and construct large-scale, pro...
Conference Paper
Full-text available
Populating large-scale structured databases from unstructured sources is a critical and challenging task in data analytics. As automated feature engineering methods grow increasingly prevalent, constructing sufficiently large labeled training sets has become the primary hurdle in building machine learning information extraction systems. In light of...
Conference Paper
Full-text available
We submitted two systems to the SemEval-2016 Task 12: Clinical TempEval challenge, participating in Phase 1, where we identified text spans of time and event expressions in clinical notes and Phase 2, where we predicted a relation between an event and its parent document creation time. For temporal entity extraction, we find that a joint inference-...
Preprint
We submitted two systems to the SemEval-2016 Task 12: Clinical TempEval challenge, participating in Phase 1, where we identified text spans of time and event expressions in clinical notes and Phase 2, where we predicted a relation between an event and its parent document creation time. For temporal entity extraction, we find that a joint inference-...
Conference Paper
This paper describes a method of using Craig list personal ads to better understand the movement behavior of anonymous, casual sex-seeking individuals within the men-who-have-sex-with-men community. Given recent dramatic increases in HIV and sexually transmitted disease within this community, gaining insight into how sexual networks connect neighbo...
Article
Full-text available
To explore how hand hygiene observer scheduling influences the number of events and unique individuals observed. We deployed a mobile sensor network to capture detailed movement data for 6 categories of healthcare workers over a 2-week period.   University of Iowa Hospital and Clinic medical intensive care unit (ICU). We recorded 33,721 time-stampe...
Conference Paper
Full-text available
Background: One of the most common causes of preventable healthcare-associated infections is poor hand-hygiene. To improve compliance, many hospitals track hand-hygiene rates and feed them back to healthcare workers (HCW). Recently, several groups have introduced computer-based, mobile recording systems to increase the accuracy and ease of recordin...
Conference Paper
Background: To understand how infectious diseases spread in a clinical setting, one must understand the movements and interactions of workers and patients. In outbreak settings, contacts between people are retrospectively studied by interviewing healthcare workers and/or examining patient assignment records. We have developed a wireless method for...
Conference Paper
Background: The failure of healthcare workers to perform appropriate hand hygiene is an important preventable cause of healthcare associated infections. Yet, hand-hygiene rates among workers remain unacceptably low. Feeding hand-hygiene rates back to healthcare workers can lead to improvements in effective hand hygiene. Thus measuring hand hygiene...
Article
Full-text available
Nosocomial (i.e., hospital-acquired) infections are a major cause of morbidity and mortality in the United States and throughout the world. Therefore, understanding, mediating, and limiting conta-gious infections are important problems, even in clinical settings. Contact networks of healthcare workers and patients provides a ve-hicle for modeling t...

Network

Cited By