Emily B. Fox’s research while affiliated with Stanford University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (101)


Figure 6: Illustrative TOC curve.
Learning Explainable Treatment Policies with Clinician-Informed Representations: A Practical Approach
  • Preprint
  • File available

November 2024

·

35 Reads

Johannes O. Ferstad

·

Emily B. Fox

·

·

Ramesh Johari

Digital health interventions (DHIs) and remote patient monitoring (RPM) have shown great potential in improving chronic disease management through personalized care. However, barriers like limited efficacy and workload concerns hinder adoption of existing DHIs; while limited sample sizes and lack of interpretability limit the effectiveness and adoption of purely black-box algorithmic DHIs. In this paper, we address these challenges by developing a pipeline for learning explainable treatment policies for RPM-enabled DHIs. We apply our approach in the real-world setting of RPM using a DHI to improve glycemic control of youth with type 1 diabetes. Our main contribution is to reveal the importance of clinical domain knowledge in developing state and action representations for effective, efficient, and interpretable targeting policies. We observe that policies learned from clinician-informed representations are significantly more efficacious and efficient than policies learned from black-box representations. This work emphasizes the importance of collaboration between ML researchers and clinicians for developing effective DHIs in the real world.

Download

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

September 2024

·

77 Reads

·

24 Citations

Charlotte Bunne

·

Yusuf Roohani

·

Yanay Rosen

·

[...]

·

Stephen R Quake

The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of leveraging advances in AI to construct virtual cells, high-fidelity simulations of cells and cellular systems under different conditions that are directly learned from biological data across measurements and scales. We discuss desired capabilities of such AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions has come into reach.


How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

September 2024

·

208 Reads

·

1 Citation

The cell is arguably the smallest unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of AI-powered Virtual Cells, where robust representations of cells and cellular systems under different conditions are directly learned from growing biological data across measurements and scales. We discuss desired capabilities of AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions is within reach.


Using a linear dynamic system to measure functional connectivity from M/EEG

Objective. Measures of functional connectivity (FC) can elucidate which cortical regions work together in order to complete a variety of behavioral tasks. This study’s primary objective was to expand a previously published model of measuring FC to include multiple subjects and several regions of interest. While FC has been more extensively investigated in vision and other sensorimotor tasks, it is not as well understood in audition. The secondary objective of this study was to investigate how auditory regions are functionally connected to other cortical regions when attention is directed to different distinct auditory stimuli. Approach. This study implements a linear dynamic system (LDS) to measure the structured time-lagged dependence across several cortical regions in order to estimate their FC during a dual-stream auditory attention task. Results. The model’s output shows consistent functionally connected regions across different listening conditions, indicative of an auditory attention network that engages regardless of endogenous switching of attention or different auditory cues being attended. Significance. The LDS implemented in this study implements a multivariate autoregression to infer FC across cortical regions during an auditory attention task. This study shows how a first-order autoregressive function can reliably measure functional connectivity from M/EEG data. Additionally, the study shows how auditory regions engage with the supramodal attention network outlined in the visual attention literature.


Smart Start — Designing Powerful Clinical Trials Using Pilot Study Data

January 2024

·

21 Reads

·

1 Citation

NEJM Evidence

BACKGROUND: Digital health interventions may be optimized before evaluation in a randomized clinical trial. Although many digital health interventions are deployed in pilot studies, the data collected are rarely used to refine the intervention and the subsequent clinical trials. METHODS: We leverage natural variation in patients eligible for a digital health intervention in a remote patient-monitoring pilot study to design and compare interventions for a subsequent randomized clinical trial. RESULTS: Our approach leverages patient heterogeneity to identify an intervention with twice the estimated effect size of an unoptimized intervention. CONCLUSIONS: Optimizing an intervention and clinical trial based on pilot data may improve efficacy and increase the probability of success. (Funded by the National Institutes of Health and others; ClinicalTrials.gov number, NCT04336969.)


Figure 1: Data Flow Chart of the The Statin Therapy and Global Outcomes in Older Persons Pragmatic Clinical Trial (STAGE PCT).
The Evolving Role of Data & Safety Monitoring Boards for Real-World Clinical Trials

August 2023

·

57 Reads

·

2 Citations

Journal of Clinical and Translational Science

Introduction Clinical trials provide the “gold standard” evidence for advancing the practice of medicine, even as they evolve to integrate real-world data sources. Modern clinical trials are increasingly incorporating real-world data sources – data not intended for research and often collected in free-living contexts. We refer to trials that incorporate real-world data sources as real-world trials. Such trials may have the potential to enhance the generalizability of findings, facilitate pragmatic study designs, and evaluate real-world effectiveness. However, key differences in the design, conduct, and implementation of real-world vs traditional trials have ramifications in data management that can threaten their desired rigor. Methods Three examples of real-world trials that leverage different types of data sources – wearables, medical devices, and electronic health records are described. Key insights applicable to all three trials in their relationship to Data and Safety Monitoring Boards (DSMBs) are derived. Results Insight and recommendations are given on four topic areas: A. Charge of the DSMB; B. Composition of the DSMB; C. Pre-launch Activities; and D. Post-launch Activities. We recommend stronger and additional focus on data integrity. Conclusions Clinical trials can benefit from incorporating real-world data sources, potentially increasing the generalizability of findings and overall trial scale and efficiency. The data, however, present a level of informatic complexity that relies heavily on a robust data science infrastructure. The nature of monitoring the data and safety must evolve to adapt to new trial scenarios to protect the rigor of clinical trials.


Multimodal decision-support Timely Interventions for Diabetes Excellence (TIDE) care model integrating continuous glucose monitoring (CGM) and physical activity (Garmin Vivosmart 4 or Venu Sq) metrics.
Adding glycemic and physical activity metrics to a multimodal algorithm-enabled decision-support tool for type 1 diabetes care: Keys to implementation and opportunities

January 2023

·

148 Reads

·

6 Citations

Algorithm-enabled patient prioritization and remote patient monitoring (RPM) have been used to improve clinical workflows at Stanford and have been associated with improved glucose time-in-range in newly diagnosed youth with type 1 diabetes (T1D). This novel algorithm-enabled care model currently integrates continuous glucose monitoring (CGM) data to prioritize patients for weekly reviews by the clinical diabetes team. The use of additional data may help clinical teams make more informed decisions around T1D management. Regular exercise and physical activity are essential to increasing cardiovascular fitness, increasing insulin sensitivity, and improving overall well-being of youth and adults with T1D. However, exercise can lead to fluctuations in glycemia during and after the activity. Future iterations of the care model will integrate physical activity metrics (e.g., heart rate and step count) and physical activity flags to help identify patients whose needs are not fully captured by CGM data. Our aim is to help healthcare professionals improve patient care with a better integration of CGM and physical activity data. We hypothesize that incorporating exercise data into the current CGM-based care model will produce specific, clinically relevant information such as identifying whether patients are meeting exercise guidelines. This work provides an overview of the essential steps of integrating exercise data into an RPM program and the most promising opportunities for the use of these data.



Abstract 13358: A Platform for the Personalized Management of Diabetes and Cardiovascular Disease at Population Scale With Data From Multiple Sensors

November 2022

·

16 Reads

·

2 Citations

Circulation

Introduction: The American Heart Association has identified the design of mobile health (mHealth) tools to improve the management of diabetes and cardiovascular disease and to prevent unplanned hospital readmissions as a priority. Methods: We developed HEART, an open source platform for personalized telehealth at population scale based on data from internet-connected health sensors and data extracted from the electronic medical record (EMR). Health sensors include scales, blood pressure monitors, activity monitors, continuous glucose monitors, and insulin pumps. EMR data include gender, age, race, reason for admission, all surgical procedures performed, and all medications prescribed upon discharge. We collected admission and readmission data for all cardiology patients at an academic pediatric medical center over the last 7 years. Results: HEART has five independent modules (Figure): 1) a data processing module that pulls data in from a variety of devices and EHR feeds, 2) an algorithm module that identifies reasons why a provider may want to contact a patient and ranks patients for contact, 3) a visual interface module that summarizes population data and provides additional details for patients selected by the provider, 4) an intervention module that facilitates messages being sent to providers and documented in the EMR, and 5) a user tracking module that monitors providers use. HEART ranks the entire population cared for by a clinic and allows providers to identify easily those patients who may benefit most from a secure message, telemedicine contact, or in-person visit. There were 5,011 admissions to cardiology among 2,916 patients, with 1000 urgent readmissions for 512 patients and 35,300 outpatient clinic visits. Conclusions: HEART is a population-level framework to facilitate personalized mHealth with data from a variety of sensors and EMR data feeds. It may facilitate outpatient monitoring as part of an effort to prevent unplanned hospital readmission.


1009-P: The Association between Patient Characteristics and the Efficacy of Remote Patient Monitoring and Messaging

June 2022

·

13 Reads

Diabetes

In the Pilot 4T study (n=135) , remote patient monitoring (RPM) was associated with improved time in range (TIR) and lower A1c. Measuring differences in how patients respond to messages from the care team is essential for the design of effective, personalized care models. We analyzed electronic health record (EHR) and continuous glucose monitor (CGM) data to estimate the week-over-week impact of RPM messages on patients’ TIR. We applied statistical clustering methods to divide patients who received messages into two groups based on their changes in TIR and compared the characteristics of these groups. Receiving a message was associated with a greater mean week-to-week improvement in TIR after a low-TIR week [4.9 percentage points (pp) after a message vs. 2.5pp without a message; p < 0.001]. A patient’s TIR improvement after receiving a message is greater on average if the same patient’s TIR improved following previous messages. We identified two groups of patients with significantly different responses to messages. The group with greater mean TIR improvement after messages contains a larger proportion of non-white, non-English speaking, and publicly insured patients. We found that messages were associated with different magnitudes of TIR improvement across two clusters of patients. Identifying patients who benefit more from RPM could facilitate the personalization of management strategies. Disclosure J.Ferstad: None. P.Prahalad: None. D.M.Maahs: Advisory Panel; Abbott Diabetes, Eli Lilly and Company, Medtronic, Novo Nordisk, Sanofi, Consultant; Aditx Therapeutics, Inc., Biospex. E.Fox: None. R.Johari: None. D.Scheinker: None. Funding Helmsley Charitable Trust, ISPAD-JDRF Fellowship,NIH R18DK122422,Stanford Diabetes Research Center,LPCH Auxiliaries,Stanford Maternal and Child Health Research Institute, The Stanford REDCap platform (http://redcap.stanford.edu) is developed and operated by Stanford Medicine Research IT team. The REDCap platform services at Stanford are subsidized by a) Stanford School of Medicine Research Office, and b) the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through grant UL1 TR001085.


Citations (74)


... However, accurate inference of gene regulatory networks is challenging. The possible space for genetic interactions is vast [Bunne et al., 2024], the networks to be inferred are highly context-dependent, different cell types and tissue types exhibit different regulatory networks and exhibit significant variations across donors [Chen and Dahl, 2024]. Moreover, the data required to study gene regulatory networks for a specific disease is usually limited and highly specialized, often plagued by experimental artifacts [Hicks et al., 2018]. ...

Reference:

TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology
How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
  • Citing Article
  • September 2024

... Since the number of patients meeting the criteria for review may vary from review period to review period, our proposed randomization method would include additional patients only in those periods where there is sufficient capacity for them. Numerous statistical design and analysis methods are now available to leverage the partial randomization functionality enabled by platforms such as Tidepool-TIDE 34,35,44 . The combination of the partial randomization functionality of Tidepool-TIDE together with appropriate statistical design and analysis has the potential to enable a virtuous feedback cycle for continuous improvement. ...

Smart Start — Designing Powerful Clinical Trials Using Pilot Study Data
  • Citing Article
  • January 2024

NEJM Evidence

... • Transformation techniques to enable sampling on constrained parameter spaces (Brosse et al., 2017;Bubeck et al., 2018;Hsieh et al., 2018). • Exploiting short-range dependencies and other model structures to allow subsampling for time series and network data (Li et al., 2016;Ma et al., 2017;Aicher et al., 2023). • Leveraging alternative stochastic processes with desired invariant distributions as samplers for models with complex data and parameter structures (Baker et al., 2018). ...

Stochastic Gradient MCMC for Nonlinear State Space Models

Bayesian Analysis

... For example, when considering patients with severe cardiovascular disease, deterioration in cardiac function would be indicated by a combination of variables. While increased resting heart rate and decreased physical activity might not indicate severe risk separately, their combination with weight gain from water retention might suggest critical health [3], [10]. This case can have multiple structures, but we consider the following two sets which correspond to the ℓ 1 and ℓ ∞ norm, respectively -H C = {h | 1≤i≤n h (i) ≤ c} and H C = {h | max 1≤i≤n {h (i) } ≤ c}. ...

Abstract 13358: A Platform for the Personalized Management of Diabetes and Cardiovascular Disease at Population Scale With Data From Multiple Sensors
  • Citing Article
  • November 2022

Circulation

... Transparency: Hybrid modeling promotes transparency by maintaining a clear distinction between the physically understood parts of the model and the data-driven components [158,159]. This separation allows for better interpretability of the model's behavior, making it easier to diagnose issues, understand the contributions of different parts of the model, and communicate the model's workings to stakeholders [160]. ...

Breiman's Two Cultures: You Don't Have to Choose Sides
  • Citing Article
  • January 2021

Observational Studies

... 12 The growing use of CGM is driving the demand for algorithms to manage data and inform care delivery, increasing the complexity of clinic operations. 13 To our knowledge, no clinician-facing quantitative framework is available to track how algorithm-directed care impacts clinical workload, patient glucose management, and timeliness of care. Such quantitative frameworks may be helpful for clinics that: remotely access patient data; provide RPM-based care; and employ algorithms to direct care delivery by, for example, identifying or prioritizing patients requiring care. ...

Adding glycemic and physical activity metrics to a multimodal algorithm-enabled decision-support tool for type 1 diabetes care: Keys to implementation and opportunities

... Note that we did not correct for these sources of noise when assessing their detrimental effects on epidemic controllability. While several studies have focussed on estimating and compensating for under-reporting [11] and reporting delays [40,41,42], these approaches often require additional knowledge about the reporting process or orthogonal data sources [43]. It is often the case that these are not available or only become available later in epidemics so we preferred to characterise performance under the more practical scenario that little else is known about the epidemic than its time series of cases. ...

Statistical Deconvolution for Inference of Infection Time Series

Epidemiology

... The cross-lagged paths capture the causal relations (i.e. Granger causality, 3 (Shojaie & Fox, 2022)) between the variables, indicating the extent to which a change in attitude (e.g. a change in attitude toward foreigners) at T1 influences a Schwartz value (e.g. universalism) at T2, and vice versa. ...

Granger Causality: A Review and Recent Advances
  • Citing Article
  • March 2022

Annual Review of Statistics and Its Application

... The strong agreement between model predictions and age-stratified surveillance and serological data for adults and seniors supports the use of mobility data as an effective approach for modeling workplace-related contacts during a pandemic. Previous studies observed a strong correlation between mobility data and COVID-19 spread in the early stages of the outbreak [43][44][45] , which weakened over time as behavior changes and preventive measures (e.g., masking) became widespread [46][47][48][49] . This diminishing correlation exposed the limitations of a simplistic use of mobility data in . ...

It’s complicated: characterizing the time-varying relationship between cell phone mobility and COVID-19 spread in the US

npj Digital Medicine

... In addition to developing novel algorithms and architectures for training heterogeneous ML, new evaluation procedures need to be introduced to define the success criteria of heterogeneous ML. Commonly used metrics defined on the overall population (e.g., prediction accuracy, area under the curve, R 2 ) need to be placed into the context of heterogeneous ML to quantify variation in predictive power across individuals (i.e., stratified performance evaluations) (131,132). Therefore, more informative might be metrics such as the correlation between an individual's age and prediction error or an analysis of variance test of area under the curve for different racial/ ethnic groups. These metrics should be subject to convergent validity based on external dataset validation, longitudinal assessment, and different types of data sources (e.g., structural or functional neuroimaging, inflammatory biomarkers). ...

Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance
  • Citing Preprint
  • April 2021