Aviv Regev’s research while affiliated with Genentech and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (808)


Figure 4. PRESAGE's attention mechanism reveals dataset-specific and perturbation-specific utilization of prior knowledge sources
Gene-embedding-based prediction and functional evaluation of perturbation expression responses with PRESAGE
  • Preprint
  • File available

June 2025

·

4 Reads

Russell Littman

·

Jacob Levine

·

Sepideh Maleki

·

[...]

·

Jan-Christian Hütter

Understanding the impact of genetic perturbations on cellular behavior is crucial for biological research, but comprehensive experimental mapping remains infeasible. We introduce PRESAGE (Perturbation Response EStimation with Aggregated Gene Embeddings), a simple, modular, and interpretable framework that predicts perturbation-induced expression changes by integrating diverse knowledge sources via gene embeddings. PRESAGE transforms gene embeddings through an attention-based model to predict perturbation expression outcomes. To assess model performance, we introduce a comprehensive evaluation suite with novel functional metrics that move beyond traditional regression tasks, including measures of accuracy in effect size prediction, in identifying perturbations with similar expression profiles (phenocopy), and in prediction of perturbations with the strongest impact on specific gene set scores. PRESAGE outperforms existing methods in both classical regression metrics and our novel functional evaluations. Through ablation studies, we demonstrate that knowledge source selection is more critical for predictive performance than architectural complexity, with cross-system Perturb-seq data providing particularly strong predictive power. We also find that performance saturates quickly with training set size, suggesting that experimental design strategies might benefit from collecting sparse perturbation data across multiple biological systems rather than exhaustive profiling of individual systems. Overall, PRESAGE establishes a robust framework for advancing perturbation response prediction and facilitating the design of targeted biological experiments, significantly improving our ability to predict cellular responses across diverse biological systems.

Download

Deep cell atlas of the human lung reveals the presence of ionocytes in both human proximal and distal airways and proximal and distal ALI cultures
A Regional sampling for a deep lung cell atlas. Numbered circles represent sampled locations. B Lung cell atlas. Uniform manifold approximation and projection (UMAP) embedding of cell profiles (dots) from the large airways (left) and lung lobe regions (right) colored by cell type annotation. C–F Epithelial lung and ALI cell profiles. UMAP embeddings of epithelial cell profiles from the proximal airway (C), distal lung lobe (D), and ALI cultures generated from large airway basal cells isolated from primary bronchus (E) or from small airway basal cells isolated from microdissected small airway less than 2 mm in diameter (F). G–I Ionocyte abundance in the human proximal and distal airways and human proximal and distal ALI cultures. G Number of BSND+ mature ionocytes per ALI (y axis) in Large ALI and Small ALI cultures (x axis). n = 3 ALIs averaged from 3 separate donors, Two tailed unpaired T test. Error bars are standard deviation. H Whole mount images of dissected large (left) and small (right) airways stained for BSND (magenta) and acetylated Tubulin (green). Insets: Representative examples of BSND+ ionocytes (magenta). I Number of BSND+ mature ionocytes per mm2 (y axis) in microdissected large airways and small airways (x axis). n = 26 for large airways and 33 for small airways across three normal human lungs (Hu66, Hu67, and Hu68). One way ANOVA (Sidak’s multiple comparisons). Error bars are standard deviation. Elements of 1 A was created with BioRender https://BioRender.com/0ruztco.
Transcriptional and chromatin accessibility profiles reveal a replicative rare cell progenitor and a pre-ionocyte state
APOU2F3+ tuft-like cells are predicted to be progenitors of mature ionocytes. UMAP embedding of scRNA-seq profiles (dots) of rare epithelial cells in our deep lung cell atlas, colored by cell annotation (left) and showing RNA velocity vectors (right) directed from tuft-like cells to ionocytes. B Human large airways and human Large ALI cultures both contain POU2F3+ cells. Antibody staining of a section of the right primary bronchus for POU2F3 (red) and DAPI (blue) and LAE ALI for POU2F3 (green) and DAPI (blue). This staining has been repeated in 3 separate samples. C–EPOU2F3+ tuft-like cells include replicating and non-replicating cells. C Mean expression (dot color, relative expression) and percentage of cells (dot size) expressing selected cell identity and cell proliferation markers (columns) in different rare epithelial cell subsets (rows). D UMAP embedding of scRNA-seq profiles (dots) of rare cells, colored by cell cycle classification (left) or cell type annotation (right). E Large ALI cultures co-stained with POU2F3 (yellow, top) and MKI67 (purple, middle). The bottom panel shows cells expressing both markers (arrows). This staining was repeated in 3 samples. F Distinct chromatin state marks POU2F3+ progenitors. UMAP embedding of large airways epithelial cell scATAC-seq profiles (dots) colored by de novo cell type annotation. Zoom of boxed rare cells highlight chromatin accessibility at select gene loci associated with tuft cells, ionocytes, and progenitor cells.
Type 2 pathway cytokines induce mature tuft cell differentiation
A Neural and immune genes are induced in mature tuft cells. Significance (signed-log10 (q-value), x axis) of enrichment of the four functional gene sets (y axis) in the REACTOME database, most enriched in genes up-regulated in mature tuft cells (positive values) or ionocytes (negative values). B IL-13 treatment shifts the Large ALI cell composition. UMAP embedding of scRNA-seq profiles (dots) from control (left; same plot as in Fig. 1E, reproduced here for convenience) and IL-13-treated (right) LAE ALIs, colored by cell subset annotation. C–E Mature tuft cells are induced in IL-13-treated LAE ALIs. Zoom of a portion of the UMAP embedding in IL-13 treated LAE ALIs (from B, right) colored by scores for rare cell marker gene signatures (Supplementary Data 2) (C) or by expression of rare cell marker genes (D). E Number of antibody-stained cells (y axis) for GNAT3 expressing tuft cells (n = 2 ALIs (Hu19, Hu67)) and BSND expressing ionocytes n = 3 ALIs (Hu19, Hu62, Hu67) in LAE ALI treated with PBS or IL13 (10 ng/ml) (x axis). (All experimental treatments were done in parallel; Methods). F Mature SAE ALIs are treated for 96 h with PBS (control) or IL13 (20 ng/ml). IL-13 treatment shifts SAE ALI cell composition. UMAP embedding of scRNA-seq profiles (dots) from control (left; as in Fig. 1F) and IL-13-treated (right) SAE ALIs, colored by cell type annotation. G Mature tuft cells are induced in IL-13-treated SAE ALIs. Zoom of a portion of the UMAP embedding in IL-13 treated SAE ALIs (from G, right) colored by expression of rare cell marker genes.
Type 2 pathway cytokines redirect the lineage of bipotent Tuft-Ionocyte Progenitor (TIP) cells
A, B Sorting strategy to enrich rare cells. A. Mean expression (dot color, relative expression) and proportion of cells (dot size) expressing genes encoding the cell surface proteins NCAM1 and KIT in human in vivo scRNA-seq data from Fig. 1B. B Expression level (∂∂CT, qPCR, y axis) of key rare cell marker genes (marked on top left) in sorted cell populations from human ALI cultures (x axis, labeled by sorting marker). n = 3 technical replicates. Error bars are standard deviation. Based on this expression data, dissociated cells were stained for anti-human CD45–BV421 (1:100; BioLegend 368522), anti-human CD31–BV421 (1:100; BioLegend 303124), anti-human CD326(EPCAM)–APC (1:100 BioLegend 324208), CD117(KIT)–FITC (1:100; BioLegend 313231) and anti-human CD56(NCAM)–BV711 (1:100; BD Biosciences 563169). A negative sort was performed for CD45 (immune cells) and CD31 (endothelial cells), with positive selection for CD326 (epithelial cells), CD56 (NCAM – rare cell marker), and CD117 (KIT – rare cell marker). Please see Supplementary Fig. 9 for gating strategy. C, D Experimental strategy. C Left: Model of differentiating ALI. Right: POU2F3 mRNA expression (∂∂CT, qPCR, y axis) at different time points (x axis) during ALI differentiation. n = 3 technical replicates. Error bars are standard deviation. D Schematic of experimental time course, where starting at ALI D3 (top; the time point at which TIP cells are first present) cultures were treated with IL13 (10 ng/ml) or PBS control for 5 days and then CD45- CD31- EPCAM + NCAM1 + , CD45- CD31- EPCAM + KIT+ and CD45- CD31- EPCAM + KIT + NCAM1+ rare cells were collected, pooled, and profiled using scRNA-seq. E–H TIP cells give rise to ionocytes via defined transition states in control LAE ALI cultures. UMAP embedding of scRNA-seq profiles (dots) from PBS-treated (control) LAE ALI cultures colored by cell type annotation (E), overlaid RNA velocity vectors (F), cell cycle phase classification (G, left), G1/S (G, middle), and G2/M (G, right) gene signature scores. H Mean expression (dot color) and fraction of cells (dot size) expressing different lineage markers (columns) in each cell subset along the default lineage transition of TIP cells towards ionocytes (rows). I–L TIP cells are diverted towards mature tuft cell fate following IL-13 treatment. UMAP embedding of scRNA-seq profiles (dots) from IL-13-treated LAE ALI cultures colored by cell subsetannotation (I), overlaid RNA velocity vectors (J), cell cycle phase classification (K, left), G1/S (K, middle), and G2/M (K, right) gene signature scores. L Mean expression (dot color) and fraction of cells(dot size) expressing different lineage markers columns, same genes as in H) in each cell subset (rows) along the IL13-induced lineage transition of TIP cells towards mature tuft cells. M Left: Experimental setup schematic of differentiating ALI treated continuously with IL-13 for 25 days. Right: Number of BSND+ cells (ionocytes, y axis, n = 3 ALIs (Hu19, Hu60, Hu67)), error bars are standard deviation, and GNAT3+ cells (tuft cells, y axis, n = 2 ALIs (Hu60, Hu67)) in PBS and IL-13 conditions (x axis). N RNA velocity analysis of the pooled PBS and IL13-treated ALIs (clustering shown in Supplementary Fig. 10C, D) demonstrates that IL13 redirects TIP cell differentiation towards mature tuft cells and away from the default pathway of ionocyte differentiation. O Schematic depicting the observed expression of lineage-specifying TFs in TIP cell descendants that are differentiating towards either ionocyte or mature tuft cell fate. Orange arrows indicate default differentiation (PBS) and blue arrows indicate IL13-induced differentiation. P Proposed model of cytokine-mediated TIP cell lineage switching. Illustrations 4 C, M, O, P were created with BioRender: https://BioRender.com/dbkmcyx.
IL17A promotes tuft cell differentiation and asthmatic airway epithelium contains mature tuft cells
A–D Tuft cell abundance increases following IL17A treatment in LAE ALI. A, C Overview schematic of IL17A treatment experiments in mature LAE ALI treated with PBS (control) or IL17A (50 ng/ml; from D39 to D44) (A) and differentiating ALI at D3 (the first time point at which TIP cells are present) treated cells with IL17A (50 ng/ml) or PBS from D3 to D24 (C). B Number of GNAT3+ mature tuft cells (quantified by immunohistochemistry, y axis, n = 2 ALIs (Hu19, Hu67)), when mature LAE ALIs are treated with PBS or IL17A (50 ng/ul; from D39 to D44) (x axis) D Number of GNAT3+ mature tuft cells (quantified by immunohistochemistry, y axis, n = 2 ALIs, (Hu19, Hu67)), in LAE ALIs when treated with PBS or IL17A from D3 to D24. E, F Tuft cells are increased in ALIs and airways from asthmatic patients. E Number of GNAT3+ mature tuft cells (y axis) in LAE ALIs derived from two asthmatic individuals (Hu70 and Hu78) and in patients with no history of lung diseases (Hu66, Hu67) (x axis), n = 1 ALI per donor. F Whole mount staining of dissected airways from a patient with asthma exacerbation. Staining in the top panel with GNAT3 (green) shows a mature tuft cell, surrounded by ciliated cells (Atub, white). Staining with additional mature tuft cell marker ALOX5AP (magenta) reveals the characteristic bipolar morphology (arrows) associated with mature tuft cells (bottom panels). Due to the availability of tissue, immunostaining was performed on only 1 donor. Illustrations 4B and D were created with BioRender: https://BioRender.com/g37wh81.
Single cell profiling of human airway identifies tuft-ionocyte progenitor cells displaying cytokine-dependent differentiation bias in vitro

June 2025

·

36 Reads

Human airways contain specialized rare epithelial cells including CFTR-rich ionocytes that regulate airway surface physiology and chemosensory tuft cells that produce asthma-associated inflammatory mediators. Here, using a lung cell atlas of 311,748 single cell RNA-Seq profiles, we identify 687 ionocytes (0.45%). In contrast to prior reports claiming a lack of ionocytes in the small airways, we demonstrate that ionocytes are present in small and large airways in similar proportions. Surprisingly, we find only 3 mature tuft cells (0.002%), and demonstrate that previously annotated tuft-like cells are instead highly replicative progenitor cells. These tuft-ionocyte progenitor (TIP) cells produce ionocytes as a default lineage. However, Type 2 and Type 17 cytokines divert TIP cell lineage in vitro, resulting in the production of mature tuft cells at the expense of ionocyte differentiation. Our dataset thus provides an updated understanding of airway rare cell composition, and further suggests that clinically relevant cytokines may skew the composition of disease-relevant rare cells.


Biomni: A General-Purpose Biomedical AI Agent

June 2025

·

71 Reads

Biomedical research underpins progress in our understanding of human health and disease, drug discovery, and clinical care. However, with the growth of complex lab experiments, large datasets, many analytical tools, and expansive literature, biomedical research is increasingly constrained by repetitive and fragmented workflows that slow discovery and limit innovation, underscoring the need for a fundamentally new way to scale scientific expertise. Here, we introduce Biomni, a general-purpose biomedical AI agent designed to autonomously execute a wide spectrum of research tasks across diverse biomedical subfields. To systematically map the biomedical action space, Biomni first employs an action discovery agent to create the first unified agentic environment -- mining essential tools, databases, and protocols from tens of thousands of publications across 25 biomedical domains. Built on this foundation, Biomni features a generalist agentic architecture that integrates large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, enabling it to dynamically compose and carry out complex biomedical workflows -- entirely without relying on predefined templates or rigid task flows. Systematic benchmarking demonstrates that Biomni achieves strong generalization across heterogeneous biomedical tasks -- including causal gene prioritization, drug repurposing, rare disease diagnosis, microbiome analysis, and molecular cloning -- without any task-specific prompt tuning. Real-world case studies further showcase Biomni's ability to interpret complex, multi-modal biomedical datasets and autonomously generate experimentally testable protocols. Biomni envisions a future where virtual AI biologists operate alongside and augment human scientists to dramatically enhance research productivity, clinical insight, and healthcare. Biomni is ready to use at https://biomni.stanford.edu, and we invite scientists to explore its capabilities, stress-test its limits, and co-create the next era of biomedical discoveries.



PerTurboAgent: A Self-Planning Agent for Boosting Sequential Perturb-seq Experiments

May 2025

·

5 Reads

Understanding how genetic interventions affect a phenotype is key to revealing causal gene regulatory mechanisms and finding novel drug targets. Pooled, high-content perturbation screening methods like Perturb-seq allow us to assess the impact of each of a large number of genetic interventions on a rich cellular profile of RNA or other features, facilitating such discoveries. However, the overall scale of perturbations, especially when considering combinations of genes for perturbation, cannot be tackled exhaustively in the lab. An alternative is to use an iterative design: By leveraging the modularity and sparsity of gene cir- cuits along with prior biological knowledge, we can predict the impact of unseen genetic perturbations on these profiles and group genes with similar effects into co-functional modules, followed by a new round of Perturb-Seq to test these pre- dictions and improve the overall performance of the model. These iterative cycles of experiment and prediction allow prioritizing genes for testing, maximizing the knowledge gleaned from fixed experimental resources, and opening the way to learn general predictive models. Designing these experiments requires a system that can analyze a cellular system, incorporate new and existing knowledge, use statistical tools, predict the effects of unseen perturbations, and prioritize the set of perturbations for the next iteration. These can be time-consuming tasks for scientists and require multiple different skills. Here, we developed PerTurboAgent, an LLM-based agent that excels in predicting candidate gene panels for iterative Perturb-Seq experiments through self-directed data analysis and knowledge retrieval. We evaluated PerTurboAgent based on its ability to identify genes with a phenotypic impact on gene expression upon perturbation in genome-scale perturbation data. PerTurboAgent outperforms existing agent-based and active learning strategies, offering an efficient and understandable approach to designing sequential perturbation experiments


Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised Learning

May 2025

·

3 Reads

Reconstruction and joint embedding have emerged as two leading paradigms in Self Supervised Learning (SSL). Reconstruction methods focus on recovering the original sample from a different view in input space. On the other hand, joint embedding methods align the representations of different views in latent space. Both approaches offer compelling advantages, yet practitioners lack clear guidelines for choosing between them. In this work, we unveil the core mechanisms that distinguish each paradigm. By leveraging closed form solutions for both approaches, we precisely characterize how the view generation process, e.g. data augmentation, impacts the learned representations. We then demonstrate that, unlike supervised learning, both SSL paradigms require a minimal alignment between augmentations and irrelevant features to achieve asymptotic optimality with increasing sample size. Our findings indicate that in scenarios where these irrelevant features have a large magnitude, joint embedding methods are preferable because they impose a strictly weaker alignment condition compared to reconstruction based methods. These results not only clarify the trade offs between the two paradigms but also substantiate the empirical success of joint embedding approaches on real world challenging datasets.


SlideCNA methodology schematic and benchmarking on in silico Slide-seq-like data generated from snRNA-seq data of two MBC biopsies. a SlideCNA methodology involves calculating bead distances by a combined spatial- and expression-based pseudo-distance, binning beads based on pseudo-distance, and determining malignant clones based on CNA profiles. SlideCNA heat map (amplification > 1, deletion < 1), spatial plot of bins colored by assigned cluster, and boxplot of number of reads per bin after filtering for beads with > 300 counts across all genes for the in silico non-malignant-separated dataset (b) and non-malignant-mixed dataset (c) with UMI counts downsampled to 10%. Comparison of average SlideCNA CNA scores per chromosome arm of each SlideCNA-defined cluster for in silico data with non-malignant separation (d) and mixing (f) with UMI counts downsampled to 10% to the InferCNV profiles from the original snRNA-seq data. Pairwise Spearman correlation of CNA profiles as in d and f for in silico data with non-malignant separation (e) and mixing (g) with UMI counts downsampled to 10%
SlideCNA identifies spatial CNA patterns in Slide-seq MBC samples. a–f refer to sample HTAPP-895-SMP-7359 and g–l refer to sample HTAPP-944-SMP-7479. a, g Spatial plot of beads colored by mean CNA score across the indicated chromosomes, selected to demonstrate a range of spatial CNA patterns. b, h Spatial plot of beads annotated as non-malignant (blue) or malignant (pink) with non-malignant beads serving as reference for SlideCNA. c, i H&E stains of consecutive sections matching the Slide-seq samples with histopathological annotations. d, j Spatial plot of binned beads colored by SlideCNA-defined cluster designation. e, k Top DEGs for each cluster detected from the SlideCNA profile. DEGs were colored by average log2 cluster expression and sized by the percent of beads expressing that gene in the cluster (negative binomial generalized linear model p-adj < 0.05). f, l SlideCNA heat map of malignant and non-malignant binned beads annotated with cluster assignment. Comparison of average SlideCNA, InferCNV, CopyKAT, and ABSOLUTE CNA scores per chromosome arm for Slide-seq, snRNA-seq, and WES data for HTAPP-895-SMP-7359 (m) and HTAPP-944-SMP-7479 (n). Pairwise Spearman correlation of CNA profiles as in m and n for HTAPP-895-SMP-7359 (o) and HTAPP-944-SMP-7479 (p)
SlideCNA: spatial copy number alteration detection from Slide-seq-like spatial transcriptomics data

May 2025

·

27 Reads

Genome Biology

Solid tumors are spatially heterogeneous in their genetic, molecular, and cellular composition, but recent spatial profiling studies have mostly charted genetic and RNA variation in tumors separately. To leverage the potential of RNA to identify copy number alterations (CNAs), we develop SlideCNA, a computational tool to extract CNA signals from sparse spatial transcriptomics data with near single cellular resolution. SlideCNA uses expression-aware spatial binning to overcome sparsity limitations while maintaining spatial signal to recover CNA patterns. We test SlideCNA on simulated and real Slide-seq data of (metastatic) breast cancer and demonstrate its potential for spatial subclone detection. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03573-y.





Citations (50)


... Because 9p and 14q losses can co-occur in our cohort (Fisher's test P-value < 0.001), we further interrogated the specificity of this association using single-cell RNA-seq data from 954,023 cells, curated from 8 previously published studies (see "Methods"; refs. 8, [13][14][15][16][17][18][19]. By applying InferCNV, we detected individual cells with 9p and/or 14q loss, including 5 samples with subclonal loss of 9p and no 14q loss and 13 samples with subclonal 14q loss without 9p loss. ...

Reference:

Tracking Nongenetic Evolution from Primary to Metastatic ccRCC: TRACERx Renal
Tumor and immune reprogramming during immunotherapy in advanced renal cell carcinoma
  • Citing Article
  • June 2025

Cancer Cell

... Correspondingly, the four GBM cell states in Plxnb2-cKO hosts also displayed a pronounced expansion of NPC-like (88% versus 43%) and a reduction in both AC-like (8% versus 41%) and MES-like (1% versus 12%) states compared to control hosts (Fig. 7g). Because identical KR158 cells were transplanted, this drastic shift of GBM cell states illustrates stromal influence 51 . GSEA of the MG transcriptome in Plxnb2 cKO further supported a link of bulk expansion with inflammation (inflammatory response and TNF signaling) and hypoxia (Extended Data Fig. 8c). ...

Interactions between cancer cells and immune cells drive transitions to mesenchymal-like states in glioblastoma
  • Citing Article
  • May 2025

Cancer Cell

... Recently, some evidence for Kras G12D -mediated loss of lineage identity and reprogramming of type II cells into diverse cell states with distinct transcriptional signature was provided by utilizing single-cell RNA sequencing (scRNA-seq) in genetically engineered mouse models (GEMM; refs. [18][19][20]. Each of these studies demonstrates loss of type II cell markers and upregulation of type I cell markers. ...

Epigenomic State Transitions Characterize Tumor Progression in Mouse Lung Adenocarcinoma
  • Citing Article
  • May 2025

Cancer Cell

... Recent advancements in artificial intelligence (AI), particularly through large language models (LLMs) and Agentic AI systems, have significantly accelerated the automation of scientific research. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] AI tools now assist with literature review, experiment design, computational and wet-lab experiments, data analysis, and even manuscript drafting. Platforms such as the AI Scientist, 5,20 AI co-scientist, 12 Agent Laboratory, 17 and AgentRxiv 18 have demonstrated the ability of LLM agents to autonomously work across the whole research pipeline, representing a major advance in end-to-end research automation. ...

SpatialAgent: An Autonomous AI Agent for Spatial Biology

... In the near term, most real-world applications will likely follow a semi-automated model, in which LLM outputs are reviewed by human experts. Artificial intelligence agents powered by LLMs offer a promising avenue for overcoming some of these barriers, helping optimize and streamline data processes [88]. One example of suc-cess is TrialGPT, a recently introduced LLM-based matching tool for clinical trials, which reportedly achieves an 87.3% accuracy rate [89]. ...

How AI agents will change cancer research and oncology
  • Citing Article
  • December 2024

Nature Cancer

... To further explore the clinical relevance of these findings and elucidate connections between lipid homeostasis and lymphoid specification in human samples, we analyzed a recently described human single-HSPC transcriptome dataset spanning human development and maturation [20,21] to characterize ELOVL2 expression and its correlation with lymphoid cell markers at single-cell resolution in human HSPCs across gestation, maturation, and aging. These analyses revealed the presence of a subset of CD34 + HSPCs that express ELOVL2 in healthy adult bone marrow. ...

The dynamics of hematopoiesis over the human lifespan

Nature Methods

... With the myriad of possible applications of scRNA-seq technologies in precision medicine, 39 scRNA-seq reference atlases should be diverse from their inception to maximize the global benefit to all populations. 90 Limitations of the study One caveat for our analyses is that human demographics such as age, sex, self-reported ethnicity, and genetic ancestry can be confounded by correlated sociocultural, environmental (e.g., exposure to infectious agents), and lifestyle (e.g., diet) factors, all of which can contribute to phenotypic variation. Furthermore, technical variation across study sites can introduce biases that are challenging to disentangle from biological variation between population groups. ...

The commitment of the human cell atlas to humanity

... Although not virtual cells per se, several large-scale initiatives have sought to advance our understanding of human biology by mapping the relationships among biological components. Examples include the IUPS Human Physiome Project, which is developing a multi-scale framework for the hierarchical modeling of physiological function (Hunter et al., 2002;Hunter & Borg, 2003), the Human Cell Atlas, which is producing comprehensive reference maps for all human cells (Regev et al., 2018;Rood et al., 2025), the Human Proteome Project, which aims to map all expressed proteins in the human body (Hanash & Celis, 2002;Omenn et al., 2024), and the Human Connectome Project, which aims to map all neural connections in the human brain (Van Essen et al., 2013). ...

The Human Cell Atlas from a cell census to a unified foundation model
  • Citing Article
  • November 2024

Nature

... Given these advantages, our DRUG-seq dataset represents a valuable resource for training and benchmarking emerging AI models designed to predict cellular mechanisms and perturbation-response relationships (Bunne et al., 2024;Lotfollahi et al., 2023;Qi et al., 2024;Roohani et al., 2024) . These models aim to generalize drug effects across unseen cell types or conditions, which can be critical for applications in MOA inference, drug repurposing, and personalized medicine. ...

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
  • Citing Article
  • September 2024

... As a result, constructing and interpreting these models and identifying their failure points become more challenging 22 . Additionally, in settings in which training occurs based on the experimentally observed activity of short sequences (such as in massively parallel reporter assays (MPRAs), discussed subsequently), CNN models can outperform transformer-based models 23,24 . Consequently, recently published and preprint studies that use genome-wide measurements still prefer simpler models such as CNNs 25-28 . ...

A community effort to optimize sequence-based deep learning models of gene regulation

Nature Biotechnology