Nir Yosef’s research while affiliated with Weizmann Institute of Science and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (338)


High-resolution spatial mapping of cell state and lineage dynamics in vivo with PEtracer
  • Preprint

June 2025

·

2 Reads

Luke W. Koblan

·

·

Pu Zheng

·

[...]

·

Jonathan S. Weissman

Charting the spatiotemporal dynamics of cell fate determination in development and disease is a long-standing objective in biology. Here we present the design, development, and extensive validation of PEtracer, a prime editing-based, evolving lineage tracing technology compatible with both single-cell sequencing and multimodal imaging methodologies to jointly profile cell state and lineage in dissociated cells or while preserving cellular context in tissues with high spatial resolution. Using PEtracer coupled with MERFISH spatial transcriptomic profiling in a syngeneic mouse model of tumor metastasis, we reconstruct the growth of individually-seeded tumors in vivo and uncover distinct modules of cell-intrinsic and cell-extrinsic factors that coordinate tumor growth. More generally, PEtracer enables systematic characterization of cell state and lineage relationships in intact tissues over biologically-relevant temporal and spatial scales.


Figure S2: Hotspot analysis of the mouse brain dataset. Local gene-gene correlations grouped by modules, computed with Hotspot applied to scVIVA latent.
scVIVA: a probabilistic framework for representation of cells and their environments in spatial transcriptomics
  • Preprint
  • File available

June 2025

·

10 Reads

Spatial transcriptomics provides a significant advance over studies of dissociated cells in that it reveals the environment in which cells reside, thus opening the way for a more complete description of their state and function. However, most current methods for embedding and discovery of cell states rely only on the cells' own gene expression profile, thus raising the need for ways to account for the neighboring cells as well. Here, we introduce scVIVA, a deep generative model that leverages both cell-intrinsic and neighboring gene expression profiles to output stochastic embeddings of cell states as well as normalized gene expression profiles. We demonstrate that scVIVA produces informative fine-grained partitions of cells that reflect both their internal state and the surrounding tissue and that its generative model facilitates the testing of hypotheses of differential expression between tissue niches. We leverage these properties of scVIVA to uncover a spatially-restricted tumor-promoting endothelial population in breast cancer and niche-associated T cell states that are shared across multiple cancers. scVIVA is available as open source software within scvi-tools.

Download




(A) Single‐cell multi‐omics provide a more comprehensive view of the cell by profiling different layers of the central dogma of biology. After initial preprocessing of single‐cell RNA‐seq (scRNA‐seq) fastq files, a matrix of UMI counts is generated for each gene in every cell. CITE‐seq enhances this by adding a proteomics layer, resulting in a matrix of antibody‐derived tags (ADT) counts per cell. Spatial transcriptomics provides gene expression data along with spatial coordinates of cells or grids, depending on the technology used. Single‐cell ATAC‐seq profiles the epigenome by reporting the number of Tn5 integration sites per peak. G&T‐seq integrates genomic and transcriptomic data by providing copy number variations across chromosome regions. Perturb‐seq adds a functional dimension by providing UMI counts in addition to the identity of perturbed genes in each cell. (B) The central VAE scheme illustrates how latent representation and decoded expression data can be utilised in single‐cell and spatial transcriptomics. Tasks framed in yellow are primarily derived from the latent space, focusing on data interpretation and dimensionality reduction, while tasks in pink are driven by the decoder, emphasising predictive and integrative capabilities.
Towards the Next Generation of Data‐Driven Therapeutics Using Spatially Resolved Single‐Cell Technologies and Generative AI

Recent advances in multi‐omics and spatially resolved single‐cell technologies have revolutionised our ability to profile millions of cellular states, offering unprecedented opportunities to understand the complex molecular landscapes of human tissues in both health and disease. These developments hold immense potential for precision medicine, particularly in the rational design of novel therapeutics for treating inflammatory and autoimmune diseases. However, the vast, high‐dimensional data generated by these technologies present significant analytical challenges, such as distinguishing technical variation from biological variation or defining relevant questions that leverage the added spatial dimension to improve our understanding of tissue organisation. Generative artificial intelligence (AI), specifically variational autoencoder‐ or transformer‐based latent variable models, provides a powerful and flexible approach to addressing these challenges. These models make inferences about a cell's intrinsic state by effectively identifying complex patterns, reducing data dimensionality and modelling the biological variability in single‐cell datasets. This review explores the current landscape of single‐cell and spatial multi‐omics technologies, the application of generative AI in data analysis and modelling and their transformative impact on our understanding of autoimmune diseases. By combining spatial and single‐cell data with advanced AI methodologies, we highlight novel insights into the pathogenesis of autoimmune disorders and outline future directions for leveraging these technologies to achieve the goal of AI‐powered personalised medicine.


ResolVI - addressing noise and bias in spatial transcriptomics

January 2025

·

3 Reads

·

1 Citation

Technologies for estimating RNA expression at high throughput, in intact tissue slices, and with high spatial resolution (spatial transcriptomics; ST) shed new light on how cells communicate and tissues function. A fundamental step common to all ST protocols is quantification, namely segmenting the plane into regions, each approximating a cell, and then collating the molecules inside each region to estimate the cellular expression profile. Despite many advances in this area, a persisting problem is that of wrong assignment of molecules to cells, which limits most current applications to the level of a priori defined cell subsets and complicates the discovery of novel cell states. Here, we develop resolVI, a model that operates downstream of any segmentation algorithm to generate a probabilistic representation, correcting for misassignment of molecules, as well as for batch effects and other nuisance factors. We demonstrate that resolVI improves our ability to distinguish between cell states, to identify subtle expression changes in space, and to perform integrated analysis across datasets. ResolVI is available as open source software within scvi-tools.


Framework of popV for automatic cell type annotation
PopV takes an unannotated query dataset and an annotated reference dataset as input. Each expert algorithm predicts the label on the query dataset to yield a cell-type annotation. The certainty of the respective label transfer can be quantified by scoring the agreement of those methods. The workflow yields a sample report to provide the user with insights into the annotated labels.
PopV prediction on LCA and TS lung as reference is accurate and interpretable
a, UMAP embedding after scANVI integration of TS reference cells, LCA query cells labeled with the ground-truth label and LCA query cells labeled with predicted label. b, Ontology accuracy (Methods) for the various methods computed on the query cells. c, Ontology accuracy for the prediction scores in popV. d, Highlighted cells with a consensus score of 4 or less (low consensus). e, Zoomed-in view of endothelial cells in the LCA with popV-predicted labels and ground-truth labels displayed. The zoomed-in picture is rotated by 90° to allow readability of all labels. Alveolar capillary type 2 endothelial cell is the Cell Ontology term for capillary aerocytes. The LCA annotated additional cell types between capillary aerocytes and capillary endothelial cells. TS, Tabula Sapiens; LCA, Lung Cell Atlas.
PopV identifies thymocytes as query-specific cell types and yields highly interpretable consensus scores
a, UMAP embedding after scANVI integration of reference cells (TS) and query cells (thymus cells across different age groups) labeled by popV prediction and original annotation. b, PopV prediction score overlaid on the UMAP plot. The prediction score is low for thymocytes and higher for most other cell types. c, The prediction accuracy of the popV prediction highlights the low accuracy in developing thymocytes. d, The prediction accuracy of the popV prediction in adult thymus cells in the query shows high accuracy except for CD8 T cells. e, Left, PopV accuracy and consensus score are well correlated in all thymus cells with high accuracy for predictions with a consensus score of 7 and 8. Right, All methods show a low accuracy on fetal cells. f, Left, PopV accuracy and consensus score are also well correlated when subsetting to cells from adult donors. Right, PopV shows the highest accuracy when subsetting to adult cells; most methods show similarly high accuracy.
Consensus prediction of cell type labels in single-cell data with popV

November 2024

·

28 Reads

·

13 Citations

Nature Genetics

Cell-type classification is a crucial step in single-cell sequencing analysis. Various methods have been proposed for transferring a cell-type label from an annotated reference atlas to unannotated query datasets. Existing methods for transferring cell-type labels lack proper uncertainty estimation for the resulting annotations, limiting interpretability and usefulness. To address this, we propose popular Vote (popV), an ensemble of prediction models with an ontology-based voting scheme. PopV achieves accurate cell-type labeling and provides uncertainty scores. In multiple case studies, popV confidently annotates the majority of cells while highlighting cell populations that are challenging to annotate by label transfer. This additional step helps to reduce the load of manual inspection, which is often a necessary component of the annotation process, and enables one to focus on the most problematic parts of the annotation, streamlining the overall annotation process.


VI-VS: calibrated identification of feature dependencies in single-cell multiomics

November 2024

·

51 Reads

Genome Biology

Unveiling functional relationships between various molecular cell phenotypes from data using machine learning models is a key promise of multiomics. Existing methods either use flexible but hard-to-interpret models or simpler, misspecified models. VI-VS (Variational Inference for Variable Selection) balances flexibility and interpretability to identify relevant feature relationships in multiomic data. It uses deep generative models to identify conditionally dependent features, with false discovery rate control. VI-VS is available as an open-source Python package, providing a robust solution to identify features more likely representing genuine causal relationships. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-024-03419-z.


Spatiotemporal lineage tracing reveals the dynamic spatial architecture of tumor growth and metastasis

October 2024

·

59 Reads

·

1 Citation

Tumor progression is driven by dynamic interactions between cancer cells and their surrounding microenvironment. Investigating the spatiotemporal evolution of tumors can provide crucial insights into how intrinsic changes within cancer cells and extrinsic alterations in the microenvironment cooperate to drive different stages of tumor progression. Here, we integrate high-resolution spatial transcriptomics and evolving lineage tracing technologies to elucidate how tumor expansion, plasticity, and metastasis co-evolve with microenvironmental remodeling in a Kras;p53-driven mouse model of lung adenocarcinoma. We find that rapid tumor expansion contributes to a hypoxic, immunosuppressive, and fibrotic microenvironment that is associated with the emergence of pro-metastatic cancer cell states. Furthermore, metastases arise from spatially-confined subclones of primary tumors and remodel the distant metastatic niche into a fibrotic, collagen-rich microenvironment. Together, we present a comprehensive dataset integrating spatial assays and lineage tracing to elucidate how sequential changes in cancer cell state and microenvironmental structures cooperate to promote tumor progression.


Citations (72)


... For example, it is widely studied that macrophages found in tumours, denoted as tumour-associated macrophages (TAMs) can be educated by the cancer cells to promote their resistance to immune attacks and therapies, favouring tumour growth. (Mantovani et al., 2022;Noy and Pollard, 2014;Sheban et al., 2025). The activation of the pro-tumoral phenotype in TAMs leads to the secretion of several cytokines, such as CXCL12/13, IL-10, and IL-6/IL-8, which are reported to promote tumour development by protecting the cancer cells (Cassetta and Pollard, 2018). ...

Reference:

Time-series RNA-Seq and data-driven network inference unveil dynamics of cell activation, survival and crosstalk in Chronic Lymphocytic Leukaemia in vitro models
ZEB2 is a master switch controlling the tumor-associated macrophage program
  • Citing Article
  • April 2025

Cancer Cell

... Uses CL as both input and output. (Ergen et al. 2024) confident of the precise identity of the cells being annotated. On the CxG Discover platform, users can find datasets by the cell types they contain via a faceted browsing interface, which supports searching for cell types by name and browsing via a simplified version of CL hierarchy ( Figure 3C). ...

Consensus prediction of cell type labels in single-cell data with popV

Nature Genetics

... These populations shared canonical residency signatures, characterized by elevated expression of ITGAE (CD103), ITGA1 (CD49a), SPRY1, CXCR6, CD101, and CCR9, and reduced expression of ITGB1 and SELL, relative to other T cells ( Supplementary Fig. 2d). γδ Trm cells also showed higher expression of KLRC2 and ENTPD1, markers previously linked with intestinal γδ T cells 11 . Among CD4 + Trms, one subset highly expressed KLRB1 (CD161) and CCR5 ( Supplementary Fig. 2d), consistent with a pro-inflammatory Trm population described in CD patients 12 . ...

Human γδ T cells in diverse tissues exhibit site-specific maturation dynamics across the life span
  • Citing Article
  • June 2024

Science Immunology

... Large scale 'omics technologies have been used to explore new dimensions of signalling with notable success [6][7][8][9][10][11] , and the rate at which new data is generated is increasing. These data are suited to analysis with machine learning, which can handle large feature sets effectively 8,12,13 . However, this approach comes with a caveat: the sacrifice of interpretability in favour of predictive accuracy. ...

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Nature Cancer

... Following training, the model and a corresponding AnnData file containing all cells passing full filters (including plate 14) were minified to enable interactive analysis (Ergen et al. 2024). In its minified format, the model provides gene expression estimates for 62,710 genes across 95,624,334 cells, utilizing 41 GB of storage. ...

Scvi-hub: an actionable repository for model-driven single cell analysis

... Despite progress in the field, this diversity complicates end users' ability to select the most suitable VEP and poses challenges for unbiased assessment, as new predictors often claim superiority over others [8]. Recent efforts have focused on independent benchmarking [9][10][11][12], but the sheer number of methods, their inconsistent naming (e.g., predictors of "variant effect, " "variant impact, " "functional effect, " "deleteriousness, " "pathogenicity, " or "mutational impact"), and the effort required to access predictions hinder identification and evaluation. Fair assessment also demands clear knowledge of training data, which is often poorly detailed in publications. ...

CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods

Genome Biology

... MPRAs allow us to measure the activity of thousands of regulatory sequences at once. We used our previously collected perturbation MPRA datasets [7,41,42] to validate our predicted enhancers. Perturbation MPRA is a strategy for demonstrating the regulatory effects of TF-binding motifs. ...

Optimizing sequence design strategies for perturbation MPRAs: a computational evaluation framework

Nucleic Acids Research

... Previous studies have provided valuable insights into tissue-specific immune variation by comparing the transcriptional profiles of blood and mucosal immune cells in healthy individuals [2][3][4] . While some studies have analyzed paired blood and tissue samples from IBD patients 5-8 , these have been limited in sample size and have not systematically characterised tissue-specific immune variation across cell types. ...

Multimodal profiling reveals tissue-directed signatures of human immune cells altered with age
  • Citing Preprint
  • January 2024

... Close interactions within FOLR2 + macrophages niches appear to be a major factor, as epithelial cells could induce necroptosis in these macrophages during cancer progression via direct contact [67]. Advanced techniques, such as spatial omics, time-resolved single-cell transcriptomics and medical imaging, serve promising strategies to track cancer evolution and depict elaborate cell communications [77,78]. Despite these advances, distinguishing FOLR2 + TAMs from FOLR2 + macrophages in the normal tissues using several markers remains challenging. ...

Time-resolved single-cell transcriptomics defines immune trajectories in glioblastoma
  • Citing Article
  • December 2023

Cell

... Time-scaled lineage trees, or chronograms, additionally hold information on the timing of lineage segregation events and, further, the underlying population dynamics of the studied cell populations [32]. Towards that goal, generating such time-scaled trees was made possible with the formulation of mechanistic models of lineage recording data, used in a Bayesian or maximum-likelihood inference framework [33][34][35]. In a previous paper [33], we developed a Bayesian inference method (TideTree) for lineage tracing data where mutations in the barcode, or scars, are acquired in a strictly irreversible fashion at independent genomic sites. ...

ConvexML: Scalable and accurate inference of single-cell chronograms from CRISPR/Cas9 lineage tracing data