Figure - available from: Nature Communications
This content is subject to copyright. Terms and conditions apply.
Projection refines spatial expression patterns observed in enterocytes of the intestinal villus
a UMAP plot of the single cell atlas (circles) and projected LCM samples (squares) across the zones of the intestinal villus. Single cells are colored based on their zone assignment by Moor et al. b Heatmap visualizing the spatial expression patterns of the top 3000 highly variable genes using the spatial inference approach of Moor et al. on the left and after projecting the LCM samples with scProjection on the right. Three marker genes (rows) are labeled: Ada, Slc2a2 and Reg1. c Schematic of a single intestinal villus, along with the expected dominant expression zone of expression for Ada, Slc2a2 and Reg1. Shown below the villus is the measured expression pattern of Ada, Slc2a2 and Reg1 in the LCM data of the five zones. d Line plots comparing the measured and projected expression of top zonated genes across the intestinal villus.

Projection refines spatial expression patterns observed in enterocytes of the intestinal villus a UMAP plot of the single cell atlas (circles) and projected LCM samples (squares) across the zones of the intestinal villus. Single cells are colored based on their zone assignment by Moor et al. b Heatmap visualizing the spatial expression patterns of the top 3000 highly variable genes using the spatial inference approach of Moor et al. on the left and after projecting the LCM samples with scProjection on the right. Three marker genes (rows) are labeled: Ada, Slc2a2 and Reg1. c Schematic of a single intestinal villus, along with the expected dominant expression zone of expression for Ada, Slc2a2 and Reg1. Shown below the villus is the measured expression pattern of Ada, Slc2a2 and Reg1 in the LCM data of the five zones. d Line plots comparing the measured and projected expression of top zonated genes across the intestinal villus.

Source publication
Article
Full-text available
Multi-modal single cell RNA assays capture RNA content as well as other data modalities, such as spatial cell position or the electrophysiological properties of cells. Compared to dedicated scRNA-seq assays however, they may unintentionally capture RNA from multiple adjacent cells, exhibit lower RNA sequencing depth compared to scRNA-seq, or lack g...

Citations

... Johansen et al. [2023],Lopez et al. [2019],,Biancalani et al. [2021]. The experimental setup employed a leave-one-gene-out strategy, where the expression of a single gene was masked across all cells, and the models were tasked with predicting its expression pattern based on the remaining genes. ...
Preprint
Full-text available
A bstract Single-cell RNA sequencing (scRNA-seq) enables high-resolution exploration of cellular diversity and gene regulation, yet analyzing such data remains challenging due to technical and methodological limitations. Existing task-specific deep generative models like Variational Auto-Encoder (VAE) and its variants struggle to incorporate external biological knowledge, while transformer-based foundational large Language Models (LLMs or large LaMs) face limitations in computational cost and applicability to tabular gene expression data. Here, we introduce sciLaMA (single-cell interpretable Language Model Adapter), a novel representation learning framework that bridges these gaps by integrating static gene embeddings from multimodal LaMs with scRNA-seq tabular data through a paired-VAE architecture. Our approach generates context-aware representations for both cells and genes and outperforms state-of-the-art methods in key single-cell downstream tasks, including batch effect correction, cell clustering, and cell-state-specific gene marker and module identification, while maintaining computational efficiency. sciLaMA offers a computationally efficient, unified framework for comprehensive single-cell data analysis and biologically interpretable gene module discovery.
... We reasoned that shallow sequencing depth would also effectively reduce the number of samples available to train the encoders of scPair or other cell state inference methods. The encoders, being a form of dimensionality reduction in the case of scPair and other cell state inference methods, intuitively use the covariance structure in both the input features and non-linear transformations of them in order to reduce the dimensionality of each data modality [77][78][79] . However, robust covariance estimation can require many more samples (cells) compared to the number of input features, which can be a challenge since data modalities such as ATAC-seq can measure millions of features 3,13,60 , and even RNA-seq can have tens of thousands of features 29 . ...
Article
Full-text available
Multimodal single-cell assays profile multiple sets of features in the same cells and are widely used for identifying and mapping cell states between chromatin and mRNA and linking regulatory elements to target genes. However, the high dimensionality of input features and shallow sequencing depth compared to unimodal assays pose challenges in data analysis. Here we present scPair, a multimodal single-cell data framework that overcomes these challenges by employing an implicit feature selection approach. scPair uses dual encoder-decoder structures trained on paired data to align cell states across modalities and predict features from one modality to another. We demonstrate that scPair outperforms existing methods in accuracy and execution time, and facilitates downstream tasks such as trajectory inference. We further show scPair can augment smaller multimodal datasets with larger unimodal atlases to increase statistical power to identify groups of transcription factors active during different stages of neural differentiation.
... However, correlation coefficients were modest, and correlations between other genes and physiology features were inconsistent ( Figure S3B), as expected based on the considerable measurement noise associated with the detection of individual genes in single cells. 43,44 ...
Article
Full-text available
The distinctive physiology of striatal medium spiny neurons (MSNs) underlies their ability to integrate sensory and motor input. In rodents, MSNs have a hyperpolarized resting potential and low input resistance. When activated, they have a delayed onset of spiking and regular spike rate. Here, we show that in the macaque putamen, latency to spike is reduced and spike rate adaptation is increased relative to mouse. We use whole-cell brain slice recordings and recover single-cell gene expression using Patch-seq to distinguish macaque MSN cell types. Species differences in the expression of ion channel genes including the calcium-activated chloride channel, ANO2, and an auxiliary subunit of the A-type potassium channel, DPP10, are correlated with species differences in spike rate adaptation and latency to the first spike, respectively. These surprising divergences in physiology better define the strengths and limitations of mouse models for understanding neuronal and circuit function in the primate basal ganglia.
... We additionally computed the Average Silhouette Width (ASW), using the ground-truth cell-type labels and latent embeddings, without the use of any additional clustering algorithm. We additionally compared scMoE to several published spatial deconvolution methods including Tangram [21] and scProjection [22], which are deep-learning based models. We also compared scMoE to RCTD [23], a probabilistic deconvolution method. ...
... We compared scMoE and scMoE-H (i.e., without hierarchy) against 3 existing spatial deconvolution methods: scProjection [22], RCTD [23], and Tangram [21]. We benchmarked on simulated data based on 3 MERFISH single-cell resolution ST datasets: Mouse Medial Preoptic Area (MPOA) [43], Mouse Brain Aging Spatial Atlas (MBASA) [44], and Mouse Spatial Kidney (MSK) [45]. ...
Preprint
Full-text available
Advancements in single-cell transcriptomics methods have resulted in a wealth of single-cell RNA sequencing (scRNA-seq) data. Methods to learn cell representation from atlas-level scRNA-seq data across diverse tissues can shed light into cell functions implicated in diseases such as cancer. However, integrating large-scale and heterogeneous scRNA-seq data is challenging due to the disparity of cell-types and batch effects. We present single-cell Mixture of Expert (scMoE), a hierarchical mixture of experts single-cell topic model. Our key contributions are the cell-type specific experts, which explicitly aligns topics with cell-types, and the integration of hierarchical cell-type lineages and domain knowledge. scMoE is both transferable and highly interpretable. We benchmarked our scMoE's performance on 9 single-cell RNA-seq datasets for clustering and 3 simulated spatial datasets for spatial deconvolution. We additionally show that our model, using single-cell references, yields meaningful biological results by deconvolving 3 cancer bulk RNA-seq datasets and 2 spatial transcriptomics datasets. scMoE is able to identify cell-types of survival importance, find cancer subtype specific deconvolutional patterns, and capture meaningful spatially distinct cell-type distributions.
... Cell2location 9 creates Bayesian model to resolve fine-grained cell types in complex tissues and create cellular maps of diverse tissue. scProjection 10 trains Variational autoencoders with scRNA to capture within-cell type variation in expression. BayesTME 11 models spatial variation at multiple scales in ST data using a single hierarchical probabilistic model. ...
Preprint
Full-text available
In the era of single-cell genomics, deciphering cellular heterogeneity is paramount for understanding complex biological systems. Providing correct cell type annotations is a crucial task for enabling downstream analysis in Spatial Transcriptomics (ST). Untargeted ST technologies offer wide insight into the gene landscape, but suffer from gene dropouts, preventing complete capture of genes present in the cell. Common approaches for cell type annotation employ mapping of the reference high-quality scRNA gene expressions and cell types, to lower quality ST dataset. However, the accuracy and the performance of cell type annotation is not objectively evaluated, and methods lack the capacity to perform on large datasets with sparse data produced by the high-resolution technologies like Slide-seq v2 and Stereo-seq. We present CoDi, an innovative tool designed for precise cell type annotation leveraging the power of contrastive learning and advanced distance calculation methods using reference single-cell datasets. CoDi represents a significant advancement by demonstrating superior performance, and scalability compared to existing solutions on several different evaluation metrics, including highest retention rate of the marker genes. By harnessing the intrinsic structure of the data, CoDi effectively captures subtle features that characterize distinct cell types, resulting in enhanced annotation accuracy that can detect rare cell types such as neurons in the heart. In summary, CoDi represents a valuable tool, contributing to our understanding of cellular heterogeneity and offering insights into the specificity of various cell types within diverse tissue structures.
Article
Although more and more evidence has supported that metabolic syndrome (MS) is linked to ischemic stroke (IS), the molecular mechanism and genetic association between them has not been investigated. Here, we combined the existing single-cell RNA sequencing (scRNA-seq) data and mendelian randomization (MR) for stroke to understand the role of dysregulated metabolism in stroke. The shared hub genes were identified with machine learning and WGCNA. A total of six upregulated DEGs and five downregulated genes were selected for subsequent analyses. Nine genes were finally identified with random forest, Lasso regression, and XGBoost method as a potential diagnostic model. scRNA-seq also show the abnormal glycolysis level in most cell clusters in stroke and associated with the expression level of hub genes. The genetic relationship between IS and MS was verified with MR analysis. Our study reveals the common molecular profile and genetic association between ischemic stroke and metabolic syndrome.