Figure - available from: Nature Communications
This content is subject to copyright. Terms and conditions apply.
Augmenting multiomic analysis with unimodal datasets enhances coverage of transient states in trajectory inference
a-b UMAP visualizations of the ATAC cell state spaces learned by scPair, with 2,141 10x scMultiome cells used to train scPair (a), or with the 14,605 unimodal scATAC-seq cells (b). Cells are colored based on estimated pseudotime trajectories via Palantir, with labels (R, B1, B2, B3) indicating the trajectory root and branch terminals. Arrows mark the initial fork points. c UMAP visualizations of unimodal scATAC-seq cell state learned by scPair, each corresponding to the predicted expression pattern of specific marker genes: Fabp7 (starting pluripotent state), Maf, Zic1, and Ebf1 (markers of branch 1, 2 and 3, respectively). d Line plots show RNA expression predicted by scPair from the unimodal scATAC-seq data for the four markers from (c), as a function of pseudotime. Error bands represent one standard deviation. e Heatmaps compare chromatin accessibility patterns along inferred pseudotime (x-axis) for each branch in the trajectory, using the 2,141 10x scMultiome cells (top) versus the 14,605 unimodal scATAC-seq cells (bottom). Rows represent features (peaks), and columns represent 0.05 pseudotime intervals. In each heatmap, the order of rows from top to bottom is based on “feature pseudotime” (Methods) in ascending order. f Same as (d), except comparing measured RNA expression from the 10x scMultiome cells (top) and predicted RNA expression by scPair using the unimodal scATAC-seq cells (bottom). g Pseudotime-specific enrichments of transcription factor binding motifs along trajectory. Motifs found to be enriched in accessible regions of transient states were categorized as either (1) enriched in the trajectory trunk; (2) enriched in both trunk and projection neuron precursor branches; (3) mainly in branch 1 corresponding to interneuron precursors; and (4) projection neuron precursor branches only (branches 2 and 3). Example motifs were selected for visualization for each of the four categories. h Heatmap displays motif enrichment along trajectory, with vertical arrows marking the fork and branch terminals indicated in (b). Rows represent the enriched motifs and columns represent pseudotime. Source data are provided as a Source Data file.

Augmenting multiomic analysis with unimodal datasets enhances coverage of transient states in trajectory inference a-b UMAP visualizations of the ATAC cell state spaces learned by scPair, with 2,141 10x scMultiome cells used to train scPair (a), or with the 14,605 unimodal scATAC-seq cells (b). Cells are colored based on estimated pseudotime trajectories via Palantir, with labels (R, B1, B2, B3) indicating the trajectory root and branch terminals. Arrows mark the initial fork points. c UMAP visualizations of unimodal scATAC-seq cell state learned by scPair, each corresponding to the predicted expression pattern of specific marker genes: Fabp7 (starting pluripotent state), Maf, Zic1, and Ebf1 (markers of branch 1, 2 and 3, respectively). d Line plots show RNA expression predicted by scPair from the unimodal scATAC-seq data for the four markers from (c), as a function of pseudotime. Error bands represent one standard deviation. e Heatmaps compare chromatin accessibility patterns along inferred pseudotime (x-axis) for each branch in the trajectory, using the 2,141 10x scMultiome cells (top) versus the 14,605 unimodal scATAC-seq cells (bottom). Rows represent features (peaks), and columns represent 0.05 pseudotime intervals. In each heatmap, the order of rows from top to bottom is based on “feature pseudotime” (Methods) in ascending order. f Same as (d), except comparing measured RNA expression from the 10x scMultiome cells (top) and predicted RNA expression by scPair using the unimodal scATAC-seq cells (bottom). g Pseudotime-specific enrichments of transcription factor binding motifs along trajectory. Motifs found to be enriched in accessible regions of transient states were categorized as either (1) enriched in the trajectory trunk; (2) enriched in both trunk and projection neuron precursor branches; (3) mainly in branch 1 corresponding to interneuron precursors; and (4) projection neuron precursor branches only (branches 2 and 3). Example motifs were selected for visualization for each of the four categories. h Heatmap displays motif enrichment along trajectory, with vertical arrows marking the fork and branch terminals indicated in (b). Rows represent the enriched motifs and columns represent pseudotime. Source data are provided as a Source Data file.

Source publication
Article
Full-text available
Multimodal single-cell assays profile multiple sets of features in the same cells and are widely used for identifying and mapping cell states between chromatin and mRNA and linking regulatory elements to target genes. However, the high dimensionality of input features and shallow sequencing depth compared to unimodal assays pose challenges in data...

Citations

... However, scATAC-seq data faces two core challenges. Firstly, high sparsity, as there are few opportunities to capture open sites in the diploid genome, and the number of reads per cell is limited, resulting in an extremely low probability of capturing specific sites [22,23]. Secondly, it has high dimensionality caused by the complexity of chromatin structure and state. ...
Article
Full-text available
Background Cell type annotation serves as the cornerstone for downstream analysis of single cell data. Nevertheless, scATAC-seq data is characterized by high sparsity and dimensionality, presenting significant challenges to its annotation process. Results We introduce a novel method based on language model, named annATAC, which is designed for the automatic annotation of cell types in scATAC-seq data. This method primarily consists of three stages. During the pre-training stage, by training on a vast amount of unlabeled data, the model can learn the interaction relationships between peaks, thus building a preliminary understanding of the data features. Subsequently, in the fine-tuning stage, a small quantity of labeled data is utilized to conduct secondary training on the model, which enables the model to identify cell types accurately. Finally, in the prediction stage, the trained model is applied to annotate scATAC-seq data. Conclusions Compared with other automatic annotation methods across multiple datasets, annATAC demonstrates superiority on the annotation performance. Further experiments have validated that annATAC holds great potential in identifying marker peaks and marker motifs. It is expected that annATAC will provide more profound and precise analysis outcomes for scATAC-seq research. As a result, it will effectively promote the progress of relevant biomedical research. Graphical Abstract
... Among these, MultiVI (21) applies unsupervised variational autoencoders for cross-modality alignment, while scPair (22) implements a supervised learning approach that links chromatin accessibility and gene expression end to end. These frameworks are powerful for learning biologically meaningful low-dimensional representations of the data across modalities. ...
Preprint
Recent development of single-cell technology across multiple omics platforms has provided new ways to obtain holistic views of cells to study disease pathobiology. Alzheimer's disease (AD) is the most common form of dementia worldwide, yet the detailed understanding of its cellular and molecular mechanisms remains limited. In this study, we analyzed paired single-cell transcriptomic (scRNA-seq) and chromatin accessibility (scATAC-seq) data from the Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) Consortium to investigate the molecular mechanisms of AD at a cell-subpopulation-specific resolution focusing on glial cells. We benchmarked various multi-omics integration methods using diverse metrics and built an analytic workflow that enabled effective batch correction and cross-modality alignment, creating a unified cell state space. Through integrative analysis of 26 human brain samples, we uncovered AD-associated gene expression and pathway changes in glial subpopulations and highlighted important transcriptomic and epigenomic signatures via functional inference and interpretable machine learning paradigms, discovering the profound involvement of the Solute Carrier proteins (SLC) family genes in multiple glial cell types. We also identified glial cell-specific regulatory programs mediated by key transcription factors such as JUN and FOSL2 in astrocytes, the Zinc Finger (ZNF) family genes in microglia, and the SOX family of transcription factors in oligodendrocytes. Our study provides a comprehensive workflow and a high-resolution view of how glial regulatory programs are disrupted in AD. Our findings offer novel insights into disease-related changes in gene regulation and suggest potential targets for further research and therapy.