Hongru Hu’s research while affiliated with University of California System and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


sciLaMA: A Single-Cell Representation Learning Framework to Leverage Prior Knowledge from Large Language Models
  • Preprint
  • File available

February 2025

·

11 Reads

Hongru Hu

·

Shuwen Zhang

·

Yongin Choi

·

[...]

·

Gerald Quon

A bstract Single-cell RNA sequencing (scRNA-seq) enables high-resolution exploration of cellular diversity and gene regulation, yet analyzing such data remains challenging due to technical and methodological limitations. Existing task-specific deep generative models like Variational Auto-Encoder (VAE) and its variants struggle to incorporate external biological knowledge, while transformer-based foundational large Language Models (LLMs or large LaMs) face limitations in computational cost and applicability to tabular gene expression data. Here, we introduce sciLaMA (single-cell interpretable Language Model Adapter), a novel representation learning framework that bridges these gaps by integrating static gene embeddings from multimodal LaMs with scRNA-seq tabular data through a paired-VAE architecture. Our approach generates context-aware representations for both cells and genes and outperforms state-of-the-art methods in key single-cell downstream tasks, including batch effect correction, cell clustering, and cell-state-specific gene marker and module identification, while maintaining computational efficiency. sciLaMA offers a computationally efficient, unified framework for comprehensive single-cell data analysis and biologically interpretable gene module discovery.

Download

Overview of the scPair framework for single cell multimodal analysis
a scPair uses dual feedforward neural networks to predict each modality from the other. The last hidden layer of each network encodes a modality-specific cell state space, and the bidirectional networks learn mappings between the modality-specific state spaces. (Cartoons of the single cell and assays were created with BioRender.com: Created in BioRender. Hu, H. (2024) BioRender.com/r97r180). b We use UMAP to visualize modality-specific cell state spaces learned by scPair. In this figure, the data is from the sci-CAR multimodal cell line dataset⁵⁵, where cells are colored by the cell type labels from the original study. Lines connect the modality-specific states of the same cell. c A visualization of the bidirectional map trained by scPair. Given a multimodal single cell sample, scPair is in part evaluated based on how well it can predict the ground truth (measured) ATAC cell state (bottom), given only the RNA profile to predict the ATAC state of a cell (top). Lines connect each cell’s predicted ATAC cell state to its ground truth ATAC cell state; vertical lines indicate high prediction accuracy. d Same as (c), but visualizing the ground truth (measured) RNA cell state (bottom) and the predicted RNA state from ATAC (top). Source data are provided as a Source Data file.
ScPair robustly aligns single cell multiomic data modalities
a Benchmark of RNA→ATAC mapping performance of scPair and other single cell multiomic methods. All methods were provided with the same training and held-out data sets for evaluation. Box plots compare the mapping performance as measured by the Fraction Of Samples Closer Than the True Match metric (1-FOSCTTM), where larger values indicate better performance. In the box plots, the minima, maxima, centerline, bounds of box, and whiskers represent the minimum value in the data, maximum, median, upper and lower quartiles, and 1.5x interquartile range, respectively. b Same as (a), except measuring ATAC→RNA performance of all methods. c UMAP visualizations of the ATAC (ground truth) and RNA→ATAC (predicted) cell state spaces learned by scPair on single cell multiomic datasets. Each point represents a single cell, and lines connect each cell’s measured ATAC and predicted ATAC (via mapping RNA→ATAC) cell states. Colors correspond to cell type labels from the original studies32,33,54–56 (datasets from left to right: 10X Genomics scMultiome human PBMCs, 10X Genomics scMultiome mouse brain, SHARE-seq mouse skin, and multi-species SNARE-seq cortex datasets). d Same as (a), but visualizing the RNA (ground truth) and ATAC→RNA (predicted) cell states learned by scPair. Source data are provided as a Source Data file.
Prediction of individual data features from the other data modality
a The ranking of RNA expression prediction accuracy, measured as Pearson correlation, across held-out data from seven datasets. Yellow stars indicate the best performing methods. b Same as (a), except the ranking of ATAC opening prediction accuracy, measured as area under the Receiver Operating Characteristic curve (auROC). c Held-out ground truth RNA expression from each cell type from the SNARE-seq multimodal adult mouse cortex dataset. Rows are differentially expressed genes and columns are cells clustered by type. d Predicted RNA expression based on the held-out ATAC profiles from the SNARE-seq multimodal adult mouse cortex dataset. Rows are differentially expressed genes and columns are cells clustered by type, in the same order as (c). e UMAP of scPair’s predicted ATAC cell state space (based solely on the RNA measurement of held-out samples), where cells are colored by cell types that have been defined in the SNARE-seq multimodal adult mouse cortex dataset. f Aggregated held-out ground truth accessibility tracks for the example marker peaks, which are identified as those that differ between cortical layer 2-3 (E2Rasgrf2, E3Rorb) and layer 5-6 (E5Sulf1, E6Tie4) excitatory neurons, within each corresponding cell type shown in (e). g UMAPs showing the predicted accessibility of peaks in (f), based on held-out RNA profiles. Color indicates opening probability. Source data are provided as a Source Data file.
Inference of developmental trajectories in ATAC space
a UMAP visualizations of cell state spaces learned from RNA (top) and ATAC (bottom) by various methods on the neonatal mouse cortex SNARE-seq data. Colors indicate cell types as defined in the original study. Below, a diagram indicating the expected linear developmental trajectory³³. b Swarm plots illustrate the individual pseudotimes assigned to each cell of each cell type, which are inferred using the cell state spaces learned by MultiVI (left) and scPair (right). The order of cell type on the y-axis (from top to bottom) follows the developmental path observed in the original study. * represents significance (p-value < 0.05) using a two-sided t-test, and n.s. represents non-significance. c Heatmaps of developmental state marker expression along pseudotime (x-axis) inferred from the MultiVI (left) and scPair (right) ATAC spaces. Gene order on the y-axis follows the expected order of expression according to maturation time from the original study. Source data are provided as a Source Data file.
Augmenting multiomic analysis with unimodal datasets enhances coverage of transient states in trajectory inference
a-b UMAP visualizations of the ATAC cell state spaces learned by scPair, with 2,141 10x scMultiome cells used to train scPair (a), or with the 14,605 unimodal scATAC-seq cells (b). Cells are colored based on estimated pseudotime trajectories via Palantir, with labels (R, B1, B2, B3) indicating the trajectory root and branch terminals. Arrows mark the initial fork points. c UMAP visualizations of unimodal scATAC-seq cell state learned by scPair, each corresponding to the predicted expression pattern of specific marker genes: Fabp7 (starting pluripotent state), Maf, Zic1, and Ebf1 (markers of branch 1, 2 and 3, respectively). d Line plots show RNA expression predicted by scPair from the unimodal scATAC-seq data for the four markers from (c), as a function of pseudotime. Error bands represent one standard deviation. e Heatmaps compare chromatin accessibility patterns along inferred pseudotime (x-axis) for each branch in the trajectory, using the 2,141 10x scMultiome cells (top) versus the 14,605 unimodal scATAC-seq cells (bottom). Rows represent features (peaks), and columns represent 0.05 pseudotime intervals. In each heatmap, the order of rows from top to bottom is based on “feature pseudotime” (Methods) in ascending order. f Same as (d), except comparing measured RNA expression from the 10x scMultiome cells (top) and predicted RNA expression by scPair using the unimodal scATAC-seq cells (bottom). g Pseudotime-specific enrichments of transcription factor binding motifs along trajectory. Motifs found to be enriched in accessible regions of transient states were categorized as either (1) enriched in the trajectory trunk; (2) enriched in both trunk and projection neuron precursor branches; (3) mainly in branch 1 corresponding to interneuron precursors; and (4) projection neuron precursor branches only (branches 2 and 3). Example motifs were selected for visualization for each of the four categories. h Heatmap displays motif enrichment along trajectory, with vertical arrows marking the fork and branch terminals indicated in (b). Rows represent the enriched motifs and columns represent pseudotime. Source data are provided as a Source Data file.

+1

scPair: Boosting single cell multimodal analysis by leveraging implicit feature selection and single cell atlases

November 2024

·

31 Reads

·

2 Citations

Multimodal single-cell assays profile multiple sets of features in the same cells and are widely used for identifying and mapping cell states between chromatin and mRNA and linking regulatory elements to target genes. However, the high dimensionality of input features and shallow sequencing depth compared to unimodal assays pose challenges in data analysis. Here we present scPair, a multimodal single-cell data framework that overcomes these challenges by employing an implicit feature selection approach. scPair uses dual encoder-decoder structures trained on paired data to align cell states across modalities and predict features from one modality to another. We demonstrate that scPair outperforms existing methods in accuracy and execution time, and facilitates downstream tasks such as trajectory inference. We further show scPair can augment smaller multimodal datasets with larger unimodal atlases to increase statistical power to identify groups of transcription factors active during different stages of neural differentiation.


Projecting RNA measurements onto single cell atlases to extract cell type-specific expression profiles using scProjection

August 2023

·

84 Reads

·

7 Citations

Multi-modal single cell RNA assays capture RNA content as well as other data modalities, such as spatial cell position or the electrophysiological properties of cells. Compared to dedicated scRNA-seq assays however, they may unintentionally capture RNA from multiple adjacent cells, exhibit lower RNA sequencing depth compared to scRNA-seq, or lack genome-wide RNA measurements. We present scProjection, a method for mapping individual multi-modal RNA measurements to deeply sequenced scRNA-seq atlases to extract cell type-specific, single cell gene expression profiles. We demonstrate several use cases of scProjection, including identifying spatial motifs from spatial transcriptome assays, distinguishing RNA contributions from neighboring cells in both spatial and multi-modal single cell assays, and imputing expression measurements of un-measured genes from gene markers. scProjection therefore combines the advantages of both multi-modal and scRNA-seq assays to yield precise multi-modal measurements of single cells.


Cell adhesion molecules play subclass-specific roles in electrophysiological response and Schizophrenia risk

November 2022

·

88 Reads

·

1 Citation

Multimodal assays such as Patch-seq that simultaneously profile molecular and cellular phenotypes of cells enable the identification of molecular underpinnings of electrophysiological response patterns in neurons. Here we analyzed Patch-seq measurements of thousands of mouse interneurons to identify subclass-specific genes associated with different electrophysiological features. We found extensive subclass specificity: even for the same ephys feature, largely unique sets of genes are associated with that feature in different subclasses. Well established ephys genes such as Reln demonstrated subclass specificity that was previously not reported. Surprisingly, we found that ion channels explained significantly less variation in ephys response across interneurons compared to other genes; in particular, gene sets enriched in cell adhesion genes were amongst the most associated. We found our gene sets associated with action potential dV/dt measurements explained significant heritability of Schizophrenia risk, suggesting a novel role of single neuron electrophysiology in Schizophrenia risk. Finally, we observed significant ephys function switching of cell adhesion molecules across subclasses; the same adhesion molecule was observed to associate with different functional ephys measurements in distinct subclasses and co-express with different genes, suggesting re-purposing of adhesion molecules in different subclasses. Overall, our results yield novel insight into the specificity of roles that individual genes and adhesion molecules play in both single neuron ephys response and Schizophrenia risk.

Citations (3)


... Among these, MultiVI (21) applies unsupervised variational autoencoders for cross-modality alignment, while scPair (22) implements a supervised learning approach that links chromatin accessibility and gene expression end to end. These frameworks are powerful for learning biologically meaningful low-dimensional representations of the data across modalities. ...

Reference:

Single-cell multiomics reveals disrupted glial gene regulatory programs in Alzheimer's disease via interpretable machine learning
scPair: Boosting single cell multimodal analysis by leveraging implicit feature selection and single cell atlases

... Johansen et al. [2023],Lopez et al. [2019],,Biancalani et al. [2021]. The experimental setup employed a leave-one-gene-out strategy, where the expression of a single gene was masked across all cells, and the models were tasked with predicting its expression pattern based on the remaining genes. ...

Projecting RNA measurements onto single cell atlases to extract cell type-specific expression profiles using scProjection

... Single cell assays have been developed to capture diverse aspects of genome regulation, including gene expression 1,2 , chromatin accessibility 3 , and methylation profiling 4,5 , among others 6,7 . These single modality assays that capture a single data type have been widely deployed on a variety of tissues and species to catalog cell types and states [8][9][10][11][12][13][14][15][16] , identify genomic features that activate at specific steps along cellular trajectories [17][18][19][20][21][22][23] , and infer regulatory networks cataloging interactions of genes, open chromatin regions or methylation sites [24][25][26] . A common step of single cell data analysis is cell state inference: the inference of a low dimensional representation of a single cell data modality, that is subsequently used for 2D data visualization 27,28 , clustering to identify discrete cell types and states, and trajectory inference tasks 29,30 . ...

Cell adhesion molecules play subclass-specific roles in electrophysiological response and Schizophrenia risk