Hassaan Maan’s research while affiliated with University Health Network and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (16)


SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes
  • Article

December 2024

·

15 Reads

NAR Genomics and Bioinformatics

·

Finlay Maguire

·

Kendrick M Smith

·

[...]

·

Andrew G McArthur

The incorporation of sequencing technologies in frontline and public health healthcare settings was vital in developing virus surveillance programs during the Coronavirus Disease 2019 (COVID-19) pandemic caused by transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, increased data acquisition poses challenges for both rapid and accurate analyses. To overcome these hurdles, we developed the SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL) for quick bulk analyses of Illumina short-read sequencing data. SIGNAL is a Snakemake workflow that seamlessly manages parallel tasks to process large volumes of sequencing data. A series of outputs are generated, including consensus genomes, variant calls, lineage assessments and identified variants of concern (VOCs). Compared to other existing SARS-CoV-2 sequencing workflows, SIGNAL is one of the fastest-performing analysis tools while maintaining high accuracy. The source code is publicly available (github.com/jaleezyy/covid-19-signal) and is optimized to run on various systems, with software compatibility and resource management all handled within the workflow. Overall, SIGNAL illustrated its capacity for high-volume analyses through several contributions to publicly funded government public health surveillance programs and can be a valuable tool for continuing SARS-CoV-2 Illumina sequencing efforts and will inform the development of similar strategies for rapid viral sequence assessment.


Overview of the Iniquitate pipeline and analysis results
a, To determine the effects of dataset imbalance in scRNA-seq integration, two control balanced datasets and four complex datasets with imbalance already present were integrated using current state-of-the-art scRNA-seq integration techniques. A total of 2,600 integration experiments involving downsampling across datasets were performed, and the effects of imbalance on integration results as well as downstream analyses (clustering, DGE, cell type classification, query-to-reference prediction and trajectory inference) were quantified. b, Two key data characteristics were found to contribute to altered downstream results in imbalanced settings: aggregate cell type support (cell type imbalance) and minimum cell type center distance (transcriptomic similarity). c, To account for imbalanced scRNA-seq integration scenarios in evaluation and benchmarking, typically used metrics and scores were reformulated to reweigh disproportionate cell types, which includes the bARI, bAMI, Balanced Homogeneity Score, Balanced Completeness and Balanced V-measure.
Perturbation analysis of controlled PBMC dataset and effects on cell-type-specific integration
a,b, The cell type and batch representations of the balanced two batch PBMC dataset. c, The perturbation setup for the balanced PBMC data. In each iteration, one batch and one cell type are randomly selected, and the cell type is randomly either downsampled to 10% of its original number or ablated. Control experiments are also performed where no downsampling occurs. d, KNN classification within the integrated embedding space in control, downsampling and ablation experiments (n = 800 independent integration experiments) and across methods. The F1 scores are indicated for the same cell type that was downsampled. e, Hierarchical clustering of similar cell types in the balanced two-batch PBMC data. f, Cell-type-specific integration results using a KNN classifier after hierarchical clustering across perturbation experiments (n = 800 independent integration experiments) with the same setup as d. The cell types here are based on the label after hierarchical clustering from e. Box plots (d,f) indicate median values across experiments; hinges are the 25th and 75th percentile values; and whiskers indicate the 1.5× interquartile range (IQR) values from the hinges.
Quantification of the effects of dataset imbalance on downstream analyses
a, After integration of the balanced PBMC dataset in different perturbation scenarios (type) and based on the cell type downsampled, the number of unsupervised clusters from the results of each method based on Leiden clustering across experiments. b, The average marker gene ranking change in DGE (average marker gene perturbation score) for cell types downsampled in the balanced PBMC dataset, across methods. c, The average marker gene ranking change in DGE, for the ‘ablation’ experiment type in the balanced PBMC dataset. d,e, The cell-type-specific L1 annotation (coarse-grained) (d) and L2 annotation (fine-grained) (e) accuracy scores across experiments for query-to-reference results for individual batches in the balanced PBMC dataset, based on experiment type (control, downsampling and ablation) and cell type downsampled. f,g, The L1 predictions (f) and L2 predictions (g) by proportion across experiment types and experiments for CD4⁺ T cells and CD8⁺ T cells. h, The Spearman correlation between the estimated pseudotime for cells in the unintegrated data compared to the integrated data for the different methods in the balanced mesenchymal organogenesis dataset. A total of n = 800 integration experiments involving control, downsampling and ablation subsets were done for each analysis. Box plots (a,d,e,h) indicate median values across experiments; hinges are the 25th and 75th percentile values; and whiskers indicate the 1.5× interquartile range (IQR) values from the hinges. All values are overlaid on the box plots in a.
Compartment-wise perturbation experiments for eight batches of PDAC biopsy samples
a, Overview of the experimental setup. To determine the effects of dataset imbalance across epithelial cell compartments, various microenvironment cells were collapsed into the ‘microenvironment’ compartment, normal ductal and acinar cells into the ‘epithelial normal’ compartment and malignant ductal and acinar cells into the ‘epithelial tumor’ compartment. The perturbation experiments involved downsampling (10% of a compartment) and ablation (complete removal of a compartment) for four of eight randomly selected batches (n = 200 independent integration experiments). Note that all batches are integrated at once using each method. b, Number of cells in each compartment after cell type collapse, across batches/biopsy samples in the PDAC data. c, F1 classification score for KNN classification after integration, specific to each compartment when compared to the compartment that was downsampled or ablated, across experiments and methods used for integration. Box plots (c) indicate median values across experiments; hinges are the 25th and 75th percentile values; and whiskers indicate the 1.5× interquartile range (IQR) values from the hinges.
Benchmarking single-cell data integration using balanced clustering metrics
a, Cell-type-labeled UMAP plot for the balanced two-batch PBMC data with FCGR3A⁺ monocytes and CD4⁺ T cells downsampled to 10% of their original proportion in one batch, after integration with the tested methods as well as an unintegrated representation. b, Batch-labeled UMAP plot for the integrated and unintegrated downsampled two-batch PBMC data. c, Unsupervised clustering-labeled UMAP plot for Leiden clustering in the embedding space of the integrated and unintegrated results for the downsampled two-batch PBMC data. Note that each method/subset has its own unsupervised clusters, and they do not overlap. d,e, Scoring and ranking of integration results, when considering concordance of the unsupervised clustering labels and ground truth cell type labels for each integration method and the unintegrated subset, using the average results of the base (imbalanced) clustering metrics (d) (ARI, AMI, Completeness and Homogeneity) and average of the balanced clustering metrics (e) (bARI, bAMI, Balanced Completeness, Balanced Homogeneity and Balanced V-measure).

+1

Characterizing the impacts of dataset imbalance on single-cell data integration
  • Article
  • Publisher preview available

March 2024

·

154 Reads

·

15 Citations

Nature Biotechnology

Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties—aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.

View access options

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

February 2024

·

589 Reads

·

313 Citations

Nature Methods

Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a promising approach for developing foundation models. Drawing parallels between language and cellular biology (in which texts comprise words; similarly, cells are defined by genes), our study probes the applicability of foundation models to advance cellular biology and genetic research. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference.


DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics

January 2024

·

85 Reads

·

20 Citations

Genome Biology

Existing RNA velocity estimation methods strongly rely on predefined dynamics and cell-agnostic constant transcriptional kinetic rates, assumptions often violated in complex and heterogeneous single-cell RNA sequencing (scRNA-seq) data. Using a graph convolution network, DeepVelo overcomes these limitations by generalizing RNA velocity to cell populations containing time-dependent kinetics and multiple lineages. DeepVelo infers time-varying cellular rates of transcription, splicing, and degradation, recovers each cell’s stage in the differentiation process, and detects functionally relevant driver genes regulating these processes. Application to various developmental and pathogenic processes demonstrates DeepVelo’s capacity to study complex differentiation and lineage decision events in heterogeneous scRNA-seq data. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-03148-9.


Figure 5: (a) HLA antigen network from zero-shot scGPT. (b) CD8A gene neighbors from zero-shot and finetuned scGPT models, ranked by embedding similarity colored by ground-truth signalling pathway from Reactome. (c) CD antigen network from zero-shot and finetuned scGPT on the Immune Human dataset. (d) Differential expressions among scGPT-extracted gene programs by cell types in the Immune Human dataset. The interactivity between transcription factors, cofactors and target genes underlying a Gene
Results of perturbation prediction
scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

May 2023

·

2,249 Reads

·

53 Citations

Generative pre-trained models have achieved remarkable success in various domains such as natural language processing and computer vision. Specifically, the combination of large-scale diverse datasets and pre-trained transformers has emerged as a promising approach for developing foundation models. While texts are made up of words, cells can be characterized by genes. This analogy inspires us to explore the potential of foundation models for cell and gene biology. By leveraging the exponentially growing single-cell sequencing data, we present the first attempt to construct a single-cell foundation model through generative pre-training on over 10 million cells. We demonstrate that the generative pre-trained transformer, scGPT, effectively captures meaningful biological insights into genes and cells. Furthermore, the model can be readily fine-tuned to achieve state-of-the-art performance across a variety of downstream tasks, including multi-batch integration, multi-omic integration, cell-type annotation, genetic perturbation prediction, and gene network inference. The scGPT codebase is publicly available at https://github.com/bowang-lab/scGPT.


scFormer: A Universal Representation Learning Approach for Single-Cell Data Using Transformers

November 2022

·

210 Reads

·

6 Citations

A bstract Single-cell sequencing has emerged as a promising technique to decode cellular heterogeneity and analyze gene functions. With the high throughput of modern techniques and resulting large-scale sequencing data, deep learning has been used extensively to learn representations of individual cells for downstream tasks. However, most existing methods rely on fully connected networks and are unable to model complex relationships between both cell and gene representations. We hereby propose scFormer, a novel transformer-based deep learning framework to jointly optimize cell and gene embeddings for single-cell biology in an unsupervised manner. By drawing parallels between natural language processing and genomics, scFormer applies self-attention to learn salient gene and cell embeddings through masked gene modelling. scFormer provides a unified framework to readily address a variety of downstream tasks such as data integration, analysis of gene function, and perturbation response prediction. Extensive experiments using scFormer show state-of-the-art performance on seven datasets across the relevant tasks. The scFormer model implementation is available at https://github.com/bowang-lab/scFormer .


The differential impacts of dataset imbalance in single-cell data integration

October 2022

·

123 Reads

·

4 Citations

Single-cell transcriptomic data measured across distinct samples has led to a surge in computational methods for data integration. Few studies have explicitly examined the common case of cell-type imbalance between datasets to be integrated, and none have characterized its impact on downstream analyses. To address this gap, we developed the Iniquitate pipeline for assessing the stability of single-cell RNA sequencing (scRNA-seq) integration results after perturbing the degree of imbalance between datasets. Through benchmarking 5 state-of-the-art scRNA-seq integration techniques in 1600 perturbed integration scenarios for a multi-sample peripheral blood mononuclear cell (PBMC) cohort, our results indicate that dataset imbalance has significant impacts on downstream analyses and the biological interpretation of integration results. We observed significant variation in clustering, cell-type classification, marker-gene-based annotation, and query-to-reference mapping in imbalanced settings. Two key factors were found to lead to quantitation differences after scRNA-seq integration - the cell-type imbalance within and between samples (relative cell-type support) and the relatedness of cell-types across samples (minimum cell-type center distance). To account for evaluation gaps in imbalanced contexts, we developed novel clustering metrics robust to sample imbalance, including the balanced Adjusted Rand Index (bARI) and balanced Adjusted Mutual Information (bAMI). Our analysis quantifies biologically-relevant effects of dataset imbalance in integration scenarios, and introduces guidelines and novel metrics for integration of disparate datasets. The Iniquitate pipeline and balanced clustering metrics are available at https://github.com/hsmaan/Iniquitate and https://github.com/hsmaan/balanced-clustering, respectively.


Colony stimulating factor-1 producing endothelial cells and mesenchymal stromal cells maintain monocytes within a perivascular bone marrow niche

May 2022

·

124 Reads

·

41 Citations

Immunity

Macrophage colony stimulating factor-1 (CSF-1) plays a critical role in maintaining myeloid lineage cells. However, congenital global deficiency of CSF-1 (Csf1op/op) causes severe musculoskeletal defects that may indirectly affect hematopoiesis. Indeed, we show here that osteolineage-derived Csf1 prevented developmental abnormalities but had no effect on monopoiesis in adulthood. However, ubiquitous deletion of Csf1 conditionally in adulthood decreased monocyte survival, differentiation, and migration, independent of its effects on bone development. Bone histology revealed that monocytes reside near sinusoidal endothelial cells (ECs) and leptin receptor (Lepr)-expressing perivascular mesenchymal stromal cells (MSCs). Targeted deletion of Csf1 from sinusoidal ECs selectively reduced Ly6C⁻ monocytes, whereas combined depletion of Csf1 from ECs and MSCs further decreased Ly6Chi cells. Moreover, EC-derived CSF-1 facilitated recovery of Ly6C⁻ monocytes and protected mice from weight loss following induction of polymicrobial sepsis. Thus, monocytes are supported by distinct cellular sources of CSF-1 within a perivascular BM niche.


Figure 4: Velocity and pseudotime plots for pancreatic endocrinogenesis [2]. (a) The pseudotime prediction from DeepVelo accurately assigns alpha and beta cells to accurate developmental timepoints. Particularly, the progenitor cell cluster is correctly located at the upper left quadrant of the UMAP projection. The difference between the terminal alpha and beta cells is well captured, where alpha cells were developed earlier at E12.5 and beta cells appeared later until E15.5. (b) Velocity values derived from DeepVelo are projected onto the UMAP-based embedding and visualized. DeepVelo successfully captures the main structure of EP cells developing into the terminal celltypes of alpha, beta and delta cells. (c),(d) For comparison, the latent time and velocity computed by the dynamical model from scVelo. (e) Distribution of the overall RNA velocity consistency scores for DeepVelo and scVelo. (f,g) The histogram of pseudotime predictions for beta and alpha cells, by DeepVelo and the scVelo dynamical model, respectively. Beta cells are expected to have a larger percentage of cells with higher pseudotime values, which is true of the DeepVelo predicted values.
Figure S2: The PCA projection of cell-specific kinetic rates at various training epochs. (a-f) Scatter plot of the first two PCA dimensions at training epochs 10, 20, 30, 60, 90, 120. DeepVelo learns to predict similar kinetic rates for cells of same celltype. For example, the kinetic rates of Endothelial cells (outlined) are gradually clustered together and are located away from the unrelated granule lineage.
DeepVelo: Deep Learning extends RNA velocity to multi-lineage systems with cell-specific kinetics

April 2022

·

79 Reads

·

6 Citations

The introduction of RNA velocity in single-cell studies has opened new ways of examining cell differentiation and tissue development. Existing RNA velocity estimation methods are based on strong assumptions of either complete observation of cells in steady states or a predefined dynamics pattern parameterized by constant coefficients. These assumptions are violated in complex and heterogenous single-cell sequencing datasets and thus limit the application of these techniques. Here we present DeepVelo, a novel method that predicts the cell-specific dynamics of splicing kinetics using Graph Convolution Networks (GCNs). DeepVelo generalizes RNA velocity to cell populations containing time-dependent kinetics and multiple lineages, which are common in developmental and pathological systems. We applied DeepVelo to disentangle multifaceted kinetics in the processes of dentate gyrus neurogenesis, pancreatic endocrinogenesis, and hindbrain development. DeepVelo infers time-varying cellular rates of transcription, splicing and degradation, recovers each cell's stage in the underlying differentiation process and detects putative driver genes regulating these processes. DeepVelo relaxes the constraints of previous techniques and facilitates the study of more complex differentiation and lineage decision events in heterogeneous single-cell RNA sequencing data.


Three methods of selecting top drug candidates based on the GCN and MatchMaker prediction. Twenty-six drugs were selected for experimental testing by applying the predictions made by Node2Vec/GCN and MatchMaker models using three different methods. For Method 1, ten drugs were selected directly from top 60 drugs that were predicted to be most proximal to COVID-19 based on the GCN alone. For Methods 2 and 3, first, ten human protein targets were selected from the list of top 100 proteins that are most proximal to COVID-19 based on the GCN prediction. Following that, for Method 2, 14 drug candidates were selected from top ten top ranking PolypharmDB (i.e., 10,224 drugs from DrugBank⁹⁵ screened against 8525 human proteins; see Materials and Methods) candidates for each protein (i.e. ten single-target panels resulting in 100 candidates in total). For Method 3, nine out of the top 25 ranking PolypharmDB candidates for the ten-targets panel were selected as candidates, seven of which were already present in the list of candidates selected with Method 2. Please see Table 1 and Materials and Methods for further details.
Screening of predicted compounds identifies capmatinib and other drugs as host-targeted compounds with antiviral activity against human coronaviruses. Graph depicting mean ± SE and individual measurements of 229E Spike protein expression as measured by IF assay upon incubation with drugs as per Table S4. Results are expressed as mean 229E Spike expression relative to the DMSO vehicle (control) condition. (bottom) Representative images showing S protein expression (magenta) or DAPI (cyan) of the DMSO vehicle (control) or capmatinib (10 µM) treated conditions. Scale, 100 µm. Also shown are structures of palbociclib, polidocanol, capmatinib and anidulafungin, compounds that showed antiviral activity.
Capmatinib has a broad range of antiviral activity against human coronaviruses. (A) Quantification of 229E Spike protein abundance in MRC-5 cells treated with increasing doses of capmatinib in the IF assay (48 h infection), as mean ± SE (n = 3) expressed relative to DMSO (vehicle) control. *P < 0.05 relative to control (B) (left) Representative images of 229E plaques observed in MRC-5 cells treated with 10 µM capmatinib or DMSO (vehicle) for 6 days. (right) Quantification of viral titer from the DMSO (vehicle) control or capmatinib plaque assays, expressed as PFU/mL. *P < 0.05 relative to control (C). (left) Representative images of NL63 plaques observed in LLC-MK2 treated with 10 µM capmatinib or DMSO (vehicle) control for 5 days. (right) Quantification of NL63 PFU/mL in LLC-MK2 cells treated with the indicated doses of capmatinib. *P < 0.05 relative to control (D). Relative NL63 N protein RNA abundance 3 days post-infection in LLC-MK2 cells treated with 10 µM capmatinib relative to DMSO (vehicle) control. *P < 0.05 relative to control, n = 3 experimental replicates. E (left) Representative images of OC43 plaques observed in LLC-MK2 cells treated with 10 µM capmatinib or DMSO (vehicle) for 5 days. (right) Quantification of OC43 PFU/mL in LLC-MK2 cells treated with capmatinib or DMSO (vehicle) control. *P < 0.05 relative to control.
The antiviral activity of capmatinib is not attributed to its canonical role as an inhibitor of MET. (A) (left) Representative images of plaques in LLC-MK2 cells treated with 10 µM capmatinib, 10 µM AMG-337, or DMSO (vehicle) control and infected with NL63 for 5 days and (right) quantification of NL63 viral titer, shown as mean PFU ± SE (n = 3 with 3 technical replicates per experiment). (B) (left) Representative images of plaques in LLC-MK2 cells treated with DMSO (vehicle) control, 10 µM capmatinib, or 10 µM AMG-337, and infected with OC43 and (right) quantification of OC43 viral titer shown as mean PFU ± SE (n = 3 with 3 technical replicates per experiment). (C) (left) Representative images from IF assay of MRC-5 cells treated with 10 µM capmatinib or AMG-337 and infected with 229E. (right) Quantification of 229E S protein expression (20 images per condition, n = 3). (D) Representative neutralization curves from n = 5 independent experiments showing the relative antiviral activity of capmatinib vs. AMG-337 in pseudovirus assays performed with the SARS-CoV-1 and SARS-CoV-2 Spike protein. (E) Structures of capmatinib and AMG-337. (F) (left) Representative images of plaques in LLC-MK2 cells treated with 10 µM JH-I-25 and infected with OC43 for 5 days and (right) quantification of the OC43 viral titer shown as mean PFU ± SE (n = 3 with 3 technical replicates per experiment). (G) (left) Representative images from MRC-5 cells treated with 10 µM JH-I-25 and infected with 229E for 2 days and (right) quantification of 229E Spike protein expression (10 images per condition, n = 3). H (left) Representative images of MRC-5 cells transfected with IRAK1/4 siRNA or control and infected with 229E for 2 days. (right) Quantification of 229E Spike protein expression shown as mean ± SE, expressed relative to DMSO (vehicle) control (n = 3). *P  < 0.05.
Multiscale interactome analysis coupled with off-target drug predictions reveals drug repurposing candidates for human coronavirus disease

December 2021

·

387 Reads

·

14 Citations

The COVID-19 pandemic has highlighted the urgent need for the identification of new antiviral drug therapies for a variety of diseases. COVID-19 is caused by infection with the human coronavirus SARS-CoV-2, while other related human coronaviruses cause diseases ranging from severe respiratory infections to the common cold. We developed a computational approach to identify new antiviral drug targets and repurpose clinically-relevant drug compounds for the treatment of a range of human coronavirus diseases. Our approach is based on graph convolutional networks (GCN) and involves multiscale host-virus interactome analysis coupled to off-target drug predictions. Cell-based experimental assessment reveals several clinically-relevant drug repurposing candidates predicted by the in silico analyses to have antiviral activity against human coronavirus infection. In particular, we identify the MET inhibitor capmatinib as having potent and broad antiviral activity against several coronaviruses in a MET-independent manner, as well as novel roles for host cell proteins such as IRAK1/4 in supporting human coronavirus infection, which can inform further drug discovery studies.


Citations (13)


... The integration of multiple scRNA-seq studies requires the evaluation and correction of batch effects 50,51 . In the present study, we integrated almost equal numbers of cells from control and atherosclerotic tissue, resulting in high batch mixture and broad recapitulation of the major cell types from individual studies. ...

Reference:

Encompassing view of spatial and single-cell RNA sequencing renews the role of the microvasculature in human atherosclerosis
Characterizing the impacts of dataset imbalance on single-cell data integration

Nature Biotechnology

... 1. Domain-specific foundation models. These models, such as ProGen [16] and ESM3 [17] for proteins, DNABERT [18] and Evo [19] for DNA sequences, scGPT [20] for single-cell data, and chemical language models [21,22] for small molecules, are trained specifically on token sequence representations for individual scientific domains. 2. Fine-tuned general-purpose models. ...

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Nature Methods

... This concept was formalised through the calculation of "RNA velocity", using measurements of spliced and unspliced transcripts at a single point in time to predict the future transcriptomic state of single cells, and by reference to the transcriptomes of other cells, to infer future phenotypic states 15 . Since its original description, RNA velocity methods have been re ned and widely applied to understand dynamic and developmental relationships between cellular states across numerous biological systems [20][21][22][23][24] . ...

DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics

Genome Biology

... Accompanied with the widespread wave of pre-trained large language models in the artificial intelligence field, we introduced the BERT paradigm with innovative designs to unleash the power of BERT in single-cell data analysis area, inspiring a number of following single-cell large models [11][12][13][14][15][16][17][18][19] . As the pioneer, we collected millions of public data for pre-training and benchmarked the model on cell-type annotation task against various state-of-the-art methods. ...

scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

... Accompanied with the widespread wave of pre-trained large language models in the artificial intelligence field, we introduced the BERT paradigm with innovative designs to unleash the power of BERT in single-cell data analysis area, inspiring a number of following single-cell large models [11][12][13][14][15][16][17][18][19] . As the pioneer, we collected millions of public data for pre-training and benchmarked the model on cell-type annotation task against various state-of-the-art methods. ...

scFormer: A Universal Representation Learning Approach for Single-Cell Data Using Transformers

... In contrast, AtacAnnoR not only treats each reference cell type equally but also treats each query cell as an independent sample to compare with each reference cell type, rendering the annotation independent of the proportions of cell types in the reference or query datasets. Studies have shown that the imbalance between datasets could significantly impact the performance of methods on dataset integration [42,43]. Here, all competing methods except MASTERO and CellWalkR, which are marker-gene-based methods, rely on dataset integration in order to conduct the annotation. ...

The differential impacts of dataset imbalance in single-cell data integration

... This protein was not detected in the proteome. CSF1 secretion is associated with the maintenance of normal hematopoiesis and bone remodeling [55]. A CSF1 deficiency found in the AML-MSCs secretome could negatively affect osteogenesis and hematopoiesis. ...

Colony stimulating factor-1 producing endothelial cells and mesenchymal stromal cells maintain monocytes within a perivascular bone marrow niche
  • Citing Article
  • May 2022

Immunity

... We present TopicVelo, a method and framework for RNA velocity that improves on the state of the art and conceptually complements other approaches. Existing methods typically include genes based on their fit to a velocity model (7,(19)(20)(21), making strong assumptions about a globally determined steady state and potentially excluding genes that are informative for locally dynamic processes. In contrast, by using topic modeling to discover biologically relevant gene programs or processes ("topics") and the cells in which their activity levels are relatively high, TopicVelo hones in on genes that are informative for the kinetic parameters for different processes, while preventing cells that are not associated with a process from distorting its parameter estimates. ...

DeepVelo: Deep Learning extends RNA velocity to multi-lineage systems with cell-specific kinetics

... A commercially available covalent chemical library was screened to identify new IRP2-targeting compounds through a structure-based, AIaugmented in silico discovery system Ligand Design platform driven by MatchMaker™ (Cyclica Inc. Toronto, Canada) as described previously [39]. After lining up top candidate compounds with their AI-ranks and EV Scores (the MatchMaker's prediction of the compound specificity by scoring positively from on-target (IRP2) versus negatively from anti-target (IRP1), Fig. 1A), top 10 compounds were first evaluated for their inhibitory activities of the IRP-IRE system in the SW480 5′ ferritin IRE-luciferase reporter cells as described below. ...

Multiscale interactome analysis coupled with off-target drug predictions reveals drug repurposing candidates for human coronavirus disease

... Samples with Ct values less than 30 were included for whole genome sequencing, which was performed as described by Kotwa et al. [25], Nasir et al. [26], and Quick et al. [27]. In brief, cDNA was amplified using the ARTIC protocol. ...

A Comparison of Whole Genome Sequencing of SARS-CoV-2 Using Amplicon-Based Sequencing, Random Hexamers, and Bait Capture