Xuegong Zhang

Xuegong Zhang
Verified
Xuegong verified their affiliation via an institutional email.
Verified
Xuegong verified their affiliation via an institutional email.
  • PhD
  • Professor at Tsinghua University

About

427
Publications
68,447
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,138
Citations
Introduction
Xuegong Zhang is currently a Professor at the Department of Automation, School of Information Science and Technology, and an Adjunct Professor at the School of Life Sciences and School of Medicine, Tsinghua University. Xuegong does research in Information Science, Machine Learning and Bioinformatics.
Current institution
Tsinghua University
Current position
  • Professor
Additional affiliations
October 2012 - July 2019
Tsinghua University
Position
  • Professor

Publications

Publications (427)
Preprint
Full-text available
The rise of large language models and multi-agent systems has sparked growing interest in AI scientists capable of autonomous biological research. However, existing benchmarks either focus on reasoning without data or on data analysis with predefined statistical answers, lacking realistic, data-driven evaluation settings. Here, we introduce the Bio...
Preprint
Learning spatial context of cells through pretraining on spatial transcriptomics (ST) data may empower us to systematically decipher tissue organization and cellular interactions. Yet, transformer-based generative models often focus on modeling individual cells, neglecting the intricate spatial relationships within them. We develop GeST, a deep tra...
Article
Reprogramming cell state transitions provides the potential for cell engineering and regenerative therapy for many diseases. Finding the reprogramming transcription factors (TFs) and their combinations that can direct the desired state transition is crucial for the task. Computational methods have been developed to identify such reprogramming TFs....
Article
In single-cell transcriptomics, inconsistent cell type annotations due to varied naming conventions and hierarchical granularity impede data integration, machine learning applications, and meaningful evaluations. To address this challenge, we developed the unified Hierarchical Annotation Framework (uHAF), which includes organ-specific hierarchical...
Preprint
Full-text available
Single-cell multi-omics data have a high potential for deciphering complex cellular mechanisms. But simultaneously measuring multi-omics data from the same cells is still challenging, which calls for computational methods to integrate data of multiple modalities and generate unobserved data. In this paper, we present scDiffusion-X, a latent diffusi...
Article
High‐throughput single‐cell RNA‐seq (scRNA‐seq) data contains an excess of zero values, which can be contributed by unexpressed genes and detection signal dropouts. Existing imputation methods fail to distinguish between these two types of zeros. In this study, we introduce a statistical framework that effectively differentiates true zeros (lack of...
Article
Full-text available
Background Atrial fibrillation Better Care (ABC) pathway is recommended by guidelines on atrial fibrillation (AF) and exerts a protective role against adverse outcomes of AF patients. But the possible differences in its effectiveness across the diverse range of patients in China have not been systematically evaluated. We aim to comprehensively eval...
Article
Full-text available
Cells are regulated at multiple levels, from regulations of individual genes to interactions across multiple genes. Some recent neural network models can connect molecular changes to cellular phenotypes, but their design lacks modeling of regulatory mechanisms, limiting the decoding of regulations behind key cellular events, such as cell state tran...
Article
Full-text available
Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and the complexity of differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated artificial intelligence pipeline. PhenoBrain utilizes a BERT...
Preprint
Full-text available
Understanding perturbations at the single-cell level is essential for unraveling cellular mechanisms and their implications in health and disease. The growing availability of biological data has driven the development of a variety of in silico perturbation methods designed for single-cell analysis, which offer a means to address many inherent limit...
Preprint
Full-text available
Human lung is a complex organ susceptible to various diseases. Single-cell transcriptomic studies provide rich data to targeting specific research questions. Here, we present uniLUNG, the largest lung transcriptomic cell atlas, comprising over 10 million cells across 20 disease states and healthy controls. We ensembled a universal hierarchical anno...
Article
Full-text available
Understanding tumor cell heterogeneity and plasticity is crucial for overcoming drug resistance. Single-cell technologies enable analyzing cell states at a given condition, but catenating static cell snapshots to characterize dynamic drug responses remains challenging. Here, we propose scStateDynamics, an algorithm to infer tumor cell state dynamic...
Article
Investigating mutations, including single nucleotide variations (SNVs), gene fusions, alternative splicing and copy number variations (CNVs), is fundamental to cancer study. Recent computational methods and biological research have demonstrated the reliability and biological significance of detecting mutations from single-cell transcriptomic data....
Preprint
The functional or structural spatial regions within tissues, referred to as spatial niches, are elements for illustrating the spatial contexts of multicellular organisms. A key challenge is querying shared niches across diverse tissues, which is crucial for achieving a comprehensive understanding of the organization and phenotypes of cell populatio...
Article
Recent development of large language models (LLMs) in AI has inspired scientists to develop a few large-scale AI foundation models for single-cell transcriptomics or large cellular models (LCMs) pretrained on massive single-cell RNA-seq data. They illustrated superior performances on a wide spectrum of tasks although the models were only pretrained...
Article
Motivation Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless,...
Preprint
Full-text available
Molecular property prediction is a crucial foundation for drug discovery. In recent years, pre-trained deep learning models have been widely applied to this task. Some approaches that incorporate prior biological domain knowledge into the pre-training framework have achieved impressive results. However, these methods heavily rely on biochemical exp...
Article
Full-text available
A universal coordinate system that can ensemble the huge number of cells and capture their heterogeneities is of vital importance for constructing large-scale cell atlases as references for molecular and cellular studies. Studies have shown that cells exhibit multifaceted heterogeneities in their transcriptomic features at multiple resolutions. Thi...
Preprint
Full-text available
Neoepitope-based cancer immunotherapy depends on accurate prediction of patient-specific neoepitopes. Many candidate neoepitopes can be identified but their prioritization is challenging, resulting in poor effectiveness of existing methods. NeoGuider, our neoepitope prediction pipeline, detects neoepitope candidates from sequencing data and utilize...
Article
In single cell studies, Principal Component Analysis (PCA) is widely used to reduce the dimensionality of dataset and visualize in 2D or 3D PC plots. Scientists often focus on different clusters within PC plot, overlooking the specific phenomenon, such as horse-shoe-like effect, that may reveal hidden knowledge about underlying biological dataset....
Article
Full-text available
While single-cell technologies have greatly advanced our comprehension of human brain cell types and functions, studies including large numbers of donors and multiple brain regions are needed to extend our understanding of brain cell heterogeneity. Integrating atlas-level single-cell data presents a chance to reveal rare cell types and cellular het...
Article
Full-text available
Objectives To develop computed tomography (CT)-based models to increase the prediction accuracy of spread through air spaces (STAS) in clinical-stage T1N0 lung adenocarcinoma. Methods Three cohorts of patients with stage T1N0 lung adenocarcinoma (n = 1258) were analyzed retrospectively. Two models using radiomics and deep neural networks (DNNs) we...
Article
Full-text available
The complex diagnostic criteria for gliomas pose great challenges for making accurate diagnoses with computational pathology methods. There are no in-depth analyses of the accuracy, reliability and auxiliary capability of present approaches from a clinical perspective. Previous studies have overlooked the exploration of molecular and morphological...
Article
Full-text available
The immunological mechanisms underlying chronic colitis are poorly understood. T follicular helper (TFH) cells are critical in helping B cells during germinal center reactions. In a T cell transfer colitis model, a lymphoid structure composed of mature dendritic cells (DCs) and TFH cells was found within T cell zones of colonic lymphoid follicles....
Article
Full-text available
Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the ‘languages’ of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named ‘xTrimoscFounda...
Article
Full-text available
Myeloid cells, particularly dendritic cells (DCs) and macrophages, play pivotal roles during asthma exacerbation by producing various cytokines and chemokines. 1 However, the heterogeneity and function of these cells remains inadequately explored. In this study, we performed an established HDM-induced allergic asthma mouse model and generated a sin...
Preprint
Full-text available
With large amounts of unlabeled RNA sequences data produced by high-throughput sequencing technologies, pre-trained RNA language models have been developed to estimate semantic space of RNA molecules, which facilities the understanding of grammar of RNA language. However, existing RNA language models overlook the impact of structure when modeling t...
Article
Full-text available
Node importance estimation (NIE) is the task of inferring the importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicated to knowledge graphs (KGs) for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model b...
Article
Full-text available
Large-scale transcriptomic data are crucial for understanding the molecular features of hepatocellular carcinoma (HCC). Integrated 15 transcriptomic datasets of HCC clinical samples, the first version of HCCDB (HCC database) was released in 2018. Through the meta-analysis of differentially expressed genes and prognosis-related genes across multiple...
Preprint
Full-text available
Gene expression could be perceived as a form of cell language, with underlying regulatory mechanisms akin to biological grammar. Decoding this "language" is critical in understanding cellular functions and behaviors, but presents significant challenges. Several works have attempted to learn the biological language by pre-training large foundation m...
Article
Full-text available
Complicated molecular alterations in tumors generate various mutant peptides. Some of these mutant peptides can be presented to the cell surface and then elicit immune responses, and such mutant peptides are called neoantigens. Accurate detection of neoantigens could help to design personalized cancer vaccines. Although some computational framework...
Article
Full-text available
Single-cell clustered regularly interspaced short palindromic repeats-sequencing (scCRISPR-seq) is an emerging high-throughput CRISPR screening technology where the true cellular response to perturbation is coupled with infected proportion bias of guide RNAs (gRNAs) across different cell clusters. The mixing of these effects introduces noise into s...
Article
Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regula...
Preprint
Full-text available
Cell state transitions are complicated processes that occur in various life activities. Understanding and artificially manipulating them have been longstanding challenges. Substantial experiments reveal that the transitions could be directed by several key transcription factors (TFs). Here we present scDirect, a computational framework to identify...
Article
Full-text available
Profiling spatial variations of cellular composition and transcriptomic characteristics is important for understanding the physiology and pathology of tissues. Spatial transcriptomics (ST) data depict spatial gene expression but the currently dominating high-throughput technology is yet not at single-cell resolution. Single-cell RNA-sequencing (SC)...
Article
Full-text available
Background Pituitary neuroendocrine tumors (PitNETs) are one of the most common types of intracranial tumors. Currently, the cellular characteristics of normal pituitary and various other types of PitNETs are still not completely understood. Methods We performed single-cell RNA sequencing (scRNA-seq) on 4 normal samples and 24 PitNET samples for c...
Preprint
Full-text available
Understanding the heterogeneity and dynamic plasticity of tumor cells is crucial for overcoming drug resistance. Single-cell technologies enable the analysis of cell states at a given condition or time point, but it is still challenging to catenate static tumor cell snapshots to characterize their dynamic responses after drug treatment. Here, we pr...
Preprint
Full-text available
The liver performs several vital functions such as metabolism, toxin removal and glucose storage through the coordination of various cell types. The cell type compositions and cellular states undergo significant changes in abnormal conditions such as fatty liver, cirrhosis and liver cancer. As the recent breakthrough of the single-cell/single-nucle...
Preprint
Full-text available
Background: Machine learning methods have recently been shown powerful in discovering knowledge from scientific data, offering promising prospects for discovery learning. In the meanwhile, Deep Generative Models like Generative Adversarial Networks (GANs) have excelled in generating synthetic data close to real data. GANs have been extensively empl...
Preprint
Full-text available
Innate lymphoid cells (ILCs) are crucial for maintaining tissue homeostasis. The dynamic composition of ILC subsets during ontogeny has been observed for over a decade, yet the underlying mechanisms remain incompletely understood. Here, we combined differentiation assay and scRNA-seq analysis to compare the fetal and adult ILC development, and asse...
Article
Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. He...
Article
Full-text available
The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi‐scale an...
Article
Cell-cell communication events (CEs) are mediated by multiple ligand-receptor (LR) pairs. Usually only a particular subset of CEs directly works for a specific downstream response in a particular microenvironment. We name them as functional communication events (FCEs) of the target responses. Decoding FCE-target gene relations is: important for und...
Article
Full-text available
Idiopathic pulmonary fibrosis (IPF) is a chronic interstitial lung disease with a high mortality rate and unclarified aetiology. Immune response is elaborately regulated during the progression of IPF, but immune cells subsets are complicated which has not been detailed described during IPF progression. Therefore, in the current study, we sought to...
Preprint
Full-text available
Single-cell technologies greatly accelerated our understanding of the human brain cell types and their functions. But most studies focused on only a single or a couple of brain regions in a limited number of donors. Integration of atlas-level single-cell data can offer opportunities in revealing the cell type difference among brain regions, thus re...
Preprint
Full-text available
The heart maintains its essential role in human life by the highly orchestrated functioning of specialized cell types. Recent advances in single-cell and single-nuclei RNA sequencing (scRNA-seq and snRNA-seq) provides the possibility of profiling the molecular and cellular characteristics of heart cells. We collected scRNA-seq and snRNA-seq data of...
Article
Full-text available
Motivation: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of th...
Preprint
Full-text available
Large-scale transcriptomic data are crucial for understanding the molecular features of hepatocellular carcinoma (HCC). By integrating 15 transcriptomic datasets of HCC clinical samples, the first version of HCCDB was released in 2018. The meta-analysis of differentially expressed genes and prognosis-related genes across multiple datasets provides...
Preprint
Full-text available
The immunological mechanisms underlying chronic colitis are poorly understood. T follicular helper (Tfh) cells are critical in helping B cells during germinal center reactions. In a T cell transfer colitis model, a lymphoid structure composed of mature type 2 conventional dendritic cells (cDC2s) and Tfh cells were found within T cell zones of colon...
Preprint
Full-text available
Chromatin accessibility profiling methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq) have been promoting the identification of gene regulatory elements and the characterization of epigenetic landscapes. Unlike gene expression data, there is no consistent reference for chromatin accessibility data, which hinders l...
Preprint
Large-scale pretrained models have become foundation models, leading to breakthroughs in natural language processing and related fields. Developing foundation models in life science, aimed at deciphering the "languages" of cells and facilitating biomedical research, is challenging yet promising. We developed a large-scale pretrained model, scFounda...
Article
Full-text available
Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remai...
Preprint
Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods....
Article
Lung cancer is the leading cause of cancer-related deaths worldwide. Medical imaging technologies such as computed tomography (CT) and positron emission tomography (PET) are routinely used for non-invasive lung cancer diagnosis. In clinical practice, physicians investigate the characteristics of tumors such as the size, shape and location from CT a...
Article
Full-text available
Background Musculoskeletal tissue degeneration impairs the life quality and function of many people. Meniscus degeneration is a major origin of knee osteoarthritis and a common threat to athletic ability, but its cellular mechanism remains elusive. Methods We built a cell atlas of 12 healthy or degenerated human meniscus samples from the inner and...
Article
Full-text available
Unfolding the “black-box” associations between genotype and phenotype is essential for understanding the molecular mechanisms of complex human diseases. Here, we describe the use of GRPath to uncover putative causal paths (pcPaths) from genetic variants to disease phenotypes. GRPath takes multiple omics data and summary statistics as input and iden...
Preprint
Full-text available
Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increased the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. H...
Preprint
Full-text available
Controlling total mRNA content differences between cell populations is critical in comparative transcriptomic measurements. Due to poor compatibility with ERCC, a good control for droplet-based scRNA-seq is yet to be discovered. Normalizing cells to a common count distribution has been adopted as a silent compromise. Such practice profoundly confou...
Preprint
Full-text available
Computationally integrating spatial transcriptomics (ST) and single-cell transcriptomics (SC) greatly benefits biomedical research such as cellular organization, embryogenesis and tumorigenesis, and could further facilitate therapeutic developments. We proposed a transfer learning model, STEM, to learn spatially-aware embeddings from gene expressio...
Article
Full-text available
Human Ensemble Cell Atlas (hECA) provides a unified informatics framework and the cell-centric-assembled single-cell transcriptome data of 1,093,299 labeled human cells from 116 published datasets. In this protocol, we provide three applications of hECA: “quantitative portraiture” exploration with websites, customizable reference creation for autom...
Article
Full-text available
Objectives To quantify intra-tumor heterogeneity (ITH) in non-small cell lung cancer (NSCLC) from computed tomography (CT) images.Methods We developed a quantitative ITH measurement—ITHscore—by integrating local radiomic features and global pixel distribution patterns. The associations of ITHscore with tumor phenotypes, genotypes, and patient’s pro...
Article
Full-text available
Complex traits such as cardiovascular diseases (CVD) are the results of complicated processes jointly affected by genetic and environmental factors. Genome-wide association studies (GWAS) identified genetic variants associated with diseases but usually did not reveal the underlying mechanisms. There could be many intermediate steps at epigenetic, t...
Article
Visual representation extraction is a fundamental problem in the field of computational histopathology. Considering the powerful representation capacity of deep learning and the scarcity of annotations, self-supervised learning has emerged as a promising approach to extract effective visual representations from unlabeled histopathological images. A...
Preprint
Cell-cell communication events (CEs) mediated by multiple ligand-receptor pairs construct a complex intercellular signaling network. Usually only a subset of CEs directly works for a specific downstream response in certain microenvironments. We call them functional communication events (FCEs). Spatial transcriptomic methods can profile the spatial...
Article
Federated learning (FL) is a privacy-preserving paradigm for multi-institutional collaborations, where the aggregation is an essential procedure after training on the local datasets. Conventional aggregation algorithms often apply a weighted averaging of the updates generated from distributed machines to update the global model. However, while the...
Preprint
Full-text available
Background Musculoskeletal tissue degeneration impairs the life quality and function of many people. Meniscus degeneration is a major origin of knee osteoarthritis and a common threat to athletic ability, but its cellular mechanism remains elusive. Methods We built a cell atlas of healthy/degenerated human meniscus using scRNA-seq to investigate m...
Article
Motivation Single cell technologies play a crucial role in revolutionizing biological research over the past decade, which strengthens our understanding in cell differentiation, development, and regulation from a single-cell level perspective. Single-cell RNA sequencing (scRNA-seq) is one of the most common single cell technologies, which enables p...
Article
Full-text available
The accumulation of massive single-cell omics data provides growing resources for building biomolecular atlases of all cells of human organs or the whole body. The true assembly of a cell atlas should be cell-centric rather than file-centric. We developed a unified informatics framework for seamless cell-centric data assembly and built the human En...
Article
Full-text available
Recent advances in single-cell technologies have enabled the characterization of epigenomic heterogeneity at the cellular level. Computational methods for automatic cell type annotation are urgently needed given the exponential growth in the number of cells. In particular, annotation of single-cell chromatin accessibility sequencing (scCAS) data, w...
Article
Full-text available
Isogenic cells growing in identical environments show cell-to-cell variations because of the stochasticity in gene expression. High levels of variation or noise can disrupt robust gene expression and result in tremendous consequences for cell behaviors. In this work, we showed evidence from single-cell RNA-sequencing data analysis that microRNAs (m...
Article
Full-text available
This perspective discusses the need and directions for developing a unified information framework to enable assembling cell atlases and revolutionizing medical research on the virtual body of assembled cell systems.
Preprint
Full-text available
The goal of big projects like Human Cell Atlas (HCA) and Human BioMedical Atlas Program (HuBMAP) is to build maps that comprehensively define and describe all cell types and their molecular features in a healthy human being. Just like geographical maps must have coordinates, a key task in building cell maps is to provide coordinate systems for cell...
Article
Full-text available
Quantifying cell proportions, especially for rare cell types in some scenarios, is of great value in tracking signals associated with certain phenotypes or diseases. Although some methods have been proposed to infer cell proportions from multicomponent bulk data, they are substantially less effective for estimating the proportions of rare cell type...
Preprint
Full-text available
Single-cell omics data can characterize multifaceted features of massive cells and bring significant insights to biomedical researches. The accumulation of single-cell data provides growing resources for constructing atlases for all cells of a human organ or the whole body. The true assembly of a cell atlas should be cell-centric rather than file-c...
Article
Motivation: Recent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue microenvironments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several...
Article
Clustering is a key step in revealing heterogeneities in single-cell data. Most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering provides dendrograms of cells, but cannot scale to large datasets due to high computational complexity. We present HGC, a fa...
Article
Full-text available
Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collectio...
Preprint
Full-text available
Quantifying the cell proportions, especially for rare cell types in some scenarios, is of great value to track signals related to certain phenotypes or diseases. Although some methods have been pro-posed to infer cell proportions from multi-component bulk data, they are substantially less effective for estimating rare cell type proportions since th...
Article
Full-text available
Freshwater lakes are threatened by harmful cyanobacterial blooms, whose basic unit is Cyanobacterial Aggregate (CA). CA-attached bacteria play a significant role through different blooming stages with substantial variation of their taxonomic structure. However, little is known about their functional variations and functional links with cyanobacteri...
Preprint
Full-text available
Clustering is a key step in revealing heterogeneities in single-cell data. Cell heterogeneity can be explored at different resolutions and the resulted varying cell states are inherently nested. However, most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical cluste...
Article
Full-text available
Background Efficient regulation of bacterial genes in response to the environmental stimulus results in unique gene clusters known as operons. Lack of complete operonic reference and functional information makes the prediction of metagenomic operons a challenging task; thus, opening new perspectives on the interpretation of the host-microbe interac...
Article
Recent advances of long-term time-lapse microscopy have made it easy for researchers to quantify cell behavior and molecular dynamics at single-cell resolution. However, the lack of easy-to-use software tools optimized for customized research is still a major challenge for quantitatively understanding biological processes through microscopy images....
Article
Full-text available
Background High throughput single-cell transcriptomic technology produces massive high-dimensional data, enabling high-resolution cell type definition and identification. To uncover the expressional patterns beneath the big data, a transcriptional landscape searching algorithm at a single-cell level is desirable. Results We explored the feasibilit...
Preprint
Full-text available
Recent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue micro-environments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computation...
Article
Full-text available
Objective We used data from twins and their families to probe the genetic factors contributing to microtia-atresia, in particular, early post-twinning variations that potentially account for the discordant phenotypes of monozygotic twin pairs. Methods Six families of monozygotic twins discordant for congenital microtia-atresia were recruited for s...
Preprint
Full-text available
Background Freshwater lakes are threatened by harmful cyanobacterial blooms; whose basic unit is Cyanobacterial Aggregate (CA). Community variations of CA-attached bacteria are substantial during different blooming stages. However, little is known about their transcriptional and metabolic variations. Most bacterial genomes in CA were not constructe...
Article
Colorectal cancer (CRC) progression is associated with cancer cell dedifferentiation and sternness acquisition. Several methods have been developed to identify sternness signatures in CRCs. However, studies that directly measured the degree of dedifferentiation in CRC tissues are limited. It is unclear how the differentiation states change during C...
Article
Full-text available
Expectations of machine learning (ML) are high for discovering new patterns in high-throughput biological data, but most such practices are accustomed to relying on existing knowledge conditions to design experiments. Investigations of the power and limitation of ML in revealing complex patterns from data without the guide of existing knowledge hav...
Article
Full-text available
Background: With the rapid development of single-cell genomics, technologies for parallel sequencing of the transcriptome and genome in each single cell is being explored in several labs and is becoming available. This brings us the opportunity to uncover association between genotypes and gene expression phenotypes at single-cell level by eQTL ana...

Network

Cited By