About
116
Publications
33,340
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,007
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (116)
Motivation:
PIWI-interacting RNAs (piRNAs) are a class of small non-coding RNAs that are highly abundant in the germline. One important role of piRNAs is to defend genome integrity by guiding PIWI proteins to silence transposable elements (TEs), which have a high potential to cause deleterious effects on their host. The mechanism of piRNA-mediated...
The piRNA machinery is known for its role in mediating epigenetic silencing of transposons. Recent studies suggest that this function also involves piRNA-guided cleavage of transposon-derived transcripts. As many piRNAs also appear to have the capacity to target diverse mRNAs, this raises the intriguing possibility that piRNAs may act extensively a...
piRNAs are a class of small RNAs that is most abundantly expressed in the animal germ line. Presently, substantial research is going on to reveal the functions of piRNAs in the epigenetic and post-transcriptional regulation of transposons and genes. A piRNA database for collection, annotation and structuring of these data will be a valuable contrib...
Metazoan microRNAs (miRNAs) are commonly encoded by primary mRNA-like characteristics (mlRNAs). To investigate whether mlRNAs are subject to miRNA control, we compared the expression of mlRNAs to that of tissue-specific miRNAs. We show that, like mRNAs, the expression levels of predicted mlRNA targets are significantly reduced in tissues where a ta...
The NONCODE database is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs). Since NONCODE was first released 3 years ago, the number of known ncRNAs has grown rapidly, and there is growing recognition that ncRNAs play important regulatory roles in most organisms. In the updated version of NONCODE (NONCODE v2.0),...
Human leukocyte antigen (HLA) genes play a crucial role in the adaptation of human populations to the dynamic pathogenic environment. Despite their significance, investigating the pathogen-driven evolution of HLAs and the implications for autoimmune diseases presents considerable challenges. Here, we genotyped over twenty HLA genes at 3-field resol...
The National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), offers a comprehensive suite of database resources to support the global scientific community. Amidst the unprecedented accumulation of multi-omics data, CNCB-NGDC is committed to continually evolving and updating its core database reso...
Lycium barbarum, a member of the Solanaceae family, represents an important eudicot lineage with homology of food and medicine. Lycium barbarum pectin polysaccharides (LBPPs) are key bioactive ingredients of Lycium barbarum, and are among the few polysaccharides with both biocompatibility and biomedical activity. While previous studies have primari...
Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown. Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR...
The brain is the central hub of the entire nervous system. Its development is a lifelong process guided by a genetic blueprint. Understanding how genes influence brain development is critical for deciphering the formation of human cognitive functions and the underlying mechanisms of neurological disorders. Recent advances in multi-omics techniques...
piRNA (PIWI-interacting RNA) is a kind of small noncoding RNA mainly expressed in germ cells and bound to PIWI proteins. It plays an important role in germ cells and gene regulation. Due to the huge amount and complexity of related data resources and the lack of systematic integration, a large number of data have not been fully mined and utilized,...
Cis-regulatory elements (CREs) have an important role in human adaptation to the living environment. However, the lag in population genomic cohort studies and epigenomic studies, hinders the research in the adaptive analysis of CREs in human populations. In this study, we collected 4,013 unrelated individuals and performed a comprehensive analysis...
The National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support the global academic and industrial communities. With the rapid accumulation of multi-omics data at an unprecedented pace, CNCB-NGDC continuously expands and updates core database resour...
Cancer is driven by both germline and somatic genetic changes. Efforts have been devoted to characterizing essential genetic variations in cancer initiation and development. Most attention has been given to mutations in protein‐coding genes and associated regulatory elements such as promoters and enhancers. The development of sequencing technologie...
Characterizing natural selection signatures and relationships with phenotype spectra is important for understanding human evolution and both biological and pathological mechanisms. Here, we identified 24 genetic loci under recent selection by analyzing rare singletons in 3946 high-depth whole-genome sequencing data of Han Chinese. The loci include...
Recently, Worobey et al. (2022) published a report with the title “The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic” that succinctly summarizes their study. A pre-print version of this study had earlier elicited a series of high-profile media coverages. All these reports deliver a social-political messag...
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,01...
The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global academic and industrial communities. With the explosive accumulation of multi-omics data generated at an unprecedented rate, CNCB-NGDC constantly expands and updates core database resources...
Noncoding RNAs (ncRNAs) play key regulatory roles in biological processes by interacting with other biomolecules. With the development of high-throughput sequencing and experimental technologies, extensive ncRNA interactions have been accumulated. Therefore, we updated the NPInter database to a fifth version to document these interactions. ncRNA in...
The redox homeostasis system regulates many biological processes, intracellular antioxidant production and redox signaling. However, long noncoding RNAs (lncRNAs) involved in redox regulation have rarely been reported. Herein, we reported that downregulation of MAGI2-AS3 decreased the superoxide level in Human fibroblasts (Fbs), a replicative aging...
Variation of vertical distribution of bacterial communities through the whole water column of the trench regions has received more and more attention. However, it is still unclear whether there are unique microbial communities in trenches. The study investigated the bacterial composition and diversity in three size-fractions (0.3-0.7μm, 0.7–2.7μm,...
Background: The inactivation of tumor-suppressor p53 plays an important role in second generation anti-androgens (SGAs) drug resistance and neuroendocrine differentiation in castration-resistant prostate cancer (CRPC). The reactivation of p53 by blocking the MDM2–p53 interaction represents an attractive therapeutic remedy in cancers with wild-type...
Chemotherapeutic agents, such as 5-fluorouracil (5-FU) and oxaliplatin (Oxi), can not only kill the cancer cell but also influence the proliferation of gut microbiota; however, the interaction between these drugs and gut microbiota remains poorly understood. In this study, we developed a powerful framework for taxonomy composition and genomic varia...
Mobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map...
Piwi-interacting RNAs are a type of small noncoding RNA that have various functions. piRBase is a manually curated resource focused on assisting piRNA functional analysis. piRBase release v3.0 is committed to providing more comprehensive piRNA related information. The latest release covers >181 million unique piRNA sequences, including 440 datasets...
The lack of haplotype reference panels and whole-genome sequencing resources specific to the Chinese population has greatly hindered genetic studies in the world’s largest population. Here, we present the NyuWa genome resource, based on deep (26.2×) sequencing of 2,999 Chinese individuals, and construct a NyuWa reference panel of 5,804 haplotypes a...
The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global research in both academia and industry. With the explosively accumulated multi-omics data at ever-faster rates, CNCB-NGDC is constantly scaling up and updating its core database resources t...
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation o...
RNA-binding proteins (RBPs) have essential functions during germline and early embryo development. However, current methods are unable to identify the in vivo targets of a RBP in these low-abundance cells. Here, by coupling RBP-mediated reverse transcription termination with linear amplification of complementary DNA ends and sequencing, we present...
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation o...
Figs and fig pollinators are one of the few classic textbook examples of obligate pollination mutualism. The specific dependence of fig pollinators on the relatively safe living environment with sufficient food sources in the enclosed fig syconia implies that they are vulnerable to habitat changes. However, there is still no extensive genomic evide...
Mobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map...
NONCODE (http://www.noncode.org/) is a comprehensive database of collection and annotation of noncoding RNAs, especially long non-coding RNAs (lncRNAs) in animals. NONCODEV6 is dedicated to providing the full scope of lncRNAs across plants and animals. The number of lncRNAs in NONCODEV6 has increased from 548 640 to 644 510 since the last update in...
The lack of Chinese population specific haplotype reference panel and whole genome sequencing resources has greatly hindered the genetics studies in the world’s largest population. Here we presented the NyuWa genome resource of 71.1M SNPs and 8.2M indels based on deep (26.2X) sequencing of 2,999 Chinese individuals, and constructed NyuWa reference...
Retrotransposons are populated in vertebrate genomes, and when active, are thought to cause genome instability with potential benefit to genome evolution. Retrotransposon-derived RNAs are also known to give rise to small endo-siRNAs to help maintain heterochromatin at their sites of transcription; however, as not all heterochromatic regions are equ...
Highly structured RNA molecules usually interact with each other, and associate with various RNA-binding proteins, to regulate critical biological processes. However, RNA structures and interactions in intact cells remain largely unknown. Here, by coupling proximity ligation mediated by RNA-binding proteins with deep sequencing, we report an RNA in...
Noncoding RNAs (ncRNAs) play crucial regulatory roles in a variety of biological circuits. To document regulatory interactions between ncRNAs and biomolecules, we previously created the NPInter database (http://bigdata.ibp.ac.cn/npinter). Since the last version of NPInter was issued, a rapidly growing number of studies have reported novel interacti...
Retrotransposons are extensively populated in vertebrate genomes, which, when active, are thought to cause genome instability with potential benefit to genome evolution. Retrotransposon-derived RNAs are also known to give rise to small endo-siRNAs to help maintain heterochromatin at their sites of transcription; however, as not all heterochromatic...
A long-standing question in the field of embryogenesis is how the zygotic genome is precisely activated by maternal factors, allowing normal early embryonic development. We have previously shown that N6-methyladenine (6mA) DNA modification is highly dynamic in early Drosophila embryos and forms an epigenetic mark. However, little is known about how...
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences, collating information on ncRNA sequences of all types from a broad range of organisms. We have recently added a new genome mapping pipeline that identifies genomic locations for ncRNA sequences in 296 species. We have also added several new types of functional annotations,...
PIWI-interacting RNAs are a class of small RNAs that is most abundantly expressed in animal germline. Substantial research is going on to reveal the functions of piRNAs in the epigenetic and post-transcriptional regulation of transposons and genes. To collect and annotate these data, we developed piRBase, a database assisting piRNA functional study...
Mutations in several general pre-mRNA splicing factors have been linked to myelodysplastic syndromes (MDSs) and solid tumors. These mutations have generally been assumed to cause disease by the resultant splicing defects, but different mutations appear to induce distinct splicing defects, raising the possibility that an alternative common mechanism...
Whole genome sequencing technology has facilitated the discovery of a large number of somatic mutations in enhancers (SMEs), whereas the utility of SMEs in tumorigenesis has not been fully explored. Here we present Ennet, a method to comprehensively investigate SMEs enriched networks (SME-networks) in cancer by integrating SMEs, enhancer-gene inter...
Biological processes, especially developmental processes, are often dynamic. Previous BodyMap projects for human and mouse have provided researchers with portals to tissue-specific gene expression, but these efforts have not included dynamic gene expression patterns. Over the past few years, substantial progress in our understanding of the molecula...
Small proteins is the general term for proteins with length shorter than 100 amino acids. Identification and functional studies of small proteins have advanced rapidly in recent years, and several studies have shown that small proteins play important roles in diverse functions including development, muscle contraction and DNA repair. Identification...
Long noncoding RNAs (lncRNAs) are essential in many molecular pathways, and are frequently associated with disease, but the mechanisms of most lncRNAs have not yet been characterized. Genetic variations, including SNPs and structural variations, are widely distributed in the genome, including lncRNA gene regions. As the number of studies on lncRNAs...
Amphibian populations are experiencing catastrophic declines driven by the fungal pathogen Batrachochytrium dendrobatidis (Bd). Although horizontal gene transfer (HGT) facilitates the evolution and adaptation in many fungi by conferring novel function genes to the recipient fungi, inter-kingdom HGT in Bd remains largely unexplored. In this study, o...
Exon number for each horizontally transferred gene in the two strains of Bd.
Phylogenetic analyses of horizontally transferred genes in Bd derived from bacteria and oomycete. The Bayesian inference tree is shown unrooted. The Bayesian tree is virtually identical to ML and NJ trees. Numbers at nodes represent bayesian posterior probabilities (left) and bootstrap values of maximum likelihood (middle) and neighbor-joining (rig...
The code (script) and parameters used in the bioinformatic pipeline.
Piwi-interacting RNA (piRNA)is the largest class of small non-coding RNA,which usually expressed in germline cells.piRNA has a role in transposon silencing,and contributes to maintain ge-nome integrity.The C.elegans piRNA has a special role in a memory of previous gene expression.The discovery of piRNA in somatic cells and cancers showed the functi...
We here present BioCircos.js, an interactive and lightweight JavaScript library especially for biological data interactive
visualization. BioCircos.js facilitates the development of web-based applications for circular visualization of various biological
data, such as genomic features, genetic variations, gene expression and biomolecular interaction...
Female fig wasps differ phenotypically from conspecific males to the extent that often they cannot be associated with one another. Weighted gene co-expression network analysis (WGCNA) of the genome and transcriptomes of one such fig wasp, Ceratosolen solmsi, generated five expression modules, which were flagged as blue, turquoise, brown, green and...
Invertebrates can acquire functional genes via horizontal gene transfer (HGT) from bacteria but fishes are not known to do so. We provide the first reliable evidence of one HGT event from marine bacteria to fishes. The HGT appears to have occurred after emergence of the teleosts. The transferred gene is expressed and regulated developmentally. Its...
To date, biologists have discovered a large amount of valuable information from assembled genomes, but the abundant microbial data that is hidden in the raw genomic sequence data of plants and animals is usually ignored. In this study, the richness and composition of fungal community were determined in the raw genomic sequence data of Ceratosolen s...
Fig wasps exhibit extreme intraspecific morphological divergence in the wings, compound eyes, antennae, body color, and size. Corresponding to this, behaviors and lifestyles between two sexes are also different: females can emerge from fig and fly to other fig tree to oviposit and pollinate, while males live inside fig for all their lifetime. Genet...
DNA N(6)-methyladenine (6mA) modification is commonly found in microbial genomes and plays important functions in regulating numerous biological processes in bacteria. However, whether 6mA occurs and what its potential roles are in higher-eukaryote cells remain unknown. Here, we show that 6mA is present in Drosophila genome and that the 6mA modific...
PIWI-interacting RNAs (piRNA) are endogenous small RNAs (sRNA), which play roles in resisting exogenous gene invasion and transposon mobility. More than 16 000 piRNAs are found in Caenorhabditis elegans, and the piRNA loci share a conserved upstream sequence. New piRNAs can be predicted based on its conserved upstream sequence. C. elegans are synch...
Background
In metazoans, Piwi-related Argonaute proteins play important roles in maintaining germline integrity and fertility and have been linked to a class of germline-enriched small RNAs termed piRNAs. Caenorhabditis elegans encodes two Piwi family proteins called PRG-1 and PRG-2, and PRG-1 interacts with the C. elegans piRNAs (21U-RNAs). Previo...
Fig pollinating wasps form obligate symbioses with their fig hosts. This mutualism arose approximately 75 million years ago. Unlike many other intimate symbioses, which involve vertical transmission of symbionts to host offspring, female fig wasps fly great distances to transfer horizontally between hosts. In contrast, male wasps are wingless and c...
Many studies have reported horizontal gene transfer (HGT) events from eukaryotes, especially fungi. However, only a few investigations summarized multiple interkingdom HGTs involving important phytopathogenic species of Pyrenophora and few have investigated the genetic contributions of HGTs to fungi. We investigated HGT events in P. teres and P. tr...
The value of 4 index of codon bias: CAI, CBI, Fop and ENC of HGT genes, P. tritici-repentis and top-hit species in non-fungal groups. CAI value. (B) CBI value. (C) Fop value. (D) ENC value. The CAI, CBI, Fop and ENC value of P. tritici-repentis are the mean value of all the CDS. Gene 1–14 refers to the genes coding leucine rich repeat protein, meth...
Lists of 93 fungi used in the analyses. Species with red font means that they associate with plants in their lifestyles.
(XLS)
The phylogenetic trees of 16 types HGT genes in Pyrenophora species. Bayesian trees are shown; the ML trees and NJ trees exhibited substantially the same topologies. Nodal support values ≥50 shown (BI/ML/NJ). Asterisks (*) indicate support values <50. Pyrenophora sequences are indicated in red, while HGT gene sequences from other fungi are indicate...
GC and GC3s content of horizontally transferred genes in P. tritici-repentis and top-hit species in non-fungal groups. GC3s and GC content of P. tritici-repentis are the mean values of the complete coding sequence (CDS). Gene 1–14 refers to the genes coding leucine-rich repeat protein, methyltransferase MppJ, beta-galactosidase, UDP-glucosyltransfe...
Eukaryotic horizontal gene transfer (HGT) events are increasingly being discovered yet few reports have summarized multiple occurrences in a wide range of species. We systematically investigated HGT events in the order Lepidoptera by employing a series of filters. Bombyx mori, Danaus plexippus and Heliconius melpomene had 13, 12 and 12 HGTs, respec...
Horizontal gene transfer (HGT) is one of the major mechanisms contributing to microbial genome diversification. A number of computational methods for finding horizontally transferred genes have been proposed in the past decades; however none of them has provided a reliable detector yet. In existing parametric approaches, only one single composition...
The human body harbors numerous microbes, and here exists a close relationship between microbes and human health. The Human Microbiome Project has generated whole genome sequences of several hundred human microbes. In this study, we identified horizontal gene transfer (HGT) events in human microbes and tried to elucidate the relationships between t...
Upwards of 1200 miRNA loci have hitherto been annotated in the human genome. The specific features defining a miRNA precursor and deciding its recognition and subsequent processing are not yet exhaustively described and miRNA loci can thus not be computationally identified with sufficient confidence.
We rendered pre-miRNA and non-pre-miRNA hairpins...
Specific combinations of nucleotide and structural information. A. Frequency of co-occurring nucleotide and structural notations. B. Three significantly enriched “neighbouring” nucleotide-structure notations among the pre-miRNA ss-motifs.
(TIF)
Supplementary methods and results.
(DOC)
SVM pre-miRNA prediction with increasing number (N) of features.
(DOC)
Three miRNA families used for ss-motif similarity analysis.
(DOC)
SVM pre-miRNA prediction after adjusting for sequence similarity. The table shows prediction accuracy (ACC) after filtering out pre-miRNAs with sequences identity >70% or 80%, respectively. N denotes the number of feature (i.e., ss-motifs) included in the SVM.
(DOC)
Normalisation of ss-motif positions in a pre-miRNA sequence. x1–x4 indicate ss-motif positions. Red sections indicate the positions of the mature miRNA/miRNA* sequences.
(TIF)
Distribution of nucleotide notations. Light hues (pink, light green) indicates the positive and negative randomly selected sequences (RSS). Darker hues (red, green) indicates the actual ss-motifs derived from the positive (pre-miRNA) and negative (CDS hairpin) training sets.
(TIF)
Statistical evaluation of the ss-motif information content.
(XLS)
Flow chart for the process of ncRNA identification in human fetal brain. Pipline of is-ncRNAs identification and confirmation in human fetal brain as indicated in the figure.
(TIF)
Clustered expression profiles of is-ncRNAs during human fetal brain development. Expression patterns of 326 clustered ncRNAs (the figure includes both novel and known ncRNAs) during human fetal brain development.
(TIF)
Northern blot analysis of 58 ncRNAs in human fetal brain. As indicated, 58 is-ncRNAs identified in human fetal brain were confirmed by Northern blot analysis. Most of all have a single band within the expected size range. In some case with multiple bands, at least one band within the expected size range.
(TIF)
RT-PCR analysis of 31 ncRNAs in human fetal brain. As indicated, RT-PCR products of 31 ncRNAs in PAGE gels. RT+ indicated reaction with reverse transcriptase and RT- indicated omission of reverse transcriptase from the reaction to exclude the possible contamination by genomic DNAs. All the RT-PCR products are within the expected size range.
(TIF)
Clustered expression profiles of is-ncRNAs in tumor cell lines. Expression patterns of 326 clustered ncRNAs (the figure includes both novel and known ncRNAs) in glioma cell line U251 and neuroblastoma cell line SH-SY5Y, as compared with normal brain tissue.
(TIF)
Genomic location of the novel ncRNA genes. The length and chromosome location of all the novel ncRNAs are presented in the table.
(DOC)
Predicted snoRNAs or scaRNAs of novel ncRNAs. Fourteen ncRNAs with clear snoRNA or scaRNA characteristics were identified. As indicated, four ncRNAs were identified as C/D box snoRNAs, nine as H/ACA box snoRNAs, and one transcript (nc089) which showed both C/D box and H/ACA box characteristics is a likely scaRNA candidate.
(DOC)
Novel ncRNAs involved in axon guidance pathway. Four ncRNAs, their host genes and annotations are shown in the table.
(DOC)
Oligos used and all novel ncRNA sequences.
(DOC)
BlastN sequence alignments of novel is-ncRNAs in primate genomes. (A) Alignments of all novel is-ncRNAs in primate genomes. (B) Alignments of ‘Primate specific’ novel is-ncRNAs in primate genomes.
(TIF)
Clustered expression profiles of is-ncRNAs in different tissues. Expression patterns of 326 clustered ncRNAs (the figure includes both novel and known ncRNAs) in human fetal brain, liver, spleen, lung and heart tissues.
(TIF)
Distribution of sequenced clones. Distribution of sequenced library clones on different RNA species and categories. Sequenced clone numbers and percentage of novel and known ncRNAs are indicated in the table.
(DOC)
The involvement of noncoding RNAs (ncRNAs) in the development of the human brain remains largely unknown. Applying a cloning strategy for detection of intermediate size (50-500 nt) ncRNAs (is-ncRNAs) we have identified 82 novel transcripts in human fetal brain tissue. Most of the novel is-ncRNAs are not well conserved in vertebrates, and several tr...