[Show abstract][Hide abstract] ABSTRACT: Two distinct myoglobin (mb) transcripts have been reported in common carp, Cyprinus carpio, which is a hypoxia-tolerant fish living in habitats with greatly fluctuant dissolved oxygen levels. Recombinant protein analysis has shown functional specialization of the two mb transcripts. In this work, analysis for mb-containing bacterial artificial chromosome (BAC) clones indicated different genome loci for common carp myoglobin-1 (mb-1) and myoglobin-2 (mb-2) genes. Fluorescence in situ hybridization (FISH) revealed that mb-1 and mb-2 located on separate chromosomes. In both of the mb-1 and mb-2 containing BAC clones, gene synteny was well conserved with the homologous region on zebrafish chromosome 1, supporting that the common carp specific mb-2 gene originated from the recent whole genome duplication event in cyprinid lineage. Transcription factor binding sites search indicated that both common carp mb genes lacked specificity Protein 1 (Sp1) and myocyte enhancer factor-2 (MEF2) binding sites, which mediated muscle-specific and calcium-dependent expression in the well-studied mb promoters. Potential hypoxia response elements (HRE) were predicted in the regulatory region of common carp mb genes. These characteristics of common carp mb gene regulatory region well interpreted the hypoxia-inducible, non-muscle expression pattern of mb-1. In the case of mb-2, a 10bp insertion to the binding site of upstream stimulatory factor (USF), which was a co-factor of hypoxia inducible factor (HIF), might cause the non-response to hypoxia treatment of mb-2. The case of common carp mb gene duplication and subsequent differentiation in expression pattern and protein function provided an example for adaptive evolution toward aquatic hypoxia tolerance.
[Show abstract][Hide abstract] ABSTRACT: Both sexual reproduction and unisexual reproduction are adaptive strategies for species survival and evolution. Unisexual animals have originated largely by hybridization, which tends to elevate their heterozygosity. However, the extent of genetic diversity resulting from hybridization and the genomic differences that determine the type of reproduction are poorly understood. In Carassius auratus, sexual diploids and unisexual triploids coexist. These two forms are similar morphologically but differ markedly in their modes of reproduction. Investigation of their genomic differences will be useful to study genome diversity and the development of reproductive mode. We generated transcriptomes for the unisexual and sexual populations. Genes were identified using homology searches and an ab initio method. Estimation of the synonymous substitution rate in the orthologous pairs indicated that the hybridization of gibel carp occurred 2.2 million years ago. Microsatellite genotyping in each individual from the gibel carp population indicated that most gibel carp genes were not tri-allelic. Molecular function and pathway comparisons suggested few gene expansions between them, except for the progesterone-mediated oocyte maturation pathway, which is enriched in gibel carp. Differential expression analysis identified highly expressed genes in gibel carp. The transcriptomes provide information on genetic diversity and genomic differences, which should assist future studies in functional genomics.
International Journal of Molecular Sciences 06/2014; 15(6):9386-406. · 2.46 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Abstract The guppy (Poecilia reticulata), a member of the Poeciliidae family, is one of the most popular aquarium fish. Here, we reported the complete mitochondrial genome of P. reticulata. The genome is 16,570 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The structure of non-coding control region was also analyzed. Comparing the mitochondrial genome of P. reticulata with its congener Xiphophorus maculatus revealed the high sequence similarity and the identical gene structure. The complete mitochondrial genome of the guppy would help study the evolution of Poeciliidae family.
[Show abstract][Hide abstract] ABSTRACT: Abstract The kissing gourami (Helostoma temminkii) belongs to the Labyrinth fishes (Perciformes: Anabantoidei), which exhibits a wide variety of behavioral traits. In this study the complete mitogenome of H. temminkii was determined to be 16,740 bp in length. It contains 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The sequence structure of non-coding control region was also analyzed. Comparing the mitochondrial genome of H. temminkii with its closely related species Colisa lalia showed the similarity of 78%. The complete mitochondrial genome of H. temminkii provides resource for phylogenetic analysis on Anabantoidei.
[Show abstract][Hide abstract] ABSTRACT: Abstract The complete mitochondrial genome of sterlet (Acipenser ruthenus) was determined in this study. The mitogenome is 16,790 bp in length and contains 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 2 non-coding regions (the control region and the putative origin of the light strand replication) with a typical vertebrate mitochondrial gene arrangement. The overall base composition of the heavy strand is 30.26% for A, 29.00% for C, 16.23% for G and 24.51% for T, with a slight AT bias of 54.77%.
[Show abstract][Hide abstract] ABSTRACT: Generation of large mate-pair libraries is necessary for de novo genome assembly but the procedure is complex and time-consuming. Furthermore, in some complex genomes, it is hard to increase the N50 length even with large mate-pair libraries, which leads to low transcript coverage. Thus, it is necessary to develop other simple scaffolding approaches, to at least solve the elongation of transcribed fragments.
We describe L_RNA_scaffolder, a novel genome scaffolding method that uses long transcriptome reads to order, orient and combine genomic fragments into larger sequences. To demonstrate the accuracy of the method, the zebrafish genome was scaffolded. With expanded human transcriptome data, the N50 of human genome was doubled and L_RNA_scaffolder out-performed most scaffolding results by existing scaffolders which employ mate-pair libraries. In these two examples, the transcript coverage was almost complete, especially for long transcripts. We applied L_RNA_scaffolder to the highly polymorphic pearl oyster draft genome and the gene model length significantly increased.
The simplicity and high-throughput of RNA-seq data makes this approach suitable for genome scaffolding. L_RNA_scaffolder is available at http://www.fishbrowser.org/software/L_RNA_scaffolder.
[Show abstract][Hide abstract] ABSTRACT: Abstract Pike perch (Sander canadensis) is a member of the largest order of Osteichthyes, Perciformes, and is an important ecological and economic freshwater species, which distributes in Ili River and Ergis River of Xinjiang Province, China. In this study, we sequenced the whole mitochondrial genome of pike perch, and analyzed the similarity with its related species. The mitochondrial genome of S. canadensis is 16,542 bp in length with 55.05% AT content, contained 13 protein coding genes, 22 tRNA genes, 2 ribosomal genes and an 892 bp non-coding region. In control region, 6 CSBs (CSB-1, CSB-2, CSB-3, CSB-D, CSB-E and CSB-F), one potential TAS and one poly-T region were identified. Comparing all protein-coding genes and whole genome sequence with 4 species of Perciformes (three species of Percidae, Perca flavescens. Percina macrolepida. Etheostoma radiosum and one outgroup Oreochromis sp. red tilapia), ND3 gene has the highest mutation rate, and S. canadensis has higher similarity with Perca flavescens than others. The mitochondrial genomic sequence will help us to study the conservation genetic and evolution of Percidae.
[Show abstract][Hide abstract] ABSTRACT: Abstract The complete mitochondrial genome of Amur sturgeon (Acipenser schrenckii) was determined in this study. The mitogenome is 16,684 bp in length and contains 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 2 non-coding regions (the control region and the putative origin of the light strand replication) with a typical vertebrate mitochondrial gene arrangement. The overall base composition of the heavy strand is 30.07% for A, 29.36% for C, 16.44% for G and 24.13% for T, with a slight AT bias of 54.20%.
[Show abstract][Hide abstract] ABSTRACT: Chinese sleeper (Perccottus glenii) belongs to the largest vertebrate order, Perciformes. In this study, the complete mitochondrial genome of P. glenii was determined to be 16,487 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. We also analysed the sequence structure of non-coding control region. Comparing the mitochondrial genome of P. glenii with its congener Rhyacichthys aspro showed sequence similarity and the identical gene arrangement. The complete mitochondrial genome of Chinese sleeper provides the basis for the studies in Perciformes evolution and conservation.
[Show abstract][Hide abstract] ABSTRACT: MicroRNAs (miRNAs) exist pervasively across viruses, plants and animals and play important roles in the post-transcriptional regulation of genes. In the common carp, miRNA targets have not been investigated. In model species, single-nucleotide polymorphisms (SNPs) have been reported to impair or enhance miRNA regulation as well as to alter miRNA biogenesis. SNPs are often associated with diseases or traits. To date, no studies into the effects of SNPs on miRNA biogenesis and regulation in the common carp have been reported.
Using homology-based prediction combined with small RNA sequencing, we have identified 113 common carp mature miRNAs, including 92 conserved miRNAs and 21 common carp specific miRNAs. The conserved miRNAs had significantly higher expression levels than the specific miRNAs. The miRNAs were clustered into three phylogenetic groups. Totally 394 potential miRNA binding sites in 206 target mRNAs were predicted for 83 miRNAs. We identified 13 SNPs in the miRNA precursors. Among them, nine SNPs had the potential to either increase or decrease the energy of the predicted secondary structures of the precursors. Further, two SNPs in the 3' untranslated regions of target genes were predicted to either disturb or create miRNA-target interactions.
The common carp miRNAs and their target genes reported here will help further our understanding of the role of miRNAs in gene regulation. The analysis of the miRNA-related SNPs and their effects provided insights into the effects of SNPs on miRNA biogenesis and function. The resource data generated in this study will help advance the study of miRNA function and phenotype-associated miRNA identification.
[Show abstract][Hide abstract] ABSTRACT: Natural antisense transcripts (NATs) exist ubiquitously in mammalian genomes and play roles in the regulation of gene expression. However, both the existence of bidirectional antisense RNA regulation and the possibility of protein-coding genes that function as antisense RNAs remain speculative. Here, we found that the protein-coding gene, deoxyhypusine synthase (DHPS), as the NAT of WDR83, concordantly regulated the expression of WDR83 mRNA and protein. Conversely, WDR83 also regulated DHPS by antisense pairing in a concordant manner. WDR83 and DHPS were capable of forming an RNA duplex at overlapping 3' untranslated regions and this duplex increased their mutual stability, which was required for the bidirectional regulation. As a pair of protein-coding cis-sense/antisense transcripts, WDR83 and DHPS were upregulated simultaneously and correlated positively in gastric cancer (GC), driving GC pathophysiology by promoting cell proliferation. Furthermore, the positive relationship between WDR83 and DHPS was also observed in other cancers. The bidirectional regulatory relationship between WDR83 and DHPS not only enriches our understanding of antisense regulation, but also provides a more complete understanding of their functions in tumor development.
Cell Research 04/2012; 22(9):1374-89. · 11.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Common carp (Cyprinus carpio) is thought to have undergone one extra round of genome duplication compared to zebrafish. Transcriptome analysis has been used to study the existence and timing of genome duplication in species for which genome sequences are incomplete. Large-scale transcriptome data for the common carp genome should help reveal the timing of the additional duplication event.
We have sequenced the transcriptome of common carp using 454 pyrosequencing. After assembling the 454 contigs and the published common carp sequences together, we obtained 49,669 contigs and identified genes using homology searches and an ab initio method. We identified 4,651 orthologous pairs between common carp and zebrafish and found 129,984 paralogous pairs within the common carp. An estimation of the synonymous substitution rate in the orthologous pairs indicated that common carp and zebrafish diverged 120 million years ago (MYA). We identified one round of genome duplication in common carp and estimated that it had occurred 5.6 to 11.3 MYA. In zebrafish, no genome duplication event after speciation was observed, suggesting that, compared to zebrafish, common carp had undergone an additional genome duplication event. We annotated the common carp contigs with Gene Ontology terms and KEGG pathways. Compared with zebrafish gene annotations, we found that a set of biological processes and pathways were enriched in common carp.
The assembled contigs helped us to estimate the time of the fourth-round of genome duplication in common carp. The resource that we have built as part of this study will help advance functional genomics and genome annotation studies in the future.
[Show abstract][Hide abstract] ABSTRACT: Next-generation DNA sequencing technologies generate tens of millions of sequencing reads in one run. These technologies are now widely used in biology research such as in genome-wide identification of polymorphisms, transcription factor binding sites, methylation states, and transcript expression profiles. Mapping the sequencing reads to reference genomes efficiently and effectively is one of the most critical analysis tasks. Although several tools have been developed, their performance suffers when both multiple substitutions and insertions/deletions (indels) occur together.
We report a new algorithm, Basic Oligonucleotide Alignment Tool (BOAT) that can accurately and efficiently map sequencing reads back to the reference genome. BOAT can handle several substitutions and indels simultaneously, a useful feature for identifying SNPs and other genomic structural variations in functional genomic studies. For better handling of low-quality reads, BOAT supports a "3'-end Trimming Mode" to build local optimized alignment for sequencing reads, further improving sensitivity. BOAT calculates an E-value for each hit as a quality assessment and provides customizable post-mapping filters for further mapping quality control.
Evaluations on both real and simulation datasets suggest that BOAT is capable of mapping large volumes of short reads to reference sequences with better sensitivity and lower memory requirement than other currently existing algorithms. The source code and pre-compiled binary packages of BOAT are publicly available for download at http://boat.cbi.pku.edu.cn under GNU Public License (GPL). BOAT can be a useful new tool for functional genomics studies.
[Show abstract][Hide abstract] ABSTRACT: Natural antisense transcripts are at least partially complementary to their sense transcripts. Cis-Sense/Antisense pairs (cis-SAs) have been extensively characterized and known to play diverse regulatory roles, whereas trans-Sense/Antisense pairs (trans-SAs) in animals are poorly studied. We identified long trans-SAs in human and nine other animals, using ESTs to increase coverage significantly over previous studies. The percentage of transcriptional units (TUs) involved in trans-SAs among all TUs was as high as 4.13%. Particularly 2896 human TUs (or 2.89% of all human TUs) were involved in 3327 trans-SAs. Sequence complementarities over multiple segments with predicted RNA hybridization indicated that some trans-SAs might have sophisticated RNA-RNA pairing patterns. One-fourth of human trans-SAs involved noncoding TUs, suggesting that many noncoding RNAs may function by a trans-acting antisense mechanism. TUs in trans-SAs were statistically significantly enriched in nucleic acid binding, ion/protein binding and transport and signal transduction functions and pathways; a significant number of human trans-SAs showed concordant or reciprocal expression pattern; a significant number of human trans-SAs were conserved in mouse. This evidence suggests important regulatory functions of trans-SAs. In 30 cases, trans-SAs were related to cis-SAs through paralogues, suggesting a possible mechanism for the origin of trans-SAs. All trans-SAs are available at http://trans.cbi.pku.edu.cn/.
Nucleic Acids Research 09/2008; 36(15):4833-44. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Common carp is one of the most important aquaculture species in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. Common carp is tetroploid teleost fish with an estimated haploid genome size of about 1700 Mbp and 2n = 100 chromosomes. Although molecular genetics study of common carp had been conducted productively in China for breeding and genetic improvement purpose, genomic resources of the common carp are still relatively underdeveloped until the Common Carp Genome Project (CCGP) was initiated on December 2009. The project aims at building the necessary tools and resources, identifying commercial important genes or traits and boosting the aquaculture industry of common carp. Besides, CCGP will also provide valuable genome data for the study of vertebrate genome evolution and phylogeny as common carp may involve an additional 4th Round of Whole Genome Duplication. CCGP currently includes following tasks: whole genome shotgun deep sequencing and draft assembly employing the next generation sequencing technology, BAC library construction and genome-wide BAC-end sequencing, BAC-based physical mapping, genome-wide marker development and high-density linkage mapping, FISH-based chromosome mapping, transcriptome deep sequencing and genome annotation, functional genomics, bioinformatics and database construction. To date, whole genome sequencing had been completed using multiple platforms including Roche 454, Illumina, SOLiD and traditional Sanger method with various library insertion lengths, generating 175 folds of common carp genome equivalence. The assembled genome has scaffold N50 length of 1.14 Mb and contig N50 length of 31 .1 kb, covering 1710 Mb of common carp. Transcriptome sequences had been collected from both Roche 454 and Illumina HiSeq 2000 platforms. Genome annotation was performed based on transcriptome homology and de novo prediction methods. A total of 31404 genes was annotated in the genome with average length 1917 bp. The fingerprint from over 90,000 BACs of 7x genome equivalence had been collected and the first BAC-based physical map of common carp was constructed, containing 3696 BAC contigs with the N50 length of 688 kb. Over 800,000 SNPs and 13,000 microsatellite loci had been identified from common carp cDNA and BAC-end sequences for high density linkage mapping and map integration. The high-density linkage map of common carp had been constructed with approximate 6000 markers, including around 5000 SNP markers and 1000 SSR markers. The completion of CCGP will lead to better management of breeding selection, disease prevention and transgenic study on the common carp, as well as the other Cyprinidae species.
International Plant and Animal Genome Conference XX 2012;