[Show abstract][Hide abstract] ABSTRACT: With the rapid growth of genome sequencing projects, genome browser is becoming indispensable, not only as a visualization system but also as an interactive platform to support open data access and collaborative work. Thus a customizable genome browser framework with rich functions and flexible configuration is needed to facilitate various genome research projects.
Based on next-generation web technologies, we have developed a general-purpose genome browser framework ABrowse which provides interactive browsing experience, open data access and collaborative work support. By supporting Google-map-like smooth navigation, ABrowse offers end users highly interactive browsing experience. To facilitate further data analysis, multiple data access approaches are supported for external platforms to retrieve data from ABrowse. To promote collaborative work, an online user-space is provided for end users to create, store and share comments, annotations and landmarks. For data providers, ABrowse is highly customizable and configurable. The framework provides a set of utilities to import annotation data conveniently. To build ABrowse on existing annotation databases, data providers could specify SQL statements according to database schema. And customized pages for detailed information display of annotation entries could be easily plugged in. For developers, new drawing strategies could be integrated into ABrowse for new types of annotation data. In addition, standard web service is provided for data retrieval remotely, providing underlying machine-oriented programming interface for open data access.
ABrowse framework is valuable for end users, data providers and developers by providing rich user functions and flexible customization approaches. The source code is published under GNU Lesser General Public License v3.0 and is accessible at http://www.abrowse.org/. To demonstrate all the features of ABrowse, a live demo for Arabidopsis thaliana genome has been built at http://arabidopsis.cbi.edu.cn/.
[Show abstract][Hide abstract] ABSTRACT: The concurrent release of rice genome sequences for two subspecies (Oryza sativa L. ssp. japonica and Oryza sativa L. ssp. indica) facilitates rice studies at the whole genome level. Since the advent of high-throughput analysis, huge amounts of functional genomics data have been delivered rapidly, making an integrated online genome browser indispensable for scientists to visualize and analyze these data. Based on next-generation web technologies and high-throughput experimental data, we have developed Rice-Map, a novel genome browser for researchers to navigate, analyze and annotate rice genome interactively.
More than one hundred annotation tracks (81 for japonica and 82 for indica) have been compiled and loaded into Rice-Map. These pre-computed annotations cover gene models, transcript evidences, expression profiling, epigenetic modifications, inter-species and intra-species homologies, genetic markers and other genomic features. In addition to these pre-computed tracks, registered users can interactively add comments and research notes to Rice-Map as User-Defined Annotation entries. By smoothly scrolling, dragging and zooming, users can browse various genomic features simultaneously at multiple scales. On-the-fly analysis for selected entries could be performed through dedicated bioinformatic analysis platforms such as WebLab and Galaxy. Furthermore, a BioMart-powered data warehouse "Rice Mart" is offered for advanced users to fetch bulk datasets based on complex criteria.
Rice-Map delivers abundant up-to-date japonica and indica annotations, providing a valuable resource for both computational and bench biologists. Rice-Map is publicly accessible at http://www.ricemap.org/, with all data available for free downloading.
[Show abstract][Hide abstract] ABSTRACT: We updated the plant transcription factor (TF) database to version 2.0 (PlantTFDB 2.0, http://planttfdb.cbi.pku.edu.cn) which contains 53,319 putative TFs predicted from 49 species. We made detailed annotation including general information, domain feature, gene ontology, expression pattern and ortholog groups, as well as cross references to various databases and literature citations for these TFs classified into 58 newly defined families with computational approach and manual inspection. Multiple sequence alignments and phylogenetic trees for each family can be shown as Weblogo pictures or downloaded as text files. We have redesigned the user interface in the new version. Users can search TFs with much more flexibility through the improved advanced search page, and the search results can be exported into various formats for further analysis. In addition, we now provide web service for advanced users to access PlantTFDB 2.0 more efficiently.
[Show abstract][Hide abstract] ABSTRACT: Transcription factors (TFs) play an important role in gene regulation. Computational identification and annotation of TFs at genome scale are the first step toward understanding the mechanism of gene expression and regulation. We started to construct the database of Arabidopsis TFs in 2005 and developed a pipeline for systematic identification of plant TFs from genomic and transcript sequences. In the following years, we built a database of plant TFs (PlantTFDB, http://planttfdb.cbi.pku.edu.cn ) which contains putative TFs identified from 22 species including five model organisms and 17 economically important plants with available EST sequences. To provide comprehensive information for the putative TFs, we made extensive annotation at both the family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of the TFs in other species. PlantTFDB has a simple interface to allow users to make text queries, or BLAST searches, and to download TF sequences for local analysis. We hope that PlantTFDB could provide the user community with a useful resource for studying the function and evolution of transcription factors.
[Show abstract][Hide abstract] ABSTRACT: Next-generation DNA sequencing technologies generate tens of millions of sequencing reads in one run. These technologies are now widely used in biology research such as in genome-wide identification of polymorphisms, transcription factor binding sites, methylation states, and transcript expression profiles. Mapping the sequencing reads to reference genomes efficiently and effectively is one of the most critical analysis tasks. Although several tools have been developed, their performance suffers when both multiple substitutions and insertions/deletions (indels) occur together.
We report a new algorithm, Basic Oligonucleotide Alignment Tool (BOAT) that can accurately and efficiently map sequencing reads back to the reference genome. BOAT can handle several substitutions and indels simultaneously, a useful feature for identifying SNPs and other genomic structural variations in functional genomic studies. For better handling of low-quality reads, BOAT supports a "3'-end Trimming Mode" to build local optimized alignment for sequencing reads, further improving sensitivity. BOAT calculates an E-value for each hit as a quality assessment and provides customizable post-mapping filters for further mapping quality control.
Evaluations on both real and simulation datasets suggest that BOAT is capable of mapping large volumes of short reads to reference sequences with better sensitivity and lower memory requirement than other currently existing algorithms. The source code and pre-compiled binary packages of BOAT are publicly available for download at http://boat.cbi.pku.edu.cn under GNU Public License (GPL). BOAT can be a useful new tool for functional genomics studies.
[Show abstract][Hide abstract] ABSTRACT: With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at http://weblab.cbi.pku.edu.cn, with all source code released as Free Software.
Nucleic Acids Research 06/2009; 37(Web Server issue):W33-9. DOI:10.1093/nar/gkp428 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide duplication is ubiquitous during diversification of the angiosperms, and gene duplication is one of the most important mechanisms for evolutionary novelties. As an indicator of functional evolution, the divergence of expression patterns following duplication events has drawn great attention in recent years. Using large-scale whole-genome microarray data, we systematically analyzed expression divergence patterns of rice genes from block, tandem and dispersed duplications.
We found a significant difference in expression divergence patterns for the three types of duplicated gene pairs. Expression correlation is significantly higher for gene pairs from block and tandem duplications than those from dispersed duplications. Furthermore, a significant correlation was observed between the expression divergence and the synonymous substitution rate which is an approximate proxy of divergence time. Thus, both duplication types and divergence time influence the difference in expression divergence. Using a linear model, we investigated the influence of these two variables and found that the difference in expression divergence between block and dispersed duplicates is attributed largely to their different divergence time. In addition, the difference in expression divergence between tandem and the other two types of duplicates is attributed to both divergence time and duplication type.
Consistent with previous studies on Arabidopsis, our results revealed a significant difference in expression divergence between the types of duplicated genes and a significant correlation between expression divergence and synonymous substitution rate. We found that the attribution of duplication mode to the expression divergence implies a different evolutionary course of duplicated genes.
[Show abstract][Hide abstract] ABSTRACT: We made genome-wide analyses to explore the evolutionary process of the SBP-box gene family. We identified 120 SBP-box genes from nine species representing the main green plant lineages: green alga, moss, lycophyte, gymnosperm and angiosperm. A maximum-likelihood phylogenetic tree was constructed using the protein sequences of the DNA-binding domain of SBP-box genes (SBP-domain). Our results revealed that all SBP-box genes of green alga clustered into a single clade (CR group), while all genes from land-plants fell into two distinct groups. Group I had a single copy in each species except for poplar while group II had several members in each species and can be divided into several subgroups. The SBP-domain encoded by all SBP-box genes possesses two zinc fingers. The C-terminal zinc finger of both group I and group II had the same C2HC motif while their N-terminal zinc finger showed different signatures, C4 in group I and C3H in group II. The patterns of exon-intron structure in Arabidopsis and rice SBP-box genes were consistent with the phylogenetic results. A target site of microRNA miR156 was highly conserved among land-plant SBP-box genes. Our results suggested that the SBP-box gene family might have originated from a common ancestor of green plants, followed by duplication and divergence in each lineage including exon-intron loss processes.
[Show abstract][Hide abstract] ABSTRACT: As a hepatitis B virus (HBV) envelope domain, preS plays significant roles in receptor recognition and viral infection. However, the regions critical for maintaining a stable and functional conformation of preS are still unclear and require further investigation. In order to unravel these regions, serially truncated fragments of preS were constructed and expressed in Escherichia coli. Their solubility, stability, secondary structure, and affinity to polyclonal antibodies and hepatocytes were examined. The results showed that amino acids 31-36 were vital for its stable conformation, and the absence of 10-36 amino acids significantly reduced its binding to polyclonal antibodies as well as hepatocytes. The most stable fragment 1-120 (preS1 + N-terminal 12 amino acids of preS2), perhaps the core of preS, was discovered, which bound to HepG2 cells most tightly. Moreover, the availability of large amounts of well-folded and stable preS1-120 enables us to carry out further structural determination and mechanistic study on HBV infection.
[Show abstract][Hide abstract] ABSTRACT: Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26,402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis.
[Show abstract][Hide abstract] ABSTRACT: Hypoxanthine-guanine phosphoribosyltransferase (HGPRT) is a potential target for structure-based inhibitor design for the treatment of parasitic diseases. We created point mutants of Thermoanaerobacter tengcongensis HGPRT and tested their activities to identify side chains that were important for function. Mutating residues Leu160 and Lys133 substantially diminished the activity of HGPRT, confirming their importance in catalysis. All 11 HGPRT mutants were subject to crystallization screening. The crystal structure of one mutant, L160I, was determined at 1.7 A resolution. Surprisingly, the active site is occupied by a peptide from the N-terminus of a neighboring tetramer. These crystal contacts suggest an alternate strategy for structure-based inhibitor design.
[Show abstract][Hide abstract] ABSTRACT: NAD(P) has long been known as an essential energy-carrying molecule in cells. Recent data, however, indicate that NAD(P) also plays critical signaling roles in regulating cellular functions. The crystal structure of a human protein, HSCARG, with functions previously unknown, has been determined to 2.4-A resolution. The structure reveals that HSCARG can form an asymmetrical dimer with one subunit occupied by one NADP molecule and the other empty. Restructuring of its NAD(P)-binding Rossmann fold upon NADP binding changes an extended loop to an alpha-helix to restore the integrity of the Rossmann fold. The previously unobserved restructuring suggests that HSCARG may assume a resting state when the level of NADP(H) is normal within the cell. When the NADP(H) level passes a threshold, an extensive restructuring of HSCARG would result in the activation of its regulatory functions. Immunofluorescent imaging shows that HSCARG redistributes from being associated with intermediate filaments in the resting state to being dispersed in the nucleus and the cytoplasm. The structural change of HSCARG upon NADP(H) binding could be a new regulatory mechanism that responds only to a significant change of NADP(H) levels. One of the functions regulated by HSCARG may be argininosuccinate synthetase that is involved in NO synthesis.
Proceedings of the National Academy of Sciences 06/2007; 104(21):8809-14. DOI:10.1073/pnas.0700480104 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The use of oseltamivir, widely stockpiled as one of the drugs for use in a possible avian influenza pandemic, has been reported to be associated with neuropsychiatric disorders and severe skin reactions, primarily in Japan. Here we identified a nonsynonymous SNP (single nucleotide polymorphism) in dbSNP database, R41Q, near the enzymatic active site of human cytosolic sialidase, a homologue of virus neuraminidase that is the target of oseltamivir. This SNP occurred in 9.29% of Asian population and none of European and African American population. Our structural analyses and Ki measurements using in vitro sialidase assays indicated that this SNP could increase the unintended binding affinity of human sialidase to oseltamivir carboxylate, the active form of oseltamivir, thus reducing sialidase activity. In addition, this SNP itself results in an enzyme with an intrinsically lower sialidase activity, as shown by its increased Km and decreased Vmax values. Theoretically administration of oseltamivir to people with this SNP might further reduce their sialidase activity. We note the similarity between the reported neuropsychiatric side effects of oseltamivir and the known symptoms of human sialidase-related disorders. We propose that this Asian-enriched sialidase variation caused by the SNP, likely in homozygous form, may be associated with certain severe adverse reactions to oseltamivir.
Cell Research 04/2007; 17(4):357-62. DOI:10.1038/cr.2007.27 · 12.41 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Hepatitis B virus (HBV) infection is a serious health problem worldwide. Treatment recommendation and response are mainly indicated by viral load, e antigen (HBeAg) seroconversion, and ALT levels. The S antigen (HBsAg) seroconversion is much less frequent. Since HBeAg can be negative in the presence of high viral replication, preS antigen (HBpreSAg) might be a useful indicator in management of chronic HBV infection.
A new assay of double antibody sandwich ELISA was established to detect preS antigens. Sera of 104 HBeAg-negative and 50 HBeAg-positive chronic hepatitis B patients have been studied and 23 HBeAg-positive patients were enrolled in a treatment follow-up study. 70% of the HBeAg-positive patients and 47% of the HBeAg-negative patients showed HBpreSAg positive. Particularly, in the HBeAg-negative patients, 30 out of 47 HBpreSAg positive patients showed no evidence of viral replication based on HBV DNA copies. A comparison with HBV DNA copies demonstrated that the overall accuracy of the HBpreSAg test could reach 72% for active HBV replication. HBpreSAg changes were well correlated with changes of HBsAg, HBV DNA and ALT levels during the course of IFN-alpha treatment and follow-up. HBeAg positive patients responded well to treatment when reduction of HBpreSAg levels was more pronounced.
Our results suggested that HBpreSAg could be detected effectively, and well correlated with HBsAg and HBV DNA copies. The reduction of HBpreSAg levels in conjunction with the HBV DNA copies appears to be an improved predictor of treatment outcome.
[Show abstract][Hide abstract] ABSTRACT: DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. It includes detailed annotations of each TF including sequence features, functional domains, Gene Ontology assignment, chromosomal localization, EST and microarray expression information, as well as multiple sequence alignment of the DNA-binding domains for each TF family. The database can be browsed and searched with a user-friendly web interface. AVAILABILITY: DRTF is available at http://drtf.cbi.pku.edu.cn
[Show abstract][Hide abstract] ABSTRACT: CARP is a novel pro-apoptotic protein that has been cloned and characterized in our previous report. Previous studies showed that suppression of CARP expression results in cell proliferation in several mammalian cell lines and over-expression of CARP leads to apoptosis and inhibition of proliferation in seven tumor cell lines [Liu et al., CARP is a novel caspase recruitment domain containing pro-apoptotic protein, Biochem. Biophys. Res. Commun. 293 (2002) 1396]. To obtain soluble and active form of CARP protein for further functional and structural studies, we have expressed CARP in Escherichia coli by using Gateway cloning system. Optimal induction and expression conditions were also studied. Recombinant histidine-tagged CARP was expressed in E. coli when the carp gene was subcloned into a Gateway expression vector pET21-DEST. The partially soluble recombinant CARP protein was purified to near homogeneity by a two-step FPLC procedure, first by Ni2+ affinity chromatography followed by a gel-filtration chromatography, which yielded about 10 mg protein/L culture with at least 95% purity. Two peaks were detected in the analytical gel-filtration chromatograph while only one peak corresponding to monomer of the CARP protein was left after adding 2 mM dithiothreitol (DTT). The polymers observed are likely due to the formation of intermolecular disulfide bridges. These results suggest that adding DTT is a good solution to prevent the formation of disulfide bonds and to stabilize the protein. Successfully growing crystals of the purified CARP protein also proved that we can produce well folded CARP protein in E. coli.
Protein Expression and Purification 03/2006; 45(2):329-34. DOI:10.1016/j.pep.2005.07.011 · 1.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Human HSCARG has been annotated as a possible cancer related protein. Amino acid homology, although at a low percentage, suggested that HSCARG contains NmrA domain and might be a member of short chain dehydrogenase reductase superfamily. In order to investigate its structure and function, HSCARG gene has been successfully expressed and purified in E. coli. HSCARG was crystallized and diffracted to a resolution of 2.4 A on Mar225 CCD Detector at SER-CAT 22BM synchrotron source. The crystals belong to F23 space group, with unit cell parameters a=b=c=223.30A, alpha=beta=gamma=90 degrees . There are two molecules per asymmetry unit.
Protein and Peptide Letters 02/2006; 13(9):955-7. DOI:10.2174/092986606778256135 · 1.07 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Human secreted proteins play a very important role in signal transduction. In order to study all potential secreted proteins identified from the human genome sequence, systematic production of large amounts of biologically active secreted proteins is a prerequisite. We selected 25 novel genes as a trial case for establishing a reliable expression system to produce active human secreted proteins in Escherichia coli. Expression of proteins with or without signal peptides was examined and compared in E. coli strains. The results indicated that deletion of signal peptides, to a certain extent, can improve the expression of these proteins and their solubilities. More importantly, under expression conditions such as induction temperature, N-terminus fusion peptides need to be optimized in order to express adequate amounts of soluble proteins. These recombinant proteins were characterized as well-folded proteins. This system enables us to rapidly obtain soluble and highly purified human secreted proteins for further functional studies.
Biochemical and Biophysical Research Communications 08/2005; 332(2):593-601. DOI:10.1016/j.bbrc.2005.04.163 · 2.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L. ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of low-cost whole-genome microarray. With the increasing number of organisms being sequenced, we believe that DB data will play an important role both in other assembly procedures and in future genomic studies.
Science in China Series C Life Sciences 07/2005; 48(3):300-6. DOI:10.1007/BF03183625 · 1.61 Impact Factor