Yujun Han

University of Georgia, Athens, GA, USA

Are you Yujun Han?

Claim your profile

Publications (23)193.31 Total impact

  • Article: Comparison of class 2 transposable elements at superfamily resolution reveals conserved and distinct features in cereal grass genomes.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: Class 2 transposable elements (TEs) are the predominant elements in and around plant genes where they generate significant allelic diversity. Using the complete sequences of four grasses, we have performed a novel comparative analysis of class 2 TEs. To ensure consistent comparative analyses, we re-annotated class 2 TEs in Brachypodium distachyon, Oryza sativa (rice), Sorghum bicolor and Zea mays and assigned them to one of the five cut-and-paste superfamilies found in plant genomes (Tc1/mariner, PIF/Harbinger, hAT, Mutator, CACTA). We have focused on noncoding elements because of their abundance, and compared superfamily copy number, size and genomic distribution as well as correlation with the level of nearby gene expression. RESULTS: Our comparison revealed both unique and conserved features. First, the average length or size distribution of elements in each superfamily is largely conserved, with the shortest always being Tc1/mariner elements, followed by PIF/Harbinger, hAT, Mutator and CACTA. This order also holds for the ratio of the copy numbers of noncoding to coding elements. Second, with the exception of CACTAs, noncoding TEs are enriched within and flanking genes, where they display conserved distribution patterns, having the highest peak in the promoter region. Finally, our analysis of microarray data revealed that genes associated with Tc1/mariner and PIF/Harbinger noncoding elements have significantly higher expression levels than genes without class 2 TEs. In contrast, genes with CACTA elements have significantly lower expression than genes without class 2 TEs. CONCLUSIONS: We have achieved the most comprehensive annotation of class 2 TEs to date in these four grass genomes. Comparative analysis of this robust dataset led to the identification of several previously unknown features of each superfamily related to copy number, element size, genomic distribution and correlation with the expression levels of nearby genes. These results highlight the importance of distinguishing TE superfamilies when assessing their impact on gene and genome evolution.
    BMC Genomics 01/2013; 14(1):71. · 4.07 Impact Factor
  • Source
    Article: A complete sequence and comparative analysis of a SARS-associated virus (Isolate BJ01)
    [show abstract] [hide abstract]
    ABSTRACT: The genome sequence of the Severe Acute Respiratory Syndrome (SARS)-associated virus provides essential information for the identification of pathogen(s), exploration of etiology and evolution, interpretation of transmission and pathogenesis, development of diagnostics, prevention by future vaccination, and treatment by developing new drugs. We report the complete genome sequence and comparative analysis of an isolate (BJ01) of the coronavirus that has been recognized as a pathogen for SARS. The genome is 29725 nt in size and has 11 ORFs (Open Reading Frames). It is composed of a stable region encoding an RNA-dependent RNA polymerase (composed of 2 ORFs) and a variable region representing 4 CDSs (coding sequences) for viral structural genes (the S, E, M, N proteins) and 5 PUPs (putative uncharacterized proteins). Its gene order is identical to that of other known coronaviruses. The sequence alignment with all known RNA viruses places this virus as a member in the family of Coronaviridae. Thirty putative substitutions have been identified by comparative analysis of the 5 SARS-associated virus genome sequences in GenBank. Fifteen of them lead to possible amino acid changes (non-synonymous mutations) in the proteins. Three amino acid changes, with predicted alteration of physical and chemical features, have been detected in the S protein that is postulated to be involved in the immunoreactions between the virus and its host. Two amino acid changes have been detected in the M protein, which could be related to viral envelope formation. Phylogenetic analysis suggests the possibility of non-human origin of the SARS-associated viruses but provides no evidence that they are man-made. Further efforts should focus on identifying the etiology of the SARS-associated virus and ruling out conclusively the existence of other possible SARS-related pathogen(s). KeywordsSevere Acute Respiratory Syndrome (SARS)-coronavirus-genome-phylogeny
    Chinese Science Bulletin 04/2012; 48(10):941-948. · 1.32 Impact Factor
  • Source
    Article: MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences.
    Yujun Han, Susan R Wessler
    [show abstract] [hide abstract]
    ABSTRACT: Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives.
    Nucleic Acids Research 09/2010; 38(22):e199. · 8.03 Impact Factor
  • Article: The B73 Maize Genome: Complexity, Diversity, and Dynamics
    [show abstract] [hide abstract]
    ABSTRACT: We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize.
    Science 11/2009; 326(5956):1112-1115. · 31.20 Impact Factor
  • Article: The B73 Maize Genome: Complexity, Diversity, and Dynamics
    [show abstract] [hide abstract]
    ABSTRACT: We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize.
    Science 11/2009; 326(5956):1112-1115. · 31.20 Impact Factor
  • Source
    Article: Detailed analysis of a contiguous 22-Mb region of the maize genome.
    [show abstract] [hide abstract]
    ABSTRACT: Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on approximately 1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses.
    PLoS Genetics 11/2009; 5(11):e1000728. · 8.69 Impact Factor
  • Source
    Article: The B73 maize genome: complexity, diversity, and dynamics.
    [show abstract] [hide abstract]
    ABSTRACT: We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize.
    Science 11/2009; 326(5956):1112-5. · 31.20 Impact Factor
  • Source
    Article: TARGeT: a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences.
    [show abstract] [hide abstract]
    ABSTRACT: Gene families compose a large proportion of eukaryotic genomes. The rapidly expanding genomic sequence database provides a good opportunity to study gene family evolution and function. However, most gene family identification programs are restricted to searching protein databases where data are often lagging behind the genomic sequence data. Here, we report a user-friendly web-based pipeline, named TARGeT (Tree Analysis of Related Genes and Transposons), which uses either a DNA or amino acid 'seed' query to: (i) automatically identify and retrieve gene family homologs from a genomic database, (ii) characterize gene structure and (iii) perform phylogenetic analysis. Due to its high speed, TARGeT is also able to characterize very large gene families, including transposable elements (TEs). We evaluated TARGeT using well-annotated datasets, including the ascorbate peroxidase gene family of rice, maize and sorghum and several TE families in rice. In all cases, TARGeT rapidly recapitulated the known homologs and predicted new ones. We also demonstrated that TARGeT outperforms similar pipelines and has functionality that is not offered elsewhere.
    Nucleic Acids Research 06/2009; 37(11):e78. · 8.03 Impact Factor
  • Source
    Article: ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun.
    [show abstract] [hide abstract]
    ABSTRACT: We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences.
    PLoS Computational Biology 10/2005; 1(4):e43. · 5.22 Impact Factor
  • Article: Applications of the double-barreled data in whole-genome shotgun sequence assembly and analysis.
    [show abstract] [hide abstract]
    ABSTRACT: Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L. ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of low-cost whole-genome microarray. With the increasing number of organisms being sequenced, we believe that DB data will play an important role both in other assembly procedures and in future genomic studies.
    Science in China Series C Life Sciences 07/2005; 48(3):300-6. · 1.61 Impact Factor
  • Source
    Article: The Genomes of Oryza sativa: a history of duplications.
    [show abstract] [hide abstract]
    ABSTRACT: We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000-40,000. Only 2%-3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
    PLoS Biology 03/2005; 3(2):e38. · 11.45 Impact Factor
  • Source
    Article: Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans.
    [show abstract] [hide abstract]
    ABSTRACT: Genomics provides an unprecedented opportunity to probe in minute detail into the genomes of the world's most deadly pathogenic bacteria- Yersinia pestis. Here we report the complete genome sequence of Y. pestis strain 91001, a human-avirulent strain isolated from the rodent Brandt's vole-Microtus brandti. The genome of strain 91001 consists of one chromosome and four plasmids (pPCP1, pCD1, pMT1 and pCRY). The 9609-bp pPCP1 plasmid of strain 91001 is almost identical to the counterparts from reference strains (CO92 and KIM). There are 98 genes in the 70,159-bp range of plasmid pCD1. The 106,642-bp plasmid pMT1 has slightly different architecture compared with the reference ones. pCRY is a novel plasmid discovered in this work. It is 21,742 bp long and harbors a cryptic type IV secretory system. The chromosome of 91001 is 4,595,065 bp in length. Among the 4037 predicted genes, 141 are possible pseudo-genes. Due to the rearrangements mediated by insertion elements, the structure of the 91001 chromosome shows dramatic differences compared with CO92 and KIM. Based on the analysis of plasmids and chromosome architectures, pseudogene distribution, nitrate reduction negative mechanism and gene comparison, we conclude that strain 91001 and other strains isolated from M. brandti might have evolved from ancestral Y. pestis in a different lineage. The large genome fragment deletions in the 91001 chromosome and some pseudogenes may contribute to its unique nonpathogenicity to humans and host-specificity.
    DNA Research 07/2004; 11(3):179-97. · 5.16 Impact Factor
  • Article: Evolution and variation of the SARS-CoV genome.
    [show abstract] [hide abstract]
    ABSTRACT: Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARS-CoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.
    Genomics Proteomics & Bioinformatics 09/2003; 1(3):216-25.
  • Article: Complete genome sequences of the SARS-CoV: the BJ Group (Isolates BJ01-BJ04).
    [show abstract] [hide abstract]
    ABSTRACT: Beijing has been one of the epicenters attacked most severely by the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) since the first patient was diagnosed in one of the city's hospitals. We now report complete genome sequences of the BJ Group, including four isolates (Isolates BJ01, BJ02, BJ03, and BJ04) of the SARS-CoV. It is remarkable that all members of the BJ Group share a common haplotype, consisting of seven loci that differentiate the group from other isolates published to date. Among 42 substitutions uniquely identified from the BJ group, 32 are non-synonymous changes at the amino acid level. Rooted phylogenetic trees, proposed on the basis of haplotypes and other sequence variations of SARS-CoV isolates from Canada, USA, Singapore, and China, gave rise to different paradigms but positioned the BJ Group, together with the newly discovered GD01 (GD-Ins29) in the same clade, followed by the H-U Group (from Hong Kong to USA) and the H-T Group (from Hong Kong to Toronto), leaving the SP Group (Singapore) more distant. This result appears to suggest a possible transmission path from Guangdong to Beijing/Hong Kong, then to other countries and regions.
    Genomics Proteomics & Bioinformatics 09/2003; 1(3):180-92.
  • Article: Genome organization of the SARS-CoV.
    [show abstract] [hide abstract]
    ABSTRACT: Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or developed by ourselves. Totally, 21 open reading frames (ORFs) of genes or putative uncharacterized proteins (PUPs) were predicted. Seven PUPs had not been reported previously, and two of them were predicted to contain transmembrane regions. Eight ORFs partially overlapped with or embedded into those of known genes, revealing that the SARS-CoV genome is a small and compact one with overlapped coding regions. The most striking discovery is that an ORF locates on the minus strand. We have also annotated non-coding regions and identified the transcription regulating sequences (TRS) in the intergenic regions. The analysis of TRS supports the minus strand extending transcription mechanism of coronavirus. The SNP analysis of different isolates reveals that mutations of the sequences do not affect the prediction results of ORFs.
    Genomics Proteomics & Bioinformatics 09/2003; 1(3):226-35.
  • Source
    Article: The structural characterization and antigenicity of the S protein of SARS-CoV.
    [show abstract] [hide abstract]
    ABSTRACT: The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino acids, in the viral genome. Its structure can be divided into three regions: a long N-terminal region in the exterior, a characteristic transmembrane (TM) region, and a short C-terminus in the interior of a virion. We detected fifteen substitutions of nucleotides by comparisons with the seventeen published SARS-CoV genome sequences, eight (53.3%) of which are non-synonymous mutations leading to amino acid alternations with predicted physiochemical changes. The possible antigenic determinants of the S protein are predicted, and the result is confirmed by ELISA (enzyme-linked immunosorbent assay) with synthesized peptides. Another profound finding is that three disulfide bonds are defined at the C-terminus with the N-terminus of the E (envelope) protein, based on the typical sequence and positions, thus establishing the structural connection with these two important structural proteins, if confirmed. Phylogenetic analysis reveals several conserved regions that might be potent drug targets.
    Genomics Proteomics & Bioinformatics 06/2003; 1(2):108-17.
  • Article: The M protein of SARS-CoV: basic structural and immunological properties.
    [show abstract] [hide abstract]
    ABSTRACT: We studied structural and immunological properties of the SARS-CoV M (membrane) protein, based on comparative analyses of sequence features, phylogenetic investigation, and experimental results. The M protein is predicted to contain a triple-spanning transmembrane (TM) region, a single N-glycosylation site near its N-terminus that is in the exterior of the virion, and a long C-terminal region in the interior. The M protein harbors a higher substitution rate (0.6% correlated to its size) among viral open reading frames (ORFs) from published data. The four substitutions detected in the M protein, which cause non-synonymous changes, can be classified into three types. One of them results in changes of pI (isoelectric point) and charge, affecting antigenicity. The second changes hydrophobicity of the TM region, and the third one relates to hydrophilicity of the interior structure. Phylogenetic tree building based on the variations of the M protein appears to support the non-human origin of SARS-CoV. To investigate its immunogenicity, we synthesized eight oligopeptides covering 69.2% of the entire ORF and screened them by using ELISA (enzyme-linked immunosorbent assay) with sera from SARS patients. The results confirmed our predictions on antigenic sites.
    Genomics Proteomics & Bioinformatics 06/2003; 1(2):118-30.
  • Source
    Article: The R protein of SARS-CoV: analyses of structure and function based on four complete genome sequences of isolates BJ01-BJ04.
    [show abstract] [hide abstract]
    ABSTRACT: The R (replicase) protein is the uniquely defined non-structural protein (NSP) responsible for RNA replication, mutation rate or fidelity, regulation of transcription in coronaviruses and many other ssRNA viruses. Based on our complete genome sequences of four isolates (BJ01-BJ04) of SARS-CoV from Beijing, China, we analyzed the structure and predicted functions of the R protein in comparison with 13 other isolates of SARS-CoV and 6 other coronaviruses. The entire ORF (open-reading frame) encodes for two major enzyme activities, RNA-dependent RNA polymerase (RdRp) and proteinase activities. The R polyprotein undergoes a complex proteolytic process to produce 15 function-related peptides. A hydrophobic domain (HOD) and a hydrophilic domain (HID) are newly identified within NSP1. The substitution rate of the R protein is close to the average of the SARS-CoV genome. The functional domains in all NSPs of the R protein give different phylogenetic results that suggest their different mutation rate under selective pressure. Eleven highly conserved regions in RdRp and twelve cleavage sites by 3CLP (chymotrypsin-like protein) have been identified as potential drug targets. Findings suggest that it is possible to obtain information about the phylogeny of SARS-CoV, as well as potential tools for drug design, genotyping and diagnostics of SARS.
    Genomics Proteomics & Bioinformatics 06/2003; 1(2):155-65.
  • Article: A genome sequence of novel SARS-CoV isolates: the genotype, GD-Ins29, leads to a hypothesis of viral transmission in South China.
    [show abstract] [hide abstract]
    ABSTRACT: We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an extra 29-nucleotide sequence located at the nucleotide positions between 27,863 and 27,864 (referred to the complete sequence of BJ01) within an overlapped region composed of BGI-PUP5 (BGI-postulated uncharacterized protein 5) and BGI-PUP6 upstream of the N (nucleocapsid) protein. The discovery of this minor genotype, GD-Ins29, suggests a significant genetic event and differentiates it from the previously reported genotype, the dominant form among all sequenced SARS-CoV isolates. A 17-nt segment of this extra sequence is identical to a segment of the same size in two human mRNA sequences that may interfere with viral replication and transcription in the cytosol of the infected cells. It provides a new avenue for the exploration of the virus-host interaction in viral evolution, host pathogenesis, and vaccine development.
    Genomics Proteomics & Bioinformatics 06/2003; 1(2):101-7.
  • Article: A statistical approach designed for finding mathematically defined repeats in shotgun data and determining the length distribution of clone-inserts.
    [show abstract] [hide abstract]
    ABSTRACT: The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.
    Genomics Proteomics & Bioinformatics 03/2003; 1(1):43-51.

Institutions

  • 2009–2010
    • University of Georgia
      • Department of Plant Biology
      Athens, GA, USA
    • Iowa State University
      Ames, IA, USA
  • 2003–2005
    • Peking University
      • School of Life Sciences
      Beijing, Beijing Shi, China
    • Chinese Academy of Medical Sciences
      Beijing, Beijing Shi, China
    • Beijing Centers for Disease Control and Prevention
      Beijing, Beijing Shi, China
  • 2001–2005
    • Beijing Genomics Institute
      Shenzhen, Guangdong Sheng, China
  • 2002
    • BGI Human Genome Center
      Beijing, Beijing Shi, China
    • Chinese Academy of Sciences
      Beijing, Beijing Shi, China