-
[show abstract]
[hide abstract]
ABSTRACT: We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.
Genome Research 07/2009; 19(9):1639-45. · 13.61 Impact Factor
-
Colin T Kelleher,
Readman Chiu,
Heesun Shin,
Ian E Bosdet, Martin I Krzywinski,
Chris D Fjell,
Jennifer Wilkin,
Tongming Yin,
Stephen P DiFazio,
Johar Ali, [......],
Jeremy Schmutz,
Daniel Rokhsar,
Steven J M Jones,
Marco A Marra,
Gerald A Tuskan,
Jörg Bohlmann,
Brian E Ellis,
Kermit Ritland,
Carl J Douglas,
Jacqueline E Schein
[show abstract]
[hide abstract]
ABSTRACT: As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 +/- 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.
The Plant Journal 07/2007; 50(6):1063-78. · 6.16 Impact Factor
-
Colin T. Kelleher,
Readman Chiu,
Heesun Shin,
Ian E. Bosdet, Martin I. Krzywinski,
Chris D. Fjell,
Jennifer Wilkin,
TongMing Yin,
Stephen P. DiFazio,
Johar Ali, [......],
Jeremy Schmutz,
Daniel Rokhsar,
Steven J.M. Jones,
Marco A. Marra,
Gerald A. Tuskan,
Jörg Bohlmann,
Brian E. Ellis,
Kermit Ritland,
Carl J. Douglas,
Jacqueline E. Schein
[show abstract]
[hide abstract]
ABSTRACT: As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 ± 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.
The Plant Journal 05/2007; 50(6):1063 - 1078. · 6.16 Impact Factor
-
11/2005; , ISBN: 9780470011539
-
Siemon H S Ng,
Carlo G Artieri,
Ian E Bosdet,
Readman Chiu,
Roy G Danzmann,
William S Davidson,
Moira M Ferguson,
Christopher D Fjell,
Bjorn Hoyheim,
Steven J M Jones, [......],
Ruth B Phillips,
Matthew L Rise,
Kristian R von Schalburg,
Jacqueline E Schein,
Heesun Shin,
Asim Siddiqui,
Jim Thorsen,
Natasja Wye,
George Yang,
Baoli Zhu
[show abstract]
[hide abstract]
ABSTRACT: A physical map of the Atlantic salmon (Salmo salar) genome was generated based on HindIII fingerprints of a publicly available BAC (bacterial artificial chromosome) library constructed from DNA isolated from a Norwegian male. Approximately 11.5 haploid genome equivalents (185,938 clones) were successfully fingerprinted. Contigs were first assembled via FPC using high-stringency (1e-16), and then end-to-end joins yielded 4354 contigs and 37,285 singletons. The accuracy of the contig assembly was verified by hybridization and PCR analysis using genetic markers. A subset of the BACs in the library contained few or no HindIII recognition sites in their insert DNA. BglI digestion fragment patterns of these BACs allowed us to identify three classes: (1) BACs containing histone genes, (2) BACs containing rDNA-repeating units, and (3) those that do not have BglI recognition sites. End-sequence analysis of selected BACs representing these three classes confirmed the identification of the first two classes and suggested that the third class contained highly repetitive DNA corresponding to tRNAs and related sequences.
Genomics 11/2005; 86(4):396-404. · 3.02 Impact Factor
-
Brendan J Loftus,
Eula Fung,
Paola Roncaglia,
Don Rowley,
Paolo Amedeo,
Dan Bruno,
Jessica Vamathevan,
Molly Miranda,
Iain J Anderson,
James A Fraser, [......],
Terry R Utterback,
Brian L Wickes,
Jennifer R Wortman,
Natasja H Wye,
James W Kronstad,
Jennifer K Lodge,
Joseph Heitman,
Ronald W Davis,
Claire M Fraser,
Richard W Hyman
[show abstract]
[hide abstract]
ABSTRACT: Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
Science 03/2005; 307(5713):1321-4. · 31.20 Impact Factor
-
Daniela S Gerhard,
Lukas Wagner,
Elise A Feingold,
Carolyn M Shenmen,
Lynette H Grouse,
Greg Schuler,
Steven L Klein,
Susan Old,
Rebekah Rasooly,
Peter Good, [......],
Angelique Schnerch,
Jacqueline E Schein,
Steven J M Jones,
Robert A Holt,
Agnes Baross,
Marco A Marra,
Sandra Clifton,
Kathryn A Makowski,
Stephanie Bosak,
Joel Malek
[show abstract]
[hide abstract]
ABSTRACT: The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.
Genome Research 11/2004; 14(10B):2121-7. · 13.61 Impact Factor
-
Daniel R Fuhrmann, Martin I Krzywinski,
Readman Chiu,
Parvaneh Saeedi,
Jacqueline E Schein,
Ian E Bosdet,
Asif Chinwalla,
LaDeana W Hillier,
Robert H Waterston,
John D McPherson,
Steven J M Jones,
Marco A Marra
[show abstract]
[hide abstract]
ABSTRACT: Here we describe software tools for the automated detection of DNA restriction fragments resolved on agarose fingerprinting gels. We present a mathematical model for the location and shape of the restriction fragments as a function of fragment size, with model parameters determined empirically from "marker" lanes containing molecular size standards. Automated identification of restriction fragments involves several steps, including: image preprocessing, to put the data in a form consistent with a linear model; marker lane analysis, for determination of the model parameters; and data lane analysis, a procedure for detecting restriction fragment multiplets while simultaneously determining the amplitude curve that describes restriction fragment amplitude as a function of mobility. In validation experiments conducted on fingerprinted and sequenced Bacterial Artificial Chromosome (BAC) clones, sensitivity and specificity of restriction fragment identification exceeded 96% on restriction fragments ranging in size from 600 base pairs (bp) to 30,000 bp. The integrated suite of software tools, written in MATLAB and collectively called BandLeader, is in use at the BC Cancer Agency Genome Sciences Centre (GSC) and the Washington University Genome Sequencing Center, and has been provided to the Wellcome Trust Sanger Institute and the Whitehead Institute. Employed in a production mode at the GSC, BandLeader has been used to perform automated restriction fragment identification for more than 850,000 BAC clones for mouse, rat, bovine, and poplar fingerprint mapping projects.
Genome Research 06/2003; 13(5):940-53. · 13.61 Impact Factor
-
Robert L Strausberg,
Elise A Feingold,
Lynette H Grouse,
Jeffery G Derge,
Richard D Klausner,
Francis S Collins,
Lukas Wagner,
Carolyn M Shenmen,
Gregory D Schuler,
Stephen F Altschul, [......],
Jeremy Schmutz,
Richard M Myers,
Yaron S N Butterfield, Martin I Krzywinski,
Ursula Skalska,
Duane E Smailus,
Angelique Schnerch,
Jacqueline E Schein,
Steven J M Jones,
Marco A Marra
[show abstract]
[hide abstract]
ABSTRACT: The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov).
Proceedings of the National Academy of Sciences 12/2002; 99(26):16899-903. · 9.68 Impact Factor
-
Yaron S N Butterfield,
Marco A Marra,
Jennifer K Asano,
Susanna Y Chan,
Ranabir Guin, Martin I Krzywinski,
Soo Sen Lee,
Kim W K MacDonald,
Carrie A Mathewson,
Teika E Olson, [......],
Anna-Liisa Prabhu,
Angelique Schnerch,
Ursula Skalska,
Duane E Smailus,
Jeff M Stott,
Miranda I Tsai,
George S Yang,
Scott D Zuyderduyn,
Jacqueline E Schein,
Steven J M Jones
[show abstract]
[hide abstract]
ABSTRACT: We describe an efficient high-throughput method for accurate DNA sequencing of entire cDNA clones. Developed as part of our involvement in the Mammalian Gene Collection full-length cDNA sequencing initiative, the method has been used and refined in our laboratory since September 2000. Amenable to large scale projects, we have used the method to generate >7 Mb of accurate sequence from 3695 candidate full-length cDNAs. Sequencing is accomplished through the insertion of Mu transposon into cDNAs, followed by sequencing reactions primed with Mu-specific sequencing primers. Transposon insertion reactions are not performed with individual cDNAs but rather on pools of up to 96 clones. This pooling strategy reduces the number of transposon insertion sequencing libraries that would otherwise be required, reducing the costs and enhancing the efficiency of the transposon library construction procedure. Sequences generated using transposon-specific sequencing primers are assembled to yield the full-length cDNA sequence, with sequence editing and other sequence finishing activities performed as required to resolve sequence ambiguities. Although analysis of the many thousands (22 785) of sequenced Mu transposon insertion events revealed a weak sequence preference for Mu insertion, we observed insertion of the Mu transposon into 1015 of the possible 1024 5mer candidate insertion sites.
Nucleic Acids Research 07/2002; 30(11):2460-8. · 8.03 Impact Factor
-
Siemon H.S. Ng,
Carlo G. Artieri,
Ian E. Bosdet,
Readman Chiu,
Roy G. Danzmann,
William S. Davidson,
Moira M. Ferguson,
Christopher D. Fjell,
Bjorn Hoyheim,
Steven J.M. Jones, [......],
Ruth B. Phillips,
Matthew L. Rise,
Kristian R. von Schalburg,
Jacqueline E. Schein,
Heesun Shin,
Asim Siddiqui,
Jim Thorsen,
Natasja Wye,
George Yang,
Baoli Zhu
[show abstract]
[hide abstract]
ABSTRACT: A physical map of the Atlantic salmon (Salmo salar) genome was generated based on HindIII fingerprints of a publicly available BAC (bacterial artificial chromosome) library constructed from DNA isolated from a Norwegian male. Approximately 11.5 haploid genome equivalents (185,938 clones) were successfully fingerprinted. Contigs were first assembled via FPC using high-stringency (1e−16), and then end-to-end joins yielded 4354 contigs and 37,285 singletons. The accuracy of the contig assembly was verified by hybridization and PCR analysis using genetic markers. A subset of the BACs in the library contained few or no HindIII recognition sites in their insert DNA. BglI digestion fragment patterns of these BACs allowed us to identify three classes: (1) BACs containing histone genes, (2) BACs containing rDNA-repeating units, and (3) those that do not have BglI recognition sites. End-sequence analysis of selected BACs representing these three classes confirmed the identification of the first two classes and suggested that the third class contained highly repetitive DNA corresponding to tRNAs and related sequences.
Genomics.