ArticlePDF Available

geneCo: A visualized comparative genomic method to analyze multiple genome structures

Authors:

Abstract

In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication, and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. Availability: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. Supplementary information: Supplementary data are available at Bioinformatics online.
Genome analysis
geneCo: a visualized comparative genomic
method to analyze multiple genome structures
Jaehee Jung
1
, Jong Im Kim
2
and Gangman Yi
3,
*
1
Department of Information and Communication Engineering, Myongji University, Yongin, Gyeonggi-do 17058,
Korea,
2
Department of Biology, Chungnam National University, Daejeon 34134, Korea and
3
Department of
Multimedia Engineering, Dongguk University, Seoul 04620, Korea
*To whom correspondence should be addressed.
Associate Editor: John Hancock
Received on May 6, 2019; revised on July 1, 2019; editorial decision on July 19, 2019; accepted on July 24, 2019
Abstract
Summary: In comparative and evolutionary genomics, a detailed comparison of common features
between organisms is essential to evaluate genetic distance. However, identifying differences in
matched and mismatched genes among multiple genomes is difficult using current comparative
genomic approaches due to complicated methodologies or the generation of meager information
from obtained results. This study describes a visualized software tool, geneCo (gene Comparison),
for comparing genome structure and gene arrangements between various organisms. User data
are aligned, gene information is recognized, and genome structures are compared based on user-
defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrange-
ment among multiple organisms being compared is provided by geneCo, which uses a web-based
interface that users can easily access without any need to consider the computational environment.
Availability and implementation: Users can freely use the software, and the accessible URL is
https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and
the web-based user interface is built by PHP, HTML and CSS to support all browsers.
Contact: gangman@dongguk.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction
Comparative genomics is mainly focused on creating highly detailed
visualizations of the common features between organisms. The
major principle of comparative genomics is to compare basic bio-
logical similarities or differences in genomic features resulting from
DNA sequences between organisms at the genetic level. Genomic
features include DNA sequences, gene contents, gene order, regula-
tory sequences and other genomic structures. Therefore, compara-
tive genomic approaches are needed to specifically align genome
sequences and to compare genomic features among organisms.
Comparative genomics also provides powerful tools to study evolu-
tionary relationships among organisms and to identify genes that are
either conserved or represent unique genomic features.
Several comparative genomic tools, such as Artemis comparison
tool (ACT) (Barrell et al., 2005), Mauve (Darling et al., 2004),
BLAST ring image generator (BRIG) (Alikhan et al., 2011) and
Circos (Schnable et al., 2009), have been developed for multiple gen-
ome assemblies. ACT visualizes comparisons of two or more
genomes and is most useful for comparing a few DNA sequences,
making it easy to spot and zoom in on regions of difference. This
tool displays sequence similarities from various allowed input for-
mats, such as GenBank entries or FASTA sequences. Mauve aligns
whole genomes and shows output in the form of SNPs, regions of
difference and homologous blocks, among others. Mauve can also
be used to assess assembly quality against a reference using Mauve
Contig Metrics. BRIG gives a global view of whole genome compar-
isons by visualizing BLAST comparisons with elaborate circular fig-
ures. BRIG is suitable for comparing multiple genomes; however, it
is difficult to compare more than a dozen or so because each genome
must be entered through the GUI. Circos uses plain text files for
V
CThe Author(s) 2019. Published by Oxford University Press. 5303
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
journals.permissions@oup.com
Bioinformatics, 35(24), 2019, 5303–5305
doi: 10.1093/bioinformatics/btz596
Advance Access Publication Date: 27 July 2019
Applications Note
Downloaded from https://academic.oup.com/bioinformatics/article-abstract/35/24/5303/5539862 by guest on 31 March 2020
both input data and configuration, with the latter controlling the
placement and format of each data track. The function to generate
both data and configuration files automatically makes Circos highly
amenable to incorporation in web-based database mining and
visualization.
Despite the many bioinformatic approaches to compare
genomes, the development of tools for comparative DNA analysis
remains a challenge. Since, the identification of mismatched genes
that may be non-conserved sequences is also meaningful in terms of
evolution. A visualization and comparative genomic tool, geneCo, is
proposed to align and compare multiple genome structures resulting
from user-defined data in the GenBank file format. Information
regarding inversion, gain, loss, duplication and gene rearrangement
among the multiple organisms being compared is provided by
geneCo. Another purpose of geneCo is to provide a web-based user
interface that can be comfortably used by biologists, offering easy-
to-use options and displaying the results in a web browser.
2 Application
Figure 1 shows an overview of geneCo, including the web-based
interface, user options and output results. Figure 1a is a screen shot
of the web page. The main engine of geneCo is implemented in
Python, and the web-based interface is implemented in PHP,
Python, Javascript and Bootstrap (Spurlock, 2013). Figure 1b
explains the option values. User configurations and usages are
described in detail in Supplementary Data.Figure 1c shows the rep-
resentative result for both the construction of a genome map and
map comparisons. The output generated by geneCo varies in
accordance with user options.
The main functions of geneCo can be divided into two catego-
ries. The first function is ‘the construction of a genome map.’
GenBank files can be used as inputs to generate single and multiple
genome maps. OrganellarGenomeDRAW (Lohse et al., 2013) func-
tions in a similar manner, but it generates only one gene map based
on the plastid or mitochondrial sequence of the organelle. When
comparing several genes, OrganellarGenomeDRAW has to generate
several sets of individual data. In contrast, geneCo permits a com-
parison of genome maps in the order in which they appear in
GenBank and the generation of outputs designed by the user that
can be customized by adjusting genome lengths, intervals, output
file formats and a user-defined functional category of configuration
files.
The second function is a ‘genome map comparison’ between
genes from two different genomes based on genomic input sequences
in the GenBank flat file format, which compares the matched genes
of both genomes with genes that exist only in a single genome.
Currently, existing tools either manually construct and compare
each genome structure or do not include gene annotation, making it
difficult to distinguish between matched or mismatched genes in
genome structures at a glance. In contrast, geneCo takes GenBank
input files and arranges them as left to right genome pairs according
to the order of the input files. Thus, each step in n-loops compares a
pair of Genbank files from the left to right genomes and draws them
in accordance with the input settings.
The comparative genomic method is a modified local sequence
alignment based on the dynamic programming algorithm to align
each matched gene name. This method compares the genes from
two genomes in the order defined by the user and analyzes missed
genes that are not conserved and the genomic features of different
genomes in terms of their biological similarities and genetic levels.
Matched and mismatched genes are distinguished and identified by
geneCo. In addition, geneCo will also enable biologists to repeat-
edly change input parameters in order to return desired outputs,
especially when there is a large amount of genome data to be ana-
lyzed. The geneCo method is described in detail in Supplementary
Section S2.
Multiple genome maps can be created by geneCo using the title
and input files with additional gene alignment options. The final
output is immediately generated in a web browser and supports vari-
ous vector types as outputs. In addition, different display options for
users who want increased precision for adjustments and greater cus-
tomization based on their preferences are supported. For example,
users can either set the color of the map to the default option or
change it manually. Furthermore, users can define keywords and set
functional categories to match the keywords using various colors to
improve visualization. The legend in the output shows the specified
functional categories. For analysis within a specific range, users can
specify the start and end base-pair in the output using the zoom-in
option. Moreover, geneCo also supports various output options for
almost all objects.
3 Evaluation
Tools, such as BRIG, Mauve, ACT and Circos, can be used for com-
parative genomics. BRIG selects several query genomes in FASTA,
GenBank, or EBML formats and selects only one as a reference and
then compares the other genomes against the reference. However,
geneCo can set multiple references that are determined by order of
the uploaded GenBank files so that mismatched genes between two
compared genomes are easily found. To evaluate the performance
of geneCo, several mitochondria, nucleomorph chromosome 1 and
Fig. 1. Overview of the geneCo system (geneCo web page (a), option values (b) and results of two different types in geneCo (c))
5304 J.Jung et al.
Downloaded from https://academic.oup.com/bioinformatics/article-abstract/35/24/5303/5539862 by guest on 31 March 2020
plastid genomes from multiple species were used as the test dataset
(Supplementary Tables S2 and S3). Supplementary Figures S2–S9
show geneCo outputs with different input options. Supplementary
Tables S4 and S5 show the results obtained with other applications.
The most important key feature of geneCo is the identification of
mis-matched genes found by comparing two related genomes.
Further details are described in the Supplementary Data.
4 Conclusion
Comparative genomics aims to find the common function between
genomes to study the evolution of the genome. This study requires
tools for comparing and visualizing of genomes. The proposed
geneCo method is implemented as a Python-based software that can
compare and analyze various genome maps. In the past, users have
had to construct individual gene maps manually to compare genome
structures. With geneCo, users can easily compare and analyze the
position of genes, find common genes between other genomes, and
find genes that exist only in one genome using GenBank files as in-
put data with user-defined settings. Various options are available for
visualizing elaborate genome structures and generating results spe-
cific to the objective of the user.
Funding
This research was supported by the National Research Foundation (NRF)
of Korea funded by the Ministry of Science, ICT & Future Planning,
Basic Science Research Program [MSIP; NRF-2016R1C1B1007929] to
J.J.; the Ministry of Education [2018R1D1A1B07050727] to J.I.K.; [NRF-
2016R1D1A1A09919318, NRF-2019R1F1A1064019] to G.Y.
Conflict of Interest: none declared.
References
Alikhan,N.F. et al. (2011) BLAST Ring Image Generator (BRIG): simple pro-
karyote genome comparisons. BMC Genomics,12, 402.
Barrell,B.G. et al. (2005) ACT: the Artemis comparison tool. Bioinformatics,
21, 3422–3423.
Darling,A.C.E. et al. (2004) Mauve: multiple alignment of conserved genomic
sequence with rearrangements. Genome Res., 14, 1394–1403.
Lohse,M. et al. (2013) OrganellarGenomeDRAW a suite of tools for generat-
ing physical maps of plastid and mitochondrial genomes and visualizing ex-
pression data sets. Nucleic Acids Res., 41, W575–W581.
Schnable,P.S. et al. (2009) The B73 maize genome: complexity, diversity, and
dynamics. Science,326, 1112–1115.
Spurlock,J. (2013) Bootstrap: Responsive Web Development. O’Reilly Media,
USA.
geneCo 5305
Downloaded from https://academic.oup.com/bioinformatics/article-abstract/35/24/5303/5539862 by guest on 31 March 2020
... Various online/web applications can provides comparative analyses at both the genomic and genic levels tools, such as BRIG (Alikhan et al. 2011), Mauve (Darling et al. 2004), Artemis Comparison Tool (ACT) (Carver et al. 2005), geneCo (Jung et al. 2019) etc. can be used for comparative genomics apart from various tools provided various databases. At Nicotiana attenuata data hub, genes of 11 published dicot species were compared and found to cluster into 23,340 homologous groups (HG) based on their sequence similarity with at least two homolog sequences. ...
Chapter
Tobacco is a commercial crop cultivated globally in about 3.62 million hectares mainly in semi-arid and rain-fed areas. The crop is often confronted with various forms of abiotic stresses viz., drought, flood, high temperature (heat), cold, salinity, ozone, low and high light intensity, chlorides, heavy metals, ultraviolet radiation etc. Further, climate change in terms of higher temperature, and changing rainfall patterns is going to have remarkable effect on tobacco productivity and quality. The abiotic stresses usually play a negative role in the growth and development of tobacco plant. Tobacco plant respond to stresses in various ways with alterations at morphological, molecular, physiological and cellular levels involving switching on or off stress responsive genes. Inherent capacity of tobacco genotypes need to be improved for achieving higher and stable yields under different abiotic stresses. Degree of abiotic stress and crop stage of stress occurrence vary from year to year and place to place making it difficult to breed resistant varieties. Currently, very fewer number of abiotic stress resistant tobacco varieties are developed through conventional breeding. The Complicated nature of abiotic stresses, lack of suitable morphology based screening techniques, polygenic nature of resistance mechanisms, lack of sources of resistance etc. are limiting the progress that could be made in abiotic stress breeding. These limitations can be successfully overcome through molecular breeding and genome designing strategies. Hence, in this chapter, an attempt made to summarize the available knowledge about tobacco genetic resources, germplasm characterization through molecular markers, molecular maps, markers and QTLs linked to various abiotic stresses, omics resources and databases, abiotic resistant genes studied, etc. for their utilization in designing tobacco genomes for abiotic stress resistance. Further, recent technological advances in marker-assisted breeding, gene editing and genome designing technologies were discussed for effectively utilizing them in developing abiotic stress resistant tobacco genotypes.KeywordsTobacco Nicotiana Abiotic stressMolecularMarkersMapsGenome sequencingLinked markersQTLGenome designingMASGene editingCloningDatabases
... The previously published nucleomorph genome sequence of C. paramecium CCAP977/2a was downloaded from GenBank [12]. For structural and synteny comparisons, genomes were aligned using GeneCo [51] with default settings. In order to visualize high-level gene order conservation at the intra-or inter-species level, Circos plots were created with Circa (http:// omgen omics. ...
Article
Full-text available
Background Cryptophytes are ecologically important algae of interest to evolutionary cell biologists because of the convoluted history of their plastids and nucleomorphs, which are derived from red algal secondary endosymbionts. To better understand the evolution of the cryptophyte nucleomorph, we sequenced nucleomorph genomes from two photosynthetic and two non-photosynthetic species in the genus Cryptomonas . We performed a comparative analysis of these four genomes and the previously published genome of the non-photosynthetic species Cryptomonas paramecium CCAP977/2a. Results All five nucleomorph genomes are similar in terms of their general architecture, gene content, and gene order and, in the non-photosynthetic strains, loss of photosynthesis-related genes. Interestingly, in terms of size and coding capacity, the nucleomorph genome of the non-photosynthetic species Cryptomonas sp. CCAC1634B is much more similar to that of the photosynthetic C. curvata species than to the non-photosynthetic species C. paramecium . Conclusions Our results reveal fine-scale nucleomorph genome variation between distantly related congeneric taxa containing photosynthetic and non-photosynthetic species, including recent pseudogene formation, and provide a first glimpse into the possible impacts of the loss of photosynthesis on nucleomorph genome coding capacity and structure in independently evolved colorless strains.
... Comparative genomics is the approach by which we can compare such biological information between species/organisms. It is a powerful tool for studying evolutionary relationships, identifying conserved or unique genomic features among organisms [6]. ...
Chapter
Full-text available
The class Actinobacteria contains many bacteria relevant to human health and industry; it includes both the most deadly pathogen and also organisms that are very important for antibiotic production. Hence, Actinomycetes have historically been a leading source for organic products called as secondary metabolites. These secondary metabolites usually originate from regions located nearby in the genome and are referred to as biosynthetic gene clusters (BGC). But there have been no systematic studies to date, on whether a BGC in one species is also likely to be in a second species. Keeping this in mind, classification and comparison of these BGCs may thus systematically catalog the extent of natural product diversity. A comparative genomic approach might be a step forward. So, in this chapter, we try to summarize the work done in comparative genomics with a focus on BGCs and also try to shed some light in answering why this appears to be a difficult bioinformatic task.
... Though many comparative genomics tools are available to date for the analysis of bacterial replicons, such as GeneCO (Jung et al., 2019), BRIG (Alikhan et al., 2011) or MAUVE (Darling et al., 2004), most are developed for comparing full genomes and detecting evolutionary events such as insertions, deletions or rearrangements across those genomes, and are therefore limited in the number of genomes that are feasible to compare. Furthermore, these tools are not developed for comparative analysis of particular gene loci, which may be more suitable when researching e.g. the evolutionary history or taxonomic distribution of single bacterial genes. ...
Article
Full-text available
Comparing genomic loci of a given bacterial gene across strains and species can provide insights into their evolution, including information on e.g. acquired mobility, the degree of conservation between different taxa, or indications of horizontal gene transfer events. While thousands of bacterial genomes are available to date, there is no software that facilitates comparisons of individual gene loci for a large number of genomes. GEnView is a Python based pipeline for the comparative analysis of gene-loci in a large number of bacterial genomes, providing users with automated, taxon-selective access to the >840.000 genomes and plasmids currently available in the NCBI Assembly and RefSeq databases, and is able to process local genomes that are not deposited at NCBI, enabling searches for genomic sequences and to analyze their genetic environments through the interactive visualization and extensive metadata files created by GEnView. Availability and Implementation GEnView is implemented in Python 3. Instructions for download and usage can be found at https://github.com/EbmeyerSt/GEnView under GLP3. Contact To contact the developers, report a bug at https://github.com/EbmeyerSt/GEnView, or write to stefan.ebmeyer@gu.se or joakim.larsson@fysiologi.gu.se. Supplementary information Supplementary data are available at Bioinformatics online.
... The pairwise comparison of average nucleotide identity (ANI) between ASFV genomes was performed by ANI Calculator (12) (https:// www.ezbiocloud.net/tools/ani). Other tools for genomic visualization and classification of multigene family (MGF) proteins in ASFV were geneCo (13) and MGFC (14), respectively. All bioinformatic tools were run with default parameter settings. ...
Article
Full-text available
This study reports the genome sequence of an isolated African swine fever (ASF) virus (VNUA-ASFV-05L1/HaNam) obtained at the fourth passage on pulmonary alveolar macrophages. The virus was isolated during a typical acute ASF outbreak in pigs in a northern province of Vietnam in 2020.
Chapter
Tobacco is one of the important commercial crops in the world and is cultivated in more than 120 countries. Various biotic stresses viz. pests, diseases and parasitic weeds infect tobacco from seedling stage to leaf harvest and during post-harvest leaf management there-by severely affecting its leaf yield and quality. Development and deployment of host plant resistance is a sustainable option to minimize these losses. A number of varieties resistant to major biotic stresses viz. TMV, back shank, brown spot, blue mold, powdery mildew, root-knot nematodes, cater pillar, aphid etc. infecting tobacco were developed through conventional breeding. However, lack of reliable sources of resistance, narrow genetic variability, natural barriers of crossing among existing species, longer period required for developing stable homogeneous lines, undesirable associations between the resistant gene and yield and quality contributing characters either due to pleiotropic effects of the resistance gene per se, or due to linkage drag effects caused by the presence of deleterious genes linked to resistant gene, laborious process of screening/phenotyping segregating generations, etc. are slowing down the progress in developing tobacco varieties resistant to biotic stresses through traditional breeding. These limitations can be successfully overcome through molecular breeding and genome designing strategies. In this chapter, the current knowledge about genetic resources, the status of utilization of molecular markers in germplasm characterization and development of molecular maps, identified linked markers and quantitative trait loci (QTLs) to various biotic stresses, omics resources characterized, resistant genes cloned, accessible genomic resource databases etc. were summarized for their effective utilization in designing tobacco genome for higher yields and biotic stress resistance. In addition, advances in Marker-assisted selection (MAS) strategies, gene editing technologies and other genome designing strategies, and their possible utilization in designing tobacco genotypes for biotic stress resistance were also discussed.KeywordsTobacco Nicotiana DiseasesPestsStressResistanceGenetic mapsDiversityGenome sequencingMarkersGenomic resourcesQTLGenome designingMASGene editingCloningDatabases
Article
Full-text available
Crocus istanbulensis (B.Mathew) Rukšāns is one of the most endangered Crocus species in the world and has an extremely limited distribution range in Istanbul. Our recent field work indicates that no more than one hundred individuals remain in the wild. In the present study, we used genome skimming to determine the complete chloroplast (cp) genome sequences of six C . istanbulensis individuals collected from the locus classicus . The cp genome of C . istanbulensis has 151,199 base pairs (bp), with a large single-copy (LSC) (81,197 bp), small single copy (SSC) (17,524 bp) and two inverted repeat (IR) regions of 26,236 bp each. The cp genome contains 132 genes, of which 86 are protein-coding (PCGs), 8 are rRNA and 38 are tRNA genes. Most of the repeats are found in intergenic spacers of Crocus species. Mononucleotide repeats were most abundant, accounting for over 80% of total repeats. The cp genome contained four palindrome repeats and one forward repeat. Comparative analyses among other Iridaceae species identified one inversion in the terminal positions of LSC region and three different gene ( psbA , rps3 and rpl22 ) arrangements in C . istanbulensis that were not reported previously. To measure selective pressure in the exons of chloroplast coding sequences, we performed a sequence analysis of plastome-encoded genes. A total of seven genes ( accD , rpoC2 , psbK , rps12 , ccsA , clpP and ycf2 ) were detected under positive selection in the cp genome. Alignment-free sequence comparison showed an extremely low sequence diversity across naturally occurring C . istanbulensis specimens. All six sequenced individuals shared the same cp haplotype. In summary, this study will aid further research on the molecular evolution and development of ex situ conservation strategies of C . istanbulensis .
Article
Full-text available
The Raphidophyceae is an ecologically important eukaryotic lineage of primary producers and predators that inhabit marine and freshwater environments worldwide. These organisms are of great evolutionary interest because their plastids are the product of eukaryote-eukaryote endosymbiosis. To obtain deeper insight into the evolutionary history of raphidophycean plastids, we sequenced and analyzed the plastid genomes of three freshwater and three marine species. Our comparison of these genomes, together with the previously reported plastid genome of Heterosigma akashiwo, revealed unexpected variability in genome structure. Unlike the genomes of other analyzed species, the plastid genome of Gonyostomum semen was found to contain only a single rRNA operon, presumably due to the loss of genes from the inverted repeat (IR) region found in most plastid genomes. In contrast, the marine species Fibrocapsa japonica contains the largest IR region and overall plastid genome for any raphidophyte examined thus far, mainly due to the presence of four large gene-poor regions and foreign DNA. Two plastid genes, tyrC in F. japonica and He. akashiwo and serC in F. japonica, appear to have arisen via lateral gene transfer (LGT) from diatoms, and several raphidophyte open reading frames are demonstrably homologous to sequences in diatom plasmids and plastid genomes. A group II intron in the F. japonica psbB gene also appears to be derived by LGT. Our results provide important insights into the evolutionary history of raphidophyte plastid genomes via LGT from the plastids and plasmid DNAs of diatoms.
Article
Aim: Bacteriophages are effective natural antimicrobial agents against drug-resistant pathogens. Therefore, identification and detailed characterization of bacteriophages become essential to explore their therapeutic potential. This study aims to isolate and characterize a lytic bacteriophage against drug-resistant Pseudomonas aeruginosa. Methods and results: The Pseudomonas phage AIIMS-Pa-A1, isolated from the river Ganga water against drug-resistant P. aeruginosa, showed clear lytic zone on spot assay. The phage revealed icosahedral head (58.20 nm diameter) and small tail (6.83 nm) under transmission electron microscope. The growth kinetics showed adsorption constant of 1.5×10-9 phage particles cell-1 ml-1 minute-1 and latent period of approximately 15 minutes with the burst size of 27 phages per infected cell. The whole genome sequencing depicted a GC-rich genome of 40.97kb having a lysis cassette of holin, endolysin, and Rz protein, with features of the family Autographiviridae. The comparative genome analysis, Ortho-average nucleotide identity value, and phylogenetic analysis indicated the novelty of the phage AIIMS-Pa-A1. Conclusions: The study concludes that the Pseudomonas phage AIIMS-Pa-A1 is a novel member of the Autographiviridae family, truly lytic in nature for drug-resistant P. aeruginosa. Significance and impact of study: The Pseudomonas phage AIIMS-Pa-A1 is having promising potential for future therapeutic intervention to treat drug-resistant P. aeruginosa infections.
Article
Full-text available
Mitochondria and plastids (chloroplasts) are cell organelles of endosymbiotic origin that possess their own genetic information. Most organellar DNAs map as circular double-stranded genomes. Across the eukaryotic kingdom, organellar genomes display great size variation, ranging from ∼15 to 20 kb (the size of the mitochondrial genome in most animals) to >10 Mb (the size of the mitochondrial genome in some lineages of flowering plants). We have developed OrganellarGenomeDraw (OGDRAW), a suite of software tools that enable users to create high-quality visual representations of both circular and linear annotated genome sequences provided as GenBank files or accession numbers. Although all types of DNA sequences are accepted as input, the software has been specifically optimized to properly depict features of organellar genomes. A recent extension facilitates the plotting of quantitative gene expression data, such as transcript or protein abundance data, directly onto the genome map. OGDRAW has already become widely used and is available as a free web tool (http://ogdraw.mpimp-golm.mpg.de/). The core processing components can be downloaded as a Perl module, thus also allowing for convenient integration into custom processing pipelines.
Article
Full-text available
Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.
Article
Full-text available
We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize.
Article
Full-text available
As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacterial chromosomes, and deletions remove segments of the genome. Consequently, each genome is a mosaic of unique lineage-specific segments, regions shared with a subset of other genomes and segments conserved among all the genomes under consideration. Furthermore, the linear order of these segments may be shuffled among genomes. We present methods for identification and alignment of conserved genomic DNA in the presence of rearrangements and horizontal transfer. Our methods have been implemented in a software package called Mauve. Mauve has been applied to align nine enterobacterial genomes and to determine global rearrangement structure in three mammalian genomes. We have evaluated the quality of Mauve alignments and drawn comparison to other methods through extensive simulations of genome evolution.
Article
The Artemis Comparison Tool (ACT) allows an interactive visualisation of comparisons between complete genome sequences and associated annotations. The comparison data can be generated with several different programs; BLASTN, TBLASTX or Mummer comparisons between genomic DNA sequences, or orthologue tables generated by reciprocal FASTA comparison between protein sets. It is possible to identify regions of similarity, insertions and rearrangements at any level from the whole genome to base-pair differences. ACT uses Artemis components to display the sequences and so inherits powerful searching and analysis tools. ACT is part of the Artemis distribution and is similarly open source, written in Java and can run on any Java enabled platform, including UNIX, Macintosh and Windows. Availability: ACT is freely available (under a GPL licence) for download from the Sanger Institute web site, http://www.sanger.ac.uk Contact: artemis{at}sanger.ac.uk
Mauve: multiple alignment of conserved genomic sequence with rearrangements
  • Darling
Darling,A.C.E. et al. (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res., 14, 1394-1403.
Bootstrap: Responsive Web Development
  • J Spurlock
Spurlock,J. (2013) Bootstrap: Responsive Web Development. O'Reilly Media, USA.