GARD: A genetic algorithm for recombination detection

Department of Pathology, University of California San Diego, La Jolla, CA 92093, USA.
Bioinformatics (Impact Factor: 4.98). 01/2007; 22(24):3096-8. DOI: 10.1093/bioinformatics/btl474
Source: PubMed


Phylogenetic and evolutionary inference can be severely misled if recombination is not accounted for, hence screening for it should be an essential component of nearly every comparative study. The evolution of recombinant sequences can not be properly explained by a single phylogenetic tree, but several phylogenies may be used to correctly model the evolution of non-recombinant fragments.
We developed a likelihood-based model selection procedure that uses a genetic algorithm to search multiple sequence alignments for evidence of recombination breakpoints and identify putative recombinant sequences. GARD is an extensible and intuitive method that can be run efficiently in parallel. Extensive simulation studies show that the method nearly always outperforms other available tools, both in terms of power and accuracy and that the use of GARD to screen sequences for recombination ensures good statistical properties for methods aimed at detecting positive selection.
Freely available

Download full-text


Available from: Simon D W Frost, Oct 07, 2015
1 Follower
86 Reads
    • "The alignment was manually edited in Bioedit, version 7.05 to preserve frame insertions and deletions if present. Because recombination may confound the results of phylogeographic inference [Schierup and Hein, 2000], all data sets for phylogeographic analyses were verified and tested negative for recombination using RIP 3.0 [Siepel et al., 1995] and GARD [Pond et al., 2006]. The sequences of each subject have been submitted to GenBank (GenBank accession number: KP796426 -KP797835). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The cellular source of HIV RNA circulating in blood plasma remains unclear. Here, we investigated whether sequence analysis of HIV RNA populations circulating before combination antiretroviral therapy (cART) and HIV DNA populations in cellular subsets (CS) after cART could identify the cellular sources of circulating HIV RNA. Blood was collected from five subjects at cART initiation and again 6 months later. Naïve CD4+ T cells, resting central memory and effector memory CD4+ T cells, activated CD4+ T cells, monocytes, and natural killer cells were sorted using a fluorescence-activated cell sorter. HIV-1 env C2V3 sequences from HIV RNA in blood plasma and HIV DNA in CSs were generated using single genome sequencing. Sequences were evaluated for viral compartmentalization (Fst test) and migration events (MEs; Slatkin Maddison and cladistic measures) between blood plasma and each CS. Viral compartmentalization was observed in 88% of all cellular subset comparisons (range: 77-100% for each subject). Most observed MEs were directed from blood plasma to CSs (52 MEs, 85.2%). In particular, there was only viral movement from plasma to NK cells (15 MEs), monocytes (7 MEs) and naïve cells (5 ME). We observed a total of 9 MEs from activated CD4 cells (2/9 MEs), central memory T cells (3/9 MEs) and effector memory T cells (4/9 MEs) to blood plasma. Our results revealed that the HIV RNA population in blood plasma plays an important role in seeding various cellular reservoirs and that the cellular source of the HIV RNA population is activated central memory and effector memory T cells. This article is protected by copyright. All rights reserved.
    Journal of Medical Virology 09/2015; DOI:10.1002/jmv.24375 · 2.35 Impact Factor
  • Source
    • "In order to increase the effective sample size (ESS > 200), the analysis was run in duplicate for each dataset, with 50 million chain lengths and sampled every 1000 states. In order to estimate the rate dN/dS and to investigate the selective pressure acting on specific codons in VP1 gene, SLAC, FEL (Kosakovsky Pond and Frost, 2005), FUBAR (Murrell et al., 2013) and IFEL (Pond et al., 2006) methods were used, provided from Datamonkey website. Almost the complete genome sequence of strain EIS6B (from 1 to 7365 nt) has been deposited to GenBank under the accession number KM024043. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Echovirus 3 (E3) serotype has been related with several neurologic diseases, although it constitutes one of the rarely isolated serotypes, with no report of epidemics in Europe. The aim of the present study was to provide insights into the molecular epidemiology and evolution of this enterovirus serotype, while an E3 strain was isolated from sewage in Greece, four years after the initial isolation of the only reported E3 strain in the same geographical region. Phylogenetic analysis of the complete VP1 genomic region of that E3 strain and of those available in GenBank suggested three main genogroups that were further subdivided into seven subgenogroups. Further evolutionary analysis suggested that VP1 genomic region of E3 was dominated by purifying selection, as the vast majority of genetic diversity presumably occurred through synonymous nucleotide substitutions and the substitution rate for complete and partial VP1 sequences was calculated to be 8.13 x 10(-3)and 7.72 x 10(-3) substitutions/ site/ year respectively. The partial VP1 sequence analysis revealed the composite epidemiology of this serotype, as the strains of the three genogroups presented different epidemiological characteristics. Copyright © 2015. Published by Elsevier B.V.
    Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases 03/2015; 32. DOI:10.1016/j.meegid.2015.03.008 · 3.02 Impact Factor
    • "There were too few sequences available for CpYDV, CpRLV and CpYV for us to perform selection analyses on these species. Accounting for recombination breakpoints identified with the GARD method (Kosakovsky Pond et al., 2006) these "
    [Show abstract] [Hide abstract]
    ABSTRACT: In Sudan Chickpea chlorotic dwarf virus (CpCDV, genus Mastrevirus, family Geminiviridae) is an important pathogen of pulses that are grown both for local consumption, and for export. Although a few studies have characterised CpCDV genomes from countries in the Middle East, Africa and the Indian subcontinent, little is known about CpCDV diversity in any of the major chickpea production areas in these regions. Here we analyse the diversity of 147 CpCDV isolates characterised from pulses collected across the chickpea growing regions of Sudan. Although we find that seven of the twelve known CpCDV strains are present within the country, strain CpCDV-H alone accounted for ∼73% of the infections analysed. Additionally we identified four new strains (CpCDV-M, -N, -O and -P) and show that recombination has played a significant role in the diversification of CpCDV, at least in this region. Accounting for observed recombination events, we use the large amounts of data generated here to compare patterns of natural selection within protein coding regions of CpCDV and other dicot-infecting mastrevirus species. Copyright © 2014 Elsevier B.V. All rights reserved.
    Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases 01/2015; 29:203-2015. DOI:10.1016/j.meegid.2014.11.024 · 3.02 Impact Factor
Show more