Rapid DNA Barcoding Analysis of Large Datasets Using CV Method

Article (PDF Available)inBMC Bioinformatics 10(Suppl 14):S8 · November 2009with50 Reads
DOI: 10.1186/1471-2105-10-S14-S8 · Source: PubMed
Background: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. Results: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. Conclusion: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes.

Full-text (PDF)

Available from: Ka Hou Chu
    • "In conclusion, despite the problems of sampling size [104]–[105] and the criticisms on methodological [106], theoretical [107] and empirical grounds [1], [108]–[110], the prospect of DNA barcoding is still promising if it is based on solid foundations of comprehensive taxonomy. The exploration on new analytical methods [59], [93], [99], [105], [111]–[117] and the use of nuclear genes as additional effective DNA barcodes [60], [103] will certainly promote the progress in DNA barcoding. "
    [Show abstract] [Hide abstract] ABSTRACT: We tested the performance of DNA barcoding in Acridoidea and attempted to solve species boundary delimitation problems in selected groups using COI barcodes. Three analysis methods were applied to reconstruct the phylogeny. K2P distances were used to assess the overlap range between intraspecific variation and interspecific divergence. "Best match (BM)", "best close match (BCM)", "all species barcodes (ASB)" and "back-propagation neural networks (BP-based method)" were utilized to test the success rate of species identification. Phylogenetic species concept and network analysis were employed to delimitate the species boundary in eight selected species groups. The results demonstrated that the COI barcode region performed better in phylogenetic reconstruction at genus and species levels than at higher-levels, but showed a little improvement in resolving the higher-level relationships when the third base data or both first and third base data were excluded. Most overlaps and incorrect identifications may be due to imperfect taxonomy, indicating the critical role of taxonomic revision in DNA barcoding study. Species boundary delimitation confirmed the presence of oversplitting in six species groups and suggested that each group should be treated as a single species.
    Full-text · Article · Dec 2013
    • "A match will be considered correct if it is the top unambiguous hit in BLAST, or the nearest implement, is actually quite complex, involving small fragments of the total sequence (wordsize), and extension in a dynamic programming environment. Thus, although now more than 20 years old, some of the algorithm concepts are similar to those implemented in newer methods involving sequence fragments such as nucleotide frequency ranges [20], k-mer spectra [21] and composition vectors [22]. Although BLAST is complex, so too are the algorithms involved in multiple sequence alignment, performed prior to NJ and BL analyses. "
    [Show abstract] [Hide abstract] ABSTRACT: Stipoid grasses (Poaceae, tribe Stipeae) include many species that are highly invasive. In Australia, several species are problematic environmental and economic weeds, degrading pastures, injuring livestock, and invading native grasslands. An accurate means of identification is the first line of defense against importation and establishment of potentially invasive stipoid grasses. Traditional morphological identification relies on floret characters, and because these characters are rarely available in juvenile or fragmentary material, DNA barcodes provide an alternative and rapid means of identification. Although barcodes themselves are tested to ensure appropriate discriminatory variation for identifying query sequences, there are few studies that report the testing of sequence matching algorithms. This limits the utility of sequence databases for DNA barcoding purposes. Therefore, in this study, we tested the efficacy of three sequence matching algorithms for stipoid grasses to determine the method and barcode that worked best. Using several sequence matching algorithms - BLAST, Neighbour Joining and Bayesian Likelihood - we assessed the success of identifying an “unknown” query sequence against a reference database of 206 specimens. The highest accuracy was achieved using the ITS (internal transcribed spacers) barcode region and the BLAST algorithm. The poorest performing barcode and analysis were rbcL and the Bayesian Likelihood analysis. However, the BLAST method was only slightly more successful than Neighbour Joining. Increasing the number of query sequences would further indicate whether this trend is significant for stipoid grasses.
    Full-text · Article · Jan 2013
    • "hods commonly used in barcoding are Neighbour-Joining and Maximum Likelihood; they identify query sequences by their position within a clade (see e.g. the Statistic Assignment Package, Munch et al. 2008; pplacer, Matsen et al. 2010; or the Evolutionary Placement Algorithm, Berger et al. 2011) or their status as sister-branch (see Ross et al. 2008). Chu et al. (2009) proposed a phylogeny-based method, but without alignment and using composition vectors. Generally presented as an alternative to the distance-based methods, character-based methods do not reduce DNA barcodes to distances but rely instead on the presence of diagnostic character states to assign sequences to species. This technique was fi"
    Full-text · Dataset · Nov 2012 · DNA Barcodes
Show more