Chapter

The Oxford Nanopore MinION as a Versatile Technology for the Diagnosis and Characterization of Emerging Plant Viruses

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The emergence of novel viral epidemics that could affect major crops represents a serious threat to global food security. The early and accurate identification of the causative viral agent is the most important step for a rapid and effective response to disease outbreaks. Over the last years, the Oxford Nanopore Technologies (ONT) MinION sequencer has been proposed as an effective diagnostic tool for the early detection and identification of emerging viruses in plants, providing many advantages compared with different high-throughput sequencing (HTS) technologies. Here, we provide a step-by-step protocol that we optimized to obtain the virome of “Lamon bean” plants (Phaseolus vulgaris L.), an agricultural product with Protected Geographical Indication (PGI) in North–East of Italy, which is frequently subjected to multiple infections caused by different RNA viruses. The conversion of viral RNA in ds-cDNA enabled the use of Genomic DNA Ligation Sequencing Kit and Native Barcoding DNA Kit, which have been originally developed for DNA sequencing. This allowed the simultaneous diagnosis of both DNA- and RNA-based pathogens, providing a more versatile alternative to the use of direct RNA and/or direct cDNA sequencing kits.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Complete and accurate identification of genetic variants associated with specific phenotypes can be challenging when there is a high level of genomic divergence between individuals in a study and the corresponding reference genome. We have applied the Cas9-mediated enrichment coupled to nanopore sequencing to perform a targeted de novo assembly and accurately reconstruct a genomic region of interest. This approach was used to reconstruct a 250-kbp target region on chromosome 5 of the common bean genome (Phaseolus vulgaris) associated with the shattering phenotype. Comparing a non-shattering cultivar (Midas) with the reference genome revealed many single-nucleotide variants and structural variants in this region. We cut five 50-kbp tiled sub-regions of Midas genomic DNA using Cas9, followed by sequencing on a MinION device and de novo assembly, generating a single contig spanning the whole 250-kbp region. This assembly increased the number of Illumina reads mapping to genes in the region, improving their genotypability for downstream analysis. The Cas9 tiling approach for target enrichment and sequencing is a valuable alternative to whole-genome sequencing for the assembly of ultra-long regions of interest, improving the accuracy of downstream genotype–phenotype association analysis.
Article
Full-text available
Myotonic dystrophy type 2 (DM2) is caused by CCTG repeat expansions in the CNBP gene, comprising 75 to >11,000 units and featuring extensive mosaicism, making it challenging to sequence fully-expanded alleles. To overcome these limitations, we used PCR-free Cas9-mediated nanopore sequencing to characterize CNBP repeat expansions at the single-nucleotide level in nine DM2 patients. The length of normal and expanded alleles can be assessed precisely using this strategy, agreeing with traditional methods, and revealing the degree of mosaicism. We also sequenced an entire ~50 kbp expansion, which has not been achieved previously for DM2 or any other repeat-expansion disorders. Our approach precisely counted the repeats and identified the repeat pattern for both short interrupted and uninterrupted alleles. Interestingly, in the expanded alleles, only two DM2 samples featured the expected pure CCTG repeat pattern, while the other seven presented also TCTG blocks at the 3′ end, which have not been reported before in DM2 patients, but confirmed hereby with orthogonal methods. The demonstrated approach simultaneously determines repeat length, structure/motif and the extent of somatic mosaicism, promising to improve the molecular diagnosis of DM2 and achieve more accurate genotype-phenotype correlations for the better stratification of DM2 patients in clinical trials.
Article
Full-text available
Plant viruses threaten crop yield and quality; thus, efficient and accurate pathogen diagnostics are critical for crop disease management and control. Recent advances in sequencing technology have revolutionized plant virus research. Metagenomics sequencing technology, represented by next-generation sequencing (NGS), has greatly enhanced the development of virus diagnostics research because of its high sensitivity, high throughput and non-sequence dependence. However, NGS-based virus identification protocols are limited by their high cost, labor intensiveness, and bulky equipment. In recent years, Oxford Nanopore Technologies and advances in third-generation sequencing technology have enabled direct, real-time sequencing of long DNA or RNA reads. Oxford Nanopore Technologies exhibit versatility in plant virus detection through their portable sequencers and flexible data analyses, thus are wildly used in plant virus surveillance, identification of new viruses, viral genome assembly, and evolution research. In this review, we discuss the applications of nanopore sequencing in plant virus diagnostics, as well as their limitations.
Article
Full-text available
‘Lamon bean’ is a protected geographical indication (PGI) for a product of four varieties of bean (Phaseolus vulgaris L.) grown in a specific area of production, which is located in the Belluno district, Veneto region (N.E. of Italy). In the last decade, the ‘Lamon bean’ has been threatened by severe virus epidemics that have compromised its profitability. In this work, the full virome of seven bean samples showing different foliar symptoms was obtained by MinION sequencing. Evidence that emerged from sequencing was validated through RT-PCR and ELISA in a large number of plants, including different ecotypes of Lamon bean and wild herbaceous hosts that may represent a virus reservoir in the field. Results revealed the presence of bean common mosaic virus (BCMV), cucumber mosaic virus (CMV), peanut stunt virus (PSV), and bean yellow mosaic virus (BYMV), which often occurred as mixed infections. Moreover, both CMV and PSV were reported in association with strain-specific satellite RNAs (satRNAs). In conclusion, this work sheds light on the cause of the severe diseases affecting the ‘Lamon bean’ by exploitation of MinION sequencing.
Article
Full-text available
Rapid and sensitive assays for the identification of plant pathogens are necessary for the effective management of crop diseases. The main limitation of current diagnostic testing is the inability to combine broad and sensitive pathogen detection with the identification of key strains, pathovars, and subspecies. Such discrimination is necessary for quarantine pathogens, whose management is strictly dependent on genotype identification. To address these needs, we have established and evaluated a novel all-in-one diagnostic assay based on nanopore sequencing for the detection and simultaneous characterization of quarantine pathogens, using Xylella fastidiosa as a case study. The assay proved to be at least as sensitive as standard diagnostic tests and the quantitative results agreed closely with qPCR-based analysis. The same sequencing results also allowed discrimination between subspecies when present either individually or in combination. Pathogen detection and typing were achieved within 13 min of sequencing owing to the use of an internal control that allowed to stop sequencing when sufficient data had accumulated. These advantages, combined with the use of portable equipment, will facilitate the development of next-generation diagnostic assays for the efficient monitoring of other plant pathogens.
Article
Full-text available
Xylella fastidiosa is a vector-borne plant vascular pathogen that has caused devastating disease outbreaks in diverse agricultural crops worldwide. A major global quarantine pathogen, X. fastidiosa can infect hundreds of plant species and can be transmitted by many different xylem sap-feeding insects. Several decades of research have revealed a complex lifestyle dependent on adaptation to the xylem and insect environments and interactions with host plant tissues.
Article
Full-text available
Traditional methods for the analysis of repeat expansions, which underlie genetic disorders, such as fragile X syndrome (FXS), lack single-nucleotide resolution in repeat analysis and the ability to characterize causative variants outside the repeat array. These drawbacks can be overcome by long-read and short-read sequencing, respectively. However, the routine application of next-generation sequencing in the clinic requires target enrichment, and none of the available methods allows parallel analysis of long-DNA fragments using both sequencing technologies. In this study, we investigated the use of indirect sequence capture (Xdrop technology) coupled to Nanopore and Illumina sequencing to characterize FMR1, the gene responsible of FXS. We achieved the efficient enrichment (> 200×) of large target DNA fragments (~60–80 kbp) encompassing the entire FMR1 gene. The analysis of Xdrop-enriched samples by Nanopore long-read sequencing allowed the complete characterization of repeat lengths in samples with normal, pre-mutation, and full mutation status (> 1 kbp), and correctly identified repeat interruptions relevant for disease prognosis and transmission. Single-nucleotide variants (SNVs) and small insertions/deletions (indels) could be detected in the same samples by Illumina short-read sequencing, completing the mutational testing through the identification of pathogenic variants within the FMR1 gene, when no typical CGG repeat expansion is detected. The study successfully demonstrated the parallel analysis of repeat expansions and SNVs/indels in the FMR1 gene at single-nucleotide resolution by combining Xdrop enrichment with two next-generation sequencing approaches. With the appropriate optimization necessary for the clinical settings, the system could facilitate both the study of genotype–phenotype correlation in FXS and enable a more efficient diagnosis and genetic counseling for patients and their relatives.
Article
Full-text available
The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.
Article
Full-text available
Since its introduction, nanopore sequencing has enhanced our ability to study complex microbial samples through the possibility to sequence long reads in real time using inexpensive and portable technologies. The use of long reads has allowed to address several previously unsolved issues in the field, such as the resolution of complex genomic structures, and facilitated the access to metagenome assembled genomes (MAGs). Furthermore, the low cost and portability of platforms together with the development of rapid protocols and analysis pipelines has featured nanopore technology as an attractive and ever-growing tool for real-time in-field sequencing for environmental microbial analysis. This review provides an up-to-date summary of the experimental protocols and bioinformatic tools for the study of microbial communities using nanopore sequencing, highlighting the most important and recent research in the field with a major focus on infectious diseases. An overview of the main approaches including targeted and shotgun approaches, metatranscriptomics, epigenomics, and epitranscriptomics is provided, together with an outlook to the major challenges and perspectives over the use of this technology for microbial studies.
Article
Full-text available
The world’s staple food crops, and other food crops that optimize human nutrition, suffer from global virus disease pandemics and epidemics that greatly diminish their yields and/or produce quality. This situation is becoming increasingly serious because of the human population’s growing food requirements and increasing difficulties in managing virus diseases effectively arising from global warming. This review provides historical and recent information about virus disease pandemics and major epidemics that originated within different world regions, spread to other continents, and now have very wide distributions. Because they threaten food security, all are cause for considerable concern for humanity. The pandemic disease examples described are six (maize lethal necrosis, rice tungro, sweet potato virus, banana bunchy top, citrus tristeza, plum pox). The major epidemic disease examples described are seven (wheat yellow dwarf, wheat streak mosaic, potato tuber necrotic ringspot, faba bean necrotic yellows, pepino mosaic, tomato brown rugose fruit, and cucumber green mottle mosaic). Most examples involve long-distance virus dispersal, albeit inadvertent, by international trade in seed or planting material. With every example, the factors responsible for its development, geographical distribution and global importance are explained. Finally, an overall explanation is given of how to manage global virus disease pandemics and epidemics effectively.
Article
Full-text available
The reconstruction of individual haplotypes can facilitate the interpretation of disease risks; however, high costs and technical challenges still hinder their assessment in clinical settings. Second-generation sequencing is the gold standard for variant discovery but, due to the production of short reads covering small genomic regions, allows only indirect haplotyping based on statistical methods. In contrast, third-generation methods such as the nanopore sequencing platform developed by Oxford Nanopore Technologies (ONT) generate long reads that can be used for direct haplotyping, with fewer drawbacks. However, robust standards for variant phasing in ONT-based target resequencing efforts are not yet available. In this study, we presented a streamlined proof-of-concept workflow for variant calling and phasing based on ONT data in a clinically relevant 12-kb region of the APOE locus, a hotspot for variants and haplotypes associated with aging-related diseases and longevity. Starting with sequencing data from simple amplicons of the target locus, we demonstrated that ONT data allow for reliable single-nucleotide variant (SNV) calling and phasing from as little as 60 reads, although the recognition of indels is less efficient. Even so, we identified the best combination of ONT read sets (600) and software (BWA/Minimap2 and HapCUT2) that enables full haplotype reconstruction when both SNVs and indels have been identified previously using a highly-accurate sequencing platform. In conclusion, we established a rapid and inexpensive workflow for variant phasing based on ONT long reads. This allowed for the analysis of multiple samples in parallel and can easily be implemented in routine clinical practice, including diagnostic testing.
Article
Full-text available
Virus disease pandemics and epidemics that occur in the world’s staple food crops pose a major threat to global food security, especially in developing countries with tropical or subtropical climates. Moreover, this threat is escalating rapidly due to increasing difficulties in controlling virus diseases as climate change accelerates and the need to feed the burgeoning global population escalates. One of the main causes of these pandemics and epidemics is the introduction to a new continent of food crops domesticated elsewhere, and their subsequent invasion by damaging virus diseases they never encountered before. This review focusses on providing historical and up-to-date information about pandemics and major epidemics initiated by spillover of indigenous viruses from infected alternative hosts into introduced crops. This spillover requires new encounters at the managed and natural vegetation interface. The principal virus disease pandemic examples described are two (cassava mosaic, cassava brown streak) that threaten food security in sub-Saharan Africa (SSA), and one (tomato yellow leaf curl) doing so globally. A further example describes a virus disease pandemic threatening a major plantation crop producing a vital food export for West Africa (cacao swollen shoot). Also described are two examples of major virus disease epidemics that threaten SSA’s food security (rice yellow mottle, groundnut rosette). In addition, brief accounts are provided of two major maize virus disease epidemics (maize streak in SSA, maize rough dwarf in Mediterranean and Middle Eastern regions), a major rice disease epidemic (rice hoja blanca in the Americas), and damaging tomato tospovirus and begomovirus disease epidemics of tomato that impair food security in different world regions. For each pandemic or major epidemic, the factors involved in driving its initial emergence, and its subsequent increase in importance and geographical distribution, are explained. Finally, clarification is provided over what needs to be done globally to achieve effective management of severe virus disease pandemics and epidemics initiated by spillover events.
Article
Full-text available
Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.
Article
Full-text available
Viruses cause epidemics on all major crops of agronomic importance, and a timely and accurate identification is essential for control. High throughput sequencing (HTS) is a technology that allows the identification of all viruses without prior knowledge on the targeted pathogens. In this paper, we used HTS technique for the detection and identification of different viral species occurring in single and mixed infections in plants in Poland. We analysed various host plants representing different families. Within the 20 tested samples, we identified a total of 13 different virus species, including those whose presence has not been reported in Poland before: clover yellow mosaic virus (ClYMV) and melandrium yellow fleck virus (MYFV). Due to this new finding, the obtained sequences were compared with others retrieved from GenBank. In addition, cucurbit aphid-borne yellows virus (CABYV) was also detected, and due to the recent occurrence of this virus in Poland, a phylogenetic analysis of these new isolates was performed. The analysis revealed that CABYV population is highly diverse and the Polish isolates of CABYV belong to two different phylogenetic groups. Our results showed that HTS-based technology is a valuable diagnostic tool for the identification of different virus species originating from variable hosts, and can provide rapid information about the spectrum of plant viruses previously not detected in a region.
Article
Full-text available
Assessment of bacterial diversity through sequencing of 16S ribosomal RNA (16S rRNA) genes has been an approach widely used in environmental microbiology, particularly since the advent of high- throughput sequencing technologies. An additional innovation introduced by these technologies was the need of developing new strategies to manage and investigate the massive amount of sequencing data generated. This situation stimulated the rapid expansion of the field of bioinformatics with the release of new tools to be applied to the downstream analysis and interpretation of sequencing data mainly gener- ated using Illumina technology. In recent years, a third generation of sequencing technologies has been developed and have been applied in parallel and complementarily to the former sequencing strategies. In particular, Oxford Nanopore Technologies (ONT) introduced nanopore sequencing which has become very popular among molecular ecologists. Nanopore technology offers a low price, portability and fast sequencing throughput. This powerful technology has been recently tested for 16S rRNA analyses show- ing promising results. However, compared with previous technologies, there is a scarcity of bioinformatic tools and protocols designed specifically for the analysis of Nanopore 16S sequences. Due its notable characteristics, researchers have recently started performing assessments regarding the suitability MinION on 16S rRNA sequencing studies, and have obtained remarkable results. Here we present a review of the state-of-the-art of MinION technology applied to microbiome studies, the current possible application and main challenges for its use on 16S rRNA metabarcoding.
Article
Full-text available
Although Kraken's k-mer-based approach provides a fast taxonomic classification of metagenomic sequence data, its large memory requirements can be limiting for some applications. Kraken 2 improves upon Kraken 1 by reducing memory usage by 85%, allowing greater amounts of reference genomic data to be used, while maintaining high accuracy and increasing speed fivefold. Kraken 2 also introduces a translated search mode, providing increased sensitivity in viral metagenomics analysis.
Article
Full-text available
Genetic markers (DNA barcodes) are often used to support and confirm species identification. Barcode sequences can be generated in the field using portable systems based on the Oxford Nanopore Technologies (ONT) MinION sequencer. However, to achieve a broader application, current proof-of-principle workflows for on-site barcoding analysis must be standardized to ensure a reliable and robust performance under suboptimal field conditions without increasing costs. Here, we demonstrate the implementation of a new on-site workflow for DNA extraction, PCR-based barcoding, and the generation of consensus sequences. The portable laboratory features inexpensive instruments that can be carried as hand luggage and uses standard molecular biology protocols and reagents that tolerate adverse environmental conditions. Barcodes are sequenced using MinION technology and analyzed with ONTrack, an original de novo assembly pipeline that requires as few as 1000 reads per sample. ONTrack-derived consensus barcodes have a high accuracy, ranging from 99.8 to 100%, despite the presence of homopolymer runs. The ONTrack pipeline has a user-friendly interface and returns consensus sequences in minutes. The remarkable accuracy and low computational demand of the ONTrack pipeline, together with the inexpensive equipment and simple protocols, make the proposed workflow particularly suitable for tracking species under field conditions.
Article
Full-text available
Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
Article
Full-text available
Here we describe NanoPack, a set of tools developed for visualization and processing of long read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences. Availability and implementation: The NanoPack tools are written in Python3 and released under the GNU GPL3.0 License. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for Linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools. Contact: wouter.decoster@molgen.vib-ua.be. Supplementary information: Supplementary tables and figures are available at Bioinformatics online.
Article
Full-text available
As virus diseases cannot be controlled by traditional plant protection methods, the risk of their spread have to be minimized on vegetatively propagated plants, such as grapevine. Metagenomic approaches used for virus diagnostics offer a unique opportunity to reveal the presence of all viral pathogens in the investigated plant, which is why their application can reduce the risk of using infected material for a new plantation. Here we used a special branch, deep sequencing of virus-derived small RNAs, of this high-throughput method for virus diagnostics, and determined viromes of vineyards in Hungary. With NGS of virus-derived small RNAs we could detect not only the viruses tested routinely, but also new ones, which had never been described in Hungary before. Virus presence did not correlate with the age of the plantation, moreover phylogenetic analysis of the identified virus isolates suggests that infections are mostly caused by the use of infected propagating material. Our results, validated by other molecular methods, raised further questions to be answered before this method can be introduced as a routine, reliable test for grapevine virus diagnostics.
Article
Full-text available
We developed a portable system for 16S rDNA analyses consisting of a nanopore technology-based sequencer, the MinION, and laptop computers, and assessed its potential ability to determine bacterial compositions rapidly. We tested our protocols using a mock bacterial community that contained equimolar 16S rDNA and a pleural effusion from a patient with empyema, for time effectiveness and accuracy. MinION sequencing targeting 16S rDNA detected all 20 of the bacterial species present in the mock bacterial community. Time course analysis indicated that the sequence data obtained during the first 5 minutes of sequencing (1,379 bacterial reads) were enough to detect all 20 bacteria in the mock sample and to determine species composition, consistent with results of those obtained from 4 hours of sequencing (24,202 reads). Additionally, using a clinical sample extracted from the empyema patient’s pleural effusion, we could identify major bacterial pathogens in that effusion using our rapid sequencing and analysis protocol. All results are comparable to conventional 16S rDNA sequencing results using an IonPGM sequencer. Our results suggest that rapid sequencing and bacterial composition determination are possible within 2 hours after obtaining a DNA sample.
Article
Full-text available
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
Article
Full-text available
Background VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.
Article
Full-text available
The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. Whilst de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.
Article
Full-text available
DNA-based taxonomic and functional profiling is widely used for the characterization of organismal communities across a rapidly increasing array of research areas that include the role of microbiomes in health and disease, biomonitoring, and estimation of both microbial and metazoan species richness. Two principal approaches are currently used to assign taxonomy to DNA sequences: DNA metabarcoding and metagenomics. When initially developed, each of these approaches mandated their own particular methods for data analysis; however, with the development of high-throughput sequencing (HTS) techniques they have begun to share many aspects in data set generation and processing. In this review we aim to define the current characteristics, goals and boundaries of each field, and describe the different software used for their analysis. We argue that an appreciation of the potential and limitations of each method can help underscore the improvements required by each field so as to better exploit the richness of current HTS-based data sets. © The Author 2015. Published by Oxford University Press.
Article
Full-text available
All species are hierarchically related to one another, and we use taxonomic names to label the nodes in this hierarchy. Taxonomic data is becoming increasingly available on the web, but scientists need a way to access it in a programmatic fashion that’s easy and reproducible. We have developed taxize, an open-source software package (freely available from http://cran.r-project.org/web/packages/taxize/index.html) for the R language. taxize provides simple, programmatic access to taxonomic data for 13 data sources around the web. We discuss the need for a taxonomic toolbelt in R, and outline a suite of use cases for which taxize is ideally suited (including a full workflow as an appendix). The taxize package facilitates open and reproducible science by allowing taxonomic data collection to be done in the open-source R platform.
Article
Full-text available
Emerging infectious diseases (EIDs) caused by plant pathogens can develop into unexpected and very serious epidemics, owing to the influence of various characteristics of the pathogen, host and environment. Devastating epidemics, having social implications by increasing the rate of urbanization, occurred in the past in Europe, and many other EIDs still occur with high frequency in developing countries. Although the ability to diagnose diseases and the technologies available for their control are far greater than in the past, EIDs are still able to cause tremendous crop losses, the economic and social impact of which, in developing countries, is often underestimated. In the present article, four of the most important EIDs in developing countries are considered from the standpoint of their origin, characteristics, symptoms, mode of spread, possible control strategies, economic impact and the socio-economic consequences of their dissemination. They are Cassava Mosaic Virus Disease, capable of reducing yields by 80–90% and causing the suspension of cassava cultivation in many areas of East Africa; Striga hermonthica, a parasitic weed affecting cereals in an area of at least 5 million hectares in Sub-Saharan Africa; Xanthomonas Wilt of Banana, a bacterial disease that caused around 50% yield losses at the beginning of 21st century in Uganda and is threatening the food security of about 70 million people owing to its impact on an important staple crop; and race Ug99 of the rust fungus Puccinia graminis f. sp. tritici, which is having a tremendous impact on wheat in Uganda, and is also threatening most of the wheat-growing countries of the world.
Article
Full-text available
A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
A vast number of plant pathogens from viroids of a few hundred nucleotides to higher plants cause diseases in our crops. Their effects range from mild symptoms to catastrophes in which large areas planted to food crops are destroyed. Catastrophic plant disease exacerbates the current deficit of food supply in which at least 800 million people are inadequately fed. Plant pathogens are difficult to control because their populations are variable in time, space, and genotype. Most insidiously, they evolve, often overcoming the resistance that may have been the hard-won achievement of the plant breeder. In order to combat the losses they cause, it is necessary to define the problem and seek remedies. At the biological level, the requirements are for the speedy and accurate identification of the causal organism, accurate estimates of the severity of disease and its effect on yield, and identification of its virulence mechanisms. Disease may then be minimized by the reduction of the pathogen's inoculum, inhibition of its virulence mechanisms, and promotion of genetic diversity in the crop. Conventional plant breeding for resistance has an important role to play that can now be facilitated by marker-assisted selection. There is also a role for transgenic modification with genes that confer resistance. At the political level, there is a need to acknowledge that plant diseases threaten our food supplies and to devote adequate resources to their control.
Article
Over the last decade, virologists have discovered an unprecedented number of viruses using high throughput sequencing (HTS), which led to the advancement of our knowledge on the diversity of viruses in nature, particularly unraveling the virome of many agricultural crops. However, these new virus discoveries have often widened the gaps in our understanding of virus biology; the forefront of which is the actual role of a new virus in disease, if any. Yet, when used critically in etiological studies, HTS is a powerful tool to establish disease causality between the virus and its host. Conversely, with globalization, movement of plant material is increasingly more common and often a point of dispute between countries. HTS could potentially resolve these issues given its capacity to detect and discover. Although many pipelines are available for plant virus discovery, all share a common backbone. A description of the process of plant virus detection and discovery from HTS data are presented, providing a summary of the different pipelines available for scientists' utility in their research.
Article
Reliable detection and identification of plant pathogens are essential for disease control strategies. Diagnostic methods commonly used to detect plant pathogens have limitations such as requirement of prior knowledge of the genome sequence, low sensitivity and are limited in ability to detect several pathogens simultaneously. The development of advanced DNA sequencing technologies has enabled determination of total nucleic acid content in biological samples. The possibility of using the single molecule sequencing platform of Oxford Nanopore, as a general method for diagnosis of plant diseases was examined. It was tested by sequencing DNA or RNA isolated from symptomatic tissues of plants of several families inoculated with known pathogens (e.g., bacteria, viruses, fungi, phytoplasma). Additionally, samples of groups of 200 seeds containing one infected seed of each of two or three pathogens, as well as symptomatic samples with unidentified pathogens were tested. Sequencing results were analyzed with Nanopore data‐analysis tools. In all the inoculated plants the pathogens were identified in real time within one to two hours of running the Nanopore sequencer, and were classified to the species or pathovar level. DNA sequencing or direct RNA sequencing of samples with unidentified disease agents were validated by conventional diagnostic procedures (e.g., PCR, ELISA, Koch test), which supported the results obtained by Nanopore sequencing. The advantages of this technology include: long read lengths, fast run times, portability, low cost and the possibility of use in every laboratory. This study indicates that adoption of the Nanopore platform will be greatly advantageous for routine laboratory diagnosis. This article is protected by copyright. All rights reserved.
Article
Motivation: Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation: https://github.com/lh3/minimap2. Contact: hengli@broadinstitute.org.
Article
The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.
Article
Emerging infectious diseases (EIDs) pose threats to conservation and public health. Here, we apply the definition of EIDs used in the medical and veterinary fields to botany and highlight a series of emerging plant diseases. We include EIDs of cultivated and wild plants, some of which are of significant conservation concern. The underlying cause of most plant EIDs is the anthropogenic introduction of parasites, although severe weather events are also important drivers of disease emergence. Much is known about crop plant EIDs, but there is little information about wild-plant EIDs, suggesting that their impact on conservation is underestimated. We conclude with recommendations for improving strategies for the surveillance and control of plant EIDs.
Emerging infectious diseases of plants: pathogen pollution, climate change and agrotechnology drivers
  • P K Anderson
  • A A Cunningham
  • N G Patel
  • PK Anderson
taxize: taxonomic search and retrieval in R. F1000Research 2:191 Google Scholar
  • S A Chamberlain
  • E Szöcs
Real-time on-site diagnosis of quaran-600
  • L Marcolungo
  • A Passera
  • S Maestri
Marcolungo L, Passera A, Maestri S et al 599 (2022) Real-time on-site diagnosis of quaran-600
The virome of 'Lamon Bean': application of 608 Lamon Area
  • G Tarquini
  • M Martini
  • S F Maestri
Tarquini G, Martini M, Maestri SF et al (2022) 607 The virome of 'Lamon Bean': application of 608 Lamon Area, Italy. Plants 11:779 611
Canu: scalable and accurate long-read 677
  • S Koren
  • B P Walenz
  • K Berlin
Koren S, Walenz BP, Berlin K et al 676 (2017) Canu: scalable and accurate long-read 677