Coverage requirements for de novo genome with Illumina + PacBio?
I'd like to sequence the genome of the gopher tortoise. The genomes of congeners are ~2.4Gb. I'm trying to decide how much coverage is necessary; we plan to run the sample on a portion of a NovaSeq run, and at least one PacBio SmrtCell. I'm trying to evaluate the benefits of additional sequencing effort: my starting point would be something like 30x coverage for the 2x150 NovaSeq run, and one PacBio SmrtCell (~8x coverage with HiFi reads? Not so sure about this), but i'm wondering how necessary a second PacBio cell, or additional Illumina reads, would be for assembling a nice genome.
We don't have any tissues available for transcriptomics, and the immediate application will be to map whole-genome methylation seq reads to the genome.
I'm pretty new to all of this, so any suggestions or references to guidelines are most welcome! Thanks!
Sequencing the genome of the gopher tortoise is a complex task that requires careful planning and consideration of various factors, such as sequencing coverage and the use of different sequencing platforms.
Based on the genome size of 2.4 Gb and the fact that you plan to run the sample on a portion of a NovaSeq run and at least one PacBio SmrtCell, 30x coverage for the 2x150 NovaSeq run and 8x coverage with HiFi reads from one PacBio SmrtCell should be a good starting point for genome assembly.
However, the optimal sequencing coverage for genome assembly can vary depending on the complexity of the genome, the quality of the sample, and the desired level of accuracy. In general, a higher sequencing coverage will result in a more accurate and complete genome assembly, but it also comes with additional costs and resources.
To determine the optimal sequencing coverage for your project, you can use a coverage calculator such as QUAST (Quality assessment tool for genome assemblies) to estimate the completeness and accuracy of your genome assembly based on the sequencing coverage, genome size, and read length.
Another option to consider is to use long-read sequencing technologies like PacBio or Oxford Nanopore. These technologies provide long reads that can span repeat regions and help to improve the accuracy and completeness of the genome assembly.
In terms of the additional sequencing effort, it depends on the specific goals of your project and the resources available. A second PacBio SMRT Cell or additional Illumina reads could help to improve the accuracy and completeness of the genome assembly, but this will also depend on the specific characteristics of the sample, such as its quality and degree of complexity.
It's important to note that since you don't have any tissue available for transcriptomics, and the immediate application will be to map whole-genome methylation seq reads to the genome, so you should also take into account the specific requirements for that specific application.
Institute of Bioorganic Chemistry Polish Academy of Science
I know people are trying to help, but pasting what chat-gpt produces without filtering the output isn't the best idea.
Firstly I would advise you to try and estimate from existing data the level of repeats in this particular genome. The more repetitive it is the more challenging it's going to be even for pacbio. I would trade Illumina coverage for way more PacBio HiFi data, and specifically, ask the sequencing facility to enrich in long DNA molecules before building the SMRTbell library.
As an example you can google for the latest assembly of the planarian genome (Schmidtea mediteranea), it is highly repetitive, around the same size as your organism, and was comprehensively assembled only recently, so the majority of the specs used there should apply.
The third-generation long reads sequencing technologies, such as PacBio and Nanopore, have great advantages over second-generation Illumina sequencing in de novo assembly studies. However, due to the inherent low base accuracy, third-generation sequencing data cannot be used for k-mer counting and estimating genomic profile based on k-mer frequenci...
Copiotrophic marine bacteria of the Roseobacter group (Rhodobacterales, Alphaproteobacteria) are characterized by a multipartite genome organization. We sequenced the genomes of Sulfitobacter indolifex DSM 14862T and four related plasmid-rich isolates in order to investigate the composition, distribution, and evolution of their extrachromosomal rep...