How many lanes of an illumina sequencer flow cell would be necessary to study a soil metagenomics? Using dna shotgun.

I thought in a paired-end sistem that gives me reads of 250 bp making contigs of 500 bp.


  • Daniel Kumazawa Morais · Universidade Federal de Viçosa (UFV)
    Thank you for the answer! I thought it would be easier to decide.
  • Jay Siddharth · Nestlé S.A.
    1) what Illumina platform and protocol
    2)most imp..the experimental design and objective
    3) is the target niche, expected to be rich in diversity.?

    and lots more.
    but there is no rule of thumb unfortunately
  • Chris Hoffmann · University of Pennsylvania
    Hello Daniel,
    Although the points raised are valid, you can probably obtain enough data to do good work on, from a relatively "small" dataset.
    On projects where we are comparing several samples (e.g. time series, or cross section studies), we routinely sequence up to 12 samples per single HiSeq 2000 lane, and this give us plenty of data.
    You may also consider the MiSeq plataform: it has only the one lane, but if it gives even more reads than one HiSeq lane (and the low cost is really attractive).
  • Daniel Kumazawa Morais · Universidade Federal de Viçosa (UFV)
    Hi Chris!
    That was the kind of answer I was looking for. Are you working on 16S PCR products or whole DNA extracted from the soil? And how is the assemble and the findings about the metabolism in data banks like KEGG?
    Thank you for your answer.
  • Chris Hoffmann · University of Pennsylvania
    Hi Daniel,

    We are currently working with gut microbiome samples, so, loads of diversity.

    Assembly is... complicated... to say the least. We usually spend some time at the beginning just trying different assembly strategies, to get get good quality, long contigs, with a few samples before running the whole plate through.

    Our preferred assembler right now it IDBA-ud
    and we map back the reads using Bowtie-2:

    Annotations can be done with a variety of tools. Camera has a nice pipeline, which you can use online (if you don't have too much data) or contact them to install locally, and it includes KEGG annotations, COG, 16S prediction/searches (just remember that it is probably an outdated version of KEGG, as the current database isn't free anymore, but still pretty good).

    Incidentally, we use a lot of custom made code to annotate our data.
    Best of luck,
  • Daniel Kumazawa Morais · Universidade Federal de Viçosa (UFV)
    Thank you, fellow! It's a great help. I'm going to try the camera.
    Thank you.
  • Archana Chauhan · University of Tennessee
    Hi daniel,
    i agree with chris. your questions are valid and important. illumina Hiseq platform gives your large amount for data. For 2x100bp dna run you can get upto 40gb of good quality data. Miseq
    (esp v2 hardware and) offers a cheap alternative but gives you sufficient data to get good results. i am most satisfied with miseq 2x150 kit in comparison to 2x 250.. the 2x250 drastically losses its quality after 10 bp or so.
    Since you are interested on 16s amplicons therefore miseq 2x10 run should give you enough of data and coverage to answer your queries.

    best wishes
  • Daniel Kumazawa Morais · Universidade Federal de Viçosa (UFV)
    Hi Archana.

    Thank you for the answer. It was a good information about the lose of quality in the miseq kit of 2x250. I didn't know.

    Best regards.

  • Archana Chauhan · University of Tennessee
    Correction daniel lease read 10bp as 100bp..sorry for typo

