Content uploaded by Matthieu Leray
Author content
All content in this area was uploaded by Matthieu Leray on Aug 02, 2016
Content may be subject to copyright.
Content uploaded by Matthieu Leray
Author content
All content in this area was uploaded by Matthieu Leray on Aug 02, 2016
Content may be subject to copyright.
219
Sarah J. Bourlat (ed.), Marine Genomics: Methods and Protocols, Methods in Molecular Biology, vol. 1452,
DOI 10.1007/978-1-4939-3774-5_15, © Springer Science+Business Media New York 2016
Chapter 15
Visualizing Patterns of Marine Eukaryotic Diversity
from Metabarcoding Data Using QIIME
Matthieu Leray and Nancy Knowlton
Abstract
PCR amplifi cation followed by deep sequencing of homologous gene regions is increasingly used to char-
acterize the diversity and taxonomic composition of marine eukaryotic communities. This approach may
generate millions of sequences for hundreds of samples simultaneously. Therefore, tools that researchers
can use to visualize complex patterns of diversity for these massive datasets are essential. Efforts by micro-
biologists to understand the Earth and human microbiomes using high-throughput sequencing of the 16S
rRNA gene has led to the development of several user-friendly, open-source software packages that can be
similarly used to analyze eukaryotic datasets. Quantitative Insights Into Microbial Ecology (QIIME) offers
some of the most helpful data visualization tools. Here, we describe functionalities to import OTU tables
generated with any molecular marker (e.g., 18S, COI, ITS) and associated metadata into QIIME. We then
present a range of analytical tools implemented within QIIME that can be used to obtain insights about
patterns of alpha and beta diversity for marine eukaryotes.
Key words Metabarcoding , QIIME , Alpha diversity , Beta diversity , Principal component analysis ,
Rarefaction
1 Introduction
The world’s Oceans harbor an immense diversity of life estimated
between 0.3 and 2.2 million species belonging to 31 phyla [
1 , 2 ].
Yet, with only about 0.25 million formally described species to date,
a considerable portion of that diversity remains either unknown to
science or without a formal description. In addition, diagnostic
morphological characters used to differentiate species can be very
subtle in some invertebrate groups or even absent among micro-
scopic soft-bodied taxa (e.g., nematodes). This taxonomic impedi-
ment has dramatically limited our understanding of the way ocean
ecosystems function and respond to environmental changes, with
most studies focusing on just a few indicator metazoan taxa [
3 ].
The realization that we might not be able to study ocean diversity
using morphology alone has led more and more researchers toward
DNA approaches to characterize and monitor diversity [
4 , 5 ].
220
DNA sequencing expands the taxonomic coverage of ecologi-
cal studies by providing rapid and reliable species identifi ers that do
not depend on having taxonomic expertise. Building upon DNA
barcode resources and taking advantage of the availability of afford-
able High-Throughput Sequencing (HTS) technologies, the con-
cept of DNA metabarcoding is now revolutionizing our
understanding of patterns of marine diversity. Most commonly,
general primer sets are used to mass amplify (via Polymerase Chain
Reaction or PCR) a short hypervariable DNA fragment from an
collection of organisms mixed together. Reads obtained using an
HTS platform are then sorted bioinformatically to delineate
Operational Taxonomic Units (OTUs) and provide estimates of
richness and community composition [
6 ]. This metabarcoding
approach [
7 ] is now widely used, and the scientifi c community has
converged toward using standard DNA markers such as 18S
nuclear small subunit (nSSU) ribosomal RNA and the mitochon-
drial Cytochrome Oxidase c. Subunit I (COI) genes to target a
wide taxonomic range of organisms.
The ability to obtain community profi les from hundreds of
samples offers the potential for an unprecedented understanding
of marine diversity. While powerful analysis tools are essential to
handle very large sequence datasets, visualizing patterns of diver-
sity for complex datasets also represents a major challenge. For
example, researchers need to be able to see how community pro-
fi les vary in relation to each other and in relation to various meta-
data variables. Quantitative Insights Into Microbial Ecology
(QIIME) [
8 ] is one of the most powerful data visualization tools.
Like many other sequence data analysis workfl ows (e.g., Mothur
[
9 ]), QIIME was developed to analyze microbial 16S rRNA data-
sets, but its functionalities can be used to analyze OTU tables
obtained with any molecular marker.
QIIME combines many separate programs into a user-friendly
software package to perform analysis from the raw data generated
by any HTS platform to the graphical representation of the data.
Functionalities available within QIIME for sequence processing
include sample demultiplexing , OTU picking, phylogenetic analy-
sis, and taxonomic assignments, all of which are also implemented
in other software packages (e.g., Mothur [
9 ], CloVR [ 10 ], LotuS
[
11 ]). On the other hand, QIIME offers some unique interactive
graphic tools to explore patterns of diversity in relation to meta-
data variables (e.g., three-dimensional visualizations with EMPeror
[
12 ]). In this chapter, we fi rst describe how to import an OTU
table and associated metadata into QIIME. We then present a
QIIME tutorial to represent the taxonomic composition (e.g.,
histograms, heatmaps), plot OTU diversity ( alpha diversity ), and
plot dissimilarities in OTU composition between samples ( beta
diversity ). We illustrate community analysis using amplicon data
comparing OTU composition between sessile communities
Matthieu Leray and Nancy Knowlton
221
collected on settlement plates in Florida and Virginia (USA).
Samples were characterized using HTS of the mitochondrial
Cytochrome Oxidase c. Subunit 1 region [
5 ].
2 Materials
To avoid installing each program used within QIIME separately,
the developing team proposes several full installation packages.
1. The precompiled MacQIIME software package for Mac OS X.
The following tutorial assumes that the user is working with
MacQIIME.
http://www.wernerlab.org/software/macqiime .
2. The QIIME virtual box that can be installed and run on Mac
OS X, Windows, and Linux. Instructions are provided in the
following link
http://qiime.org/install/virtual_box.html .
3. The QIIME virtual machine on Amazon Web Services.
MacQIIME is a precompiled, easy-to-install, and easy-to-use ver-
sion of QIIME that is maintained by the Werner lab (
http://www.
wernerlab.org
). Version 1.9.1 requires Mac OS X versions 10.7
(Lion) and above.
1. Download the compressed Tar archived MacQIIME fi le here
http://www.wernerlab.org/software/macqiime .
2. Unarchive MacQIIME 1.9.1.
Open your terminal and type:
$ cd ~/Downloads
$ tar –xvf MacQIIME_1.9.1-20150604_OS10.7.tar
3. Install MacQIIME 1.9.1.
$ cd MacQIIME_1.9.1-20150604_OS10.7
$ ./install.s
4. Launch MacQIIME 1.9.1.
$ macqiime
Regardless of the target gene and the bioinformatics pipeline used
for sample demultiplexing , quality filtering, OTU clustering, and
taxonomic assignment, the final product of any metabarcoding
study is a “sample by observation contingency table” displaying
the number of reads per OTU and per sample. The file format
traditionally used to represent a raw contingency table is a tab-
delimited text file, termed a classic OTU table. A classic OTU table
can be easily visualized and manipulated in excel or TextWrangler
but it has been deemed an inefficient file format for cross-studies
comparison, for transferring data between software packages, and
for optimizing the use of disk space [
13 ]. As a result, QIIME now
requires classic OTU tables to be converted into Biological
Observation Matrix file format (BIOM).
2.1 QIIME Packages
Available
2.2 Getting Ready
with MacQIIME 1.9.1
2.3 BIOM Formatted
OTU Table (Input File
Required)
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
222
1. Prepare your classic OTU table as shown in Table 1 and save it
as a tab-delimited text fi le called “otu_table.txt” in a directory
called “qiime_analysis.”
2. Open your terminal and navigate to the directory “qiime_
analysis” where the classic OTU table is located.
$ cd ~/qiime_analysis
3. Assuming that MacQIIME is already open (see earlier), type
the following command to convert your classic OTU table to
a BIOM formatted table ( see Note 1 ).
$ biom convert -i otu_table.txt -o otu_
table.biom --process-obs-metadata taxonomy
--table-type "OTU table" --to-json
where the parameter -i is used to specify the input fi le and
the parameter -o is used to specify the output fi le ( see Note 2 ).
4. Check conversion by summarizing the count information.
$ biom summarize-table -i otu_table.biom -o
otu_table_summary_counts.txt --qualitative
The number of OTUs per sample is provided. The param-
eter --qualitative should be removed from the script to sum-
marize the number of sequences per sample instead.
5. Filter out singletons .
OTUs represented by a single sequence are often consid-
ered less reliable because they may result from sequencing
errors. The following command discards all OTUs that are not
represented by at least two sequences.
$ fi lter_otus_from_otu_table.py -i otu_table.
biom -o otu_table_nosingleton.biom -n 2
The mapping fi le contains information about each sample present
in the OTU table . It can be generated by hand using excel and
saved as a tab-delimited text fi le. Columns “#SampleID,”
“BarcodeSequence,” “LinkerPrimerSequence,” and “Description”
are mandatory and should be presented in this order. Additional
metadata columns may be added between “LinkerPrimerSequence”
and “Description” (e.g., Location, Site; Table
2 ). When QIIME is
used for downstream data analysis and visualization only, as is the
case in this tutorial, the “BarcodeSequence” and
“LinkerPrimerSequence” columns can be left empty (but keep
tabs) or with “NA” (Table
2 ).
QIIME has a functionality to test the validity of a mapping fi le.
-p and –b parameters need to be specifi ed if no Barcode and primer
sequences are provided. This command line generates a log fi le list-
ing potential warnings and errors detected in the mapping fi le
(e.g., invalid characters, duplicated sample ID). In the following
command, the parameter -m specifi es the mapping fi le labeled
“Metadata_map.txt.”
$ validate_mapping_fi le.py -m Metadata_
map.txt -p -b
2.4 Metadata
Mapping File (Input
File Required)
Matthieu Leray and Nancy Knowlton
223
Table 1
Example of a classic OTU table representing three samples (ML.0136-ML.0138) with a total of 15 OTUs
#OTU ID ML.0136 ML.0137 ML.0138 taxonomy
171 7900 3809 4328 Root;k__Animalia;p__Bryozoa;c__Gymnolaemata;o__Cheilostomatida;f__Schizoporellidae
768 2201 1864 5967 Root;k__Animalia;p__Cnidaria;c__Hydrozoa;o__Leptothecata;f__Campanulariidae
1031 10,272 5771 3249 Root;k__Animalia;p__Cnidaria;c__Hydrozoa
13 8 1 1 Root;k__Animalia;p__Chordata;c__Ascidiacea;o__Stolidobranchia;f__Styelidae
185 1 3 2 Root;k__Animalia;p__Chordata;c__Ascidiacea;o__Phlebobranchia;f__Ascidiidae
978 9 6 13 Root;k__Animalia;p__Porifera;c__Demospongiae;o__Homosclerophorida;f__Oscarellidae
5 935 1547 2543 Root;k__Animalia;p__Bryozoa;c__Gymnolaemata;o__Cheilostomatida;f__Bugulidae
388 463 595 904 Root;k__Animalia;p__Chordata
971 670 1433 645 Root;k__Animalia;p__Arthropoda
3 555 829 865 Root;k__Animalia;p__Bryozoa;c__Gymnolaemata;o__Cheilostomatida;f__Membraniporidae
589 1397 1449 207 Root;k__Animalia;p__Arthropoda
919 1043 1107 195 Root;k__Animalia;p__Arthropoda;c__Malacostraca;o__Decapoda;f__Panopeidae
517 0 1 0 Root;k__Animalia;p__Porifera;c__Demospongiae;o__Halichondrida
516 3 3 1 Root;k__Animalia;p__Arthropoda;c__Maxillopoda;o__Calanoida
312 3 2 2 Root;k__Animalia;p__Annelida;c__Polychaeta;o__Sabellida;f__Sabellidae
This is a portion of a larger OTU table obtained using high-throughput sequencing of the mitochondrial Cytochrome Oxidase c. Subu
nit Subunit 1 region [ 5 ]. The last column of the table presents taxonomic assignment of each OTU ( see Note 3 )
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
224
This is used to calculate phylogenetic alpha (e.g., phylogenetic
diversity) and beta diversity metrics (e.g., unifrac). The QIIME
script “make_phylogeny.py” generates a tree using various meth-
ods (default: FastTree). Otherwise a tree can be imported into
QIIME as a Newick formatted tree fi le. In the following tutorial,
we do not use phylogenetic alpha and beta metrics.
A parameter fi le is used to specify one or a set of values of a param-
eter within a QIIME script. The parameter fi le contains the name
of the script (i.e., alpha_diversity) followed by the name of the
parameter (i.e., metrics), followed by a tab and fi nally the value of
the parameter (i.e., observed_species, chao1). The following line
indicates that the observed number of species and chao1 should be
used to calculate alpha diversity . It should be saved as a text fi le ( see
Note 5 for a list of alpha metrics implemented in QIIME).
alpha_diversity:metrics observed_species,chao1
2.5 Phylogenetic
Tree (Input File
Optional)
2.6 Alpha Parameter
File (Input File
Optional)
Table 2
Example of a metadata mapping fi le containing information about 18 samples analyzed in Leray and
Knowlton [
5 ]
# Sample ID Barcode sequence Linker primer sequence Location Site Description
ML.0136 NA NA Virginia Site1 Virginia.Site1
ML.0137 NA NA Virginia Site1 Virginia.Site1
ML.0138 NA NA Virginia Site1 Virginia.Site1
ML.0139 NA NA Virginia Site3 Virginia.Site3
ML.0140 NA NA Virginia Site3 Virginia.Site3
ML.0141 NA NA Virginia Site3 Virginia.Site3
ML.0142 NA NA Virginia Site2 Virginia.Site2
ML.0143 NA NA Virginia Site2 Virginia.Site2
ML.0144 NA NA Virginia Site2 Virginia.Site2
ML.0145 NA NA Florida Site1 Florida.Site1
ML.0146 NA NA Florida Site1 Florida.Site1
ML.0147 NA NA Florida Site1 Florida.Site1
ML.0148 NA NA Florida Site2 Florida.Site2
ML.0149 NA NA Florida Site2 Florida.Site2
ML.0150 NA
NA Florida Site2 Florida.Site2
ML.0151 NA NA Florida Site3 Florida.Site3
ML.0152 NA NA Florida Site3 Florida.Site3
ML.0153 NA NA Florida Site3 Florida.Site3
Each sample represents a community of sessile organisms collected on settlement plates at three sites in Florida and
Virginia ( see Note 4 )
Matthieu Leray and Nancy Knowlton
225
The beta parameter fi le specifi es the beta metrics to use. The format
of the fi le is similar to the format of the alpha parameter fi le detailed
earlier. It should also be saved as a text fi le ( see Note 6 for a list of
beta metrics implemented in QIIME).
beta_diversity:metrics bray_curtis,binary_jaccard
3 Methods
Diversity analyses presented in the following section require four
input fi les placed in the same directory called “qiime_analysis”:
– otu_table_nosingleton.biom
– Metadata_map.txt
– alpha_params.txt
– beta_params.txt
See Subheading
2 for details about input fi le format.
High-throughput sequencing experiments often result in unequal
numbers of reads between samples. Differences in sequencing
depth affect estimates of alpha and beta diversity because as more
sequences are obtained more OTUs are detected. The goal is
therefore to scale the number of sequences of the larger samples
down to the smallest number of sequences that a sample contains
within the dataset. The following QIIME script creates a single
OTU table “otu_table_nosingleton_rarefi ed.biom” that has been
subsampled down to 11,982 reads for all samples.
single_rarefaction.py -i otu_table_nosingleton.biom -o otu_table_
nosingleton_rarefi ed.biom -d 11982
Rank abundance curves display OTU richness and evenness. OTU
richness in a sample can be viewed as the number of ranks that a
curve reaches. OTU evenness corresponds to the slope of the line
(Fig.
1 ). A steep line means that a few OTUs dominate the sample
in terms of abundance (low evenness). In the following command
line, -s is used to specify the name of the sample to plot. ‘*’ means
that all samples should be presented on the same plot. We also spec-
ify -a to use absolute counts and -x to represent a linear x -axis scale.
plot_rank_abundance_graph.py -i otu_table_nosingleton_rarefi ed.
biom -s '*' -o Rank_abundance_plots.pdf -a -x
Various graphical representations of taxonomic composition are
implemented in QIIME. Histograms and heatmaps drawn at various
taxonomic levels are particularly useful for interpreting patterns of
alpha and beta diversity .
2.7 Beta Parameter
File (Input File
Optional)
3.1 Rarefy OTU Table
3.2 Plot Rank
Abundance Curves
3.3 Visualize
Taxonomic
Composition
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
226
1. Calculate the relative abundance of taxonomic groups within each
sample of the rarefi ed OTU table . The following script creates
one table per taxonomic level (i.e., kingdom, phylum, class).
They are later referred to as taxonomy tables.
$ summarize_taxa.py -i otu_table_nosingle-
ton_rarefi ed.biom -o taxa_summary_relative_
abundance
2. Generate taxonomy tables with absolute abundance rather
than relative abundance using the parameter -a
$ summarize_taxa.py -i otu_table_nosingle-
ton_rarefi ed.biom -o taxa_summary_absolute_
abundance -a
3. Plot histogram displaying relative sequence abundance within
each sample at the phylum (Fig.
2 ) and class levels. Open the
html fi les to visualize the plots. QIIME also produces each plot
in a pdf format.
$ plot_taxa_summary.py -i taxa_summary_rel-
ative_abundance/otu_table_nosingleton_rarefi ed_
L3.txt -o taxa_summary_plots/plots_phylum
$ plot_taxa_summary.py -i taxa_summary_rel-
ative_abundance/otu_table_nosingleton_rarefi ed_
L4.txt -o taxa_summary_plots/plots_class
4. Plot heatmaps with relative sequence abundance to further
explore differences in composition between groups of samples.
Like histograms, they can be produced at all taxonomic levels
Fig. 1 Rank abundance curves for 18 communities of sessile organisms collected
on settlement plates in Virginia (ML.0136 ML.-0144) and Florida (ML.0145-
ML .0153). Each community was characterized using HTS of COI amplicons [ 5 ]
Matthieu Leray and Nancy Knowlton
227
represented in taxonomy tables (computed by the summarize_
taxa.py script). In the following, we specify --no_log_transform
because the input fi le contains relative abundances.
$ make_otu_heatmap.py -i taxa_summary_rel-
ative_abundance/otu_table_nosingleton_rar-
efi ed_L3.biom -o taxa_summary_plots/plots_
phylum/phylum_heatmap.pdf --no_log_transform
--absolute_abundance
$ make_otu_heatmap.py -i taxa_summary_rel-
ative_abundance/otu_table_nosingleton_rar-
efi ed_L4.biom -o taxa_summary_plots/plots_
class/class_heatmap.pdf --no_log_transform
--absolute_abundance
5. Plot heatmaps with log-transformed absolute sequence abun-
dance at the phylum (Fig.
3 ) and class levels. Because in most
datasets a few OTUs might dominate the sequence counts,
transforming the data often helps better visualize differences in
taxonomic composition .
$ make_otu_heatmap.py -i taxa_summary_abso-
lute_abundance/otu_table_nosingleton_rarefi ed_
L3.biom -o taxa_summary_plots/plots_phylum/
phylum_heatmap_log.pdf --absolute_abundance
$ make_otu_heatmap.py -i taxa_summary_abso-
lute_abundance/otu_table_nosingleton_rarefi ed_
L4.biom -o taxa_summary_plots/plots_class/
class_heatmap_log.pdf --absolute_abundance
Fig. 2 Histogram summarizing the relative number of COI amplicons within each community of sessile organ-
isms collected in Virginia (ML.0136-ML. 0144) and Florida (ML.0145-ML. 0153). Relative abundance is pre-
sented at the phylum level. See metadata fi le for more information about each sample
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
228
Alpha diversity represents the diversity within each sample ( see Note
5 for a list of alpha metrics implemented in QIIME). The follow-
ing command line creates a table with the observed number of
OTUs and Chao1 values for each sample.
$ alpha_diversity.py -i otu_table_nosingleton_
rarefi ed.biom -m observed_otus,chao1 -o alpha_
diversity.txt
Unlike species accumulation curves, individual-based rarefaction
curves are built using a resampling approach by randomly selecting
sequences at increasing levels of accumulation (e.g. 1000, 2000,
3000 reads) until all sequences have been accumulated. Many
3.4 Calculate Alpha
Diversity
3.5 Plot Alpha
Rarefaction Curves
Fig. 3 Heatmap representing log-transformed numbers of COI amplicons within each community of sessile
organisms collected in Virginia (ML.0136-ML. 0144) and Florida (ML.0145-ML. 0153)
Matthieu Leray and Nancy Knowlton
229
resampling iterations are done at each level to calculate the mean
and standard deviation of the curve.
1. Plot rarefaction curves using alpha metrics specifi ed in the
alpha_params.txt fi le ( see Subheading
2.6 ). An interactive .html
output fi le can be opened with a web browser to visualize
curves built with the different alpha metrics. Because a map-
ping fi le is also specifi ed, rarefaction curves averaged per groups
of samples are also displayed (e.g., Virginia vs. Florida).
$ alpha_rarefaction.py -i otu_table_nos-
ingleton_rarefi ed.biom -m Metadata_map.txt -o
alpha_rarefaction -p alpha_params.txt
If no alpha parameter fi le is specifi ed, the default metrics
are used, among which PD_whole_tree requires a phyloge-
netic tree .
2. Plot rarefaction plots in .pdf format (Fig.
4 ; also see Note 7 ).
$ make_rarefaction_plots.py -i alpha_rar-
efaction/alpha_div_collated -o alpha_rarefac-
tion/pdfs -g pdf -m Metadata_map.txt
Beta diversity is a measure of dissimilarity in species composition
between samples. QIIME supports both qualitative and quantita-
tive metrics of beta diversity. Qualitative metrics (e.g., binary_jac-
card) measure changes in communities driven by presence/absence
of OTUs, whereas quantitative metrics (e.g., bray_curtis) measure
differences in relative abundance of OTUs between communities.
3.6 Calculate Beta
Diversity
Fig. 4 Alpha rarefaction curves representing the observed number of OTUs as a
function of the number of resampled sequences in the OTU table . Curves were
averaged per location (red: Florida; blue: Virginia)
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
230
The following command line calculates distance matrices using the
Jaccard and Bray Curtis metrics.
$ beta_diversity.py -i otu_table_nosingleton_
rarefi ed.biom -m binary_jaccard,bray_curtis -o
beta_diversity
In the following section , we provide command lines to represent
dissimilarities in OTU composition calculated between samples
using both the qualitative Jaccard metric and the quantitative Bray
Curtis metric.
1. Plot pairwise distances within and between categories (Fig.
5 ).
$ make_distance_boxplots.py -d beta_diver-
sity/binary_jaccard_otu_table_nosingleton_
rarefi ed.txt -m Metadata_map.txt -f "Location"
-o beta_boxplot/binary_jaccard
$ make_distance_boxplots.py -d beta_diver-
sity/bray_curtis_otu_table_nosingleton_rar-
efi ed.txt -m Metadata_map.txt -f "Location"
-o beta_boxplot/bray_curtis
3.7 Visualize Beta
Diversity
Fig. 5 Boxplot representing Bray Curtis distances within and between locations
where sessile communities were collected. These plots show that, based on COI
amplicon data, the taxonomic composition of sessile communities is more simi-
lar within Florida and Virginia than it is between the two locations
Matthieu Leray and Nancy Knowlton
231
2. Calculate principal coordinate axes.
$ principal_coordinates.py -i beta_diver-
sity/binary_jaccard_otu_table_nosingleton_
rarefi ed.txt -o PC_binary_jaccard.txt
$ principal_coordinates.py -i beta_diver-
sity/bray_curtis_otu_table_nosingleton_rar-
efi ed.txt -o PC_bray_curtis.txt
3. Plot interactive three-dimensional Principal Coordinate
Analysis (PCoA) using EMPeror [
12 ] (Fig. 6 ).
$ make_emperor.py -i PC_binary_jaccard.txt
-m Metadata_map.txt -o 3D_PCoA/binary_jaccard/
$ make_emperor.py -i PC_bray_curtis.txt
-m Metadata_map.txt -o 3D_PCoA/bray_curtis/
4. Plot two-dimensional PCoA using the Jaccard and Bray Curtis
distance matrices.
$ make_2d_plots.py -i PC_binary_jaccard.txt
-m Metadata_map.txt -o 2D_PCoA_binary_jaccard
$ make_2d_plots.py -i PC_bray_curtis.txt
-m Metadata_map.txt -o 2D_PCoA_bray_curtis
5. Plot hierarchical clustering tree using Unweighted Pair Group
Method with Arithmetic mean ( UPGMA ). The resulting tree
can be visualized in FigTree (http:// tree.bio.ed.ac.uk/soft-
ware/fi gtree/).
$ upgma_cluster.py -i beta_diversity/
binary_jaccard_otu_table_nosingleton_rar-
efi ed.txt -o UPGMA_binary_jaccard.tre
Fig. 6 Three-dimensional PCoA visualized using Emperor in a web browser. Emperor is used to color by category
in the metadata mapping fi le . Here, the PCoA was built using the Bray Curtis distance matrix
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
232
$ upgma_cluster.py -i beta_diversity/
bray_curtis_otu_table_nosingleton_rarefi ed.
txt -o UPGMA_bray_curtis.tre
The reliability of beta diversity estimates is measured by resampling
random subsets of the OTU table several times, a process called
jackknifi ng. Beta diversity is then calculated for all independent
datasets and compared to the value obtained for the entire dataset.
1. Jackknife the entire dataset and compare beta diversity values.
Here, we resample 8986 sequences of each sample (which
corresponds to 75% of the smallest sample) to ensure that the
smallest sample is also randomly resampled.
$ jackknifed_beta_diversity.py -i otu_
table_nosingleton.biom -m Metadata_map.txt -o
beta_jack -e 8986 -p beta_params.txt
3.8 Evaluate
Robustness of Beta
Diversity Estimates
to Sequencing Effort
Fig. 7 UPGMA hierarchical clustering tree with branch support calculated using jackknifi ng. Branch colors
represent the level of support: red for 75-100% support; yellow for 50-75 % support; green for 25-50% sup-
port; blue for < 25% support. Here, the tree shows high level of support for differences in community composi-
tion between sessile communities in Virginia (ML.0136-ML.0144) and Florida (ML.0145-ML.0153). It also
shows strong support for community structuring at each location
Matthieu Leray and Nancy Knowlton
233
This workfl ow creates a three-dimensional PCoA with a
confi dence ellipsoid around each sample. It calculates support
values for the UPGMA tree.
2. Support values to UPGMA hierarchical clustering tree (Fig. 7)
$ make_bootstrapped_tree.py -m beta_jack/
binary_jaccard/upgma_cmp/master_tree.tre -s
beta_jack/binary_jaccard/upgma_cmp/jack-
knife_support.txt -o beta_jack/binary_jac-
card/upgma_cmp/jackknife_named_nodes.pdf
$ make_bootstrapped_tree.py -m beta_jack/
bray_curtis/upgma_cmp/master_tree.tre -s
beta_jack/bray_curtis/upgma_cmp/jackknife_
support.txt -o beta_jack/bray_curtis/upgma_
cmp/jackknife_named_nodes.pdf
3. Plot two-dimensional PCoA with confi dence ellipsoid around
each sample estimated using jackknifi ng.
$ make_2d_plots.py -i beta_jack/binary_jac-
card/pcoa/ -m Metadata_map.txt -b 'Location&&Site'
--ellipsoid_opacity=0.2 -o beta_jack/binary_
jaccard/pcoa/
$ make_2d_plots.py -i beta_jack/bray_curtis/
pcoa/ -m Metadata_map.txt -b 'Location&&Site'
--ellipsoid_opacity=0.2 -o beta_jack/
bray_curtis/pcoa/
4. Calculate the mean, median, and standard deviation of the dis-
tance matrices previously created by jackknifi ng the OTU
table .
$ dissimilarity_mtx_stats.py -i beta_jack/
binary_jaccard/rare_dm/ -o beta_jack/binary_
jaccard/rare_dm/
$ dissimilarity_mtx_stats.py -i beta_jack/
bray_curtis/rare_dm/ -o beta_jack/bray_cur-
tis/rare_dm/
5. Create boxplots to visualize variations in beta distances within and
between categories. Here, we specify the category “Location.”
$ make_distance_boxplots.py -m Metadata_
map.txt -o beta_jack/binary_jaccard/distance_
boxplot/ -d beta_jack/binary_jaccard/rare_dm/
means.txt -f Location --save_raw_data
$ make_distance_boxplots.py -m Metadata_
map.txt -o beta_jack/bray_curtis/distance_
boxplot/ -d beta_jack/bray_curtis/rare_dm/
means.txt -f Location --save_raw_data
Several statistical approaches are implemented within QIIME for
testing for differences in beta diversity between groups of samples
( see Note 8 ). Among them, the Permutational Multivariate Analysis
3.9 Test
for Differences in Beta
Diversity
Between Groups
of Samples
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME
234
of Variance ( PERMANOVA ) is a nonparametric analog of ANOVA
that tests for differences in the position of sets of objects in multi-
variate space. The parameter -c specifi es the metadata categories
that need to be compared.
compare_categories_diversity/binary_jaccard_otu_
table_nosingleton_rarefi ed.txt -m Metadata_map.
txt -c Location -o beta_tests/binary_jaccard/-
method permanova __categories.py -ibeta_diver-
sity/bray_curtis_otu_table_nosingleton_rarefi ed.
txt -m Metadata_map.txt -c Location -o beta_tests/
bray_curtis/ --method permanova
Notes
1. If fi le conversion fails and QIIME returns an error message, it
is most likely due to the presence of special characters in your
classic OTU table that are not supported by the BIOM format.
To detect and remove these characters, open your classic OTU
table in TextWrangler, and select “Zap Gremlins…” in the
“Text” menu.
2. The syntax of QIIME commands is standardized. For example,
-i is used to specify the input fi le and -o to specify the output
fi le (or directory) in all command lines.
3. Each sample identifi er should only contain alphanumeric
characters and period (.)
4. Sample identifi ers should match between the OTU table and
the mapping fi le .
5. The following alpha metrics are supported in QIIME: ace,
berger_parker_d, brillouin_d, chao1, chao1_ci, dominance, dou-
bles, enspie, equitability, esty_ci, fi sher_alpha, gini_index, goods_
coverage, heip_e, kempton_taylor_q, margalef, mcintosh_d,
mcintosh_e, menhinick, michaelis_menten_fi t, observed_otus,
observed_species, osd, simpson_reciprocal, robbins, shannon,
simpson, simpson_e, singles, strong, PD_whole_tree. Further
information about each metric can be found at
http://scikit-bio.
org/docs/latest/generated/skbio.diversity.alpha.html
.
6. The following nonphylogenetic beta metrics are supported in
Qiime: abund_jaccard, binary_dist_chisq, binary_dist_chord,
binary_dist_euclidean, binary_dist_hamming, binary_dist_jac-
card, binary_dist_lennon, binary_dist_ochiai, binary_otu_gain,
binary_dist_pearson, binary_dist_sorensen_dice, dist_bray_curtis,
dist_canberra, dist_chisq, dist_chord, dist_euclidean, dist_gower,
dist_hellinger, dist_kulczynski, dist_manhattan, dist_morisita_
horn, dist_pearson, dist_soergel, dist_spearman_approx, dist_
specprof. Phylogenetic metrics that require a phylogenetic tree
are the following: dist_unifrac_G, dist_unifrac_G_full_tree, dist_
unweighted_unifrac, dist_unweighted_unifrac_full_tree, dist_
weighted_normalized_unifrac, dist_weighted_unifrac.
Matthieu Leray and Nancy Knowlton
235
7. The legend is not displayed on the .pdf fi les but can be seen in
the html fi le.
8. Statistical methods available in QIIME to test for differences in
composition between categories of samples are as follows:
ANOSIM, BIO-ENV, Moran’s I, MRPP, PERMANOVA ,
PERMDISP, db-RDA. Further information about each test
can be found here
http://qiime.org/scripts/compare_cate-
gories.html
.
Acknowledgments
We thank Sarah Bourlat for inviting this submission. This work was
supported by the Sant Chair and the Smithsonian Tennenbaum
Marine Observatories Network, for which this is Contribution No. 3.
References
1. Mora C, Tittensor DP, Adl S et al (2011) How
many species are there on Earth and in the
Ocean? PLoS Biol 9:1001127. doi:
10.1371/
journal.pbio.1001127
2. Appeltans W, Ahyong ST, Anderson G et al
(2012) The magnitude of global marine spe-
cies diversity. Curr Biol 22:2189–202.
doi:
10.1016/j.cub.2012.09.036
3. Tittensor DP, Mora C, Jetz W et al (2010)
Global patterns and predictors of marine biodi-
versity across taxa. Nature 466:1098–101.
doi:
10.1038/nature09329
4. Fonseca VG, Carvalho GR, Sung W et al
(2011) Second-generation environmental
sequencing unmasks marine metazoan biodi-
versity. Nat Commun 1:1–8. doi:
10.1038/
ncomms1095
5. Leray M, Knowlton N (2015) DNA barcoding
and metabarcoding of standardized samples
reveal patterns of marine benthic diversity. Proc
Natl Acad Sci U S A 112:2076–2081.
doi:
10.1073/pnas.1424997112
6. Leray M, Yang JY, Meyer CP et al (2013) A new
versatile primer set targeting a short fragment of
the mitochondrial COI region for metabarcod-
ing metazoan diversity: application for character-
izing coral reef fi sh gut contents. Front Zool
10:34. doi:
10.1186/1742-9994-10-34
7. Taberlet P, Coissac E, Pompanon F et al
(2012) Towards next-generation biodiversity
assessment using DNA metabarcoding. Mol
Ecol 21:2045–50. doi:
10.1111/j.1365-
294X.2012.05470.x
8. Caporaso JG, Kuczynski J, Stombaugh J et al
(2010) QIIME allows analysis of high-
throughput community sequencing data. Nat
Methods 7:335–6. doi:
10.1038/nmeth.f.303
9. Schloss PD, Westcott SL, Ryabin T et al (2009)
Introducing mothur: open-source, platform-
independent, community-supported software
for describing and comparing microbial com-
munities. Appl Environ Microbiol 75:7537–
7541. doi:
10.1128/aem.01541-09
10. Angiuoli SV, Matalka M, Gussman A et al (2011)
CloVR: a virtual machine for automated and
portable sequence analysis from the desktop
using cloud computing. BMC Bioinformatics
12:356. doi:
10.1186/1471-2105-12-356
11. Hildebrand F, Tadeo R, Voigt A et al (2014)
LotuS: an effi cient and user-friendly OTU pro-
cessing pipeline. Microbiome 2:30.
doi:
10.1186/2049-2618-2-30
12. Vázquez-Baeza Y, Pirrung M, Gonzalez A,
Knight R (2013) EMPeror: a tool for visual-
izing high-throughput microbial community
data. Gigascience 2:16. doi:
10.1186/2047-
217X-2-16
13. McDonald D, Clemente JC, Kuczynski J et al
(2012) The Biological Observation Matrix
(BIOM) format or: how I learned to stop wor-
rying and love the ome-ome. Gigascience 1:7.
doi:
10.1186/2047-217X-1-7
Visualizing Patterns of Marine Eukaryotic Diversity from Metabarcoding Data Using QIIME