ArticlePDF Available

Abstract and Figures

Background The common long-arm octopus (Octopus minor) is found in mudflats of subtidal zones and faces numerous environmental challenges. The ability to adapt its morphology and behavioural repertoire to diverse environmental conditions makes the species a promising model to understand genomic adaptation and evolution in cephalopods. Findings The final genome assembly of O. minor is 5.09 Gb, with a contig N50 size of 197 kb and longest size of 3.027 Mb, from a total of 419 Gb raw reads generated using PacBio RS II platform. We identified 30,010 genes and 44.43% of the genome is composed of repeat elements. The genome-wide phylogenetic tree indicated the divergence time between O. minor and O. bimaculoides was estimated to be 43 million years ago (Mya) based on single-copy orthologous genes. In total, 178 gene families are expanded in O. minor in the 14 bilaterian species. Conclusion We found that the O. minor genome was larger than that of closely related O. bimaculoides, and this difference could be explained by enlarged introns and recently diversified transposable elements. The high-quality O. minor genome assembly provides a valuable resource for understanding octopus genome evolution and the molecular basis of adaptations to mudflats.
Content may be subject to copyright.
The genome of common long-arm octopus
Octopus
minor
Bo-Mi Kim a,†, Seunghyun Kang a,†, Do-Hwan Ahn a,†, Seung-Hyun Jung b,†,
Hwanseok Rhee c,†, Jong Su Yoo b, Jong-Eun Lee c, SeungJae Lee c, Yong-Hee
Han d, Kyoung-Bin Ryu d, Sung-Jin Cho d,*, Hyun Park a,e,*, Hye Suck An b,*
Affiliations
a Unit of Polar Genomics, Korea Polar Research Institute(KOPRI), Incheon 21990,
Korea
b Department of Genetic Resources Research, National Marine Biodiversity
Institute of Korea (MABIK), Janghang-eup, Seochun-gun, Chungchungnam-do
33662, Korea
c Genomics Lab, Cluster Center, DNA Link, Inc., 150, Bugahyeon-ro, Seodaemun-
gu, Seoul 03759, Korea
d School of Biological Sciences, College of Natural Sciences, Chungbuk National
University,
Cheongju, Chungbuk 28644, Korea
e Polar Sciences, University of Science & Technology, Yuseong-gu, Daejeon 34113,
Korea
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
*Co-corresponding author:
School of Biological Sciences, College of Natural Sciences, Chungbuk National
University, Cheongju, Chungbuk 28644, Korea; E-mail address:
sjchobio@chungbuk.ac.kr (S. Cho)
Unit of Polar Genomics, Korea Polar Research Institute, Incheon 21990, Korea; E-
mail address: hpark@kopri.re.kr (H. Park)
Department of Genetic Resources Research, National Marine Biodiversity Institute
of Korea (MABIK), Janghang-eup, Seochun-gun, Chungchungnam-do 33662,
Korea; E-mail address: mgran@mabik.re.kr H.S. An)
These authors contributed equally to this work.
Abstract
Background: The common long-arm octopus (
Octopus minor
) is found in
mudflats of subtidal zones and faces numerous environmental challenges. The
ability to adapt its morphology and behavioural repertoire to diverse
environmental conditions makes the species a promising model to understand
genomic adaptation and evolution in cephalopods.
Findings: The final genome assembly of
O.
minor
is 5.09 Gb, with a contig N50
size of 197 kb and longest size of 3.027 Mb, from a total of 419 Gb raw reads
generated using PacBio RS II platform. We identified 30,010 genes and 44.43% of
the genome is composed of repeat elements. The genome-wide phylogenetic
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
tree indicated the divergence time between
O. minor
and
O. bimaculoides
was
estimated to be 43 million years ago (Mya) based on single-copy orthologous
genes. In total, 178 gene families are expanded in
O
.
minor
in the 14 bilaterian
species.
Conclusion: We found that the
O. minor
genome was larger than that of closely
related
O. bimaculoides,
and this difference could be explained by enlarged
introns and recently diversified transposable elements. The high-quality
O
.
minor
genome assembly provides a valuable resource for understanding octopus
genome evolution and the molecular basis of adaptations to mudflats.
Key words:
Octopus genome, Cephalopods, adaptation and evolution, long-read sequencing
Background
Cephalopods (
e.g.
cuttlefish, nautilus, octopus, and squid) belong to the
phylum Mollusca, which is one of the most diverse phylum within
Lophotrochozoa. Regardless of their evolutionary, biological and economic
significance, their genome information is still limited to a few species[1,2,3,4].
Cephalopods have interesting biological characteristics, such as an
extraordinary life-history plasticity, rapid growth, short lifespan, large brain, and
sophisticated sense organs with a complex nervous system[5]. The ability to adapt
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
their morphology and behavioural repertoire to diverse environmental conditions
and capacity for learning and memory are common traits in cephalopods, but
have rarely been observed in other invertebrates[6]. Many cephalopod species
have been considered for fisheries and are promising candidates for aquaculture.
There are an estimated 1,000 cephalopod species (~700 known marine-living
species), and octopods are among the most well-known representatives of the
class, including over 150 species worldwide[7]. Studies have evaluated the
biological machinery underlying the fundamental nervous system functions,
strong behavioural plasticity, and learning ability in octopods[8, 9].
Octopus minor
(Sasaki, 1920) (NCBI:txid515824), also known as the common
long-arm octopus, is a benthic littoral species, and is a major commercial fishery
product with a high annual yield[10].
O
.
minor
is relatively small and possesses a
shorter life cycle (approximately 1 year), thinner arms, and a lower ratio between
head size and arm length compared to those of other octopus species (Fig. 1a
and 1b). The species is widely distributed in Northeast Asia, particularly in coastal
regions of South Korea, China, and Japan (Fig. 1c). Most
O
.
minor
habitats are
mud and mud-sand in well-developed mudflats of coastal regions; they spawn in
holes on the mudflat by digging with the whole body. Thus, they are subjected to
the harsh environmental conditions of mudflats, including diurnal temperature
changes, steep salinity and pH gradients, desiccation, wave action and tides,
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
oxygen availability, and interrupted feeding. Owing to the ability of
O
.
minor
to
tolerate environmental fluctuations, it is a promising organism for studies of the
molecular basis of plasticity and mechanisms underlying adaptation to harsh
environmental conditions, although relevant information is scarce. To make full
use of this emerging cephalopod model system and to understand the
interesting features of
O
.
minor
, including its plasticity in mudflats and genetic
evolution, a high-quality reference genome is required.
The published genome and multiple transcriptomes of the California two-spot
octopus
Octopus bimaculoides
have provided valuable information on genomic
traits (
e.g.
gene family expansion, genome rearrangements, and transposable
element activity) related to the evolution of neural complexity and morphological
innovations[3]. In this study, we report a high-quality genome assembly and
annotation for
O
.
minor
. We compare the genomes of
O
.
minor
and
O
.
bimaculoides
and provide evidence that the expansion of genes and/or gene
families is related to adaptation to the harsh environmental conditions of
mudflats.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Data description
Genome sequencing and annotation
O
.
minor
genomic DNA was extracted from leg muscle tissues. The average
coverage of SMRT sequences was ~76-fold using P6-C4 sequence chemistry from
genomic DNA libraries which was sequenced by PacBio RS II. The average
subread length was 9.2 kb (Supplementary Table S1). For genome size estimation,
k-mer analysis was performed using Jellyfish ver. 2.1.3(Jellyfish,
RRID:SCR_005491)[11] with paired-end sequences of the genomic DNA libraries.
The
O
.
minor
genome was estimated to be 5.1 Gb (Supplementary Figs. S1 and
S2). The
de novo
assembly generated using FALCON-Unzip assembler ver. 0.4
was 5.09 Gb with 41,584 contigs(Falcon, RRID:SCR_016089)[12]. Finally, evaluation
of the genome completeness was checked using BUSCO ver. 1.22(BUSCO ,
RRID:SCR_015008)[13] (Table 1).
Total RNA was extracted from 13 tissues (brain, branchial heart, buccal mass,
eye, heart, kidney, liver, ovary, poison gland, siphon, skin, and suckers) using the
RNeasy Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s
instructions. RNA quality was confirmed using an Agilent Bioanalyzer. Isoform
sequencing was performed using pooled RNA from thirteen organs. Library
construction and sequencing were performed using PacBio RS II (Supplementary
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Table S2). The SMRTbell library for Iso-seq was sequenced using 16 SMRT cells
(12 kb, three cells; 23 kb, six cells; and 36 kb, seven cells). Reads were
identified using the SMRT Analysis ver. 2.3 RS_IsoSeq.1 classification protocol. All
full-length reads derived from the same isoform were clustered and consensus
sequences were polished using the TOFU pipeline (isoseq-tofu)[14]. Additionally,
chimeras of consensus sequences generated during experiments and TOFU
pipeline were removed using in-house script.
MAKER ver. 2.28 was used for genome annotation(MAKER,
RRID:SCR_005309)[15]. First, repetitive elements were identified using
RepeatMasker ver. 4.0.7(RepeatMasker , RRID:SCR_012954)[16]. A
de novo
repeat
library was constructed using RepeatModeler ver. 1.0.3(RepeatModeler,
RRID:SCR_015027)[17], including RECON ver. 1.08[18] and RepeatScout ver.
1.0.5(RepeatScout , RRID:SCR_014653)[19], with default parameters. Consensus
sequences and classification information for each repeat family were generated,
and tandem repeats, including simple repeats, satellites, and low-complexity
repeats, were predicted using Tandem Repeats Finder[14]. This masked genome
sequence was used for
ab initio
gene prediction with SNAP software(SNAP - SNP
Annotation and Proxy Search, RRID:SCR_002127)[20]; subsequently, alignments of
expressed sequence tags with BLASTn ver. 2.2.28+(BLASTN, RRID:SCR_001598)
and protein information from tBLASTx ver. 2.2.28+ (TBLASTX , RRID:SCR_011823)
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
were included. The
de novo
repeat library of
O
.
minor
from RepeatModeler was
used for RepeatMasker; proteins from sequenced molluscs (
L. gigantea
,
C. gigas
,
and
Aplysia californica
) and an octopus species (
O
.
bimaculoides
) were included
in the analysis. Transcriptome assembly results were used for expressed sequence
tags. Next, MAKER polished the alignments using Exonerate, which provided
integrated information for SNAP annotation. Using MAKER, the final gene model
was selected and revised considering all information. A total of 30,010
O
.
minor
genes were predicted using MAKER. The Infernal software package ver.
1.1(Infernal , RRID:SCR_011809)[21] and covariance models from the Rfam(Rfam ,
RRID:SCR_007891)[22] database were used to identify other non-coding RNAs in
the
O
.
minor
scaffold. Putative tRNA genes were identified using tRNAscan-SE
ver. 1.4(tRNAscan-SE, RRID:SCR_010835)[23]. tRNAscan-SE uses a covariance
model that scores candidates based on their sequence and predicted secondary
structures.
The mean size of
O
.
minor
genes was 23.6 kb, with an average intron length
of 5.4 kb (4.2 introns per gene) (Supplementary Table S3). The
O
.
minor
genome
contained 30,010 protein-coding genes (Table 2), of which 96% were annotated
based on known proteins in public databases, and 79% were similar to
O
.
bimaculoides
genes (Supplementary Table S4).
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Comparative genomic analyses and duplicate genes
To resolve gene family evolution in the
O
.
minor
genome, we classified
orthologous gene clusters (Supplementary Table S5) from 14 species and found
evidence for the recent expansion of low-copy gene duplicates and the expansion
of large gene families. Orthologous groups were identified using both OrthoMCL
ver. 2.0.9 [24] and Pfam(Pfam , RRID:SCR_004726)[25] domain assignments.
OrthoMCL generated a graphical representation of sequence relationships, which
was then divided into subgraphs using the Markov Clustering Algorithm (MCL)
from multiple eukaryotic genomes[24]. The default parameters and options of
OrthoMCL were used for all steps, together with the genomes of 14 species
(Supplementary Table S5). For
O
.
minor
, the coding sequence from the MAKER
annotation pipeline was used. To construct a phylogenetic tree and estimate the
divergence time, 202 1:1 single-copy orthologous genes were used. Using the
Probabilistic Alignment Kit (PRANK) ver.140603 [26], protein-coding genes were
aligned with the codon alignment option, and poorly aligned regions with gaps
were eliminated using Gblocks ver. 0.91b [27] with a codon model. A maximum-
likelihood tree was built using RAxML ver. 8.2.4(RAxML, RRID:SCR_006086)[28]
with 1,000 bootstrap replicates, and the divergence time was calibrated using
TimeTree[29]. The average gene gain-loss was identified using CAFE ver. 4.0[30]
with
p
-value < 0.05.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Sequence divergence was estimated by calculating
d
S values using the yn00
program from the PAML package ver. 4.7a(PAML , RRID:SCR_014932)[31]. The
JukesCantor distances were adjusted using the JukesCantor formula
d
XY = -
(3/4)ln(1-4/3D), where D is the proportion of nucleotide differences between the
sequences. The time estimation was calibrated by assuming
d
S of ~1 is 135
million years[7].
Gene family analyses of specific genes of interest were manually curated using
manual gene search methods. Gene or gene family targets identified in the
genomes of
O. bimaculoides
,
Crassostrea gigas
,
Lottia gigantea
,
Capitella teleta
,
and
Homo sapiens
were directly mapped to the
O
.
minor
genome database by a
local BLAST analysis. Alignments were generated using Clustal Omega (ClustalO)
ver. 1.2.4[32] and Multiple Sequence Comparison by Log-Expectation (MUSCLE)
ver. 3.8.31(MUSCLE , RRID:SCR_011812)[33], and phylogenetic trees were built
using FastTree[34] or RAxML with 1,000 bootstrap replicates.
Gene gain-loss analysis indicated significantly greater gene family expansion in
O
.
minor
(178 gene families) compared to other species,
e.g.
interleukin-17, G
protein-coupled receptor (GPCR) proteins, Zinc-finger of C2H2 type, heat shock
protein (HSP) 70 proteins, and cadherin-like domains (Supplementary Tables S6
S8). The divergence time between
O
.
minor
and
O
.
bimaculoides
was estimated
to be 43 million years ago (Mya) based on single-copy orthologous genes (Fig.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
2a) Further, Pfam domain and EggNOG metazoan database searches consistently
showed the expansion of gene families, including the cadherin and protocadherin
domains and interleukin-17 (Fig. 2b and Supplementary Tables S9 and S10).
Previously, 168 protocadherin (
pcdhs
) genes were annotated in the genome of
O. bimaculoides
, which is the largest number among sequenced metazoan
genomes[3]. In the case of C2H2 zinc finger gene family, approximately 1,800
C2H2 genes were annotated in the
O. bimaculoides
genome. The drastic
expansions were also observed in the genome of
O. minor
, as 303 and 2,289
genes were annotated for
pcdhs
and C2H2 zinc finger gene family, respectively.
We assume that the expansion patterns are unique to the genus
Octopus
, as the
expansion pattern was not detected in squid and the
pcdhs
seem to have
expanded after octopuses diverged from squid ( 135 Mya)[3]. Since we
estimated that
O. minor
diverged from the genus
Octopus
, the extraordinary
expansions of both gene families are presumably
Octopus
-specific.
Transposable element annotation and expansions
The
O
.
minor
genome (5.1 Gb) is composed of 44 % repetitive sequences and
0.68 % coding sequences, while
O
.
bimaculoides
genome (2.7 Gb) made up of 35%
repetitive sequences and 1.08 % coding sequences. Repeats were dominated by
simple repeats (14.7% of genome) and transposable elements (TEs), especially
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
DNA transposons and long interspersed elements (LINEs), which were more
abundant in the
O
.
minor
genome than in the
O
.
bimaculoides
genome
(Supplementary Tables S11S13). In an analysis of genes (i.e. exons and introns)
and intergenic sequences, TEs were highly distributed in the intergenic sequence
regions in both species (Supplementary Fig. S4). In particular, TE accumulation in
intergenic sequence regions was significantly greater in
O. minor
than in O.
bimaculoides
. The larger number of gene size and higher repeat content may
explain the larger genome of
O
.
minor
compared with
O
.
bimaculoides
.
TEs are components of animal genomes, with major roles in genome
rearrangements and evolution. Based on the mechanism of transposition, TEs are
grouped into two main classes, class I retrotransposons, which are subdivided
into long terminal repeats (LTRs) and non-LTR retrotransposons [
e.g.
LINEs and
short interspersed elements (SINEs)], and class II DNA transposons[35]. We
detected more TEs in the larger genome of
O
.
minor
than in the smaller genome
of
O
.
bimaculoides
. Approximately half of the
O
.
minor
genome was composed of
TEs (11,547,325 TEs; 44% of the genome), while one-third of the
O
.
bimaculoides
genome was composed of TEs (3,887,025 TEs; 35%) (Supplementary Table S11).
The majority of class I retrotransposons in the
O
.
minor
genome were LINEs
(10%), as was also the case in
O
.
bimaculoides
(9%), and the proportion of DNA
transposons in
O
.
minor
(13%) was comparable to that in
O
.
bimaculoides
(12%).
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Interestingly, the
O
.
minor
genome had fewer SINEs (1,540 copies; 0.01%) and
more rolling-circle (RC)-Helitrons (121,101 copies; 3.7%) than the
O
.
bimaculoides
genome (SINEs: 115,169 copies, 1.8%; RC-Helitron: 43,735 copies, 0.7%). A Kimura
distance analysis revealed that the most frequent TE sequence divergence relative
to the TE consensus sequence was ~710%, with an additional peak at 3% (Fig.
3a), compared to 1617% in the
O
.
bimaculoides
genome (Fig. 3b and
Supplementary Table S11).
A more recent expansion of LINEs, without an increase in SINEs, was detected
in the
O
.
minor
genome, while ancient copies of all four types of TEs and an
ancient transposition burst of DNA transposons were observed in
O
.
bimaculoides
.
Using the recent TE expansion in the
O
.
minor
genome, we correlated Jukes
Cantor distance measures with
d
S and identified two unique expansion waves at
0.04 and 0.09 compared to the distribution of
O
.
bimaculoides
TEs
(Supplementary Figs. S5 and S6). This suggests that a major expansion of TEs in
the
O
.
minor
genome occurred 11 to 25 Mya, which is after the divergence of
O
.
minor
and
O
.
bimaculoides
.
Conclusions
O. minor
has developed morphological and physiological adaptations to
match their unique mudflat habitats. In summary, we generated a high-quality
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
sequence assembly for
O. minor
to elucidate the molecular mechanisms
underlying their adaptations. In a direct comparison between the genomes of
O.
minor
and
O
.
bimaculoides
, we discovered that they evolved recently and
independently from the octopus lineage during the successful transition from an
aquatic habitat to mudflats. We also found evidence suggesting that speciation in
the genus
Octopus
is closely related to the gene family expansion associated with
environmental adaptation. Finally, in addition to providing insights into the
genome size increase via gene family expansion, the
O. minor
genome sequence
also provides an essential resource for studies of Cephalopoda evolution.
Availability of supporting data
The octopus (
O. minor
) genome project was deposited at NCBI under
BioProject number PRJNA421033. The whole-genome sequence was deposited in
the Sequence Read Archive (SRA) database under accession number SRX3462978,
and isoform sequence from PacBio sequencing data were deposited in the SRA
database under accession numbers SRX3478495 and SRX3478496. Other
supporting data, including annotations, alignments, and BUSCO results, are
available in the
GigaScience
repository, GigaDB [36].
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Ethics Statement
No specific permits were required for the described field studies, no specific
permissions were required for these locations/activities and the field studies did
not involve endangered or protected species.
Abbreviations
Gb: Gigabases; GPCR: G protein-coupled receptor; HSP: heat shock protein; LINEs:
long interspersed elements; LTR: Long Terminal Repeats; MUSCLE: Multiple
Sequence Comparison by Log-Expectation; MCL: Markov Clustering Algorithm;
Mya: Million years ago; PRANK; Probabilistic Alignment Kit; SINEs: short
interspersed elements; TEs: Transposable Elements.
Additional files
Fig. S1. Estimation of genome size of
O. minor
based on distribution of 17 k-mer
frequency in raw sequencing reads.
Fig. S2. Genome size determination by flow cytometry. The flow cytometry
analysis provides as estimation of Propidium iodide (PI) staining. Accepting a
haploid genome size estimate of 2.81 Gb for Mouse (Assembly; GRCm38.p6),
we estimate the genome size of O. minor to be 5.38 Gb.
Fig. S3. Blast top hit distribution.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Fig. S4. Composition of transposable elements in the regions of gene and
intergenic sequence.
Fig. S5. Transposable elements Juke-cantor distance distribution.
Fig. S6. Transposable elements Juke-cantor distance distribution of
O. minor
.
Table S1. Statistics for SMRT sequencing for the
O. minor
genome sequencing.
Table S2. Isoform sequencing summary of transcriptome analysis of
O. minor
using PacBio RSII.
Table S3. Brief summary of gene statistics.
Table S4. Functional annotation statistics of transcriptome assembly.
Table S5. Summary of orthologous gene clusters analyzed in 14 species.
Table S6. CAFÉ gene family analysis results.
Table S7. Example of top 30 CAFÉ significantly expanded gene families.
Table S8. Example of top 30 CAFÉ significantly shrinked gene families.
Table S9. Top 30 expanded Pfam domains.
Table S10. Top 30 expanded EggNOG domains.
Table S11. Statistics of repeat analysis of the
O. minor
genome.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Table S12. Classifications and frequencies of transposable elements and other
repeats.
Table S13. Classifications and frequencies of simple repeats.
Supplementary text commands
Acknowledgements
We thank Jong Won Han and Ha Yeun Song of the National Marine Biodiversity
Institute of Korea (MABIK) for the sampling of 18 tissues used for transcriptome
assembly, as well as Keekwang Kim of Chungnam National University and Kun-
Hee Kim of Chonnam National University for their devotion to estimate the
genome size of
O. minor
by flow cytometry. We also thank Jeollanam-Do Oceans
& Fisheries Science Institute for providing octopus embryos.
Funding
This work was supported by grants (2018M00900) from MABIK.
Competing interests
The authors declare that they have no competing interests.
Author contributions
H.S.A., H.P., and J.L. conceived the study. H.P., B.K., S.K., D.A., S.J., J.L., H.R., and S.L.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
performed genome sequencing, assembly, and annotation. S.J., Y.H., K.R., and S.C.
performed experiments. J.S.Y., H.S.A., H.P., S.J., and J.L. advised and coordinated
the study. B.K., S.K., D.A., and H.P. mainly wrote the paper. All authors contributed
to writing and editing the manuscript and supplementary information and
producing the figures.
References
1. Takeuchi T, Kawashima T, Koyanagi R, Gyoja F, Tanaka M, Ikuta T, et al.
Draft genome of the pearl oyster
Pinctada fucata
: a platform for
understanding bivalve biology. DNA Research. 2012:dss005.
2. Zhang G, Fang X, Guo X, Li L, Luo R, Xu F, et al. The oyster genome reveals
stress adaptation and complexity of shell formation. Nature. 2012;490
7418:49.
3. Albertin CB, Simakov O, Mitros T, Wang ZY, Pungor JR, Edsinger-Gonzales
E, et al. The octopus genome and the evolution of cephalopod neural and
morphological novelties. Nature. 2015;524 7564:220-4.
4. Luo Y-J, Takeuchi T, Koyanagi R, Yamada L, Kanda M, Khalturina M, et al.
The Lingula genome provides insights into brachiopod evolution and the
origin of phosphate biomineralization. Nature Communications.
2015;6:8301.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
5. Boyle P and Rodhouse P. Cephalopods: ecology and fisheries. Oxford: Bl
ackwell Science Ltd; 2005.
6. Hanlon RT and Messenger JB. Cephalopod behaviour. Cambridge: Cambr
idge University Press; 1998.
7. Guzik MT, Norman MD and Crozier RH. Molecular phylogeny of the be
nthic shallow-water octopuses (Cephalopoda: Octopodinae). Mol Phyloge
n Evol. 2005;37 1:235-48.
8. Hochner B, Shomrat T and Fiorito G. The octopus: a model for a comp
arative analysis of the evolution of learning and memory mechanisms. B
iol Bull. 2006;210 3:308-17.
9. Mather JA. Cephalopod consciousness: behavioural evidence. Conscious
Cogn. 2008;17 1:37-48.
10. MIFAFF. Food, Agriculture, Forestry and Fisheries statistical yearbook. Se
oul: Forestry and Fisheries (MIFAFF) Press; 2012.
11. Marçais G and Kingsford C. A fast, lock-free approach for efficient parall
el counting of occurrences of k-mers. Bioinformatics. 2011;27 6:764-70.
12. Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A,
et al. Phased diploid genome assembly with single molecule real-time
sequencing. Nat Methods. 2016;13 12:1050.
13. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV and Zdobnov E
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
M. BUSCO: assessing genome assembly and annotation completeness wi
th single-copy orthologs. Bioinformatics. 2015;31 19:3210-2.
14. Gordon SP, Tseng E, Salamov A, Zhang J, Meng X, Zhao Z, et al. Wides
pread polycistronic transcripts in fungi revealed by single-molecule mRN
A sequencing. PLoS ONE. 2015;10 7:e0132628.
15. Holt C and Yandell M. MAKER2: an annotation pipeline and genome-da
tabase management tool for second-generation genome projects. BMC
Bioinformatics. 2011;12 1:491.
16. Smit AFA HR, Green, P. RepeatMasker Open-3.0. 1996-2004 (http://www.
RepeatMakser.org).
17. Bao Z and Eddy SR. Automated de novo identification of repeat sequen
ce families in sequenced genomes. Genome research. 2002;12 8:1269-76.
18. Price AL, Jones NC and Pevzner PA. De novo identification of repeat fa
milies in large genomes. Bioinformatics. 2005;21 suppl_1:i351-i8.
19. Benson G. Tandem repeats finder: a program to analyze DNA sequence
s. Nucleic acids research. 1999;27 2:573.
20. Korf I. Gene finding in novel genomes. BMC bioinformatics. 2004;5 1:59.
21. Nawrocki EP, Kolbe DL and Eddy SR. Infernal 1.0: inference of RNA alig
nments. Bioinformatics. 2009;25 10:1335-7.
22. Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, et al.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 20
10;39 suppl_1:D141-D5.
23. Lowe TM and Eddy SR. tRNAscan-SE: a program for improved detection
of transfer RNA genes in genomic sequence. Nucleic acids research. 19
97;25 5:955.
24. Li L, Stoeckert CJ, Jr. and Roos DS. OrthoMCL: identification of ortholog
groups for eukaryotic genomes. Genome Res. 2003;13 9:2178-89.
25. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al
. Pfam: the protein families database. Nucleic Acids Res. 2013;42 D1:D22
2-D30.
26. ytynoja A and Goldman N. An algorithm for progressive multiple alig
nment of sequences with insertions. Proceedings of the National Acade
my of Sciences of the United States of America. 2005;102 30:10557-62.
27. Castresana J. Selection of conserved blocks from multiple alignments for
their use in phylogenetic analysis. Molecular biology and evolution. 200
0;17 4:540-52.
28. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and pos
t-analysis of large phylogenies. Bioinformatics. 2014;30 9:1312-3.
29. Hedges SB, Dudley J and Kumar S. TimeTree: a public knowledge-base
of divergence times among organisms. Bioinformatics. 2006;22 23:2971-2
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
.
30. Han MV, Thomas GW, Lugo-Martinez J and Hahn MW. Estimating gene
gain and loss rates in the presence of error in genome assembly and
annotation using CAFE 3. Molecular biology and evolution. 2013;30 8:19
87-97.
31. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecula
r biology and evolution. 2007;24 8:1586-91.
32. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scal
able generation of highquality protein multiple sequence alignments usi
ng Clustal Omega. Molecular systems biology. 2011;7 1:539.
33. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res. 2004;32 doi:10.1093/nar/gkh340.
34. Price MN, Dehal PS and Arkin AP. FastTree 2approximately maximum-li
kelihood trees for large alignments. PloS one. 2010;5 3:e9490.
35. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al.
A unified classification system for eukaryotic transposable elements. Na
ture Reviews Genetics. 2007;8 12:973-82.
36. Kim B; Kang S; Ahn D; Jung S; Rhee H; Yoo JS; Lee J; Lee S; Han Y; Ry
u K; Cho S; Park H; An HS (2018): Supporting data for "The genome of
common long-arm octopus Octopus minor" GigaScience Database. http
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
://dx.doi.org/10.5524/100503
Figure 1: Common long-arm octopus (
Octopus minor
). a Photograph of
O. minor.
b Habitat structure of mudflats and phenotypic differences between
O. minor
and
O
.
bimaculoides
.
O
.
minor
has a smaller body size and possesses longer, thinner
arms than those of
O
.
bimaculoides
. c The distribution of
O
.
minor
is shown in
dark red. The distribution map was updated from Roper
et al
. (1984).
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Figure 2: Gene family analysis for 14 bilaterian species. a Divergence times
estimated from genome sequences of 14 bilaterian species. b Heat map of
expanded Pfam domains in the
O
.
minor
genome. OM,
Octopus minor
; OB,
Octopus bimaculoides
; LG,
Lottia gigantea
; CG,
Crassostrea gigas
; PF,
Pinctada
fucata
; LA,
Lingula anatina
; CT,
Capitella teleta
; HR,
Helobdella robusta
; CE,
Caenorhabditis elegans
; DM,
Drosophila melanogaster
; DP,
Daphnia pulex
; SP,
Strongylocentrotus purpuratus
; MM,
Mus musculus
; HS,
Homo sapiens.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
Figure 3: Transposable element (TE) accumulation history in the
Octopus
genomes. Kimura distance-based copy divergence analysis of TEs for a,
O
.
minor
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
and b,
O
.
bimaculoides
.
x
-axis, K-value;
y
-axis, genome coverage for each type of
TE.
Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy119/5106932 by guest on 26 September 2018
... We identified complete lamin genes for five cephalopod species for which whole genome assemblies of high quality were available, namely Nautilus pompilius (Ogura et al. 2013;Zhang et al. 2021), of the two closely related species of the common Octopus complex (Gleadall 2016), i.e. Octopus sinensis and Octopus vulgaris (Zarrella et al. 2019), the gene of Octopus minor (Kim et al. 2018), and of Architeuthis dux (da Fonseca et al. 2020). We determined the exon-intron patterns for these five genes, three of which are shown (Fig. 3a, b, c). ...
Article
Full-text available
Nuclear lamins are the main components of the nuclear lamina in many eukaryotes. They are members of the intermediate filament (IF) protein family. Lamins differ from cytoplasmic IF proteins by the presence of a nuclear localisation sequence (NLS) and a C-terminal tetrapeptide, the CaaX motif. The CaaX motif is target of post-translational modifications including isoprenylation, proteolytic processing, and carboxyl-methylation. These modifications, in conjunction with the NLS, direct lamins to the inner nuclear membrane where they assemble into filaments. Lamins lacking a CaaX motif are unable to associate independently with nuclear membranes and remain in the nucleoplasm. So far, three species have been reported to exclusively express CaaX-less lamins. All three belong to the lophotrochozoan lineage. To find out whether they represent rare exceptions, we analysed lamins of representatives of 17 lophotrochozoan phyla. Here we report that all four clades of Rotifera as well as individual taxa of Mollusca and Annelida lack CaaX-lamins, but express lamins with alternative C-termini. Of note, the respective mollusc and annelid groups occupy very different phylogenetic ranks. Most of these alternative C-termini are rich in aromatic residues. A possible function of these residues in membrane association is discussed. Alternative splicing of terebellid lamin transcripts gives rise to two lamin variants, one with a CaaX motif and one with an alternative C-terminus. A similar situation is found in Arenicolidae, Opheliidae, Capitellidae, and Echiura. This points a way, how the switch from lamins carrying a CaaX motif to lamins with alternative C-termini may have occurred.
... The recently increased availability of genomic information has spurred molecular research on several cephalopod species, including Octopus vulgaris (Albertin et al., 2015;Kim et al., 2018;Zarrella et al., 2019;Li et al., 2020). O. vulgaris or the common octopus, is a cosmopolitan species, and has been the subject of many seminal studies of neural anatomy and behavior (Young, 1971(Young, , 1983Fiorito et al., 1990;Amodio and Fiorito, 2013). ...
Article
Full-text available
Gene expression analysis has been instrumental to understand the function of key factors during embryonic development of many species. Marker analysis is also used as a tool to investigate organ functioning and disease progression. As these processes happen in three dimensions, the development of technologies that enable detection of gene expression in the whole organ or embryo is essential. Here, we describe an optimized protocol of whole mount multiplexed RNA in situ hybridization chain reaction version 3.0 (HCR v3.0) in combination with immunohistochemistry (IHC), followed by fructose-glycerol clearing and light sheet fluorescence microscopy (LSFM) imaging on Octopus vulgaris embryos. We developed a code to automate probe design which can be applied for designing HCR v3.0 type probe pairs for fluorescent in situ mRNA visualization. As proof of concept, neuronal ( Ov-elav ) and glial ( Ov-apolpp ) markers were used for multiplexed HCR v3.0. Neural progenitor ( Ov-ascl1 ) and precursor ( Ov-neuroD ) markers were combined with immunostaining for phosphorylated-histone H3, a marker for mitosis. After comparing several tissue clearing methods, fructose-glycerol clearing was found optimal in preserving the fluorescent signal of HCR v3.0. The expression that was observed in whole mount octopus embryos matched with the previous expression data gathered from paraffin-embedded transverse sections. Three-dimensional reconstruction revealed additional spatial organization that had not been discovered using two-dimensional methods.
... Our assembly of the E. berryi genome totals 5.5 Gb and captures the 46 chromosomes found in decapods in a more contiguous fashion (N50 chromosome length: 113.96 Mb, N50 contig length: 827 kb) than other available cephalopod genomes (Albertin et al., 2022a(Albertin et al., , 2015bBelcaid et al., 2019;Kim et al., 2018;Zarrella et al., 2019;Zhang et al., 2021). For instance, we recovered the massive 17 Mb Hox cluster as an intact locus on chromosome 9 ( Figure S1G). ...
Preprint
Although the camera-type eyes of cephalopods and vertebrates are a canonical example of convergent morphological evolution, the cellular and molecular mechanisms underlying this convergence remain obscure. We used genomics and single cell transcriptomics to study these mechanisms in the visual system of the bobtail squid Euprymna berryi , an emerging cephalopod model. Analysis of 98,537 cellular transcriptomes from the squid visual and nervous system identified dozens of cell types that cannot be placed in simple correspondence with those of vertebrate or fly visual systems, as proposed by Ramón y Cajal and J.Z. Young. Instead, we find an unexpected diversity of neural types, dominated by dopamine, and previously uncharacterized glial cells. Surprisingly, we observe changes in cell populations and neurotransmitter usage during maturation and growth of the visual systems from hatchling to adult. Together these genomic and cellular findings shed new light on the parallel evolution of visual system complexity in cephalopods and vertebrates.
... Although no specific analysis was performed to identify full-length potentially active TEs and non-coding transcripts were discarded from the definition of the reference transcriptome, it was suggested that TEs are active in O. bimaculoides because a substantial fraction of RNAseq reads resulted in an overlap with TE fragments annotated in non-coding intergenic regions. Expansion of TEs has also been found in the genome of Octopus minor [29] and the other sequenced cephalopod species such as Euprymna scolopes [30] and the giant squid Architeuthis dux [31]. Performing a survey of the O. vulgaris genome, we have confirmed the expansion of TEs also in this species [32]. ...
Article
Full-text available
Background Transposable elements (TEs) widely contribute to the evolution of genomes allowing genomic innovations, generating germinal and somatic heterogeneity, and giving birth to long non-coding RNAs (lncRNAs). These features have been associated to the evolution, functioning, and complexity of the nervous system at such a level that somatic retrotransposition of long interspersed element (LINE) L1 has been proposed to be associated to human cognition. Among invertebrates, octopuses are fascinating animals whose nervous system reaches a high level of complexity achieving sophisticated cognitive abilities. The sequencing of the genome of the Octopus bimaculoides revealed a striking expansion of TEs which were proposed to have contributed to the evolution of its complex nervous system. We recently found a similar expansion also in the genome of Octopus vulgaris. However, a specific search for the existence and the transcription of full-length transpositionally competent TEs has not been performed in this genus. Results Here, we report the identification of LINE elements competent for retrotransposition in Octopus vulgaris and Octopus bimaculoides and show evidence suggesting that they might be transcribed and determine germline and somatic polymorphisms especially in the brain. Transcription and translation measured for one of these elements resulted in specific signals in neurons belonging to areas associated with behavioral plasticity. We also report the transcription of thousands of lncRNAs and the pervasive inclusion of TE fragments in the transcriptomes of both Octopus species, further testifying the crucial activity of TEs in the evolution of the octopus genomes. Conclusions The neural transcriptome of the octopus shows the transcription of thousands of putative lncRNAs and of a full-length LINE element belonging to the RTE class. We speculate that a convergent evolutionary process involving retrotransposons activity in the brain has been important for the evolution of sophisticated cognitive abilities in this genus.
... Amphioctopus fangsiao and O. minor are abundant sympatric octopus species in the coastal China Seas (Fang et al., 2018), and they share similar trophic levels in the ecosystem (Fang et al., 2021), co-occurring throughout the year. The lower proportion of O. minor than A. fangsiao in the cephalopod group is due to their preference to different habitat types: O. minor digs holes and hides the whole body within the mudflats to tolerate the harsh environment (Kim et al., 2018); while A. fangsiao mostly appear on sand and mud substrates, which means they are easier to obtain while trawling. Spring (Mar-May) was suggested to be the main spawning peak of the A. fangsiao population Pang et al., 2020). ...
Article
Climate change and intensive fishing have affected not only population abundance, but also species composition. Cephalopods have been increasing in abundance in the world ocean under climate change due to their flexible life-history traits, including the over-exploited China Seas. Despite the increasing importance of coastal cephalopods in the China Seas, there have been no reports of changes in either species composition, nor the ecological roles of species with different life-history traits. Thus, this study first presents the changes in species composition of coastal cephalopods throughout the China Seas as summarized from fishery-independent survey reports over the last six decades. This is followed by an investigation of species composition of cephalopods in Haizhou Bay in the Yellow Sea. The ecological roles of two currently targeted cephalopods, Amphioctopus fangsiao and Loliolus spp. (Loliolus beka and Loliolus japonicus), are evaluated using an ecosystem model. The species composition of coastal cephalopods in the China Seas has changed since the 1960s, from species of large size and high value to small-size, low-value species. Cephalopod species composition in Haizhou Bay shows great seasonality, which is probably due largely to the characteristics of their life cycle. The population abundance of A. fangsiao and Loliolus spp. appear to be affected by ambient water temperature, and population distribution of Loliolus spp. seems to correlate with water depth. Occupying the highest trophic level in this ecosystem, A. fangsiao potentially displays strong top-down control over other organisms. Loliolus spp. are keystone species showing higher keystoneness in the autumn, owing to a low abundance of fish species which normally prey on them. The species-specific life-history traits and ecological roles of cephalopods are therefore important factors to consider in order to manage them effectively.
Article
Full-text available
Cephalopods have been considered enigmatic animals that have attracted the attention of scientists from different areas of expertise. However, there are still many questions to elucidate the way of life of these invertebrates. The aim of this study is to construct a reference transcriptome in Octopus vulgaris early life stages to enrich existing databases and provide a new dataset that can be reused by other researchers in the field. For that, samples from different developmental stages were combined including embryos, newly-hatched paralarvae, and paralarvae of 10, 20 and 40 days post-hatching. Additionally, different dietary and rearing conditions and pathogenic infections were tested. At least three biological replicates were analysed per condition and submitted to RNA-seq analysis. All sequencing reads from experimental conditions were combined in a single dataset to generate a reference transcriptome assembly that was functionally annotated. The number of reads aligned to this reference was counted to estimate the transcript abundance in each sample. This dataset compiled a complete reference for future transcriptomic studies in O. vulgaris .
Article
Full-text available
Background Many wild species have suffered drastic population size declines over the past centuries, which have led to ‘genomic erosion’ processes characterized by reduced genetic diversity, increased inbreeding, and accumulation of harmful mutations. Yet, genomic erosion estimates of modern-day populations often lack concordance with dwindling population sizes and conservation status of threatened species. One way to directly quantify the genomic consequences of population declines is to compare genome-wide data from pre-decline museum samples and modern samples. However, doing so requires computational data processing and analysis tools specifically adapted to comparative analyses of degraded, ancient or historical, DNA data with modern DNA data as well as personnel trained to perform such analyses. Results Here, we present a highly flexible, scalable, and modular pipeline to compare patterns of genomic erosion using samples from disparate time periods. The GenErode pipeline uses state-of-the-art bioinformatics tools to simultaneously process whole-genome re-sequencing data from ancient/historical and modern samples, and to produce comparable estimates of several genomic erosion indices. No programming knowledge is required to run the pipeline and all bioinformatic steps are well-documented, making the pipeline accessible to users with different backgrounds. GenErode is written in Snakemake and Python3 and uses Conda and Singularity containers to achieve reproducibility on high-performance compute clusters. The source code is freely available on GitHub (https://github.com/NBISweden/GenErode). Conclusions GenErode is a user-friendly and reproducible pipeline that enables the standardization of genomic erosion indices from temporally sampled whole genome re-sequencing data.
Article
Homeobox genes play essential roles in the early development of many animals. Although the repertoire of most homeobox genes, including three amino acid loop extension (TALE)‐type homeobox genes, is conserved in animals, spiralian‐TALE (SPILE) genes are a notable exception. In this study, SPILE genes were extracted from the genomic data of 22 mollusc species and classified into four clades (−A/C, ‐B, ‐D, and ‐E) to determine which SPILE genes exhibit dynamic repertoire changes. While SPILE‐D and ‐E duplications were rarely observed, SPILE‐B duplication was observed in the bivalve lineage and SPILE‐A/C duplication was observed in multiple clades. Conversely, most or all SPILE genes were lost in cephalopods and in some gastropod lineages. SPILE gene expression patterns were also analyzed in multiple mollusc species using publicly available RNA‐seq data. The majority of SPILE genes examined, particularly those in the A/C‐ and B‐clades, were specifically expressed during early development, suggesting that most SPILE genes exert specific roles in early development. This comprehensive cataloging and characterization revealed a dynamic evolutionary history, including SPILE‐A/C and ‐B gene duplications and the loss of SPILE genes in several lineages. Furthermore, this study provides a useful resource for studying the molecular mechanism of spiralian early development and the evolution of young and lineage‐specific transcription factors.
Article
Full-text available
Motivation Deep learning has become a prevalent method in identifying genomic regulatory sequences such as promoters. In a number of recent papers, the performance of deep learning models has continually been reported as an improvement over alternatives for sequence-based promoter recognition. However, the performance improvements in these models do not account for the different datasets that models are evaluated on. The lack of a consensus dataset and procedure for benchmarking purposes has made the comparison of each model’s true performance difficult to assess. Results We present a framework called Supervised Promoter Recognition Framework (‘SUPR REF’) capable of streamlining the complete process of training, validating, testing, and comparing promoter recognition models in a systematic manner. SUPR REF includes the creation of biologically relevant benchmark datasets to be used in the evaluation process of deep learning promoter recognition models. We showcase this framework by comparing the models’ performances on alternative datasets, and properly evaluate previously published models on new benchmark datasets. Our results show that the reliability of deep learning ab initio promoter recognition models on eukaryotic genomic sequences is still not at a sufficient level, as overall performance is still low. These results originate from a subset of promoters, the well-known RNA Polymerase II core promoters. Furthermore, given the observational nature of these data, cross-validation results from small promoter datasets need to be interpreted with caution.
Article
Full-text available
The evolutionary origins of lingulid brachiopods and their calcium phosphate shells have been obscure. Here we decode the 425-Mb genome of Lingula anatina to gain insights into brachiopod evolution. Comprehensive phylogenomic analyses place Lingula close to molluscs, but distant from annelids. The Lingula gene number has increased to ~34,000 by extensive expansion of gene families. Although Lingula and vertebrates have superficially similar hard tissue components, our genomic, transcriptomic and proteomic analyses show that Lingula lacks genes involved in bone formation, indicating an independent origin of their phosphate biominerals. Several genes involved in Lingula shell formation are shared by molluscs. However, Lingula has independently undergone domain combinations to produce shell matrix collagens with EGF domains and carries lineage-specific shell matrix proteins. Gene family expansion, domain shuffling and co-option of genes appear to be the genomic background of Lingula’s unique biomineralization. This Lingula genome provides resources for further studies of lophotrochozoan evolution.
Article
Full-text available
Coleoid cephalopods (octopus, squid and cuttlefish) are active, resourceful predators with a rich behavioural repertoire. They have the largest nervous systems among the invertebrates and present other striking morphological innovations including camera-like eyes, prehensile arms, a highly derived early embryogenesis and a remarkably sophisticated adaptive colouration system. To investigate the molecular bases of cephalopod brain and body innovations, we sequenced the genome and multiple transcriptomes of the California two-spot octopus, Octopus bimaculoides. We found no evidence for hypothesized whole-genome duplications in the octopus lineage. The core developmental and neuronal gene repertoire of the octopus is broadly similar to that found across invertebrate bilaterians, except for massive expansions in two gene families previously thought to be uniquely enlarged in vertebrates: the protocadherins, which regulate neuronal development, and the C2H2 superfamily of zinc-finger transcription factors. Extensive messenger RNA editing generates transcript and protein diversity in genes involved in neural excitability, as previously described, as well as in genes participating in a broad range of other cellular functions. We identified hundreds of cephalopod-specific genes, many of which showed elevated expression levels in such specialized structures as the skin, the suckers and the nervous system. Finally, we found evidence for large-scale genomic rearrangements that are closely associated with transposable element expansions. Our analysis suggests that substantial expansion of a handful of gene families, along with extensive remodelling of genome linkage and repetitive content, played a critical role in the evolution of cephalopod morphological innovations, including their large and complex nervous systems.
Article
Full-text available
Genes in prokaryotic genomes are often arranged into clusters and co-transcribed into polycistronic RNAs. Isolated examples of polycistronic RNAs were also reported in some higher eukaryotes but their presence was generally considered rare. Here we developed a long-read sequencing strategy to identify polycistronic transcripts in several mushroom forming fungal species including Plicaturopsis crispa, Phanerochaete chrysosporium, Trametes versicolor, and Gloeophyllum trabeum. We found genome-wide prevalence of polycistronic transcription in these Agaricomycetes, involving up to 8% of the transcribed genes. Unlike polycistronic mRNAs in prokaryotes, these co-transcribed genes are also independently transcribed. We show that polycistronic transcription may interfere with expression of the downstream tandem gene. Further comparative genomic analysis indicates that polycistronic transcription is conserved among a wide range of mushroom forming fungi. In summary, our study revealed, for the first time, the genome prevalence of polycistronic transcription in a phylogenetic range of higher fungi. Furthermore, we systematically show that our long-read sequencing approach and combined bioinformatics pipeline is a generic powerful tool for precise characterization of complex transcriptomes that enables identification of mRNA isoforms not recovered via short-read assembly.
Article
Full-text available
Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. I present some of the most notable new features and extensions of RAxML, such as, a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX, and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date, 50 page user manual covering all new RAxML options is available. The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Alexandros.Stamatakis@h-its.org.
Article
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
Article
While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.
Article
Genomics has revolutionised biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. Software implemented in Python and datasets available for download from http://busco.ezlab.org. Evgeny.Zdobnov@unige.ch. © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.