Content uploaded by Dilip Lakshman
Author content
All content in this area was uploaded by Dilip Lakshman on May 20, 2021
Content may be subject to copyright.
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Pangenome analysis of the soil-borne fungal phytopathogen Rhizoctonia solani
1
and development of a comprehensive web resource: RsolaniDB
2
Kaushik, A.1, Roberts, D.P.2, Ramaprasad A.1, Mfarrej, S.1, Mridul Nair1, Lakshman, D.K.*2 and
3
Pain, A.*1,3
4
5
1Biological & Environmental Science & Engineering Division, KAUST, Thuwal 23955-6900,
6
Saudi Arabia
7
8
2Sustainable Agricultural Systems Laboratory, USDA-ARS, Beltsville, MD 20705, USA
9
3Research Center for Zoonosis Control, Global Institution for Collaborative Research and
10
Education (GI-CoRE); Hokkaido University, Sapporo, 001-0020 Japan
11
12
13
* Corresponding authors
14
15
16
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Abstract
17
Rhizoctonia solani is a collective group of genetically and pathologically diverse
18
basidiomycetous fungus that damages economically important crops. Its isolates are classified
19
into 13 Anastomosis Groups (AGs) and subgroups having distinctive morphology and host
20
range. The genetic factors driving the unique features of R. solani pathology are not well
21
characterized due to the limited availability of its annotated genomes. Therefore, we performed
22
genome sequencing, assembly, annotation and functional analysis of 12 R. solani isolates
23
covering 7 AGs and selected subgroups (AG1-IA, AG1-IB, AG1-IC, AG2-2IIIB, AG3-PT
24
(isolates Rhs 1AP and the hypovirulent Rhs1A1), AG3-TB, AG4-HG-I (isolates Rs23 and R118-
25
11), AG5, AG6, and AG8), in which six genomes are reported for the first time, wherein we
26
discovered unique and shared secretomes, CAZymes, and effectors across the AGs. Using a
27
pangenome comparative analysis of 12 R. solani isolates and 15 other basidiomycetes, we also
28
elucidated the molecular factors potentially involved in determining the AG-specific host
29
preference, and the attributes distinguishing them from other Basidiomycetes. Finally, we present
30
the largest repertoire of R. solani genomes and their annotated components as a comprehensive
31
database, viz. RsolaniDB, with tools for large-scale data mining, functional enrichment and
32
sequence analysis not available with other state-of-the-art platforms, to assist mycologists in
33
formulating new hypotheses.
34
35
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Introduction
36
Rhizoctonia solani Kühn (teleomorph: Thanatephorus cucumeris [Frank] Donk) is considered as
37
one of the most destructive soil borne plant pathogens causing various diseases including pre-
38
and post-emergence damping-off of seedlings, crown and root rots, black scurf of potato, take-all
39
of wheat, sheath blight of rice and maize, brown patch of turf, and postharvest fruit rots (1, 2).
40
This necrotrophic fungus infects a wide range of economically important plant species,
41
belonging to more than 32 plant families and 188 genera, and is responsible for 15% to 50%
42
agricultural damages annually (3). Broadly, it is classified among 13 Anastomosis Groups (AGs)
43
with distinctive morphology, physiology, pathogenicity host range, and highly divergent genetic
44
composition (4). Most R. solani AGs are further divided into subgroups, also called IntraSpecific
45
Groups (ISGs), which differ in pathogenicity, virulence, ability to form sclerotia, growth rate,
46
and host range preference (5). Although field isolates of Rhizoctonia infected plants are usually
47
found to be infested with one or more AGs, each AG subgroup can still have its own host
48
preference. For instance, Arabidopsis thaliana, was found to be susceptible to AG2-1 sub-group
49
isolates but resistant to AG8 isolates (6), which suggests that genetic divergence is the inherent
50
characteristic of Rhizoctonia species.
51
Over the last two decades, our understanding of the genetic divergence among different
52
R. solani AGs has improved to the point that it is now evident that all AGs and their sub-groups
53
are genetically isolated, non-interbreeding populations (7). The rapid and relatively low-cost of
54
generation of genomic sequences and other ‘omics’ datasets has played a significant role in
55
furthering our understanding of the host-pathogen interactions and ecology of Rhizoctonia
56
species. (8–12). The analysis of these genomic sequences and functional components revealed
57
several novel or previously unrecognized classes of R. solani genes among different AGs that are
58
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
involved in pathogenesis in a host-specific manner, e.g. effector proteins and carbohydrate-active
59
enzymes (CAZymes) (13). Additionally, analysis of differentially expressed genes in different
60
isolates has enabled researchers to predict the adaptive behavior of this fungus in different hosts
61
and the associated virulence (14, 15). However, the majority of this information has come from
62
the analysis of isolates belonging to only a small number of AGs for which complete genome
63
and/or transcriptome sequences are available. In fact, until now, draft genome assemblies
64
belonging to only 4 of the 13 AGs have been reported viz. AG1-IA (16), AG1-IB (17), AG2-
65
2IIIB (13), AG3-Rhs1AP (18), AG3-PT isolate Ben-3 (19) and AG8 (20). This limited
66
availability of genome sequences and the predicted proteomes across the 13 different AGs and
67
their subgroups is one of the important barriers hindering the understanding of functional
68
complexity and temporal dynamics in R. solani AGs and their subgroups.
69
In this study, we report whole-genome sequencing, assembly and annotation of 12
70
Rhizoctonia isolates from 7 AGs; of which genome sequences of three AGs (AG4, AG5, and
71
AG6), two subgroups (AG1-IC and AG3-TB {or AG3-T5}) and a hypovirulent isolate (AG3-
72
1A1) of the subgroup AG3-PT are being reported for the first time. The draft genome of the
73
AG3-PT isolate 1AP (alternatively named as Rhs1AP) was previously reported (Cubeta et. al.,
74
2014) (18), but was re-sequenced for comparative purposes, as AG3-1AP. Furthermore, to
75
understand genetic diversity among different R. solani isolates, we performed inter-proteome
76
comparative analyses, including ortholog analysis at the pangenome level and protein domains
77
profiling for secreted components, virulent proteins, and CAZymes in all 12 R. solani isolates.
78
To make these high-quality draft R. solani genomes and features readily accessible to a broad
79
audience of researchers, we built a comprehensive and dedicated web resource, viz. RsolaniDB,
80
for hosting and analyzing the available genomic information predicted at the transcript-, and
81
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
protein-level in different R. solani AGs. The presented web-resource includes detailed
82
information on each R. solani isolate, such as the genome properties, predicted gene, transcript
83
and protein sequences, predicted gene function, and protein orthologues among other AG sub-
84
groups, along with tools for Gene Ontology (GO) and pathway enrichment analysis, sequence
85
analysis, and visualization of gene models.
86
Materials and Methods
87
Isolation of genomic DNAs for sequencing
88
Details regarding R. solani isolates used for sequence analyses are presented in Table S1 and S2.
89
Fungal cultures were purified by the hyphal tip excision method (21) and maintained by sub-
90
culturing on potato dextrose agar (PDA, Sigma Aldrich catalog # P2182, St. Louis, MO, USA).
91
The PDA was amended with kanamycin (25 µg/ml) and streptomycin (50 µg/ml) to inhibit
92
bacterial growth. Isolates were grown in Potato Dextrose Broth (PDB, Sigma Aldrich catalog #
93
P6685) broth at 100 rpm and 25 C for 4 to 6 days, mycelia collected by filtration through 2 layers
94
of sterile cheese cloth, washed 2 X with sterile distilled water, gently squeezed and placed on 4
95
layers of paper towel to remove surface water, and then snap-frozen in liquid nitrogen and stored
96
at -80 C till use. Genomic DNA was extracted from mycelia using both the CTAB method (22)
97
and a protocol recommended by the manufacturer (User-Developed Protocol: Isolation of
98
genomic DNA from plants and filamentous fungi using the QIAGEN® Genomic-tip, Qiagen
99
Inc.). RNA was extracted from fungal isolates and from tobacco detached leaves infected with
100
corresponding fungal isolates, using the Qiagen RNeasy Plant Mini Kit (Qiagen Inc.
101
Germantown, MD, USA). Extracted genomic DNA and RNA was quantified with a Qubit Flex
102
Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). AG and subgroup identity of the
103
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
fungal isolates was verified by ITS-PCR, sequencing and homology analysis with nucleotide
104
sequences available in the NCBI database (23).
105
RNA extraction
106
Nicotina tabacum seedlings were raised to four-leaf stage on potting mix (Pro-mix, Premier
107
Horticulture, USA) in the greenhouse at ambient temperature (22º - 24º C) and with four hours
108
supplemental light with a mercury lamp. Two leaves were excised from each seedling and placed
109
on a tray on two piece of wet paper towels. For inoculation, seven to eight agar plugs from the
110
margin of fresh R. solani growth on 1/4th concentration of PDA (potato dextrose agar) were
111
placed on the adaxial surface of each leaf. For control, only seven to eight agar plugs from 1/4th
112
PDA were placed. Each tray was closed with a lid and incubated on lab bench at ambient
113
temperature and light.
114
After 5 days, yellow to necrotic symptoms were noticeable on R. solani treated leaves but no
115
symptoms appeared on control leaves surrounding the plugs. The control and infected patches
116
were excised with a sterile scalpel, snap frozen in liquid nitrogen and processed for RNA
117
extraction with RNeasy Plus Mini Kit in RLC buffer (Qiagen Sciences Inc., Germantown, MD,
118
USA). The purified RNA was treated with DNase at 37º C for 30 min, extracted with phenol and
119
Phenol: chloroform, precipitated with ethanol, and dissolved in RNase-free water.
120
Construction of genomic and RNA libraries and sequencing
121
For making genomic libraries, an input of 500ng of DNA from each sample was sheared on
122
Covaris (Covaries E series) and paired-end libraries were prepared for sequencing using
123
Illumina's HiSeq 2000 platform. From end repair until adapter ligation and purification steps of
124
the paired-end libraries were prepared using the protocol "Illumina library prep" on the IP-Star
125
automated platform from Diagenonde (Diagenode IP Star) as per the manufacturer's protocol.
126
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Post ligation, manual protocols were used for gel size selection and PCR amplification using the
127
standard Illumina PCR Cycle (Kapa high-fidelity master mix). The prepared libraries were
128
analyzed on bioanalyzer and quantified using Qubit (Thermo Fisher). The normalized libraries
129
were pooled for sequencing (insert size of 500bp) and submitted for HiSeq 2000 sequencing at
130
Bioscience Core Laboratory of King Abdullah University of Science and Technology.
131
Strand-specific mRNA sequencing was performed from total RNA using TruSeq Stranded
132
mRNA Sample Prep Kit LT (Illumina) according to manufacturer's instructions. Briefly, polyA+
133
mRNA was purified from total RNA using oligo-dT dynabead selection. First strand cDNA was
134
synthesised using randomly primed oligos followed by second strand synthesis where dUTPs
135
were incorporated to achieve strand-specificity. The cDNA was adapter-ligated and the libraries
136
amplified by PCR. Libraries were sequenced in Illumina Hiseq2000 with paired-end 100bp read
137
chemistry.
138
De novo assembly, genome annotation and bioinformatic analysis
139
Data preprocessing. Adapter sequences in genomic reads in FASTQ format were trimmed using
140
the trimmomatic tool (version 0.35) (24), followed by trimming low-quality bases at read ends.
141
Read quality was evaluated using the fastqc tool (version 0.11.8) (25). Reads with length < 20 bp
142
and average quality score < 30 were also removed. For genome heterogeneity analysis, k-mer
143
distribution analysis on resulting DNAseq reads was performed using jellyfish (version 2.2.10)
144
(26), which estimated best k-mer length for each genome. Histogram distributions of different k-
145
mers for the best k-mer length was plotted using the -histo module of the jellyfish program. In
146
addition, the available raw RNAseq paired-end reads (Table S2) were quality trimmed and
147
preprocessed using with the same approach used for DNAseq reads. The quality trimmed reads
148
were then subjected to denovo assembly using Trinity which predicted transcript sequences (27).
149
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Genome assembly. Quality trimmed reads were subjected to denovo genome assembly using
150
SPAdes (version 3.7.0) in which a defined range of k-mer lengths (21,33,55,65,77,101 and 111)
151
was used for contig formation (28). Quast (version 4.5) was used for quality evaluation of
152
predicted contigs (29). Scaffolds were subsequently predicted from contigs using SSPACE
153
(version3.0) (30) and gaps in assembled scaffolds filled using five consecutive runs of GapCloser
154
(version 1.12) (31). For samples with RNAseq dataset available, genome scaffolding was further
155
improved using the Rascaf program (32). Genome quality was evaluated with BUSCO (version
156
3.0.1) (33) and scaffolds subjected to ITSx (version 1.1) (34) for ITS sequence prediction.
157
Thereafter, phylogenetic tree was constructed with megax software (35) using the neighborhood
158
joining method (10000 bootstraps), in which ITS2 sequences were aligned using ClustalW (36).
159
The resulting tree was saved in the newick format and visualized together using Phylogeny.IO
160
(37) and ETE toolkit (38). Redundans python script was then used to predict the homozygous
161
genome by reducing the unwanted redundancy to improve draft genome quality (39). Resulting
162
scaffolds were aligned with mitochondrial genomes of R. solani and other Basidiomycota using
163
blastn program (version 2.6.0; e-value
≤
1e-5) (40) and mapped mitochondrial contigs were
164
removed to retain only the nuclear genome for subsequent annotation.
165
Genome annotation. The draft genome was annotated using the MAKER (version 2.31.8)
166
pipeline(41), which predicted intron/exon boundaries, transcript and protein sequences. For the
167
annotation, repeat regions were masked using RepeatMasker (version 4.0.5; model_org=fungi)
168
(42). Protein homology evidence was taken from UniProt protein sequences (Reviewed; family:
169
Basidiomycota) (43). For EST evidences, RNAseq reads were assembled into transcripts using
170
Trinity denovo assembler (version 2.0.6) (27). For genomic datasets without corresponding
171
RNAseq datasets available, the EST sequences of alternate organisms were used from previously
172
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
published R. solani genome annotations viz. AG1-IA (16), AG1-IB (17), AG2-2IIIB (13), AG3-
173
Rhs1AP (18), AG3-PT isolate Ben-3 (19) and AG8 (20). The functional domains, PANTHER
174
pathways (44) and Gene Ontology (GO) terms (45) in the predicted protein sequences were
175
assigned using InterProScan (version 5.45-80.0) standalone program (46). The functional
176
domains assigned to each protein included the information from ProSiteProfiles (47), CDD (48),
177
Pfam (49) and TIGRFAMs (50), resulting in the annotated genome in GFF3 format using
178
iprscan2gff3 and ipr_update_gff programs (46).
179
The fungal AROM protein sequences were identified by mapping the R. solani proteome on
180
pentafunctional AROM polypeptide sequences from UniProt (organism Fungi) using blastp (e-
181
value
≤
0.001) (40). The resulting candidate AROM sequences in each R. solani proteome were
182
analyzed using HMMER webserver (51). We also identified the predicted secreted proteins in
183
each of the R. solani proteomes using signalp (version 5.0) (52). For identification of proteins
184
with a transmembrane domain phobius (version 1.01) (53) was used. We used targetp (version
185
1.1) to predict proteins with mitochondrial signal peptides (54). However, since we already
186
removed mitochondrial contigs from assembled genomes, we did not observe any proteins with a
187
mitochondrial signal peptide. Effector proteins in each R. solani secretome were predicted using
188
effectorP webserver (version 2.0) (55). The Carbohydrate Active enZyme (CAZyme) in R. solani
189
proteomes were predicted using dbCAN2 webserver, in which only the proteins predicted by at
190
least two prediction methods were considered (56). The CAZyme family predicted by HMMER
191
was used for the selected proteins.
192
Orthology. Orthologous proteins across all proteomes were identified with orthoMCL clustering
193
using the Synima program (57, 58), which identified core, unique and auxiliary regions in each
194
R. solani proteome. This program was also used for predicting genome synteny using inter-
195
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
proteome sequence similarity. ShinyCircos was used for circular visualization of synteny plots
196
(59, 60).
197
RsolaniDB database development
198
The RsolaniDB (RDB) database was built to host R. solani reference genomes, transcript and
199
protein sequences in FASTA format, along with genome annotations included in GFF3 format.
200
For each genome, the information in the database was structured as entries, in which each entry
201
included a list of details about a given transcript and protein, i.e., intron-exon boundaries;
202
predicted functions; associated pathways and GO terms; predicted sequences; orthologs and
203
functional protein sequence domains predicted from InterPro, PrositeProfile and Pfam. The
204
identifier format for each entry (i.e., RDB ID) start with ‘RS_’ and AG subgroup name followed
205
by a unique number. We also included five previously published R. solani annotated genome
206
sequences (i.e., AG1-1A, AG1-1B, AG2-2IIIB, AG3-PT and AG8) with their gene identifiers
207
converted into the RDB ID format. The database was written using DHTML and CGI-BIN Perl
208
and MySQL language, to allow users perform list of tasks, including text-based search for the
209
entire database; or in AG-specific manner. We also included a list of tools to assist users in
210
performing number of down-stream analysis, including RDB ID to protein/transcript sequence
211
conversion; FASTA sequence-based BLAST search on entire database or AG-specific manner;
212
tool to retrieve orthologs for a given set of RDB IDs along with tools for functional enrichment
213
analysis. The GO-based functional enrichment tool for gene set analysis of given RDB IDs was
214
build using topGO R package (61). Whereas the pathway-based gene set analysis was developed
215
to predict significantly enriched PANTHER pathway IDs for a given set of RDB IDs.
216
Results
217
Genome-wide comparative analysis of R. solani assemblies and its annotation
218
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
We performed the high-depth sequencing, denovo genome assembly and annotation of 12 R.
219
solani isolates. For qualitative evaluation of these assemblies, we used genome sequences of a
220
basidiomycetous mycorrhizal fungus Tulasnella calospora (Joint Genome Institute fungal
221
genome portal MycoCosm (http://genome.jgi.doe.gov/Tulca1/Tulca1.home.htm) and R. solani
222
AG3-PT as negative and positive controls, respectively. Overall, the draft genome assemblies of
223
the R. solani isolates shows remarkable differences in the genome size, ranging anywhere from
224
the smaller AG1-IC (~33 Mbp) to the larger AG3-1A1 (~71 Mbp) isolate genomes (Table S3).
225
The number of contigs generated are also highly variable ranging between 678-11,793, in which
226
the newly reported assemblies of AG1-IC and AG3-T5 has highest N50 lengths of 1,00,597 bp
227
and 1,96,000 bp respectively (Table S3). The heterogeneity in genomic reads was predicted by
228
analyzing the distribution of different k-mers in R. solani genomic sequencing reads. The
229
analysis reveals a shoulder peak along with the major peak in k-mer frequencies for AG2-2IIIB,
230
AG3-1A1, AG3-1AP and AG8, indicating the possible heterogeneity of these genomic reads of
231
these isolates (Figure S1). The G+C content ranged from 47.47% to 49.07%, with a mean of
232
48.43% (Table S3). The quality of these draft genomes was evaluated using BUSCO with scores
233
ranging between ~88-96% (Table S3), indicating the completeness of essential fungal genes in
234
the predicted assemblies. In order to evaluate the reliability of the genome assemblies, we further
235
compared our draft genomes with previously published assemblies of R. solani isolates, i.e.,
236
AG1-IA, AG1-IB, AG2-2IIIB, and AG8 (Figure S2). The mummer plot (62) comparison shows
237
the overall co-linearity and high similarity among similar assemblies, wherein AG8 assemblies
238
are least co-linear, possibly due to the heterokaryotic nature of the AG8 genome (20, 63). Among
239
the presented draft genome sequences, a large number of syntenic relationships (Figure 1A) are
240
also identified (length > 40,000bp), wherein all the given isolates share at least four highly
241
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
similar syntenic region, except T. calospora (outgroup), which does not share any syntenic
242
region with R. solani isolated for the given threshold of > 40,000 bp (Figure 1B). Similarly, our
243
analysis shows that AG5, AG2-2IIIB and AG3-1A1 shares comparatively lower syntenic
244
regions, whereas AG3-PT (positive control) shares highest number of syntenic regions with other
245
R. solani isolates. In fact, we observed that most of the closely related AGs share large number
246
of syntenic relationships, e.g., high similarity among AG3 sub-groups. Overall, the analysis
247
exhibits the first line of evidence that indicates widespread collinearity and regions of large
248
similarity across genetically distinct isolates, with T. calospora as an outlier.
249
Subsequently, we performed the ITS2-based phylogeny to compare the ITS2 sequences of the 12
250
newly sequenced R. solani isolates with that of the known R. solani tester strains (as positive
251
controls) and T. calospora as an outgroup (Figure 1C), wherein for AG3-PT, we were not able
252
not predict the ITS sequences. The observed phylogenetic clusters of AGs reflect strong
253
similarity in ITS2 sequences of assembled genomes with respect to that of tester strains of R.
254
solani. For instance, the AG1-IA cluster includes four strains, all belonging to same AG, i.e.,
255
AG1-IA. Similarly, ITS2 sequences of different AG3 and AG4 subgroups are clustered within
256
their respective clade, whereas the outgroup T. calospora shows distinct architecture, providing
257
strong evidence in favor of the correct methods used for genome assemblies. Intriguingly, the
258
ITS2 sequences of AG8 subgroup shows remarkable differences, in which sequence of tester
259
strain (i.e., AG-8-A68), previously published genome sequence (i.e., AG8-01) and from the
260
reported genome of this study (i.e., AG8-Rh89/T) are clustered across different clade of the
261
phylogenetic tree.
262
One of the important proteins known to be strongly associated with fungal evolution and
263
virulence is the penta-functional AROM sequence, with characteristic five domains (64). Here,
264
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
we characterized the AROM protein sequences in the predicted proteome of all given assemblies
265
(Figure S3). We observed at least one penta-functional AROM sequence in each of the
266
assemblies, in which sequence(s) are present in either complete or partial form. Interestingly,
267
AG3-1A1 has two complete penta-functional AROM protein sequences, a characteristic not
268
observed in any other AG. In AG5, two partial AROM sequences are observed that together
269
completed all five domains observed in the complete penta-functional AROM sequence (65).
270
Although, all assemblies are found to have contiguous AROM protein sequences, the partial
271
AROM sequences in AG5 may represent a fragmented region of the genome assembly and,
272
therefore, warrants further experimental investigations and genome assembly improvements.
273
Genome-wide orthologous protein clustering and functional analysis
274
Intron/exon and transcript boundaries were identified using the maker pipeline (see materials and
275
methods), which predicted 7,394 to 10,958 protein coding transcripts per genome (excluding T.
276
calospora, Figure S4) in which AG3-1A1 genome has the highest number of transcripts. Next,
277
using OrthoMCL, the translated protein sequences in all genomes were clustered into the
278
orthologous groups, where each cluster of proteins represented a set of similar sequences likely
279
to represent a protein family. The similarities among the given isolates were enumerated by
280
measuring proteins shared by different proteomes in the same orthoMCL clusters (Figure 2A).
281
As expected, this analysis clearly outgroup T. calospora, indicating that it has a different protein
282
family composition than R. solani isolates. Although, AG1 and AG4 subgroups, AG3-1A1 and
283
AG3-1AP shows expected similarities and share similar clustering profiles, AG3-PT and AG5
284
shows a divergent profile of protein families with respect to the other AGs under study.
285
Nevertheless, a large set of orthoMCL clusters share proteins from all/most of the R. solani
286
isolates which further indicates inherent similarities as well as unique attributes across these
287
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
pathologically diverse groups of fungi. For instance, more than 1,400 orthoMCL clusters are
288
composed of proteins belonging to only two AGs, whereas >1,500 clusters are composed of
289
proteins from all 13 R. solani isolates and T. calospora (Figure 2B). It is expected that these
290
conserved clusters are composed of proteins from core gene families with essential functions,
291
whereas other clusters may host proteins with unique AG-specific roles (Figure 2C). The
292
analysis reveals that AG1-1C, AG2-2IIIB, AG5, AG6-10EEA and AG8 are composed of large
293
number of unique proteins (>1,000 proteins), whereas AG3-1A1 has the highest number of core
294
and auxiliary proteins. The pair-wise comparison of the number of clusters shared by any two
295
AGs highlighted that AG3-1AP shares the highest number of orthoMCL clusters with AG3-1A1,
296
a sector derived hypo-virulent isolate of AG3-1AP (Figure 2D) (66). In fact, AG3-1A1 proteins
297
shares large number of clusters with few other AG subgroups too, including AG1-1C, AG6-
298
10EEA, AG2-2IIIB and AG4-R118.
299
To investigate the functional composition of proteins using orthologous groups
300
information, we performed InterPro domain family analysis of proteomes from each AG (Figure
301
S5). Interestingly, the core proteome of most AGs is composed of ~2,000 InterPro domain
302
families, whereas the unique proteome per AG ranged between 101 (for AG3-PT) to 628 (for
303
AG3-1A1). Wherein, the most common protein family that made the unique proteome of R.
304
solani subgroups is “Cytochrome P450”, which is essential for fungal adaptations to diverse
305
ecological niches (67) (Figure 3). Similarly, proteins with WD40 repeats are found to be the
306
most common set of the unique proteome in most AGs. In addition, few of the AG subgroups are
307
found to be enriched with a protein family that is significantly associated with its unique
308
proteome only, possibly being involved in the survival of that AG in respective hosts. For
309
instance, the AG1-IB unique proteome is enriched with “NADH: Flavin Oxidoreductase/ NADH
310
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
oxidase (N-terminal)”, similarly AG3-1A1 is enriched with “ABC transporter-like” and
311
“Aminoacyl-tRNA synthetase (class-II)” InterPro domains. Whereas AG3-PT is found to be
312
uniquely enriched with “Ribosomal protein S4/S9” and AG3-1A1 is uniquely enriched with
313
"Multicopper oxidase (Type 2)" and "Patatin like phospholipase domain".
314
The predicted secretome and effector proteins
315
To facilitate host colonization, plant pathogens secrete proteins to host compartments that
316
modulate morphological changes in the host system and establish fungal infection (68–70).
317
Therefore, we identified the comprehensive set of secreted proteins from all R. solani and T.
318
calospora genomes. Figure 4A shows the number of secreted proteins identified in each of the
319
given genomes, wherein AG1-IC, AG3-1A1, AG6-10EEA and AG2-2IIIB contains a large
320
number of proteins in the predicted secretome (Supplementary file, sheet 1-2). However,
321
AG1,1C, AG2-2IIIB and AG8 contains a comparatively larger number of isolate-specific
322
secreted proteins (i.e., secreted proteins in the unique proteome), while AG3-1AP, AG3-1A1 and
323
AG3-PT contains comparatively lower number of secreted proteins. Interestingly, InterPro
324
domain analysis of the secreted proteins suggests that the most enriched protein domain in the
325
predicted secretome is “cellulose binding domain – fungal” (Figure 4B) which is essential for
326
the fungal patho-system for the degradation of cellulose and xylans (71). In addition, the
327
secretomes are also enriched with proteins containing “Glycoside Hydrolase Family 61”,
328
“Pectate Lyase” and “multi-copper oxidase family” domains. Most of these protein components
329
include enzymes essential for degradation of the plant host cell wall and breaking down the first
330
line of host defense. We observed that certain families of protein domains are found to be
331
enriched within a few AGs only. For instance, “aspartic peptidase family A1” domain containing
332
proteins, involved in diverse fungal metabolic processes, are mainly enriched in AG2-2IIIB
333
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
isolate, similarly “lysine-specific metallo-endopeptidase” are enriched in AG3-1AP, AG5 and
334
AG8. The AG4-R118 secretome is significantly enriched with proteins belonging to “Glycoside
335
Hydrolase Family 28” and “Peptidase S8 propeptide-proteinase inhibitor I9” domains, whereas
336
AG4-RS23 secretome is composed of “NodB homology” and “alpha/beta hydrolase fold-1”
337
domains. Taken together, the analysis indicates that each of the given AG secretome is
338
significantly enriched with a unique set of protein families that possibly allows the fungal patho-
339
system to perform a variety of biological functions in different host systems and patho-systems.
340
Next, to identify the unique and conserved attributes associated with R solani, we
341
performed a comparative analysis of the secretome with 14 other fungi (excluding T. calospora),
342
which represented the major taxonomic, pathogenic, ecological, and commercially important
343
(edible fungi) groups within the Division Basidiomycota. (Table S4). We hypothesized that small
344
set of functionally important proteins, e.g., secreted proteins, in R. solani may have the unique
345
attributes not observed within the other basidiomycetes. Therefore, we predicted the secretome
346
and analyzed the InterPro domains in the secreted proteins of 14 different basidiomycetes and
347
compared with the secretome of R. solani AGs. We observed that the number of secreted
348
proteins predicted in R solani AGs are not significantly different to the number of secreted
349
proteins in other Basidiomycetes (p=0.0629; Figure 4C). However, the InterPro domains
350
enriched in the secretome of R. solani AGs and other basidiomycetes are found to be
351
significantly different. We observed that only a limited number of InterPro terms are shared
352
between R. solani AGs and other basidiomycetes, and R. solani AGs are functionally closer to
353
each other than other basidiomycetes (Figure 4D), which suggests that R. solani secretome have
354
a unique domain profile, which are primarily different from other Basidiomycetes. Overall, we
355
found 565 InterPro terms in the secretome of R. solani, whereas in other basidiomycetes
356
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
(including T. calospora), secretomes are enriched with 620 terms in which 283 InterPro terms
357
are common across both the group of species. We observed 282 InterPro terms (50%) uniquely
358
associated with R. solani, not observed in the secretome of other basidiomycetes, whereas 337
359
InterPro terms are only observed in the secretome of other basidiomycetes. The analysis of R.
360
solani specific 282 InterPro terms includes several protein domains belonging to diverse
361
functional significance, e.g., “Aspartic peptidase A1 family”, “Cysteine rich secretory protein
362
related” and “Polyscaccaride lyase 8” domains. Among the domains commonly enriched across
363
both R. solani isolates and other basidiomycetes, we calculated the fold change of difference of
364
domain occurrence in their secretome and enumerated the proteins domains with significant
365
differences across R. solani and other basidiomycetes (Supplementary file, sheet 1-2). Our
366
analysis suggests, high differences in domains frequency wherein, protein with domains like
367
“Pectate lyase”, “Serine amino-peptidase” and “Lysine-specific metallo-endopeptidase” are
368
significantly enriched in R. solani secretome. Similarly, proteins with “Hydrophobin” and “Zinc
369
finger ring-type” domains are majorly enriched in other basidiomycetes. We believe that such
370
large number of unique functional domains in the secreted proteome of R. solani may be
371
functionally relevant that allows these fungi to survive in diverse array of conditions, and thus
372
should further be investigated experimentally for understanding their role in survival.
373
Although these plant pathogenic fungi secrete a large number of proteins, only a small
374
proportion of these proteins have been implicated to be effectively associated with fungal-plant
375
interactions, i.e. effector proteins (68–70). Effector proteins can strongly inhibit the activity of
376
host cellular proteases and allow pathogenic fungi to evade host defense mechanisms. Fungal
377
effector proteins are not known for having a conserved family of domains, these proteins
378
typically are of small length (300-400 amino acids) and higher cysteine content (55, 69, 72). Our
379
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
analysis reveals 75-134 effector proteins predicted in R. solani genomes, whereas T. calospora
380
contains 136 effector proteins (Figure 5A; supplementary file S1-S7).
381
Isolates from AG1-IC contains the highest number of effector proteins (n=134), whereas
382
isolate from AG3-PT contains a small number of effectors (n=75). Nevertheless, all the isolates
383
are composed of approximately 100 effector proteins which contain a similar proportion of
384
cysteine residues in the predicted effector proteins (Figure 5B). Next, we investigated the
385
topmost enriched domains among all R. solani effector proteins in which “Pectate lyase” is found
386
to be the most enriched effector protein, followed by “thaumatin family” of domain containing
387
proteins (Figure 5C).
388
In comparison, the analysis of effector proteins in other basidiomycetes suggests that all
389
other basidiomycetes are enriched with similar number of effector proteins (p-value = 0.14;
390
Figure 5C-D; supplementary file; sheet 3-4). Wherein, the effector proteins in R solani AGs
391
includes the proteins belonging to 237 InterPro terms, whereas the effector proteins of other
392
basidiomycetes (including T. calospora) include proteins enriched with 119 terms. We found 173
393
terms (72%) are uniquely associated with R solani AGs, in which most abundant terms includes
394
IPR001283 (Cystine rich secretory protein related). These unique effectors may play the
395
deciding roles on host recognition and in virulence of necrotrophic Rhizoctonia pathogens (73,
396
74) . Moreover, we also observed 55 InterPro terms not observed with R solani effector proteins,
397
including Zinc Finger and LysM domain. We also found 64 InterPro terms commonly enriched
398
by both the groups of effector proteins, in which “Pectate lyase” and “Glycoside hydrolase
399
family 28” are mainly associated with R. solani AG subgroups effector proteins, whereas
400
“Hydrophobin” is mainly associated with other basidiomycetes. The complete list of secretome,
401
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
effector proteins, the InterPro domains and associated information are available in supplementary
402
file.
403
Carbohydrate-active enzymes
404
CAZymes are essential for degradation of host plant cells and fungal colonization in the host,
405
and are, thus important for fungal bioactivity (75, 76). Using CAZy (Carbohydrate Active
406
Enzyme database) (77), which contains the classified information of enzymes involved in
407
complex carbohydrate metabolism, we annotated and compared the distribution of CAZymes in
408
all R. solani isolates. Overall, R. solani isolates are composed of 383-595 high confidence
409
CAZymes, with AG3-1A1 having the largest number of CAZymes (Figure 6A). These predicted
410
CAZymes in R solani AGs are mainly distributed across 177 CAZyme families that can be
411
broadly classified into six major classes of enzymes, i.e., Glycoside Hydrolase (GH),
412
Polysaccharide Lyase (PL), Carbohydrate Esterase (CE), Carbohydrate-binding modules (CBM)
413
and redox enzymes with Auxiliary Activities (AA). Our analysis reveals that GH forms the
414
major class of CAZymes in all fungal species, including T. calospora (Figure S6 and S7), which
415
hydrolyzes the glycosidic bonds between carbohydrate and non-carbohydrate moieties or
416
between two or more carbohydrate moieties (78). Whereas CBM forms the least abundant class
417
of enzymes enriched in the proteomes of the given isolates. Despite the differences, we observed
418
similar distribution of enzyme count in each class of CAZyme across all the given isolates.
419
We found that among the predicted 177 families, only 34 families are abundant (with total
420
enzyme count > 50 proteins; Figure 6B) across all the given isolates, i.e., Rhizoctonia species
421
and T. calospora. These 36 families have a distinct abundance profile in each AG, for instance,
422
protein from GH7 family is highly abundant in T. calospora as compared to the R solani isolates.
423
Similarly, proteins belonging to PL1_4 are not observed in AG4-R118 and T. calospora. We
424
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
have divided these 34 families into three different groups, with respect to their abundance profile
425
in R. solani isolates. The Group-1 contains CAZymes belonging to GH28, AA9, PL3_2 and
426
AA3_2 families and form the highly abundant families (total enzyme count >200 proteins) of
427
enzymes in R Solani AGs. Similarly, Group-2 contains 11 CAZyme families with enzymes
428
moderately abundant in R Solani AGs. Whereas Group-3 contains 19 families with sparsely
429
abundant CAZymes. We observed that in all the three clusters, AG3-1A1 contains the highest
430
number of CAZymes for most the 34 families, and significantly enriched with all the members of
431
Group-1 families. In fact, the clustering analysis highlights the similar profiles of AG3-1A1 and
432
AG2-2IIIB, mainly due to similar distribution of proteins belonging to GH28, AA9, AA3_2 and
433
GH7. In Group-1, although GH28 containing enzymes are abundant in most of the R. solani
434
isolates, AG8 contains limited number of enzymes belonging to this family. Similarly, AA9 and
435
PL3_2 families of enzymes are abundant only in 50% of the isolates, and thus may be relevant
436
for a unique set of functions associated with the respective isolates. In Group-2, however, we
437
observed similar distribution of abundance profile across all the isolates, except T. calospora,
438
which indicates their probable role in R. solani specific function. For examples, CAZymes
439
belonging to AA5_1, GH18 and PL4_1 are enriched in most of the R. solani isolates, but not in
440
T. calospora. The conserved distribution of CAZymes families in the diverse proteomes of
441
different R. solani isolates signifies their essential role in fungal activity. On the other hand,
442
Group-3 CAZymes provide unique and distinct profile to each AG with a limited number of
443
families showing similar abundance profile. Wherein, T. calospora is found to be distinctly
444
abundant in CAZymes belonging to GH5_5, not observed with R. solani isolates. These results
445
strongly suggested that R. solani isolates share a large proportion of carbohydrate degrading
446
enzymes, in which an isolate-specific CAZyme profile can also be observed (mainly from
447
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Group-3). To confirm, if the abundance profile is strictly associated with R. solani isolates, we
448
performed the comparative analysis with abundance profile of 14 other basidiomycetes. The
449
analysis clearly reveals the distinct CAZymes profile than other Basidiomycetes, in which R.
450
solani isolates can be phylogenetically grouped into a different cluster (Figure S8). The analysis
451
highlights the families that uniquely abundant in R. solani isolates than other basidiomycetes,
452
e.g., GH28, PL3_2, AA5_1, CE4, GH10, GH62, PL4_1, CE8, PL1_7, PL1_4 and AA7, and as
453
expected, most of these families belong to Group-1 and Group-2 of the previous analysis.
454
Among these families, we observed that PL3_2, GH62 and CE8 families of proteins are
455
distinctly expressed in R. solani isolates. In addition, AG3-1A1 is exceptionally abundant in
456
AA9 and GH28, not observed with any other basidiomycetes under investigation. In contrast,
457
AA3_2 (Group-1) is abundant in most of the basidiomycetes, including R. solani. In summary,
458
we have shown that members of CAZymes families belonging to Group-1 and Group-2 are
459
abundant in R. solani isolates and may also provide them a unique attribute (or functions) not
460
observed with the other basidiomycetes.
461
RsolaniDB: a Rhizoctonia solani pangenome database and its applications
462
RDB is a large-scale, integrative repository for hosting the R. solani pangenome project with
463
emphasis on supporting data mining and analysis, wherein the genomes and their components
464
can be accessed under three different categories, viz. genomic, ortholog and functional
465
assignment.
466
Genomes: The genomic content includes draft genome sequences of R. solani isolates in FASTA
467
format along with the gene level annotation in GFF3 format. The annotation includes prediction
468
of gene boundaries with introns and exons, as well as their locations on contigs or scaffolds. It
469
also includes the predicted transcribed cDNA sequences and translated protein sequences. This
470
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
information is vital for those users looking for reference genomes and their annotated
471
components for mapping RNAseq reads. The draft genomes and their annotation can also be
472
downloaded and used for downstream local analysis, e.g., variants calling, SNP, eQTLs analysis
473
and other similar genomic analyses with different bioinformatics methods.
474
Orthologs: Using the orthoMCL clustering on the proteomes of 18 R. solani (including
475
previously published genome assemblies), protein sequences were compared and clustered into
476
groups of similar sequences. The sequences not part of any of the clusters, i.e., singletons, and
477
unique to respective isolates were categorized as “unique”. Whereas the rest of the proteome was
478
categorized either into “core” or “auxillary” groups of orthoMCL clusters. RDB allows users to
479
retrieve this information for each protein entry and also allows users to retrieve the protein ID of
480
other members of its ortholog cluster family, if any.
481
Functional assignment: This category includes the predicted InterPro protein domains associated
482
with each of the protein entries. RDB also includes GO information associated with each protein,
483
along with PANTHER pathway terms. This information helps in assigning the functional
484
description for each protein entry in the database.
485
The database is organized to include one unique RDB ID (or entry) for each gene
486
structure, with all of the above associated information. The RDB ID allows users to search the
487
genomic coordinates (intron/exon boundaries) with IGV visualization, sequences and its
488
functional annotation, for each gene in each R. solani isolate. All of this information can be
489
retrieved from the database via the “text-based” or “keywords-based” search in an AG-specific
490
manner or from the entire database. Users can also perform blast searches of their own
491
nucleotide or protein sequences to the entire database or can target a given AG. Moreover, users
492
can retrieve the set of sequences in FASTA format, for a given list of RDB IDs. One of the
493
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
important and unique features of RsolaniDB tools allows users to perform functional or gene-set
494
enrichment analysis of given RDB IDs, e.g., Gene Ontology or pathway analysis. This feature is
495
especially useful for analyzing differentially expressed genes after RNAseq data analysis, as it
496
provides the statistical significance (as p-values) of different GO/pathway terms enriched in a
497
given set of differentially expressed genes. As far as we know, this feature is unique to RDB
498
with respect to any other existing Rhizoctonia resources. However, it requires the user to use
499
reference genome sequences and the annotation file from RDB database for subjecting into
500
RNAseq data analysis pipeline. As an additional resource, RDB also incorporated previously
501
published (16, 18, 20, 79–81) genome and transcriptome level information in a single platform
502
with an RDB ID format. The database is publicly available to the scientific community,
503
accessible at http://rsolanidb.kaust.edu.sa/RhDB/index.html.
504
Discussion
505
Rhizoctonia solani is considered as one of the most destructive and a diverse group of soil-borne
506
plant pathogens causing various diseases on a wide range of economically important crops. It is
507
classified into 13 AGs with distinctive pathogenic host range and responsiveness to disease
508
control measures. For example, AG1, AG2‐2IIIB, and AG4 cause diseases mostly on cool‐
509
season turfgrasses, whereas AG2-2LP, causing large patch disease, is predominantly seen on
510
warm season turfgrasses (82, 83). Isolates from different AGs also vary in sensitivity to
511
fungicides and no single fungicide is effective against all AGs (84). For example, AG5 isolates
512
are moderately sensitive to pencycuron, while other AGs are highly sensitive to this fungicide
513
(85). Our ability to control this pathogen is hampered by a lack of accurate molecular
514
identification of AGs and its subgroups, and poor understanding of the genetic variation among
515
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
them. This genetic variation results in differing sensitivity to control measures, as well as the
516
pathogenic and ecological diversity in the population structures of the R. solani complex.
517
One of the primary reasons for this limited understanding of the R. solani complex is the lack of
518
genetic studies representative of its heterozygous and diverse AGs and sub-groups (13). Until
519
now, draft genome assemblies belonging to only four of the 13 AGs had been reported; viz.
520
AG1-IA (16), AG1-IB (17), AG2-2IIIB (13), AG3-Rhs1AP (18), AG3-PT isolate Ben-3 (19) and
521
AG8 (20). Here we expanded the scope of genetic analysis of the R. solani complex by
522
performing comprehensive genome sequencing, assembly, annotation and comparative analysis
523
of 12 R. solani isolates. This enabled us to perform pangenome analysis of R. solani to 7 AGs
524
(AG1, AG2, AG3, AG4, AG5, AG6, AG8), selected additional sub-groups (AG1-IC, AG3-TB),
525
and a hypovirulent isolate (AG3-1A1). Although heterokarotic and diploid nature of Rhizoctonia
526
species are expected to cause the genome assembly challenges (13), in our analysis we observed
527
of a large number of inter-groups syntenic regions and ITS2-based similarities which highlights
528
the high similarities among the given 13 R. solani isolates (including AG3-PT). The recognition
529
of conserved ITS2 sequences along with large syntenic regions despite the physiological and
530
taxonomic differences in the given isolates suggests the essentially conserved regions and high
531
quality of the draft genome sequences generated in this study.
532
Subsequently, to deduce the similarities as well as unique features in the given set of
533
predicted proteomes, we performed a series of comparative analyses that indicated the expected
534
heterogeneity among R. solani subgroups with the orchid mycorrhizal fungus T. calospora as an
535
outlier. For example, both AG5 and AG2-2IIIB included a large set of unique proteomes as well
536
as secretomes, enriched with InterPro families of proteins that are abundant in these two AGs.
537
Additionally, the proteome of R. solani isolates are uniquely and highly enriched with proteins
538
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
with “pectate lyase” domains, as compared to the other basidiomycetes. Another finding of
539
potential significance is that the highest number of orthoMCL clusters were shared between
540
AG3-1A1 and AG3-1AP, both isolates belonging to the AG3-PT subgroup. Isolate AG3-1A1 is
541
the sector-derived, hypovirulent isolate of the more virulent isolate, AG3-1AP. Intriguingly,
542
AG3-1A1 has been demonstrated to be a successful biocontrol agent of isolate AG3-1AP in the
543
field (86). Competitive niche exclusion is a demonstrated mechanism for biocontrol where the
544
biocontrol agent has a significant overlap in resource utilization with the pathogen and
545
outcompetes the pathogen for these necessary resources (87). A high degree of overlap in gene
546
function is consistent with the mechanism of biocontrol of AG3-1AP in the field by AG3-1A1
547
through competitive niche exclusion.
548
The sector-derived, hypovirulent isolate AG3-1A1 however differed from the progenitor isolate
549
AG3-1AP, as well as the other R. solani isolates analyzed, in AROM sequences. AROM
550
sequences are known for their conserved profile across fungal species and encode the penta-
551
functional AROM polypeptide that catalyzes five consecutive enzymatic reactions in the
552
prechorismate steps of the shikimate pathway; leading to biosynthesis of the aromatic amino
553
acids tryptophan, tyrosine, and phenylalanine(65). The isolate AG3-1A1 contained two complete
554
penta-functional AROM protein sequences while other isolates contained only one complete
555
sequence or partial AROM sequences. Sectoring as a means of phenotypic plasticity in fungi
556
may take place by genetic mutations, rearrangement of heterokaryotic nuclei, conversion from
557
heterokaryotic to homokaryotic mycelium, exchange of cytoplasmic factors, etc., resulting in
558
changes in morphology, virulence, mating type, sporulation, and ecological adaptations (88). It is
559
possible that the genetic event that led to duplication of the AROM sequences in AG3-1A1 led to
560
hypovirulence. Phenylacetic acid (PAA) has been demonstrated to be a virulence factor in the
561
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
progenitor isolate, AG3-1AP, and that downregulation of the shikimate pathway occurs in AG3-
562
1A1; resulting in a reduction in production of PAA by AG3-1A1(89). Moreover, possibilities
563
exist that one of the two arom genes in AG3-1A1 remains inactive due to methylation, or that the
564
gene duplication is an attempt to compensate for the suppressed shikimate pathway as
565
documented in Aspergillus nidulans (65). However, further investigation is necessary to
566
determine if any of those hypotheses is true.
567
Secretome analysis also revealed several interesting findings that provided unique
568
characteristics to each R. solani isolate, e.g., secretome of AG1-1B and AG3-T5 are uniquely
569
and significantly enriched with three different multi-copper oxidases (type 1/2/3), both of which
570
are known to cause foliar diseases. Nevertheless, despite the differences, most of the secretome
571
have similar composition in their significantly enriched protein domains, which mainly includes
572
"Cellulose-binding domain fungal", "Glycoside hydrolase family 61" and "Pectate lyase".
573
However, the composition is significantly different with respect to the other basidiomycetes and
574
large number of reported protein families are uniquely associated with multiple R. solani
575
isolates. We observed similar finding for the effector proteins, wherein protein containing
576
“Cysteine rich secretory proteins”, “Pectate lyase” and “Thaumatin” are distinctly abundant in R.
577
solani isolates, whereas “Hydrophobin” is only abundant in other basidiomycetes. Similarly, the
578
CAZyme analysis highlighted several unique attributes associated with each R. solani species
579
especially AG3-1A1 by possessing the CBM1 family of proteins which are linked with
580
degradation of insoluble polysaccharides (90). It was observed that several families of these
581
CAZymes were not present in T. calospora which is a symbiotic mycorrhizal fungus and other
582
basidiomycetes, e.g. GH28, PL3_2, AA5_1 and GH10 (91). Overall, data presented in this study
583
are consistent with the hypothesis that AG and sub-groups of Rhizoctonia species are highly
584
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
heterogeneous, each with unique functional genomic properties, while being conserved in their
585
functional regions with respective other groups. However, the unique secretomes, effector and
586
similarly CAZymes profiles of R. solani over other basidiomycetes may reflect the ecological
587
and host adaptation strategies, as well as the necrotrophic lifestyle of the former, and call for
588
future research in respective areas to better understand the biology and pathology of the species.
589
To further propel research with R. solani we present our data as the web-resource
590
RsolaniDB (RDB). This web-resource includes detailed information on each R. solani isolate,
591
such as the genome properties, predicted transcript/protein sequences, predicted function, and
592
protein orthologues among other AG sub-groups, along with tools for Gene Ontology (GO) and
593
pathway enrichment analysis, orthologs, sequence analysis and IGV visualization of gene
594
models. Also, by adding the previously published genome assemblies and their features,
595
RsolaniDB stands as the universal platform for accessing R. solani resources with single
596
identifier format. Since none of the existing Rhizoctonia specific databases host such a large
597
repertoire of genome assemblies and accessory web-tools for functional enrichment analysis of
598
gene set, e.g., differentially expressed genes, RsolaniDB stands as a valuable resource for
599
formulating new hypotheses and understanding the unique or conserved patho-system of R.
600
solani AGs and subgroups. The associated gene-set enrichment analysis tool further sets
601
RsolaniDB apart from the existing fungal databases which does not allow the gene enrichment
602
analysis.
603
Finally, since, each of the R. solani AGs or subgroups is characterized by a unique
604
heterogeneous profile, we strongly believe that the presented genome assemblies, annotation and
605
comparative analysis will facilitate mycologists and plant pathologists generating a greater
606
understanding of its biology and ecology, and in developing as well as improving the existing R.
607
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
solani disease management projects, including drug target discovery and design of future
608
diagnostic tools for rapid discrimination of R. solani AGs under indoor and outdoor farming
609
environments.
610
Data availability
611
All data is publicly available as the error corrected, processed fastq files at European Nucleotide
612
Archive (ENA) at EMBL-EBI under primary accession ID PRJEB39881 (secondary accession:
613
ERP123449) (92, 93). Genome assemblies and corresponding annotations are available at
614
RsolaniDB database (http://rsolanidb.kaust.edu.sa/RhDB/index.html)
615
Funding
616
This project was funded by USDA-ARS fund [Agreement #-58-8042-8-067-F USDA-KAUST
617
project] to DKL and a KAUST faculty baseline fund [BAS/1/1020-01-01] to AP.
618
Acknowledgements
619
The authors thank the members of the Bioscience Core Laboratory (BCL) in KAUST for
620
producing the raw DNA and RNA sequence datasets and Adnan (Ed) Ismaiel (USDA-ARS,
621
SASL, for DNA extraction and fungal culture maintenance). We also thank Drs. Ian Misner and
622
Nadim Alkharouf (Towson University, Towson, MD.) for helping during the initial setting-up of
623
the project.
624
Author contributions
625
A.K., D.K.L. and A.P. conceived the study, interpreted the results and wrote the manuscript;
626
A.K. performed the bioinformatics analysis and developed the computational pipelines and the
627
database; A.R., S.M. and M.N. conducted the molecular experiments, library preparation and
628
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
sequencing; D.P.R. collected and stored the materials and edited the manuscript; A.P. and D.K.L.
629
supervised the overall project.
630
References
631
1. Yang,G. and Li,C. (2012) General Description of Rhizoctonia Species Complex INTECH
632
Open Access Publisher.
633
2. Amaradasa,B.S., Horvath,B.J., Lakshman,D.K. and Warnke,S.E. (2013) DNA fingerprinting
634
and anastomosis grouping reveal similar genetic diversity in rhizoctonia species infecting
635
turfgrasses in the transition zone of USA. Mycologia, 105, 1190–1201.
636
3. Raaijmakers,J.M., Paulitz,T.C., Steinberg,C., Alabouvette,C. and Moënne-Loccoz,Y. (2009)
637
The rhizosphere: A playground and battlefield for soilborne pathogens and beneficial
638
microorganisms. Plant Soil, 321, 341–361.
639
4. Gónzalez,D., Rodriguez-Carres,M., Boekhout,T., Stalpers,J., Kuramae,E.E., Nakatani,A.K.,
640
Vilgalys,R. and Cubeta,M.A. (2016) Phylogenetic relationships of Rhizoctonia fungi within
641
the Cantharellales. Fungal Biol., 120, 603–619.
642
5. Keijer,J., Korsman,M.G., Dullemans,A.M., Houterman,P.M., De Bree,J. and Van
643
Silfhout,C.H. (1997) In vitro analysis of host plant specificity in Rhizoctonia solani. Plant
644
Pathol., 46, 659–669.
645
6. Foley,R.C., Gleason,C.A., Anderson,J.P., Hamann,T. and Singh,K.B. (2013) Genetic and
646
Genomic Analysis of Rhizoctonia solani Interactions with Arabidopsis; Evidence of
647
Resistance Mediated through NADPH Oxidases. PLoS One, 8, e56814.
648
7. Gonzalez,D., Carling,D.E., Kuninaga,S., Vilgalys,R. and Cubeta,M.A. (2001) Ribosomal
649
DNA systematics of Ceratobasidium and Thanatephorus with Rhizoctonia anamorphs .
650
Mycologia, 93, 1138–1150.
651
8. Hane,J.K., Anderson,J.P., Williams,A.H., Sperschneider,J. and Singh,K.B. (2014) Genome
652
sequencing and comparative genomics of the broad host-range pathogen Rhizoctonia solani
653
AG8. PLoS Genet., 10, e1004281.
654
9. Hossain,M.K., Tze,O.S., Nadarajah,K., Jena,K., Bhuiyan,M.A.R. and Ratnam,W. (2014)
655
Identification and validation of sheath blight resistance in rice (Oryza sativa L.) cultivars
656
against Rhizoctonia solani. Can. J. Plant Pathol., 36, 482–490.
657
10. Copley,T., Bayen,S. and Jabaji,S. (2017) Biochar Amendment Modifies Expression of
658
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Soybean and Rhizoctonia solani Genes Leading to Increased Severity of Rhizoctonia Foliar
659
Blight. Front. Plant Sci., 8, 221.
660
11. Anderson,J.P., Hane,J.K., Stoll,T., Pain,N., Hastie,M.L., Kaur,P., Hoogland,C., Gorman,J.J.
661
and Singh,K.B. (2016) Proteomic analysis of rhizoctonia solani identifies infection-specific,
662
redox associated proteins and insight into adaptation to different plant hosts. Mol. Cell.
663
Proteomics, 15, 1188–1203.
664
12. Lakshman,D.K., Roberts,D.P., Garrett,W.M., Natarajan,S.S., Darwish,O., Alkharouf,N.,
665
Pain,A., Khan,F., Jambhulkar,P.P. and Mitra,A. (2016) Proteomic Investigation of
666
Rhizoctonia solani AG 4 Identifies Secretome and Mycelial Proteins with Roles in Plant
667
Cell Wall Degradation and Virulence. J. Agric. Food Chem., 64, 3101–3110.
668
13. Wibberg,D., Andersson,L., Tzelepis,G., Rupp,O., Blom,J., Jelonek,L., Pühler,A.,
669
Fogelqvist,J., Varrelmann,M., Schlüter,A., et al. (2016) Genome analysis of the sugar beet
670
pathogen Rhizoctonia solani AG2-2IIIB revealed high numbers in secreted proteins and cell
671
wall degrading enzymes. BMC Genomics, 17, 245.
672
14. Zhang,J., Chen,L., Fu,C., Wang,L., Liu,H., Cheng,Y., Li,S., Deng,Q., Wang,S., Zhu,J., et al.
673
(2017) Comparative transcriptome analyses of gene expression changes triggered by
674
Rhizoctonia solani AG1 IA infection in resistant and susceptible rice varieties. Front. Plant
675
Sci., 8, 1422.
676
15. Shu,C., Zhao,M., Anderson,J.P., Garg,G., Singh,K.B., Zheng,W., Wang,C., Yang,M. and
677
Zhou,E. (2019) Transcriptome analysis reveals molecular mechanisms of sclerotial
678
development in the rice sheath blight pathogen Rhizoctonia solani AG1-IA. Funct. Integr.
679
Genomics, 19, 743–758.
680
16. Nadarajah,K., Razali,N.M., Cheah,B.H., Sahruna,N.S., Ismail,I., Tathode,M. and Bankar,K.
681
(2017) Draft genome sequence of Rhizoctonia solani anastomosis group 1 subgroup 1A
682
strain 1802/KB isolated from rice. Genome Announc., 5.
683
17. Wibberg,D., Rupp,O., Blom,J., Jelonek,L., Kröber,M., Verwaaijen,B., Goesmann,A.,
684
Albaum,S., Grosch,R., Pühler,A., et al. (2015) Development of a Rhizoctonia solani AG1-
685
IB Specific Gene Model Enables Comparative Genome Analyses between Phytopathogenic
686
R. solani AG1-IA, AG1-IB, AG3 and AG8 Isolates. PLoS One, 10.
687
18. Cubeta,M.A., Thomas,E., Dean,R.A., Jabaji,S., Neate,S.M., Tavantzis,S., Toda,T.,
688
Vilgalys,R., Bharathan,N., Fedorova-Abrams,N., et al. (2014) Draft Genome Sequence of
689
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
the Plant-Pathogenic Soil Fungus Rhizoctonia solani Anastomosis Group 3 Strain Rhs1AP.
690
Genome Announc., 2.
691
19. Wibberg,D., Genzel,F., Verwaaijen,B., Blom,J., Rupp,O., Goesmann,A., Zrenner,R.,
692
Grosch,R., Pühler,A. and Schlüter,A. (2017) Draft genome sequence of the potato pathogen
693
Rhizoctonia solani AG3-PT isolate Ben3. Arch. Microbiol., 199, 1065–1068.
694
20. Hane,J.K., Anderson,J.P., Williams,A.H., Sperschneider,J. and Singh,K.B. (2014) Genome
695
Sequencing and Comparative Genomics of the Broad Host-Range Pathogen Rhizoctonia
696
solani AG8. PLoS Genet., 10.
697
21. Bills,G.F., Singleton,L.L., Mihail,J.D. and Rush,C.M. (1993) Methods for Research on
698
Soilborne Phytopathogenic Fungi The American Phytopathological Society,.
699
22. Carlson,J.E., Tulsieram,L.K., Glaubitz,J.C., Luk,V.W.K., Kauffeldt,C. and Rutledge,R.
700
(1991) Segregation of random amplified DNA markers in F1 progeny of conifers. Theor.
701
Appl. Genet., 10.1007/BF00226251.
702
23. Sayers,E.W., Beck,J., Brister,J.R., Bolton,E.E., Canese,K., Comeau,D.C., Funk,K., Ketter,A.,
703
Kim,S., Kimchi,A., et al. (2020) Database resources of the National Center for
704
Biotechnology Information. Nucleic Acids Res., 10.1093/nar/gkz899.
705
24. Bolger,A.M., Lohse,M. and Usadel,B. (2014) Trimmomatic: A flexible trimmer for Illumina
706
sequence data. Bioinformatics, 30, 2114–2120.
707
25. Simon Andrews (2020) Babraham Bioinformatics - FastQC A Quality Control tool for High
708
Throughput Sequence Data. Soil, 5, 47–81.
709
26. Marçais,G. and Kingsford,C. (2011) A fast, lock-free approach for efficient parallel counting
710
of occurrences of k-mers. Bioinformatics, 27, 764–770.
711
27. Grabherr,M.G., Haas,B.J., Yassour,M., Levin,J.Z., Thompson,D.A., Amit,I., Adiconis,X.,
712
Fan,L., Raychowdhury,R., Zeng,Q., et al. (2011) Full-length transcriptome assembly from
713
RNA-Seq data without a reference genome. Nat. Biotechnol., 29, 644–652.
714
28. Bankevich,A., Nurk,S., Antipov,D., Gurevich,A.A., Dvorkin,M., Kulikov,A.S., Lesin,V.M.,
715
Nikolenko,S.I., Pham,S., Prjibelski,A.D., et al. (2012) SPAdes: A new genome assembly
716
algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477.
717
29. Gurevich,A., Saveliev,V., Vyahhi,N. and Tesler,G. (2013) QUAST: Quality assessment tool
718
for genome assemblies. Bioinformatics, 29, 1072–1075.
719
30. Boetzer,M., Henkel,C. V., Jansen,H.J., Butler,D. and Pirovano,W. (2011) Scaffolding pre-
720
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
assembled contigs using SSPACE. Bioinformatics, 10.1093/bioinformatics/btq683.
721
31. Luo,R., Liu,B., Xie,Y., Li,Z., Huang,W., Yuan,J., He,G., Chen,Y., Pan,Q., Liu,Y., et al.
722
(2012) SOAPdenovo2: An empirically improved memory-efficient short-read de novo
723
assembler. Gigascience, 1, 18.
724
32. Song,L., Shankar,D.S. and Florea,L. (2016) Rascaf: Improving Genome Assembly with RNA
725
Sequencing Data. Plant Genome, 9, plantgenome2016.03.0027.
726
33. Seppey,M., Manni,M. and Zdobnov,E.M. (2019) BUSCO: Assessing genome assembly and
727
annotation completeness. In Methods in Molecular Biology. Humana Press Inc., Vol. 1962,
728
pp. 227–245.
729
34. Bengtsson-Palme,J., Ryberg,M., Hartmann,M., Branco,S., Wang,Z., Godhe,A., De Wit,P.,
730
Sánchez-García,M., Ebersberger,I., de Sousa,F., et al. (2013) Improved software detection
731
and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other
732
eukaryotes for analysis of environmental sequencing data. Methods Ecol. Evol., 4, 914–919.
733
35. Kumar,S., Stecher,G., Li,M., Knyaz,C. and Tamura,K. (2018) MEGA X: Molecular
734
evolutionary genetics analysis across computing platforms. Mol. Biol. Evol.,
735
10.1093/molbev/msy096.
736
36. Rédei,G.P. (2008) CLUSTAL W (improving the sensitivity of progressive multiple sequence
737
alignment through sequence weighting, position-specific gap penalties and weight matrix
738
choice). In Encyclopedia of Genetics, Genomics, Proteomics and Informatics.
739
37. Jovanovic,N. and Mikheyev,A.S. (2019) Interactive web-based visualization and sharing of
740
phylogenetic trees using phylogeny.IO. Nucleic Acids Res., 47, W266–W269.
741
38. Huerta-Cepas,J., Serra,F. and Bork,P. (2016) ETE 3: Reconstruction, Analysis, and
742
Visualization of Phylogenomic Data. Mol. Biol. Evol., 33, 1635–1638.
743
39. Pryszcz,L.P. and Gabaldón,T. (2016) Redundans: An assembly pipeline for highly
744
heterozygous genomes. Nucleic Acids Res., 44, e113.
745
40. Camacho,C., Coulouris,G., Avagyan,V., Ma,N., Papadopoulos,J., Bealer,K. and
746
Madden,T.L. (2009) BLAST+: Architecture and applications. BMC Bioinformatics, 10,
747
421.
748
41. Cantarel,B.L., Korf,I., Robb,S.M.C., Parra,G., Ross,E., Moore,B., Holt,C., Alvarado,A.S.
749
and Yandell,M. (2008) MAKER: An easy-to-use annotation pipeline designed for emerging
750
model organism genomes. Genome Res., 18, 188–196.
751
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
42. Tarailo-Graovac,M. and Chen,N. (2009) Using RepeatMasker to identify repetitive elements
752
in genomic sequences. Curr. Protoc. Bioinforma., 10.1002/0471250953.bi0410s25.
753
43. Bateman,A., Martin,M.J., O’Donovan,C., Magrane,M., Alpi,E., Antunes,R., Bely,B.,
754
Bingley,M., Bonilla,C., Britto,R., et al. (2017) UniProt: The universal protein
755
knowledgebase. Nucleic Acids Res., 10.1093/nar/gkw1099.
756
44. Mi,H. and Thomas,P. (2009) PANTHER pathway: an ontology-based pathway database
757
coupled with data analysis tools. Methods Mol. Biol., 563, 123–140.
758
45. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P.,
759
Dolinski,K., Dwight,S.S., Eppig,J.T., et al. (2000) Gene ontology: Tool for the unification
760
of biology. Nat. Genet., 25, 25–29.
761
46. Quevillon,E., Silventoinen,V., Pillai,S., Harte,N., Mulder,N., Apweiler,R. and Lopez,R.
762
(2005) InterProScan: Protein domains identifier. Nucleic Acids Res., 33.
763
47. Hulo,N. (2004) Recent improvements to the PROSITE database. Nucleic Acids Res.,
764
10.1093/nar/gkh044.
765
48. Marchler-Bauer,A., Zheng,C., Chitsaz,F., Derbyshire,M.K., Geer,L.Y., Geer,R.C.,
766
Gonzales,N.R., Gwadz,M., Hurwitz,D.I., Lanczycki,C.J., et al. (2013) CDD: Conserved
767
domains and protein three-dimensional structure. Nucleic Acids Res., 10.1093/nar/gks1243.
768
49. Finn,R.D., Bateman,A., Clements,J., Coggill,P., Eberhardt,R.Y., Eddy,S.R., Heger,A.,
769
Hetherington,K., Holm,L., Mistry,J., et al. (2014) Pfam: The protein families database.
770
Nucleic Acids Res., 10.1093/nar/gkt1223.
771
50. Haft,D.H., Selengut,J.D. and White,O. (2003) The TIGRFAMs database of protein families.
772
Nucleic Acids Res., 10.1093/nar/gkg128.
773
51. Potter,S.C., Luciani,A., Eddy,S.R., Park,Y., Lopez,R. and Finn,R.D. (2018) HMMER web
774
server: 2018 update. Nucleic Acids Res., 46, W200–W204.
775
52. Almagro Armenteros,J.J., Tsirigos,K.D., Sønderby,C.K., Petersen,T.N., Winther,O.,
776
Brunak,S., von Heijne,G. and Nielsen,H. (2019) SignalP 5.0 improves signal peptide
777
predictions using deep neural networks. Nat. Biotechnol., 37, 420–423.
778
53. Käll,L., Krogh,A. and Sonnhammer,E.L.L. (2007) Advantages of combined transmembrane
779
topology and signal peptide prediction-the Phobius web server. Nucleic Acids Res., 35.
780
54. Emanuelsson,O., Brunak,S., von Heijne,G. and Nielsen,H. (2007) Locating proteins in the
781
cell using TargetP, SignalP and related tools. Nat. Protoc., 2, 953–971.
782
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
55. Sperschneider,J., Gardiner,D.M., Dodds,P.N., Tini,F., Covarelli,L., Singh,K.B.,
783
Manners,J.M. and Taylor,J.M. (2016) EffectorP: Predicting fungal effector proteins from
784
secretomes using machine learning. New Phytol., 210, 743–761.
785
56. Zhang,H., Yohe,T., Huang,L., Entwistle,S., Wu,P., Yang,Z., Busk,P.K., Xu,Y. and Yin,Y.
786
(2018) DbCAN2: A meta server for automated carbohydrate-active enzyme annotation.
787
Nucleic Acids Res., 46, W95–W101.
788
57. Farrer,R.A. (2017) Synima: A Synteny imaging tool for annotated genome assemblies. BMC
789
Bioinformatics, 18, 507.
790
58. Li,L., Stoeckert,C.J. and Roos,D.S. (2003) OrthoMCL: Identification of ortholog groups for
791
eukaryotic genomes. Genome Res., 13, 2178–2189.
792
59. Yu,Y., Ouyang,Y. and Yao,W. (2018) ShinyCircos: An R/Shiny application for interactive
793
creation of Circos plot. Bioinformatics, 10.1093/bioinformatics/btx763.
794
60. Krzywinski,M., Schein,J., Birol,I., Connors,J., Gascoyne,R., Horsman,D., Jones,S.J. and
795
Marra,M.A. (2009) Circos: An information aesthetic for comparative genomics. Genome
796
Res., 10.1101/gr.092759.109.
797
61. Alexa,A. and Rahnenführer,J. (2009) Gene set enrichment analysis with topGO.
798
Bioconductor Improv, 27.
799
62. Marçais,G., Delcher,A.L., Phillippy,A.M., Coston,R., Salzberg,S.L. and Zimin,A. (2018)
800
MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol., 14.
801
63. Cubeta,M.A., Thomas,E., Dean,R.A., Jabaji,S., Neate,S.M., Tavantzis,S., Toda,T.,
802
Vilgalys,R., Bharathan,N., Fedorova-Abrams,N., et al. (2014) Draft genome sequence of
803
the plant-pathogenic soil fungus Rhizoctonia solani anastomosis group 3 strain Rhs1AP.
804
Genome Announc., 10.1128/genomeA.01072-14.
805
64. Lakshman,D.K., Liu,C., Mishra,P.K. and Tavantzis,S. (2006) Characterization of the arom
806
gene in Rhizoctonia solani, and transcription patterns under stable and induced
807
hypovirulence conditions. Curr. Genet., 10.1007/s00294-005-0005-6.
808
65. Lamb,H.K., Van Den Hombergh,J.P.T.W., Newton,G.H., Moore,J.D., Roberts,C.F. and
809
Hawkins,A.R. (1992) Differential flux through the quinate and shikimate pathways:
810
Implications for the channelling hypothesis. Biochem. J., 10.1042/bj2840181.
811
66. Lakshman,D.K., Jian,J. and Tavantzis,S.M. (1998) A double-stranded RNA element from a
812
hypovirulent strain of Rhizoctonia solani occurs in DNA form and is genetically related to
813
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
the pentafunctional AROM protein of the shikimate pathway. Proc. Natl. Acad. Sci. U. S.
814
A., 95, 6425–6429.
815
67. Črešnar,B. and Petrič,Š. (2011) Cytochrome P450 enzymes in the fungal kingdom. Biochim.
816
Biophys. Acta - Proteins Proteomics, 10.1016/j.bbapap.2010.06.020.
817
68. Kim,K.T., Jeon,J., Choi,J., Cheong,K., Song,H., Choi,G., Kang,S. and Lee,Y.H. (2016)
818
Kingdom-wide analysis of fungal small secreted proteins (SSPs) reveals their potential role
819
in host association. Front. Plant Sci., 7.
820
69. McCotter,S.W., Horianopoulos,L.C. and Kronstad,J.W. (2016) Regulation of the fungal
821
secretome. Curr. Genet., 62, 533–545.
822
70. Li,T., Wu,Y., Wang,Y., Gao,H., Gupta,V.K., Duan,X., Qu,H. and Jiang,Y. (2019) Secretome
823
profiling reveals virulence-associated proteins of Fusarium proliferatum during interaction
824
with banana fruit. Biomolecules, 9.
825
71. Linder,M., Lindeberg,G., Reinikainen,T., Teeri,T.T. and Pettersson,G. (1995) The difference
826
in affinity between two fungal cellulose-binding domains is dominated by a single amino
827
acid substitution. FEBS Lett., 10.1016/0014-5793(95)00961-8.
828
72. Stergiopoulos,I. and de Wit,P.J.G.M. (2009) Fungal Effector Proteins. Annu. Rev.
829
Phytopathol., 47, 233–263.
830
73. Wei,M., Wang,A., Liu,Y., Ma,L., Niu,X. and Zheng,A. (2020) Identification of the Novel
831
Effector RsIA_NP8 in Rhizoctonia solani AG1 IA That Induces Cell Death and Triggers
832
Defense Responses in Non-Host Plants. Front. Microbiol., 10.3389/fmicb.2020.01115.
833
74. Yamamoto,N., Wang,Y., Lin,R., Liang,Y., Liu,Y., Zhu,J., Wang,L., Wang,S., Liu,H.,
834
Deng,Q., et al. (2019) Integrative transcriptome analysis discloses the molecular basis of a
835
heterogeneous fungal phytopathogen complex, Rhizoctonia solani AG-1 subgroups. Sci.
836
Rep., 9.
837
75. Kameshwar,A.K.S., Ramos,L.P. and Qin,W. (2019) CAZymes-based ranking of fungi
838
(CBRF): an interactive web database for identifying fungi with extrinsic plant biomass
839
degrading abilities. Bioresour. Bioprocess., 10.1186/s40643-019-0286-0.
840
76. Barrett,K., Jensen,K., Meyer,A.S., Frisvad,J.C. and Lange,L. (2020) Fungal secretome
841
profile categorization of CAZymes by function and family corresponds to fungal phylogeny
842
and taxonomy: Example Aspergillus and Penicillium. Sci. Rep., 10.1038/s41598-020-
843
61907-1.
844
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
77. Lombard,V., Golaconda Ramulu,H., Drula,E., Coutinho,P.M. and Henrissat,B. (2014) The
845
carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res.,
846
10.1093/nar/gkt1178.
847
78. Henrissat,B. (1991) A classification of glycosyl hydrolases based on amino acid sequence
848
similarities. Biochem. J., 10.1042/bj2800309.
849
79. Wibberg,D., Genzel,F., Verwaaijen,B., Blom,J., Rupp,O., Goesmann,A., Zrenner,R.,
850
Grosch,R., Pühler,A. and Schlüter,A. (2017) Draft genome sequence of the potato pathogen
851
Rhizoctonia solani AG3-PT isolate Ben3. Arch. Microbiol., 199, 1065–1068.
852
80. Wibberg,D., Andersson,L., Rupp,O., Goesmann,A., Pühler,A., Varrelmann,M., Dixelius,C.
853
and Schlüter,A. (2016) Draft genome sequence of the sugar beet pathogen Rhizoctonia
854
solani AG2-2IIIB strain BBA69670. J. Biotechnol., 10.1016/j.jbiotec.2016.02.001.
855
81. Wibberg,D., Rupp,O., Jelonek,L., Kröber,M., Verwaaijen,B., Blom,J., Winkler,A.,
856
Goesmann,A., Grosch,R., Pühler,A., et al. (2015) Improved genome sequence of the
857
phytopathogenic fungus Rhizoctonia solani AG1-IB 7/3/14 as established by deep mate-pair
858
sequencing on the MiSeq (Illumina) system. J. Biotechnol., 10.1016/j.jbiotec.2015.03.005.
859
82. Richard W. Smiley, Peter H. Dernoeden, and B.B.C. (2005) Compendium of Turfgrass
860
Diseases, Third Edition.
861
83. Burpee,L.L. and Martin,S.B. (1996) Biology of Turfgrass Diseases Incited by Rhizoctonia
862
Species. In Rhizoctonia Species: Taxonomy, Molecular Biology, Ecology, Pathology and
863
Disease Control.
864
84. Amaradasa,B.S., Lakshman,D., Mccall,D.S. and Horvath,B.J. (2014) In Vitro Fungicide
865
Sensitivity of Rhizoctonia and Waitea Isolates Collected from Turfgrasses 1.
866
85. Campion,C., Chatot,C., Perraton,B. and Andrivon,D. (2003) Anastomosis groups,
867
pathogenicity and sensitivity to fungicides of Rhizoctonia solani isolates collected on potato
868
crops in France. Eur. J. Plant Pathol., 10.1023/B:EJPP.0000003829.83671.8f.
869
86. Bernard,E., Larkin,R.P., Tavantzis,S., Erich,M.S., Alyokhin,A., Sewell,G., Lannan,A. and
870
Gross,S.D. (2012) Compost, rapeseed rotation, and biocontrol agents significantly impact
871
soil microbial communities in organic and conventional potato production systems. Appl.
872
Soil Ecol., 10.1016/j.apsoil.2011.10.002.
873
87. Roberts,D.P. and Kobayashi,D.Y. (2011) Impact of Spatial Heterogeneity Within
874
Spermosphere and Rhizosphere Environments on Performance of Bacterial Biological
875
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Control Agents. In Bacteria in Agrobiology: Crop Ecosystems.
876
88. Roper,M., Simonin,A., Hickey,P.C., Leeder,A. and Glass,N.L. (2013) Nuclear dynamics in a
877
fungal chimera. Proc. Natl. Acad. Sci. U. S. A., 10.1073/pnas.1220842110.
878
89. Liu,C., Lakshman,D.K. and Tavantzis,S.M. (2003) Quinic acid induces hypovirulence and
879
expression of a hypovirulence-associated double-stranded RNA in Rhizoctonia solani. Curr.
880
Genet., 10.1007/s00294-003-0375-6.
881
90. Van Bueren,A.L., Morland,C., Gilbert,H.J. and Boraston,A.B. (2005) Family 6 carbohydrate
882
binding modules recognize the non-reducing end of β-1,3-linked glucans by presenting a
883
unique ligand binding surface. J. Biol. Chem., 10.1074/jbc.M410113200.
884
91. Fochi,V., Chitarra,W., Kohler,A., Voyron,S., Singan,V.R., Lindquist,E.A., Barry,K.W.,
885
Girlanda,M., Grigoriev,I. V., Martin,F., et al. (2017) Fungal and plant gene expression in
886
the Tulasnella calospora–Serapias vomeracea symbiosis provides clues about nitrogen
887
pathways in orchid mycorrhizas. New Phytol., 10.1111/nph.14279.
888
92. Harrison,P.W., Alako,B., Amid,C., Cerdeño-Tárraga,A., Cleland,I., Holt,S., Hussein,A.,
889
Jayathilaka,S., Kay,S., Keane,T., et al. (2019) The European Nucleotide Archive in 2018.
890
Nucleic Acids Res., 10.1093/nar/gky1078.
891
93. Leinonen,R., Akhtar,R., Birney,E., Bower,L., Cerdeno-Tárraga,A., Cheng,Y., Cleland,I.,
892
Faruque,N., Goodgame,N., Gibson,R., et al. (2011) The European nucleotide archive.
893
Nucleic Acids Res., 10.1093/nar/gkq967.
894
895
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure legends
896
Figure 1. A. Circos plot. The Circos plot represents the syntenic relationship between genomes
897
of the different AGs of Rhizoctonia solani Kühn. Each line represents the region of genomic
898
similarity predicted with Synima. Only the regions with coverage > 40,000 bases were
899
enumerated and shown. B. The plot highlights the number of high-similarity syntenic regions
900
(coverage > 40,000 bp) shared between each pair of genomes, including T. calospora. The red
901
connection represents corresponding isolates sharing comparatively large number of syntenic
902
relationships than other pair of isolates. Here, self-hits were removed or not shown. C. ITS2
903
phylogeny. ITS2 sequences of the tester strain were obtained from the NCBI database and were
904
clustered with ITS2 sequences from assembled R. solani genomes (highlighted with blue color
905
and *), along with ITS2 sequences from previously published R. solani genome assemblies
906
(marked with **). The phylogenetic tree was constructed using megax software with 10,000
907
bootstrapping steps (see methods), after which resulting tree and corresponding alignment were
908
visualized together using Phylogeny.IO.
909
Figure 2. orthoMCL clustering of the predicted proteomes in R. solani AGs. A. Heatmap
910
showing protein conservation across all sequenced R. solani AGs and T. calospora. Each row
911
represents one orthoMCL cluster, and color is proportional to the number of protein members
912
shared within a given cluster from the given species (black: no member protein present; red:
913
large number of protein members present). The hierarchical clustering (hclust; method:
914
complete) analysis enumerates the similarities between different fungal isolates based on
915
proteins shared by them across all orthoMCL clusters. B. Cluster frequency. The line plot
916
represents the number of orthoMCL clusters shared by different fungal isolates used in this
917
study. Example, > 1400 orthoMCL clusters are shared by 14 different fungal isolates (including
918
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
positive and negative controls) used in this study. The bimodal nature of plot represent high
919
similarities across independent proteomes as large number of clusters shares protein members
920
from
≥
13 fungal isolates. The red line represents the smoothed curves after averaging out the
921
number of clusters. C. Protein classification based on the orthoMCL clusters. The “core”
922
proteins represent the sub-set of proteomes (from each R. solani AG and T. calospsora) with
923
conserved profile across all the isolates. Similarly, the “unique” sets represent the isolate-specific
924
protein subset. The rest of the protein subsets make the “Auxillary” proteome which are
925
conserved in a limited number of isolates. D. Shared orthoMCL clusters. The number of
926
orthoMCL clusters shared between any two isolates. A shared cluster means, a given orthoMCL
927
cluster contains proteins from both the isolates.
928
Figure 3. InterPro domain analysis of the unique proteome. In the unique proteome of each
929
fungal isolate, InterPro protein domain families were predicted using InterProScan (Version
930
5.45-80.0). Only the top 5 most enriched protein families are shown. The number marks the
931
corresponding annotation of InterPro family domain in the circular bar plot.
932
Figure 4. The Secreted Proteins. A. Number of predicted proteins in the secretome of each
933
fungal isolate (highlighted in yellow). The secreted proteins predicted in the unique proteome of
934
each isolate is highlighted in red. B. Comparative analysis of top six highly enriched InterPro
935
domains in the secretome.
936
Figure 5. Effector Proteins. A. The number of cysteine rich effector proteins predicted in the
937
predicted secretome of each fungal isolate. B. The proportion of Cysteine observed across all the
938
effectors predicted in each isolate. C. Topmost Enriched InterPro domains in Effector proteins of
939
Rhizoctonia species (not T. calospora) and other basidiomycetes (including T. calospora). D.
940
The comparative analysis of the distribution of number of effector proteins predicted in R. solani
941
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
AGs as compared to other Basidiomycetes. The p-value is computed using unpaired Wilcoxon-
942
rank sum test.
943
Figure 6. CAZymes. A. The number of carbohydrate metabolizing enzymes (CAZymes)
944
predicted in the proteome of each fungal isolate. B. Heatmap showing the CAZyme conservation
945
across all the R. solani AGs and T. calospora. Each row represents one CAZy family of proteins,
946
and color is proportional to the number of protein members shared within a given family from
947
the given species (black: no member protein present; red: large number of protein members
948
present). The hierarchical clustering (hclust; method: complete) enumerates the similarities
949
between different fungal isolates based on proteins shared by them across all CAZy families. For
950
simplicity only the CAZyme families enriched in more than 50 enzymes across all proteomes are
951
shown.
952
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure 1
953
954
955
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure 2
956
957
958
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure 3
959
960
961
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure 4
962
963
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure 5
964
965
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint
Figure 6
966
967
968
969
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 21, 2020. ; https://doi.org/10.1101/2020.12.18.423518doi: bioRxiv preprint