ArticlePDF Available

Machine learned -based visualization of the diversity of vine genomes worldwide and in Armenia using SOMmelier

Authors:
  • Institute of Molecular Biology, NAS RA

Abstract and Figures

In this study three major issues have been addressed: Firstly, the diversity of grapevine accessions worldwide and particularly in Armenia, a small country located in the largely volcanic Armenian Highlands, is incredibly rich in cultivated and especially wild grapes; secondly, the information hidden in their (whole) genomes, e.g., about the domestication history of grapevine over the last 11,000 years and phenotypic traits such as cultivar utilization and a putative resistance against powdery mildew, and, thirdly machine learning methods to extract and to visualize this information in an easy to percept way. We shortly describe the Self Origanizing Maps (SOM) portrayal method called "SOMmelier" (as the vine-genome "waiter") and illustrate its power by applying it to whole genome data of hundreds of grapevine accessions. We also give a short outlook on possible future directions of machine learning in grapevine transcriptomics and ampelography.
This content is subject to copyright. Terms and conditions apply.
Machine learned-based visualization of the diversity of grapevine
genomes worldwide and in Armenia using SOMmelier
Kristina Magaryan1,2, Maria Nikogհosyan3,4, Anush Baloyan3, Hripsime Gasoyan3, Emma Hovhannisyan3, Levon Galstyan3,
Tomas Konecny3, Arsen Arakelyan4, and Hans Binder3,5
1Research Group of Plant Genomics, Institute of Molecular Biology of National Academy of Sciences RA, Yerevan 0014, Armenia
2Department of Genetics and Cytology, Yerevan State University, Yerevan 0025, Armenia
3Armenian Bioinformatics Institute (ABI), Yerevan 0014, Armenia
4Bioinformatics Group, Institute of Molecular Biology Institute of National Academy of Sciences RA, Yerevan 0014, Armenia, Yerevan
0014, Armenia
5Interdisciplinary Centre for Bioinformatics, University of Leipzig, 04107 Leipzig, Germany
Abstract. In the proposed study three major issues have been addressed: Firstly, the diversity of grapevine
accessions worldwide and particularly in Armenia, a small country located in the largely volcanic Armenian
Highlands, is incredibly rich in cultivated and especially wild grapes; secondly, the information hidden in their
(whole) genomes, e.g., about the domestication history of grapevine over the last 11,000 years and phenotypic traits
such as cultivar utilization and a putative resistance against powdery mildew, and, thirdly machine learning methods
to extract and to visualize this information in an easy to percept way. We shortly describe the Self Origanizing Maps
(SOM) portrayal method called “SOMmelier” (as the vine-genome “waiter”) and illustrate its power by applying it to
whole genome data of hundreds of grapevine accessions. We also give a short outlook on possible future directions of
machine learning in grapevine transcriptomics and ampelogaphy.
1 Introduction
The grapevine is one of the earliest domesticated fruit
crops and has been widely cultivated and prized for its
fruit and wine. According to the recent study [1] the
roots of domestication were found deep in the
Pleistocene, ending almost 11.5 thousand years ago (ya)
in the region, where Armenian Highland is existed.
Armenia is considered an ancient origin of grapevine
domestication and wine-making, which is confirmed by
remains of wild and cultivated grapes and wine-
producing facilities found at archaeological sites of the
country. The diverse climatic conditions, unique
geography and existence of wild grapes were the main
drivers in the formation of extensive diversity of
cultivated varieties and the promotion of wine-
making [2].
In the recent decade whole genome studies of
grapevine genetic resources using high-throughput
sequencing technologies have generated novel knowledge
about the evolution of vine traits, genetic diversity,
phylogenetic relatedness and historical origin, phenotype
associations and migration paths of the vines. There has
been a rapid growth in the quality and quantity of data for
grapevine genomes, but methods to interrogate this data
are limited. At the same time, machine learning and
artificial intelligence methods are revolutionising data
analysis. Presented research applied machine learned-
based visualization and analysis of grapevine genomic
data by SOMmelier method to gain a greater
understanding of grapevine genomes, their diversity,
function and evolution [3].
Self-organizing neural networks mainly referred to as
self-organizing maps (SOMs) were introduced by T.
Kohonen in the beginning of 1980’s, who presented them
as “a new, effective software tool for the visualization of
high-dimensional data” [4]. The methods has been
further developed into a molecular portrayal method
complemented by comprehensive downstream analysis
options including different visualization options,
knowledge mining and feature selection tasks [5]. It has
been applied mainly to different omics data in the human
disease context (see, e.g., [6,7]) and recently was applied
to a collection of SNP vine genome data [3]. Here we
shortly introduce the method, illustrate its power by
applying it to worldwide grapevine genomes to
reconstruct dissemination of viticulture, discuss the
impact of wild and cultivated grapevines collected in
Armenia and finally present first results of whole
genome analyses using SOMmelier of Armenian
grapevine gene pool.
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (https://creativecommons.org/licenses/by/4.0/).
2 Machine learning of Vine genomes
Modern high throughput technologies such as genome
sequencing revolutionized molecular life sciences. A
typical genome-wide data consists of up to millions of
items (e.g. single nucleotide polymorphisms, SNPs) for
each of the hundreds to thousands of samples serving as
input for bioinformatics downstream analysis (Fig. 1).
Visualization-based analysis and knowledge mining is a
well proven but difficult to realize concept for genomic
data because of their size and complexity. We make use
of self-organizing maps (SOM), a clustering method
which was always developed more than forty years
ago [4]. Our SOM-“portrayal” approach uses a very high
number of (micro-)clusters largely exceeding the number
of relevant dimensions of variation of the data [8].
Particularly, for genomic data of vine the input matrix of
size ~108-104 is reduced to ~103-104 micro-clusters, also
called meta-SNPs, because each of them clusters similar
SNP-profiles across the vine accessions under study
together. SOM clustering uses an iterative, “machine-
learning” algorithm to achieve a specific, two-dimensions
similarity topology of the clusters. Namely, it arranges
them in a two-dimensional array under the condition that
neighbouring micro-clusters are more similar (in terms of
the Euclidian distance) than distant ones. The clustered
Figure 1. SOM portrayal of grapevine genomes (see text).
data are then visualized for each sample by colouring
each pixel of the two-dimensional array by the eMAF-
value of the meta-SNP, e.g. using a red to blue colour
scale. Because of the self-organizing properties of the
algorithm the obtained colour patterns exhibit blue and
red spot-like regions. They visualize the intrinsic co-
variance structure of the data which is specific for each
particular genome and can be interpreted as the individual
genomic portraits for each of the accessions. Virtually
they label each accession by an “fingerprint”-image. The
portraits are mutually comparable because the SNPs in
each of the micro-clusters are identical across one SOM.
Further, the portraits are interpretable in terms of
biological functions, e.g., by applying previous
knowledge about functional aspects of the genome and
enrichment techniques.
In the scope of proposed study we have analyzed
microarray SNP-data of grapevine accession collected
around the world and whole genome sequencing data of
Armenian wild and cultivated grapevines taken from [9]
and [1], respectively.
3 Genetic footprints of grapevine
cultivation
First, we analysed grapevine genetic SNP- and phenotype
data of cultivars collected around the world. The data
were taken from [9] and consists of 783 grapevine
samples originating from 41 countries in nine geographic
regions ranging from Middle Asia to Iberia in the “Old
World” and including also New World accessions (see
legend in Fig. 2). We calculated country-wise SOM
portraits, which reveal overall a high genetic diversity,
where, however, portraits of countries from the same
region often resemble each other. One finds similarities
between the portraits for neighbouring countries from
Georgia, via Russia, Ukraine and Moldova, towards
Balkan into the west direction and from Georgia and
Armenia via Iran towards Tadjikistan, Uzbekistan and
Afghanistan in the Middle Asia region. Moreover, the
textures alter in a systematic way between the regions,
e.g., from the east (EMCA, MFEA) to the west (BALK,
WCEU, ITAP and IBER), as visualized by the ‘metro-
net’ lines linking similar country portraits (Fig. 2).
Another route is directed from the Caucasus via Lebanon,
Israel towards North Africa (MAGH) and Iberian
Peninsula (IBER). The South Caucasus is also linked via
Anatolia (Turkey), Cyprus and Greece with the Balkan.
In the western part of Europe, portraits from Spain show
similarities with Northern African countries (MAGH),
and only partly with French and German portraits, which,
in turn, show similarity links via Switzerland, Austria and
the Czech Republic towards Balkan. Mexican cultivars
resemble Spanish ones according to their SNP-portraits
while cultivars from USA, Australia and Argentina, on
average, reflect more similarities with grapevines from
MFEA and EMCA.
A recent very large study on the whole genomes of
more than 2,000 V.v.subsp. vinifera and about 1,000
V.v.subsp. sylvestris disentangles history of grape
domestication and dissemination much more in
detail [1]. Accordingly, originally grapevine was
cultivated separately in an Caucasian and in an Western
Asian cultivation centre around 11,000 ya (years ago) and
afterwards cultivated grapevine accessions were
disseminated across the Old world mainly by Neolithic
farmers. Overall dissemination stages refer to six clusters
of cultivated grapes (CG1-6), where CG2 refers to the
Caucasian origin (blue arrow in Fig. 2), while CG1 and
CG3-6 refer to the Western Asian origin (red arrows).
The regional similarity relations of the SOM portraits
strikingly agree with these CG-clusters.
2
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
Figure 2. Metro map of distribution of cultivated grapes
collected from 41 countries around the Mediterranean Sea.
Country-wise SOM-portraits are linked by lines according to
mutual similarity relations (see [3] for details). The map
resembles the dissemination routes of cultivated grapes which
originated from two areas of primary cultivation in the
Caucasus (blue, CG2) and near East (red, CG1) around 11,000
years ago [1]. CG1 distributed towards the east and the west in
the following thousand years as indicated (ya…years ago) and
mainly constituted the diversity of grapes observed nowadays.
The CG-clusters well agree with the regions included in the
ellipses.
The metro-map in (Fig. 2) illustrates country- and
region-wise similarities between the grapevine genomes
as visualized using their SOM portraits. A
complementary perspective in gene-space can be
obtained by summarizing the SOM portraits into a
genetic overview landscape of grapevine cultivars
(Fig. 3). It shows (red) mountain ranges which refer to
correlated clusters of SNPs with increased excess minor
allele frequencies (eMAF) in the portraits of certain
geographic regions as indicated by the flags. Blue
coloured areas refer to genomic regions of low eMAF
values on the average. Interestingly, the topology of the
genetic eMAF-landscape resembles the geographic
topology around the Mediterranean and the Black Sea
areas including the Caucasus. Namely, the eMAF-
‘mountains’ order cultivars from the Caucasus along a
northern route’ via Balkan towards Western Europe and
along a southern route via Palestine and Maghreb towards
the Iberian Peninsula. A central ‘blue valley’ referring to
predominantly low eMAF-values separates both routes. It
can be interpreted geographically as the Mediterranean
and Black Sea areas, which constitute areas of reduced
genetic exchange. Interestingly, the large barrier is found
between grapes from the Iberian Peninsula and Western
Europe (France, Italy), while the street of Gibraltar
appears only as a small sidearm of the central ‘genetic’
valley, thus indicating a relatively moderate genetic
barrier between North Africa and Iberia. Another
moderate genetic barrier is found between grapes from
the Balkan and Western Europe (Germany, Switzerland
and Italy). According to these barriers, cultivars divide
into four major groups on the coarsest level of
classification, namely Western Europe and Italian grapes,
Iberian grapes and vine cultivars from Eastern and
Maghreb regions, which strikingly agree with the CG6,
CG5, CG4 (and partly CG2) and CG1 groups,
respectively. Detailed inspection of the mountain range of
‘eastern’ grapes reveals fine internal structure of valleys
separating, e.g., Armenian from Georgian grapes and
vines from Anatolia and Greece from Balkan ones.
Additionally, we visualize grape utilization in terms of
phenotype maps which associate table, wine and double
usage with different geographic regions (Fig. 3, part
below). Grapes for fresh consumption (table vines)
predominate in Asia and North African areas, while wine
utilization is found mostly in Western Europe. Overall
SOM portrayal of grapevine genomes illustrates the
specifics of individual grapevine accessions, similarities
between them in the historical context of vine cultivation
and dissemination as wells as phenotypic traits such as
grapevine utilization.
Figure 3. The SOM genetic landscape reflects the
dissemination paths of cultivated grapes from near east and the
Caucasus towards Western Europe and Iberian Peninsula. Grape
utilization maps of the grape show preferential wine making in
Western Europe, Iberia and Balkan, table grape usage in West
and East Asia and double usage in Maghreb and Iberia.
4 Diversity of wild and cultivated
grapevines in Armenia
4.1 Armenian gene pool of V. sylvestris
Armenia is an important origin of grapevine
domestication, located in the dual domestications centre
of grapevine evolution governed by endemic wild
grapevines [1]. The country is characterized by a high
diversity of cultivated (Vitis vinifera L. subsp.vinifera)
and wild (Vitis vinifera L. subsp. sylvestris) grapevines.
The country has played a leading role in the centuries-
lasting history of grapevine cultivation in the Near East.
3
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
Varying climatic conditions and the existence of wild
grapes lead to the formation and promotion of viticulture
and winemaking, as evidenced by nearly 450
autochthonous varieties [2]. Hundreds of unique
indigenous cultivars are still preserved in old vineyards
and abandoned gardens, though most of them are
threatened by extinction. Wild grapes, thriving along
riverbanks, climbing the rocks and embracing the trees
can be found in Vayots Dzor, Tavush, Lori, Syunik
provinces and in Artsakh.
The understanding of the importance of the protection
and conservation of genetic resources of the grapevine
wild ancestor Vitis vinifera L. subsp. sylvestris is very
high for several reasons. Nowadays, the wild grape
population has become relict due to several forms of
human disturbance such as habitat destruction and
fragmentation, irregular management of the natural
environment, pathogen spread, which has increased in the
last decades, and a demanding reproductive
strategy [10].
Studies on wild grapes reinforced in parallel with the
advanced molecular technologies, the ultimate goals of
preserving its biodiversity, clarifying its taxonomic status
and identifying traits of interest for the breeding program.
The study of genetic relationships among the two
subspecies of Vitis vinifera evidenced genetic relatedness
between wild and cultivated grapes in Armenia
(Margaryan et al, 2023, accepted for publication in
VITIS). The applied hierarchical and non-hierarchical
clustering methods differentiated between sylvestris and
vinifera, but also demonstrated existence of gene flow
between the wild and cultivated grapevines through
overlaps and presence of admixed ancestry values (see
also below). High levels of genetic diversity
demonstrated by the effective number of alleles and
richness of private and new alleles, mirrored the
existence of significant diversity both within and between
the subspecies suggesting that Armenia is an important
centre of grape biodiversity.
4.2 Diversity as seen by microsatellite markers
The knowledge of genetic diversity and relatedness
among grapevine varieties and wild plants is important to
recognize gene pool. One of the major goals more than
10 years for the Group of Plant Genomics at IMB NAS
RA was the large-scale research to evaluate the level and
relationships of existing genetic diversity of grapes across
Armenia, aiming to identify genotypes that could provide
genetic insights into the Armenian grapevine germplasm
structure. It was confirmed that Armenian grapevine
germplasm is a blend of different genotypes, exhibiting a
high level of differentiation, resulting in higher-than-
expected levels of heterozygosity. This is often observed
in woody perennial crops where varieties are selected for
their vigor and crop performance, indirectly endorsing
high levels of heterozygosity.
Prospections in traditional viticulture regions across
Armenia provided insights in the huge grapevine genetic
diversity existing in the country. A combination of
nuclear microsatellite markers and ampelography proved
useful to determine the identity of collected samples
recovered from old vineyards and home gardens.
Synonyms, homonyms, alternative spellings, and
misnomers were clarified. Well-identified and referenced
grape genetic resources are a prerequisite for its
utilization and the management of germplasm
repositories.
The high number of alleles, included also rare and
new alleles, high observed and effective heterozygosity
values, and presence of female APT3-allele 366, which is
absent in western European cultivars, illustrate the huge
diversity of Armenian germplasm. Presumably, these
findings are related to recurrent introgression of Vitis
sylvestris into the cultivated compartment during
domestication events. Instability of grapevine cultivars
also was detected, showing three and in some cases also
four alleles at one locus.
4.3 SOM portrayal of WGS data
In a new, ongoing project at the Armenian Bioinformatics
Institute, a group of young researchers started analysis of
whole genome sequencing (WGS) data of wild and
cultivated grapevines collected across the viticultural
regions of Armenia. This data provided an essential
contribution to the understanding of the evolutionary
history of grapevine [1]. Phylogenetic clustering
separates wild from cultivated grapevines except a small
mixed cluster of both (Fig. 4). SOM analysis generated
individual portrait of the genomes of each Armenian
accession studied. Mean portraits of wild and cultivated
grapevines show mirror symmetrical patterns of a red and
a blue spot indicating antagonistic eMAF patterns which
confirms the separation of accessions in the clustering
tree (Fig. 4, part below).
Figure 4. Phylogenetic cluster tree of Armenian vine accessions
distinguishes v. sylvestris and v. vinifera. SOM portraits of all
accessions are shown in the part below.
4
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
4.3 Admixture and phenotypic traits
For a closer look we performed admixture analysis
assuming K=2, 3, 4 and 6 genomic fractions (Fig. 5). For
K=2 the genomic fractions divide into a (predominantly)
v. vinifera- (blue colored) and a v. sylvestris (red colored)
ones in agreement with the phylogenetic tree clustering
and SOM analysis (see previous subsection). Hybrid and
feral vines show composite genomes with dominant
vinfera-related fraction. For K=3 and 4 one finds a
binary, continuously varying composition of v. vinifera as
well as a binary composition of v. sylvestris which
however is strongly intermixed with the v. vinifera
component. Admixture for K=6 virtually confirms the
K=4 analysis.
Figure 5. Admixture plots for K=2, 3, 4 and 6 components of
Armenian wild (V. sylvestris) and cultivated (V. vinifera)
accessions. Selected individual genetic portraits are shown on
top of the admixture plot. Part below: Mean portraits over
phenotypic traits “utilization” and “berry skin colour”.
The individual genetic portraits reveal partly parallel
changes of spot patterns and of admixture components
which suggest associations between both views but need
further analysis. For example, the red eMAF spots rotate
in counter clock- and clockwise direction for V. v. subsp.
vinifera and V.v. subsp. sylvestris, respectively, which
indicates their systematic variation along the x-admixture
coordinate, and, in turn, associates with phenotypic traits
such as vine utilization and berry skin colour (see the
colour bars in Fig. 5). Phenotype-stratified mean SOM-
portraits reveal distinct genetic differences between them.
Hence, admixture analysis and SOM portrayal provide
complementary information in terms of composite plots
along the accession axis and of detailed genetic patterns
for of individual accessions, respectively.
4.4 Searching for resistance (R-)genes
Out of the 63 v. sylvestris accessions studied, 21 (33%)
were shown putative resistance against the fungal disease
powdery mildew, which enables to search for R-
associated chromosomal loci by comparing them on
genome-wide scale. GWAS analysis revealed four
chromosomal loci significantly associated with resistance
against powdery mildew (Fig. 6). SNPs on chromosome
13 are located in an intergenic region corresponding to
the previously identified resistance REN1 locus [11]
while the other SNPs suggest possible new R- loci. Mean
SOM portraits of Armenian putative resistant and non-
resistant v. sylvestris accessions show the typical red spot
in the left upper corner with slight differences. For higher
resolution we generated the difference portrait which,
interestingly, shows red R-associated “spots” of increased
eMAF in the right upper half of the map and blue spots of
decreased eMAF in the left lower part. These patterns
suggest a systematic effect of resistance in the vine
genomes, particularly, of increased and reduced MAF,
respectively. Preliminary results show that selected
significant R-SNPs are located in the blue spots of
reduced eMAF in the left part of the difference portrait.
In summary, potential resistant v. sylvestris accessions
provide a rich reservoir for the search of R-genes and the
underlying molecular mechanisms with potential impact
for viticulture. Hereby the combination of GWAS and
SOM-portrayal are useful tools for identifying genomic
R-loci and mechanisms.
Figure 6. Manhattan plot of the results of GWAS on resistant
and non-resistant V. sylvestris from Armenia. Mean portraits
and their difference are shown in the part below.
5
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
5 Future aspects of machine learning in
vine science
5.1 Functional genomics in the era of climate
change
Genetics isn’t everything! Biological function is
governed by a bundle of molecular mechanisms under the
effect of environmental condition which includes
genomic regulation via an interplay of different omics
levels including transcriptomics and epigenomics. Hence,
genetic analyses must be complemented by phenotypic
and additional omics data for a deeper understanding of
grape physiology, e.g. to handle environmental stress in
the context of climate change. As an illustrative example
we applied SOM portrayal to transcriptomic data
(RNAseq) extracted from leaves of grapevines
conditioned under different temperature conditions
simulating cold-stress [12]. Different stress conditions
(freeze stress, chill stress, freeze shock) induce distinct
transcriptomic changes relative to the reference “warm”
environmental state for five vine accessions (see Fig. 7,
part above). Their mean SOM-portraits per condition
reveal systematic changes of the transcriptional programs
which can be summarized into a merged transcriptional
landscape (Fig. 7, part below, red and blue areas indicate
over- and under-expression, respectively). The red
overexpression modules of co-regulated genes can be
assigned to different biological functions specifically
activated under the different temperature conditions along
a stress-trajectory (white arrow). Hence, SOM-portrayal
can be applied to different omics data beyond genetics as
a generic clustering, visualization and analysis method
for big and complex data collected for studying plant
physiology under environmental stress.
5.2 Digital ampelography: Learning the shape
“Classical” ampelography generates another type of
complex data with impact for classification of grapevine
accessions based, e.g., on the metrics of their leaves. It
has a long history and can be seen as a sort of “classical”
standard based on leave-shapes. We recently developed a
SOM-learning method to classify human body
shapes [13] which can be viewed as an analogous metric
system based on a series of items per human body.
Application to ampelographic measures opens one option
to handle large-scale leaf data of hundreds of vine
accessions using machine learning. Deep learning of leaf
shapes represents another, very interesting option for
developing and applying digital ampelography techniques
to large collections of grapevine varieties [14]. Here, the
whole shape of a large number leaves is learned for
classification of vine varieties with high accuracy. Digital
ampelography is currently in the proof-of-principle stage
and needs larger consensus data sets for broad
applications. Our contribution will be the systematic
gallery of leave shapes of Armenian accessions as well as
their machine learning using SOM and deep learning
techniques. Furthermore, in a wider sense the individual
genetic SOM portraits of vine accessions as presented in
chapter 4 can be used for deep learning of genetic
patterns for developing a “genetic ampelography”. A
proof of principle study using deep learning on SOM
portraits taken from another application [15] makes its
application to vine genomes promising.
Figure 7. SOM portrayal of vine transcriptomics under
temperature stress. The phylogenetic tree clusters the
transcriptomes extracted from grapevine leaves (five accessions,
three replicates) into disjunct clusters, each related to one of the
four temperature conditions applied to the plants. Each cluster is
characterized by its specific transcriptional state as visualized
by its transcriptomics portrait. Part below: the overview
landscape summarizes the observed modules of overexpressed
genes (red spots), which can be assigned to certain biological
functions using previous knowledge and gene set enrichment
techniques. The white arrow illustrates a “stress trajectory”
pointing from normal, “warm” reference state via cold (chill
and freeze shock stress) towards freeze stress. RNAseq data
were taken from [12].
6 Conclusions
Whole genome data on thousands of grapevine
accessions open novel perspectives in viticulture.
Machine learning and, particularly, SOMelier molecular
portrayal in combination with other bioinformatics
methods offers interesting options for their intuitive
analysis and understanding in terms of mutual similarities
as well as of their functional impact. The detailed study
of the richness of Armenian genetic resources is in the
focus of our research addressing the history of grapevine
cultivation, resistance against fungal diseases and
environmental stress in the context of climate change.
We acknowledge the support given by FAST (Foundation of
Armenian Science and Technology) in the frame of the
ADVANCE program and project 21T-1F076, SC of RA.
6
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
References
1. Dong, Y.; Duan, S.; Xia, Q.; Liang, Z.; Dong, X.;
Margaryan, K., . . . Chen, W., Dual domestications
and origin of traits in grapevine evolution. Science
2023 379(6635), 892-901
2. Margaryan, K.; Melyan, G.; Röckel, F.; Töpfer, R.;
Maul, E., Genetic Diversity of Armenian Grapevine
(Vitis vinifera L.) Germplasm: Molecular
Characterization and Parentage Analysis. Biology
2021 10(12), 1279
3. Nikoghosyan, M.; Schmidt, M.; Margaryan, K.;
Loeffler-Wirth, H.; Arakelyan, A.; Binder, H.,
SOMmelierIntuitive Visualization of the Topology
of Grapevine Genome Landscapes Using Artificial
Neural Networks. Genes 2020 11(7), 817
4. Kohonen, T., Self-organized formation of
topologically correct feature maps. Biological
Cybernetics 1982 43(1), 59-69
5. Löffler-Wirth, H.; Kalcher, M.; Binder, H.,
oposSOM: R-package for high-dimensional
portraying of genome-wide expression landscapes on
bioconductor. Bioinformatics 2015 31(19), 3225-
3227
6. Loeffler-Wirth, H.; Kreuz, M.; Hopp, L.; Arakelyan,
A.; Haake, A.; Cogliatti, S. B., . . . Binder, H., A
modular transcriptome map of mature B cell
lymphomas. Genome Medicine 2019 11(1), 27
7. Schmidt, M.; Arshad, M.; Bernhart, S.H.; Hakobyan,
S.; Arakelyan, A.; Loeffler-Wirth, H.; Binder, H.,
The Evolving Faces of the SARS-CoV-2 Genome.
Viruses 2021 13(9), 1764
8. Wirth, H.; Löffler, M.; von Bergen, M.; Binder, H.,
Expression cartography of human tissues using self
organizing maps. BMC Bioinformatics 2011 12(1),
306
9. Laucou, V.; Launay, A.; Bacilieri, R.; Lacombe, T.;
Adam-Blondon, A.-F.; Bérard, A., . . . Boursiquot,
J.-M., Extended diversity analysis of cultivated
grapevine Vitis vinifera with 10K genome-wide
SNPs. PLOS ONE 2018 13(2), e0192540
10. Margaryan, K.; Maul, E.; Muradyan, Z.;
Hovhannisyan, A.; Melyan, G.; Aroutiounian, R.,
Evaluation of breeding potential of wild grape
originating from Armenia. BIO Web Conf. 2019 15,
01006
11. Sosa-Zuniga, V.; Vidal Valenzuela, Á.; Barba, P.;
Espinoza Cancino, C.; Romero-Romero, J.L.; Arce-
Johnson, P., Powdery Mildew Resistance Genes in
Vines: An Opportunity to Achieve a More
Sustainable Viticulture. Pathogens 2022 11(6), 703
12. Londo, J.P.; Kovaleski, A.P.; Lillis, J.A., Divergence
in the transcriptional landscape between low
temperature and freeze shock in cultivated grapevine
(Vitis vinifera). Horticulture Research 2018 5(1), 10
13. Löffler-Wirth, H.; Willscher, E.; Ahnert, P.;
Wirkner, K.; Engel, C.; Loeffler, M.; Binder, H.,
Novel Anthropometry Based on 3D-Bodyscans
Applied to a Large Population Based Cohort. PLOS
one 2016 11(7), e0159887
14. Magalhães, S.C.; Castro, L.; Rodrigues, L.; Padilha,
T.C.; Carvalho, F. d.; Santos, F.N. d., . . . Moreira,
A. P., Toward Grapevine Digital Ampelometry
Through Vision Deep Learning Models. IEEE
Sensors Journal 2023 23(9), 10132-10139
15. Loeffler-Wirth, H.; Kreuz, M.; Schmidt, M.; Ott, G.;
Siebert, R.; Binder, H., Classifying Germinal Center
Derived Lymphomas-Navigate a Complex
Transcriptional Landscape. Cancers 2022 14(14), 3434
7
BIO Web of Conferences 68, 01009 (2023) https://doi.org/10.1051/bioconf/20236801009
44th World Congress of Vine and Wine
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Several thousand grapevine varieties exist, with even more naming identifiers. Adequate specialised labour is not available for proper classification or identification of grapevines, making the value of commercial vines uncertain. Traditional methods such as genetic analysis or ampelometry are time-consuming, expensive and often require expert skills that are even rarer. New vision-based systems benefit from advanced and innovative technology and can be used by non-experts in ampelometry. To this end, Deep Learning (DL) and Machine Learning (ML) approaches have been successfully applied for classification purposes. The present work extends the state-of-the-art by applying digital ampelometry techniques to larger grapevine varieties. We benchmarked MobileNet v2, ResNet-34 and VGG-11-BN DL classifiers to assess their ability for digital ampelography. In our experiment, all the models could identify the vines’ varieties through the leaf with a weighted F1 score higher than 92 %.
Article
Full-text available
We elucidate grapevine evolution and domestication histories with 3525 cultivated and wild accessions worldwide. In the Pleistocene, harsh climate drove the separation of wild grape ecotypes caused by continuous habitat fragmentation. Then, domestication occurred concurrently about 11,000 years ago in Western Asia and the Caucasus to yield table and wine grapevines. The Western Asia domesticates dispersed into Europe with early farmers, introgressed with ancient wild western ecotypes, and subsequently diversified along human migration trails into muscat and unique western wine grape ancestries by the late Neolithic. Analyses of domestication traits also reveal new insights into selection for berry palatability, hermaphroditism, muscat flavor, and berry skin color. These data demonstrate the role of the grapevines in the early inception of agriculture across Eurasia.
Article
Full-text available
Classification of lymphoid neoplasms is based mainly on histologic, immunologic, and (rarer) genetic features. It has been supplemented by gene expression profiling (GEP) in the last decade. Despite the considerable success, particularly in associating lymphoma subtypes with specific transcriptional programs and classifier signatures of up- or downregulated genes, competing molecular classifiers were often proposed in the literature by different groups for the same clas-sification tasks to distinguish, e.g., BL versus DLBCL or different DLBCL subtypes. Moreover, rarer sub-entities such as MYC and BCL2 “double hit lymphomas” (DHL), IRF4-rearranged large cell lymphoma (IRF4-LCL), and Burkitt-like lymphomas with 11q aberration pattern (mnBLL-11q) attracted interest while their relatedness regarding the major classes is still unclear in many re-spects. We explored the transcriptional landscape of 873 lymphomas referring to a wide spectrum of subtypes by applying self-organizing maps (SOM) machine learning. The landscape reveals a continuum of transcriptional states activated in the different subtypes without clear-cut border-lines between them and preventing their unambiguous classification. These states show striking parallels with single cell gene expression of the active germinal center (GC), which is character-ized by the cyclic progression of B-cells. The expression patterns along the GC trajectory are discriminative for distinguishing different lymphoma subtypes. We show that the rare subtypes take intermediate positions between BL, DLBCL, and FL as considered by the 5th edition of the WHO classification of haemato-lymphoid tumors in 2022. Classifier gene signatures extracted from these states as modules of coregulated genes are competitive with literature classifiers. They provide functional-defined classifiers with the option of consenting redundant classifiers from the literature. We discuss alternative classification schemes of different granularity and functional impact as possible avenues toward personalization and improved diagnostics of GC-derived lymphomas.
Article
Full-text available
Grapevine (Vitis vinifera) is one of the main fruit crops worldwide. In 2020, the total surface area planted with vines was estimated at 7.3 million hectares. Diverse pathogens affect grapevine yield, fruit, and wine quality of which powdery mildew is the most important disease prior to harvest. Its causal agent is the biotrophic fungus Erysiphe necator, which generates a decrease in cluster weight, delays fruit ripening, and reduces photosynthetic and transpiration rates. In addition, powdery mildew induces metabolic reprogramming in its host, affecting primary metabolism. Most commercial grapevine cultivars are highly susceptible to powdery mildew; consequently, large quantities of fungicide are applied during the productive season. However, pesticides are associated with health problems, negative environmental impacts, and high costs for farmers. In paralleled, consumers are demanding more sustainable practices during food production. Therefore, new grapevine cultivars with genetic resistance to powdery mildew are needed for sustainable viticulture, while maintaining yield, fruit, and wine quality. Two main gene families confer resistance to powdery mildew in the Vitaceae, Run (Resistance to Uncinula necator) and Ren (Resistance to Erysiphe necator). This article reviews the powdery mildew resistance genes and loci and their use in grapevine breeding programs.
Article
Full-text available
Simple Summary The knowledge of genetic diversity and relatedness among grapevine varieties is important for recognizing gene pools. One of the major goals of the present large-scale study was to evaluate the level and relationships of existing genetic diversity across Armenia, aiming to identify genotypes that could provide genetic insights into the Armenian grapevine germplasm structure. A combination of nuclear microsatellite markers and ampelography proved useful to determine the identity of collected samples recovered from old vineyards and home gardens. Synonyms, homonyms, alternative spellings, and misnomers were clarified. First-degree genetic relationships between autochthonous varieties were partly uncovered. Missing parents might still exist in old vineyards but were not sampled yet or might have disappeared over time. The continuation of prospections to fill that gap is planned. The high number of new bred varieties included in the study reflects the enormous breeding activity in Armenia. The high number of alleles, high level of observed and effective heterozygosity, and presence of female APT3-allele 366, which is absent in western European cultivars, illustrate the huge diversity of the Armenian germplasm. Presumably, these findings are related to recurrent introgression of Vitis sylvestris into the cultivated compartment during domestication events. So far, the present study is the first most representative and comprehensive analysis of Armenian grape germplasm. Abstract Armenia is an important country of origin of cultivated Vitis vinifera subsp. vinifera and wild Vitis vinifera subsp. sylvestris and has played a key role in the long history of grape cultivation in the Southern Caucasus. The existence of immense grapevine biodiversity in a small territory is strongly linked with unique relief and diverse climate conditions assembled with millennium-lasting cultural and historical context. In the present in-depth study using 25 nSSR markers, 492 samples collected in old vineyards, home gardens, and private collections were genotyped. For verification of cultivar identity, the symbiotic approach combining genotypic and phenotypic characterization for each genotype was carried out. The study provided 221 unique varieties, including 5 mutants, from which 66 were widely grown, neglected or minor autochthonous grapevine varieties, 49 turned out to be new bred cultivars created within the national breeding programs mainly during Soviet Era and 34 were non-Armenian varieties with different countries of origin. No references and corresponding genetic profiles existed for 67 genotypes. Parentage analysis was performed inferring 62 trios with 53 out of them having not been previously reported and 185 half-kinships. Instability of grapevine cultivars was detected, showing allelic variants, with three and in rare cases four alleles at one loci. Obtained results have great importance and revealed that Armenia conserved an extensive grape genetic diversity despite geographical isolation and low material exchange. This gene pool richness represents a huge reservoir of under-explored genetic diversity.
Article
Full-text available
Surveillance of the evolving SARS-COV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for 'bioin-formatics surveillance' of the pandemic, with strong odds regarding visualization, intuitive perception and 'personalization' of the mutational patterns of the virus genomes.
Article
Full-text available
Background: Whole-genome studies of vine cultivars have brought novel knowledge about the diversity, geographical relatedness, historical origin and dissemination, phenotype associations and genetic markers. Method: We applied SOM (self-organizing maps) portrayal, a neural network-based machine learning method, to re-analyze the genome-wide Single Nucleotide Polymorphism (SNP) data of nearly eight hundred grapevine cultivars. The method generates genome-specific data landscapes. Their topology reflects the geographical distribution of cultivars, indicates paths of cultivar dissemination in history and genome-phenotype associations about grape utilization. Results: The landscape of vine genomes resembles the geographic map of the Mediterranean world, reflecting two major dissemination paths from South Caucasus along a northern route via Balkan towards Western Europe and along a southern route via Palestine and Maghreb towards Iberian Peninsula. The Mediterranean and Black Sea, as well as the Pyrenees, constitute barriers for genetic exchange. On the coarsest level of stratification, cultivars divide into three major groups: Western Europe and Italian grapes, Iberian grapes and vine cultivars from Near East and Maghreb regions. Genetic landmarks were associated with agronomic traits, referring to their utilization as table and wine grapes. Pseudotime analysis describes the dissemination of grapevines in an East to West direction in different waves of cultivation. Conclusion: In analogy to the tasks of the wine waiter in gastronomy, the sommelier, our ‘SOMmelier’-approach supports understanding the diversity of grapevine genomes in the context of their geographic and historical background, using SOM portrayal. It offers an option to supplement vine cultivar passports by genome fingerprint portraits.
Article
Full-text available
Background The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets. Results The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. Conclusions The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.
Article
Full-text available
Background: Germinal center-derived B-cell lymphomas are tumors of the lymphoid tissues representing one of the most heterogeneous malignancies. Here we characterize the variety of transcriptomic phenotypes of this disease based on 873 biopsy specimens collected in the German Cancer Aid consortium MMML (Molecular Mechanisms in Malignant Lymphoma). They include diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), Burkitt’s lymphoma, mixed FL/DLBCL lymphomas, primary mediastinal large B-cell lymphoma, multiple myeloma, IRF4-rearranged large cell lymphoma, MYC-negative Burkitt-like lymphoma with chr. 11q aberration, and mantle cell lymphoma. Methods: We apply self-organizing maps (SOM) machine learning to microarray-derived expression data to generate a holistic view on the transcriptome landscape of lymphomas, to describe the multidimensional nature of gene regulation and to pursue a modular view on co-expression. Expression data were complemented by pathological, genetic and clinical characteristics. Results: We present a transcriptome map of B-cell lymphomas that allows visual comparison between the SOM-portraits of different lymphoma strata and individual cases. It decomposes into one dozen modules of co-expressed genes related to different functional categories, to genetic defects and to the pathogenesis of lymphomas. On molecular level, this disease rather forms a continuum of expression states than clearly separated phenotypes. We introduced the concept of combinatorial pattern types (PATs) that stratifies the lymphomas into nine PAT-groups and, on a coarser level, into five prominent cancer hallmark types with proliferation, inflammation, and stroma-signatures. Inflammation signatures in combination with healthy B-cell and tonsil characteristics associate with better overall survival rates, while proliferation in combination with inflammation and plasma cell characteristics worsens it. A phenotypic similarity tree is presented that reveals possible progression paths along the transcriptional dimensions. Our analysis provided a novel look on the transition range between FL and DLBCL, on DLBCL with poor prognosis showing expression patterns resembling that of BL, and particularly on ‘double hit’ MYC and BCL2 transformed lymphomas. Conclusions: The transcriptome map provides a tool that aggregates, refines and visualizes the data collected in the MMML-study, interprets them in the light of previous knowledge to provide orientation and support in current and future studies on lymphomas and on other cancer entities.
Article
Crop wild relatives provide a useful source of genetic variation and represent a large pool of genetic diversity for new allelic variation required in breeding programs. Armenia is an important center of origin both for cultivated Vitis vinifera ssp. sativa and wild Vitis vinifera ssp. sylvestris. Owing to recent prospection in Armenian woods and river floodplains many forms of wild grapevine were discovered and inventoried, which is an important prerequisite to unlock their breeding potential in the future. The fact that some genotypes of V. sylvestris can withstand the diseases is likely to be due to a more efficient basal immunity. The overall goal of the proposed research was to characterize the diversity of V. sylvestris from Armenia with respect to its capacity for stilbene biosynthesis, which might be exploited as genetic resource for resistance breeding. The realized research stimulates the recovery, characterization and preservation of wild grape germplasm, presently at risk of extinction. The recovery and characterization of wild genotypes will be the base of selection of genetic traits important in breeding programs for the generation of biotic and changing climate tolerant grapevine varieties and rootstocks, both necessary for the future of viticulture in Armenia and in Europe.