Chapter

Genome Database (www.artichokegenome.unito.it)

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The first high-quality genome assembly of the globe artichoke has been produced within the Compositae Genome Project and the resequencing analyses of four globe artichoke genotypes, representative of the core varietal types, as well as a genotype of the related taxa cultivated cardoon was, later on, carried out. The Web site “www.artichokegenome.unito.it” hosts all the available genomic sequences, together with their structural/functional annotations and project information are presented to users via the open-source tool JBrowse, allowing the analysis of collinearity and the discovery of genomic variants, thus representing a one-stop resource for Cynara cardunculus genomics. Pseudomolecules as well as unmapped scaffolds were used for the bulk mining of SSR markers and for the construction of the first globe artichoke microsatellite marker database. A database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic versus genic context, perfect versus imperfect repeat, motif type, motif sequence, and repeat number.KeywordsArtichokeCardoonHybridsPropagationPollination Cynara cardunculus

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates.
Article
Full-text available
Background JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page. Results Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication. Conclusions JBrowse is a mature web application suitable for genome visualization and analysis.
Article
Full-text available
We examined the abundance of microsatellites with repeated unit lengths of 1–6 base pairs in several eukaryotic taxonomic groups: primates, rodents, other mammals, nonmammalian vertebrates, arthropods, Caenorhabditis elegans, plants, yeast, and other fungi. Distribution of simple sequence repeats was compared between exons, introns, and intergenic regions. Tri-and hexanucleotide repeats prevail in protein-coding exons of all taxa, whereas the dependence of repeat abundance on the length of the repeated unit shows a very different pattern as well as taxon-specific variation in intergenic regions and introns. Although it is known that coding and noncoding regions differ significantly in their microsatellite distribution, in addition we could demonstrate characteristic differences between intergenic regions and introns. We observed striking relative abundance of (CCG) n • (CGG) n trinucleotide repeats in intergenic regions of all vertebrates, in contrast to the almost complete lack of this motif from introns. Taxon-specific variation could also be detected in the frequency distributions of simple sequence motifs. Our results suggest that strand-slippage theories alone are insufficient to explain microsatellite distribution in the genome as a whole. Other possible factors contributing to the observed divergence are discussed.
Article
Full-text available
Transcription factors (TFs) are critical adaptor molecules that regulate many plant processes by controlling gene expression. The recent increase in the availability of TF data has made TFs a valuable resource for genic functional microsatellite marker development. In the present study, we developed TF gene-derived microsatellite (TFGM) markers for Medicago truncatula and assessed their cross-species transferability. A total of 203 SSRs were identified from 1467 M. truncatula TF coding sequences, 87.68% of which were trinucleotide repeats, followed by mono- (4.93%) and hexanucleotide repeats (1.48%). Further, 142 TFGM markers showed a high level of transferability to the leguminous (55.63%-85.21%) and non-leguminous (28.17%-50.00%) species. Polymorphisms of 27 TFGM markers were evaluated in 44 alfalfa accessions. The allele number per marker ranged from two to eight with an average of 4.41, and the PIC values ranged from 0.08 to 0.84 with an average of 0.60. Considering the high polymorphism, these TFGM markers developed in our study will be valuable for genetic relationship assessments, marker-assisted selection and comparative genomic studies in leguminous and non-leguminous species.
Article
Full-text available
We developed 1108 transcription factor gene-derived microsatellite (TFGMS) and 161 transcription factor functional domain-associated microsatellite (TFFDMS) markers from 707 TFs of chickpea. The robust amplification efficiency (96.5%) and high intra-specific polymorphic potential (34%) detected by markers suggest their immense utilities in efficient large-scale genotyping applications, including construction of both physical and functional transcript maps and understanding population structure. Candidate gene-based association analysis revealed strong genetic association of TFFDMS markers with three major seed and pod traits. Further, TFGMS markers in the 5' untranslated regions of TF genes showing differential expression during seed development had higher trait association potential. The significance of TFFDMS markers was demonstrated by correlating their allelic variation with amino acid sequence expansion/contraction in the functional domain and alteration of secondary protein structure encoded by genes. The seed weight-associated markers were validated through traditional bi-parental genetic mapping. The determination of gene-specific linkage disequilibrium (LD) patterns in desi and kabuli based on single nucleotide polymorphism-microsatellite marker haplotypes revealed extended LD decay, enhanced LD resolution and trait association potential of genes. The evolutionary history of a strong seed-size/weight-associated TF based on natural variation and haplotype sharing among desi, kabuli and wild unravelled useful information having implication for seed-size trait evolution during chickpea domestication.
Article
Full-text available
Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules.
Article
Full-text available
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private web sites. It supports most of genome browser features, including qualitative and quantitative (wiggle) tracks, track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning. As of version 2.0, GBrowse supports next-generation sequencing (NGS) data by providing for the direct display of SAM and BAM sequence alignment files. SAM/BAM tracks provide semantic zooming and support both local and remote data sources. This article provides step-by-step instructions for configuring GBrowse to display NGS data.
Article
Full-text available
A simple sequence repeat–functional domain marker (SSR-FDM) relies on development of molecular markers for putative functional domains using simple sequence repeats and in silico annotated information of those sequences using biological databases. A total of 148,921 tomato ESTs and 115,598 pepper ESTs were analyzed, resulting in the identification of 439 tomato SSR-FDMs and 489 pepper SSR-FDMs. Among them, 54 pepper SSR-FDMs were tested on pepper. Several genomic databases were used for the in silico annotation of the SSR-FDM sequences that revealed a wide range of candidate genes. This study demonstrates that SSR-FDMs provide information regarding transcribed genetic markers and putative function as a genomic resource database for Solanaceae. This system could be applied to the development of a functional marker database for any crop species. KeywordsBioinformatic tool-Sequence analysis-Gene annotation-Biological database-Functional marker development
Article
Full-text available
Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber. A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs) of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although comparable to the density of poplar, grapevine and rice), and was richest in AT dinucleotides. Using an electronic PCR strategy, we investigated the polymorphism between 9930 and Gy14 at 1,006 SSR loci, and found unexpectedly high degree of polymorphism (48.3%) between the two genotypes. The level of polymorphism seems to be positively associated with the number of repeat units in the microsatellite. The in silico PCR results were validated empirically in 660 of the 1,006 SSR loci. In addition, primer sequences for more than 83,000 newly-discovered cucumber microsatellites, and their exact positions in the Gy14 genome assembly were made publicly available. The cucumber genome is rich in microsatellites; AT and AAG are the most abundant repeat motifs in genomic and EST sequences of cucumber, respectively. Considering all the species investigated, some commonalities were noted, especially within the monocot and dicot groups, although the distribution of motifs and the frequency of certain repeats were characteristic of the species examined. The large number of SSR markers developed from this study should be a significant contribution to the cucurbit research community.
Article
Full-text available
Expressed sequence tag (EST) projects have generated a vast amount of publicly available sequence data from plant species; these data can be mined for simple sequence repeats (SSRs). These SSRs are useful as molecular markers because their development is inexpensive, they represent transcribed genes and a putative function can often be deduced by a homology search. Because they are derived from transcripts, they are useful for assaying the functional diversity in natural populations or germplasm collections. These markers are valuable because of their higher level of transferability to related species, and they can often be used as anchor markers for comparative mapping and evolutionary studies. They have been developed and mapped in several crop species and could prove useful for marker-assisted selection, especially when the markers reside in the genes responsible for a phenotypic trait. Applications and potential uses of EST-SSRs in plant genetics and breeding are discussed.
Article
Full-text available
Microsatellites are tandemly repeated short DNA sequences that are favored as molecular-genetic markers due to their high polymorphism index. Plant genomes characterized to date exhibit taxon-specific differences in frequency, genomic location, and motif structure of microsatellites, indicating that extant microsatellites originated recently and turn over quickly. With the goal of using microsatellite markers to integrate the physical and genetic maps of Medicago truncatula, we surveyed the frequency and distribution of perfect microsatellites in 77 Mbp of gene-rich BAC sequences, 27 Mbp of nonredundant transcript sequences, 20 Mbp of random whole genome shotgun sequences, and 49 Mbp of BAC-end sequences. Microsatellites are predominantly located in gene-rich regions of the genome, with a density of one long (i.e., > or = 20 nt) microsatellite every 12 kbp, while the frequency of individual motifs varied according to the genome fraction under analysis. A total of 1,236 microsatellites were analyzed for polymorphism between parents of our reference intraspecific mapping population, revealing that motifs (AT)n, (AG)n, (AC)n, and (AAT)n exhibit the highest allelic diversity. A total of 378 genetic markers could be integrated with sequenced BAC clones, anchoring 274 physical contigs that represent 174 Mbp of the genome and composing an estimated 70% of the euchromatic gene space.
Article
Full-text available
Microsatellites are extremely common in plant genomes, and in particular, they are significantly enriched in the 5' noncoding regions. Although some 5' noncoding microsatellites involved in gene regulation have been described, the general properties of microsatellites as regulatory elements are still unknown. To address the question of microsatellites associated with regulatory elements, we have analyzed the conserved noncoding microsatellite sequences (CNMSs) in the 5' noncoding regions by inter- and intragenomic phylogenetic footprinting in the Arabidopsis and Brassica genomes. We identified 247 Arabidopsis-Brassica orthologous and 122 Arabidopsis paralogous CNMSs, representing 491 CT/GA and CTT/GAA repeats, which accounted for 10.6% of these types located in the 500-bp regions upstream of coding sequences in the Arabidopsis genome. Among these identified CNMSs, 18 microsatellites show high conservation in the regulatory regions of both orthologous and paralogous genes, and some of them also appear in the corresponding positions of more distant homologs in Arabidopsis, as well as in other plants. A computational scan of CNMSs for known cis-regulatory elements showed that light responsive elements were clustered in the region of CT/GA repeats, as well as salicylic acid responsive elements in the (CTT)n/(GAA)n sequences. Patterns of gene expression revealed that 70-80% of CNMS (CTT)n/(GAA)n associated genes were regulated by salicylic acid, which was consistent with the prediction of regulatory elements in silico. Our analyses showed that some noncoding microsatellites were conserved in plants and appeared to be ancient. These CNMSs served as regulatory elements involved in light and salicylic acid responses. Our findings might have implications in the common features of the over-represented microsatellites for gene regulation in plant-specific pathways.
Article
Microsatellites are a ubiquitous class of simple repetitive DNA sequence. An excess of such repetitive tracts has been described in all eukaryotes analyzed and is thought to result from the mutational effects of replication slippage. Large-scale genomic and EST sequencing provides the opportunity to evaluate the abundance and relative distribution of microsatellites between transcribed and nontranscribed regions and the relationship of these features to haploid genome size. Although this has been studied in microbial and animal genomes, information in plants is limited. We assessed microsatellite frequency in plant species with a 50-fold range in genome size that is mostly attributable to the recent amplification of repetitive DNA. Among species, the overall frequency of microsatellites was inversely related to genome size and to the proportion of repetitive DNA but remained constant in the transcribed portion of the genome. This indicates that most microsatellites reside in regions pre-dating the recent genome expansion in many plants. The microsatellite frequency was higher in transcribed regions, especially in the untranslated portions, than in genomic DNA. Contrary to previous reports suggesting a preferential mechanism for the origin of microsatellites from repetitive DNA in both animals and plants, our findings show a significant association with the low-copy fraction of plant genomes.
Article
Microsatellites are tandemly repeated simple sequence DNA motifs widely prevalent in eukaryotic and prokaryotic genomes. In pathogenic bacteria, instability of these hypermutable loci through slipped-strand mispairing mediates the high-frequency reversible switching of phenotype expression, i.e., phase variation. Phase-variable expression of NadA, an outer membrane protein and adhesin of the pathogen Neisseria meningitidis, is mediated by changes in the number of TAAA repeats located upstream of the core promoter of nadA. Here we report that loss or gain of TAAA repeats affects the binding of the transcriptional regulatory protein IHF to the nadA promoter. Thus, phase-variable transcription of nadA potentially incorporates interplay between stochastic (mutational) and prescriptive (classical) mechanisms of gene regulation. • phase variation