The word landscape of the non-coding segments of the Arabidopsis thaliana genome

Bioinformatics Laboratory, School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, USA.
BMC Genomics (Impact Factor: 4.04). 10/2009; 10:463. DOI: 10.1186/1471-2164-10-463
Source: PubMed

ABSTRACT Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression.
Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others.
Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated ‘in silico expression analysis’ was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the ‘in silico expression analysis’ resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the ‘in silico expression analysis’ predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana.Database URL:
    Database The Journal of Biological Databases and Curation 01/2014; 2014:bau030. DOI:10.1093/database/bau030 · 4.46 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Maize endosperm is both a crucial contributor to world nutrition and a model system for plant developmental biology. For both aspects promoters with expression zones restricted to particular domains of the endosperm would be a valuable tool. The β-glucuronidase (Gus) reporter gene was fused to upstream fragments of three genes whose expression is limited to specific domains of the endosperm: Vacuolar pyrophosphatase 1 (Vpp1) encoding an H+ translocating vacuolar pyrophosphatase and expressed in the aleurone layer, Embryo surrounding region 6 (Esr6) encoding a defensin-like protein and expressed specifically in the embryo surrounding region (ESR) and Outer cell layer 4 (OCL4) encoding a transcription factor of the homeo domain-leucine zipper IV (HD-ZIP IV) family and expressed in the aleurone layer. The sequence analysis of the upstream fragments revealed several putative cis elements conserved either with the rice orthologues or with other maize genes sharing the same expression domain. The phenotypic characterisation of transgenic promoter-Gus lines revealed for Vpp1 a delay in the onset of GUS activity, as compared to in situ data, and a dynamic pattern shifting from the adaxial to the abaxial side of the aleurone layer. The two closely related genes Esr6a and Esr6b discovered during the cloning process diverged strongly in their upstream sequences but showed GUS signals very similar to each other and to the in situ signal: a strong, ESR-specific GUS signal at 8 days after pollination (DAP) and 16 DAP, which disappeared at 22 DAP. While Esr6b was likely the stronger promoter, a point-like signal outside the ESR on the abgerminal side at the junction of the basal endosperm transfer layer (BETL) and the aleurone layer was only found in prEsr6a-Gus plants. The GUS signal of prOCL4-Gus plants only partially reflected the in situ data; whereas OCL4 in situ signal was seen in the epidermis of male and female flowers, embryos and endosperm, prOCL4-GUS signal was only observed in female flowers and endosperm where it confirmed the high specificity for the outer cell layer and exhibited a dynamic pattern similar to Vpp1. While the upstream fragment used likely contained only some but not all the cis elements governing OCL4 expression, the fortuitously obtained truncated promoter may be of greater biotechnological interest than the complete promoter because of its more restricted expression zone that still includes the highly interesting aleurone layer.
    Plant Science 07/2010; 179(1):86-96. DOI:10.1016/j.plantsci.2010.04.006 · 4.11 Impact Factor
  • Source
    Dataset: Paper 6

Full-text (2 Sources)

Available from
May 28, 2014

Similar Publications