LASAGNA: A novel algorithm for transcription factor binding site alignment

BMC Bioinformatics (Impact Factor: 2.67). 03/2013; 14(1):108. DOI: 10.1186/1471-2105-14-108
Source: PubMed

ABSTRACT Background Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites.Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZARstore unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to bealigned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFsin the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, itis highly desirable to have an alignment algorithm tailored to TFBSs.Results We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence.Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method.Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more preciseat fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP(Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparableperformance with MEME in discovering motifs in ChIP-seq peak sequences.Conclusions We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites.It has been integrated into a user-friendly webtool for TFBS search and visualization calledLASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in theTRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively.The webtool is available at:


Available from: Chun-Hsi Huang, Apr 25, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Stress tolerance in plants is a coordinated action of multiple stress response genes that also cross talk with other components of the stress signal transduction pathways. The expression and regulation of stress-induced genes are largely regulated by specific transcription factors, families of which have been reported in several plant species, such as Arabidopsis, rice and Populus. In sorghum, the majority of such factors remain unexplored. We used 2DE refined with MALDI-TOF techniques to analyze drought stress-induced proteins in sorghum. A total of 176 transcription factors from the MYB, AUX_ARF, bZIP, AP2 and WRKY families of drought-induced proteins were identified. We developed a method based on semantic similarity of gene ontology terms (GO terms) to identify the transcription factors. A threshold value (≥ 90%) was applied to retrieve total 1,493 transcription factors with high semantic similarity from selected plant species. It could be concluded that the identified transcription factors regulate their target proteins with endogenous signals and environmental cues, such as light, temperature and drought stress. The regulatory network and cis-acting elements of the identified transcription factors in distinct families are involved in responsiveness to auxin, abscisic acid, defense, stress and light. These responses may be highly important in the modulation of plant growth and development.
    Cellular & Molecular Biology Letters 12/2014; DOI:10.2478/s11658-014-0223-3 · 1.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcriptional activation throughout the eukaryotic lineage has been tightly linked with disruption of nucleosome organization at promoters, enhancers, silencers, insulators and locus control regions due to transcription factor binding. Regulatory DNA thus coincides with open or accessible genomic sites of remodeled chromatin. Current chromatin accessibility assays are used to separate the genome by enzymatic or chemical means and isolate either the accessible or protected locations. The isolated DNA is then quantified using a next-generation sequencing platform. Wide application of these assays has recently focused on the identification of the instrumental epigenetic changes responsible for differential gene expression, cell proliferation, functional diversification and disease development. Here we discuss the limitations and advantages of current genome-wide chromatin accessibility assays with especial attention on experimental precautions and sequence data analysis. We conclude with our perspective on future improvements necessary for moving the field of chromatin profiling forward.
    Epigenetics & Chromatin 11/2014; 7(1):33. DOI:10.1186/1756-8935-7-33 · 4.46 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background In previous studies on an Iberian x Landrace cross, we have provided evidence that supported the porcine ELOVL6 gene as the major causative gene of the QTL on pig chromosome 8 for palmitic and palmitoleic acid contents in muscle and backfat. The single nucleotide polymorphism (SNP) ELOVL6:c.-533C > T located in the promoter region of ELOVL6 was found to be highly associated with ELOVL6 expression and, accordingly, with the percentages of palmitic and palmitoleic acids in longissimus dorsi and adipose tissue. The main goal of the current work was to further study the role of ELOVL6 on these traits by analyzing the regulation of the expression of ELOVL6 and the implication of ELOVL6 polymorphisms on meat quality traits in pigs. Results High-throughput sequencing of BAC clones that contain the porcine ELOVL6 gene coupled to RNAseq data re-analysis showed that two isoforms of this gene are expressed in liver and adipose tissue and that they differ in number of exons and 3’UTR length. Although several SNPs in the 3’UTR of ELOVL6 were associated with palmitic and palmitoleic acid contents, this association was lower than that previously observed with SNP ELOVL6:c.-533C > T. This SNP is in full linkage disequilibrium with SNP ELOVL6:c.-394G > A that was identified in the binding site for estrogen receptor alpha (ERα). Interestingly, the ELOVL6:c.-394G allele is associated with an increase in methylation levels of the ELOVL6 promoter and with a decrease of ELOVL6 expression. Therefore, ERα is clearly a good candidate to explain the regulation of ELOVL6 expression through dynamic epigenetic changes in the binding site of known regulators of ELOVL6 gene, such as SREBF1 and SP1. Conclusions Our results strongly suggest the ELOVL6:c.-394G > A polymorphism as the causal mutation for the QTL on pig chromosome 8 that affects fatty acid composition in pigs.
    Genetics Selection Evolution 03/2015; 47(20). DOI:10.1186/s12711-015-0111-y · 3.75 Impact Factor