LASAGNA: A novel algorithm for transcription factor binding site alignment

BMC Bioinformatics (Impact Factor: 2.58). 03/2013; 14(1):108. DOI: 10.1186/1471-2105-14-108
Source: PubMed


Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs.

We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences.

We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at

Download full-text


Available from: Chun-Hsi Huang
  • Source
    • "The LASAGNA-ChIP algorithm (Lee and Huang, 2013a) was used to align the binding sites of a TF since some of the projects contain TFBSs identified by ChIP-seq and ChIP-chip experiments. As reported in (Johnson et al., 2007), about 94% of the actual binding sites can be located within 50 bases of signal peaks. "
    [Show abstract] [Hide abstract]
    ABSTRACT: LASAGNA-Search 2.0 is an integrated webtool for transcription factor (TF) binding site search and visualization. The tool is based on the LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association) algorithm. It eliminates manual TF model collection and promoter sequence retrieval. Search results can be visualized locally or in the UCSC Genome Browser. Gene regulatory network inference based on the search results offers another way of visualization. A list of TFs and target genes is all a user needs to start using the tool. LASAGNA-Search 2.0 currently offers 1792 TF models and supports 15 species for automatic promoter retrieval and visualization in the UCSC Genome Browser. It is a user-friendly tool designed for non-bioinformaticians suitable for research and teaching. We describe important changes made since the initial release. LASAGNA-Search 2.0 is freely available without registration at,
    Full-text · Article · Feb 2014 · Bioinformatics
  • Source
    • "Users can elicit reports at any time during browsing or searching, and have the option of condensing the report or reporting by individual species/TFs. Instead of relying on pre-computed motif representations, motif associated sites are realigned dynamically with LASAGNA and displayed with WebLogo (19,20), providing a fluid representation of TF-binding motifs that incorporates all the available sources of evidence selected by the user (Figure 2). All report pages offer a detailed view of TF-binding sites in their genomic context with out-links to site description and gene NCBI accessions, as well as export options to FASTA, flat-file CSV and ARFF sequence formats and multiple position-specific matrix formats (Supplementary Figure S2). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The influx of high-throughput data and the need for complex models to describe the interaction of prokaryotic transcription factors (TF) with their target sites pose new challenges for TF-binding site databases. CollecTF ( compiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. CollecTF integrates multiple sources of data automatically and openly, allowing users to dynamically redefine binding motifs and their experimental support base. Data quality and currency are fostered in CollecTF by adopting a sustainable model that encourages direct author submissions in combination with in-house validation and curation of published literature. CollecTF entries are periodically submitted to NCBI for integration into RefSeq complete genome records as link-out features, maximizing the visibility of the data and enriching the annotation of RefSeq files with regulatory information. Seeking to facilitate comparative genomics and machine-learning analyses of regulatory interactions, in its initial release CollecTF provides domain-wide coverage of two TF families (LexA and Fur), as well as extensive representation for a clinically important bacterial family, the Vibrionaceae.
    Full-text · Article · Nov 2013 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factor CEBPA has been widely studied for its involvement in hematopoietic cell differentiation and causal role in hematological malignancies. We demonstrate here that it also performs a causal role in cytokines-induced apoptosis of pancreas β cells. Treatment of two mouse pancreatic α and β cell lines (αTC1-6 and βTC1) with proinflammatory cytokines IL-1β, IFN-γ, TNF-α, at doses that specifically induce apoptosis of βTC1, significantly increased amount of mRNA and protein encoded by Cebpa and its proapoptotic targets, Arl6ip5 and Tnfrsf10b in βTC1, but not in αTC1-6. Cebpa knockdown in βTC1 significantly decreased cytokines-induced apoptosis, together with the amount of Arl6ip5 and Tnfrsf10b. Analysis of the network comprising CEBPA, its targets, their first interactants and proteins encoded by genes known to regulate cytokines-induced apoptosis in pancreatic β cells (genes from the Apoptotic Machinery and from MAPK and NFkB pathways) revealed that CEBPA, ARL6IP5, TNFRSF10B, TRAF2, UBC are the top five central nodes. In silico analysis further suggests TRAF2 as trait d'union node between CEBPA and the NFkB pathway. Our results strongly suggest that Cebpa is a key regulator within the apoptotic network activated in pancreatic β cells during insulitis, and Arl6ip5, Tnfrsf10b, Traf2, Ubc are key executioners of this program.
    Full-text · Article · Jun 2014 · Molecular Biology of the Cell
Show more