Article

LASAGNA: A novel algorithm for transcription factor binding site alignment

BMC Bioinformatics (Impact Factor: 2.67). 03/2013; 14(1):108. DOI: 10.1186/1471-2105-14-108
Source: PubMed

ABSTRACT Background Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites.Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZARstore unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to bealigned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFsin the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, itis highly desirable to have an alignment algorithm tailored to TFBSs.Results We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence.Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method.Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more preciseat fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP(Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparableperformance with MEME in discovering motifs in ChIP-seq peak sequences.Conclusions We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites.It has been integrated into a user-friendly webtool for TFBS search and visualization calledLASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in theTRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively.The webtool is available at: http://biogrid.engr.uconn.edu/lasagna_search/.

Download full-text

Full-text

Available from: Chun-Hsi Huang, Jul 29, 2015
0 Followers
 · 
141 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The influx of high-throughput data and the need for complex models to describe the interaction of prokaryotic transcription factors (TF) with their target sites pose new challenges for TF-binding site databases. CollecTF (http://collectf.umbc.edu) compiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. CollecTF integrates multiple sources of data automatically and openly, allowing users to dynamically redefine binding motifs and their experimental support base. Data quality and currency are fostered in CollecTF by adopting a sustainable model that encourages direct author submissions in combination with in-house validation and curation of published literature. CollecTF entries are periodically submitted to NCBI for integration into RefSeq complete genome records as link-out features, maximizing the visibility of the data and enriching the annotation of RefSeq files with regulatory information. Seeking to facilitate comparative genomics and machine-learning analyses of regulatory interactions, in its initial release CollecTF provides domain-wide coverage of two TF families (LexA and Fur), as well as extensive representation for a clinically important bacterial family, the Vibrionaceae.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1123 · 9.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: LASAGNA-Search 2.0 is an integrated webtool for transcription factor (TF) binding site search and visualization. The tool is based on the LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association) algorithm. It eliminates manual TF model collection and promoter sequence retrieval. Search results can be visualized locally or in the UCSC Genome Browser. Gene regulatory network inference based on the search results offers another way of visualization. A list of TFs and target genes is all a user needs to start using the tool. LASAGNA-Search 2.0 currently offers 1792 TF models and supports 15 species for automatic promoter retrieval and visualization in the UCSC Genome Browser. It is a user-friendly tool designed for non-bioinformaticians suitable for research and teaching. We describe important changes made since the initial release. LASAGNA-Search 2.0 is freely available without registration at http://biogrid.engr.uconn.edu/lasagna_search/. chihlee@engr.uconn.edu, huang@engr.uconn.edu.
    Bioinformatics 02/2014; 30(13). DOI:10.1093/bioinformatics/btu115 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factor CEBPA has been widely studied for its involvement in hematopoietic cell differentiation and causal role in hematological malignancies. We demonstrate here that it also performs a causal role in cytokines-induced apoptosis of pancreas β cells. Treatment of two mouse pancreatic α and β cell lines (αTC1-6 and βTC1) with proinflammatory cytokines IL-1β, IFN-γ, TNF-α, at doses that specifically induce apoptosis of βTC1, significantly increased amount of mRNA and protein encoded by Cebpa and its proapoptotic targets, Arl6ip5 and Tnfrsf10b in βTC1, but not in αTC1-6. Cebpa knockdown in βTC1 significantly decreased cytokines-induced apoptosis, together with the amount of Arl6ip5 and Tnfrsf10b. Analysis of the network comprising CEBPA, its targets, their first interactants and proteins encoded by genes known to regulate cytokines-induced apoptosis in pancreatic β cells (genes from the Apoptotic Machinery and from MAPK and NFkB pathways) revealed that CEBPA, ARL6IP5, TNFRSF10B, TRAF2, UBC are the top five central nodes. In silico analysis further suggests TRAF2 as trait d'union node between CEBPA and the NFkB pathway. Our results strongly suggest that Cebpa is a key regulator within the apoptotic network activated in pancreatic β cells during insulitis, and Arl6ip5, Tnfrsf10b, Traf2, Ubc are key executioners of this program.
    Molecular Biology of the Cell 06/2014; 25(16). DOI:10.1091/mbc.E14-02-0703 · 4.55 Impact Factor
Show more