Article

A reexamination of information theory-based methods for DNA-binding site identification.

Department of Biological Sciences, University of Maryland-Baltimore County, Baltimore, MD, USA.
BMC Bioinformatics (Impact Factor: 2.67). 03/2009; 10:57. DOI: 10.1186/1471-2105-10-57
Source: PubMed

ABSTRACT Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods.
Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as Relative Entropy, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results.
We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.

1 Bookmark
 · 
172 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenetic marks such as cytosine methylation are important determinants of cellular and whole-body phenotypes. However, the extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. Here we present the first genome-wide study of cytosine methylation at single-nucleotide resolution in an animal model of human disease. We used whole-genome bisulfite sequencing in the spontaneously hypertensive rat (SHR), a model of cardiovascular disease, and the Brown Norway (BN) control strain, to define the genetic architecture of cytosine methylation in the mammalian heart and to test for association between methylation and pathophysiological phenotypes. Analysis of 10.6 million CpG dinucleotides identified 77,088 CpGs that were differentially methylated between the strains. In F1 hybrids we found 38,152 CpGs showing allele-specific methylation and 145 regions with parent-of-origin effects on methylation. Cis-linkage explained almost 60% of inter-strain variation in methylation at a subset of loci tested for linkage in a panel of recombinant inbred (RI) strains. Methylation analysis in isolated cardiomyocytes showed that in the majority of cases methylation differences in cardiomyocytes and non-cardiomyocytes were strain-dependent, confirming a strong genetic component for cytosine methylation. We observed preferential nucleotide usage associated with increased and decreased methylation that is remarkably conserved across species, suggesting a common mechanism for germline control of inter-individual variation in CpG methylation. In the RI strain panel, we found significant correlation of CpG methylation and levels of serum chromogranin B (CgB), a proposed biomarker of heart failure, which is evidence for a link between germline DNA sequence variation, CpG methylation differences and pathophysiological phenotypes in the SHR strain. Together, these results will stimulate further investigation of the molecular basis of locally regulated variation in CpG methylation and provide a starting point for understanding the relationship between the genetic control of CpG methylation and disease phenotypes.
    PLoS Genetics 12/2014; 10(12):e1004813. DOI:10.1371/journal.pgen.1004813 · 8.17 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The SOS response including two main proteins LexA and RecA, maintains the integrity of bacterial genomes after DNA damage due to metabolic or environmental assaults. Additionally, derepression of LexA-regulated genes can result in mutations, genetic exchange and expression of virulence factors. Here we describe the first comprehensive description of the in silico LexA regulon in Clostridium difficile, an important human pathogen. We grouped thirty C. difficile strains from different ribotypes and toxinotypes into three clusters according to lexA gene/protein variability. We applied in silico analysis coupled to surface plasmon resonance spectroscopy (SPR) and determined 16 LexA binding sites in C. difficile. Our data indicate that strains within the cluster, as defined by LexA variability, harbour several specific LexA regulon genes. In addition to core SOS genes: lexA, recA, ruvCA and uvrBA, we identified a LexA binding site on the pathogenicity locus (PaLoc) and in the putative promoter region of several genes involved in housekeeping, sporulation and antibiotic resistance. Results presented here suggest that in C. difficile LexA is not merely a regulator of the DNA damage response genes but also controls the expression of dozen genes involved in various other biological functions. Our in vitro results indicate that in C. difficile inactivation of LexA repressor depends on repressor`s dissociation from the operators. We report that the repressor`s dissociation rates from operators differentiate, thus the determined LexA-DNA dissociation constants imply on the timing of SOS gene expression in C. difficile.
    BMC Microbiology 04/2014; 14(1):88. DOI:10.1186/1471-2180-14-88 · 2.98 Impact Factor
  • Source

Full-text (2 Sources)

Download
42 Downloads
Available from
Jun 5, 2014