Combining evidence using P‐values: application to sequence homology searches

San Diego Supercomputer Center, CA 92186-9784, USA.
Bioinformatics (Impact Factor: 4.98). 02/1998; 14(1):48-54. DOI: 10.1093/bioinformatics/14.1.48
Source: PubMed


MOTIVATION: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields
a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns
in sequence homology searches. RESULTS: In sequence analysis, two or more (approximately) independent measures of the membership
of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence
being a member of the class in view of all the available evidence. An example is estimating the significance of the observed
match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence
family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values
as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution
of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information
present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

Download full-text


Available from: Michael Gribskov,
  • Source
    • "The parameters for the analysis were as follows: number of repetitions, 0 or 1; maximum number of motifs, 14; and optimum motif width, 6–100. The MAST program (Bailey and Gribskov, 1998) was used to search for each of the motifs in the AOP sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The glucosinolate biosynthetic gene AOP2 encodes an enzyme that plays a crucial role in catalysing the conversion of beneficial glucosinolates into anti-nutritional ones. In Brassica rapa, three copies of BrAOP2 have been identified, but their function in establishing the glucosinolate content of B. rapa is poorly understood. Here, we used phylogenetic and gene structure analyses to show that BrAOP2 proteins have evolved via a duplication process retaining two highly conserved domains at the N-terminal and C-terminal regions, while the middle part has experienced structural divergence. Heterologous expression and in vitro enzyme assays and Arabidopsis mutant complementation studies showed that all three BrAOP2 genes encode functional BrAOP2 proteins that convert the precursor methylsulfinyl alkyl glucosinolate to the alkenyl form. Site-directed mutagenesis showed that His356, Asp310, and Arg376 residues are required for the catalytic activity of one of the BrAOP2 proteins (BrAOP2.1). Promoter-β-glucuronidase lines revealed that the BrAOP2.3 gene displayed an overlapping but distinct tissue- and cell-specific expression profile compared with that of the BrAOP2.1 and BrAOP2.2 genes. Quantitative real-time reverse transcription-PCR assays demonstrated that BrAOP2.1 showed a slightly different pattern of expression in below-ground tissue at the seedling stage and in the silique at the reproductive stage compared with BrAOP2.2 and BrAOP2.3 genes in B. rapa. Taken together, our results revealed that all three BrAOP2 paralogues are active in B. rapa but have functionally diverged. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.
    Journal of Experimental Botany 07/2015; DOI:10.1093/jxb/erv331 · 5.53 Impact Factor
  • Source
    • "w as used for the search of consensus promoter sequences obtained from previously published mycobacterial promoters [24]. The MEME algorithm [25] was used to discover conserved palindromic DNA motifs among mycobacterial species and MAST [26] was used to identify the presence of these motifs in the genome of M. smegmatis mc 2 155. 2.3. "
    [Show abstract] [Hide abstract]
    ABSTRACT: MSMEG_0307 is annotated as a transcriptional regulator belonging to the AraC protein family and is located adjacent to the arylamine N-acetyltransferase (nat) gene in Mycobacterium smegmatis, in a gene cluster, conserved in most environmental mycobacterial species. In order to elucidate the function of the AraC protein from the nat operon in M. smegmatis, two conserved palindromic DNA motifs were identified using bioinformatics and tested for protein binding using electrophoretic mobility shift assays with a recombinant form of the AraC protein. We identified the formation of a DNA:AraC protein complex with one of the motifs as well as the presence of this motif in 20 loci across the whole genome of M. smegmatis, supporting the existence of an AraC controlled regulon. To characterise the effects of AraC in the regulation of the nat operon genes, as well as to gain further insight into its function, we generated a ΔaraC mutant strain where the araC gene was replaced by a hygromycin resistance marker. The level of expression of the nat and MSMEG_0308 genes was down-regulated in the ΔaraC strain when compared to the wild type strain indicating an activator effect of the AraC protein on the expression of the nat operon genes.
    Tuberculosis 11/2014; 94(6). DOI:10.1016/ · 2.71 Impact Factor
  • Source
    • "Comparison to motifs known to be recognised by RNA binding proteins [57] within MEME-ChIP failed to match motifs 0, 2 and 3 but motif 1 was found to be similar (p = 0.00017) to the motif identified for TRA2 [57], which regulates pre-mRNA splicing. MAST [58] shows that the motifs are often present at multiple copies within the bound RNAs and are distributed across the RNA molecules, rather than enriched towards the 5’ or 3’ ends (Additional file 3). Motif 0 is a long sequence with a striking periodic enrichment for C and G nucleotides. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Human Immunodeficiency Virus 1 (HIV-1) exhibits a wide range of interactions with the host cell but whether viral proteins interact with cellular RNA is not clear. A candidate interacting factor is the trans-activator of transcription (Tat) protein. Tat is required for expression of virus genes but activates transcription through an unusual mechanism; binding to an RNA stem-loop, the transactivation response element (TAR), with the host elongation factor P-TEFb. HIV-1 Tat has also been shown to alter the expression of host genes during infection, contributing to viral pathogenesis but, whether Tat also interacts with cellular RNAs is unknown. Results Using RNA immunoprecipitation coupled with microarray analysis, we have discovered that HIV-1 Tat is associated with a specific set of human mRNAs in T cells. mRNAs bound by Tat share a stem-loop structural element and encode proteins with common biological roles. In contrast, we do not find evidence that Tat associates with microRNAs or the RNA-induced silencing complex (RISC). The interaction of Tat with cellular RNA requires an intact RNA binding domain and Tat RNA binding is linked to an increase in RNA abundance in cell lines and during infection of primary CD4+ T cells by HIV. Conclusions We conclude that Tat interacts with a specific set of human mRNAs in T cells, many of which show changes in abundance in response to Tat and HIV infection. This work uncovers a previously unrecognised interaction between HIV and its host that may contribute to viral alteration of the host cellular environment.
    Retrovirology 07/2014; 11(1):53. DOI:10.1186/1742-4690-11-53 · 4.19 Impact Factor
Show more