Combining evidence using p-values: application to sequence homology searches

San Diego Supercomputer Center, CA 92186-9784, USA.
Bioinformatics (Impact Factor: 4.62). 02/1998; 14(1):48-54. DOI: 10.1093/bioinformatics/14.1.48
Source: PubMed

ABSTRACT MOTIVATION: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. RESULTS: In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tools of molecular biology and the evolving tools of genomics can now be exploited to study the genetic regulatory mechanisms that control cellular responses to a wide variety of stimuli. These responses are highly complex, and involve many genes and gene products. The main objectives of this paper are to describe a novel research program centered on understanding these responses bydeveloping powerful graph algorithms that exploit the innovative principles of fixed parameter tractability in order to generate distilled gene sets;producing scalable, high performance parallel and distributed implementations of these algorithms utilizing cutting-edge computing platforms and auxiliary resources;employing these implementations to identify gene sets suggestive of co-regulation; andperforming sequence analysis and genomic data mining to examine, winnow and highlight the most promising gene sets for more detailed investigation.As a case study, we describe our work aimed at elucidating genetic regulatory mechanisms that control cellular responses to low-dose ionizing radiation (IR). A low-dose exposure, as defined here, is an exposure of at most 10 cGy (rads). While the consequences of high doses of radiation are well known, the net outcome of low-dose exposures continues to be debated, with support in the literature for both detrimental and beneficial effects. We use genome-scale gene expression data collected in response to low-dose IR exposure in vivo to identify the pathways that are activated or repressed as a tissue responds to the radiation insult. The driving motivation is that knowledge of these pathways will help clarify and interpret physiological responses to IR, which will advance our understanding of the health consequences of low-dose radiation exposures.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Riboswitches present a ubiquitous genetic regulatory mechanism for prokaryotes and have been found in HIV1, fungi, plants, and even H. sapiens. We present an overview of approaches to predict riboswitch aptamers and, more generally, RNA conformational switches. © 2015 Elsevier Inc. All rights reserved.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Polycystic ovary syndrome (PCOS) is one of the most common female endocrine disorders and a leading cause of female subfertility. The mechanism underlying the pathophysiology of PCOS remains to be illustrated. Here, we identify two alternative splice variants (ASVs) of the androgen receptor (AR), insertion and deletion isoforms, in granulosa cells (GCs) in ∼62% of patients with PCOS. AR ASVs are strongly associated with remarkable hyperandrogenism and abnormalities in folliculogenesis, and are absent from all control subjects without PCOS. Alternative splicing dramatically alters genome-wide AR recruitment and androgen-induced expression of genes related to androgen metabolism and folliculogenesis in human GCs. These findings establish alternative splicing of AR in GCs as the major pathogenic mechanism for hyperandrogenism and abnormal folliculogenesis in PCOS.
    Proceedings of the National Academy of Sciences 03/2015; 112(15). DOI:10.1073/pnas.1418216112 · 9.81 Impact Factor

Full-text (2 Sources)

Available from
Jun 3, 2014