Article

XRate: a fast prototyping, training and annotation tool for phylo-grammars.

Department of Bioengineering, University of California, Berkeley CA, USA.
BMC Bioinformatics (impact factor: 2.75). 02/2006; 7:428. DOI:10.1186/1471-2105-7-428 pp.428
Source: PubMed

ABSTRACT Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists.
We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.
Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

0 0
 · 
0 Bookmarks
 · 
68 Views
  • Source
    Article: HOMSTRAD: A database of protein structure alignments for homologous families
    [show abstract] [hide abstract]
    ABSTRACT: We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst. bioc. cam. ac. uk/homstrad/, with search facilities and links to other databases.
    Protein Science 10/1998; 7(11):2469 - 2471. · 2.80 Impact Factor
  • Source
    Article: MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model.
    [show abstract] [hide abstract]
    ABSTRACT: We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding-site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.
    Genome biology 02/2004; 5(12):R98. · 6.63 Impact Factor
  • Source
    Article: Modeling amino acid replacement.
    [show abstract] [hide abstract]
    ABSTRACT: The estimation of amino acid replacement frequencies during molecular evolution is crucial for many applications in sequence analysis. Score matrices for database search programs or phylogenetic analysis rely on such models of protein evolution. Pioneering work was done by Dayhoff et al. (1978) who formulated a Markov model of evolution and derived the famous PAM score matrices. Her estimation procedure for amino acid exchange frequencies is restricted to pairs of proteins that have a constant and small degree of divergence. Here we present an improved estimator, called the resolvent method, that is not subject to these limitations. This extension of Dayhoff's approach enables us to estimate an amino acid substitution model from alignments of varying degree of divergence. Extensive simulations show the capability of the new estimator to recover accurately the exchange frequencies among amino acids. Based on the SYSTERS database of aligned protein families (Krause and Vingron, 1998) we recompute a series of score matrices.
    Journal of Computational Biology 02/2000; 7(6):761-76. · 1.55 Impact Factor

Full-text

View
0 Downloads
Available from

Keywords

annotate multiple sequence alignments
 
considerable effort
 
continuous-time Markov chains
 
design new grammars
 
estimate rate parameters
 
external configuration
 
genome annotation methods
 
irreversible
 
measure codon substitution rates
 
open source software tool
 
phylo-grammar
 
phylo-grammars
 
Recent years
 
RNA secondary structures
 
specialized tools
 
xrate
 
xrate estimates biologically meaningful rates
 

Peter S Klosterman