[Show abstract][Hide abstract] ABSTRACT: We have implemented a method that identifies the genomic origins of sample proteins by scanning their peptide-mass fingerprint against the theoretical translation and proteolytic digest of an entire genome. Unlike previously reported techniques, this method requires no predefined ORF or protein annotations. Fixed-size windows along the genome sequence are scored by an equation accounting for the number of matching peptides, the number of missed enzymatic cleavages in each peptide, the number of in-frame stop codons within a window, the adjacency between peptides, and duplicate peptide matches. Statistical significance of matching regions is assessed by comparing their scores to scores from windows matching randomly generated mass data. Tests with samples from Saccharomyces cerevisiae mitochondria and Escherichia coli have demonstrated the ability to produce statistically significant identifications, agreeing with two commonly used programs, peptident and mascot, in 86% of samples analyzed. This genome fingerprint scanning method has the potential to aid in genome annotation, identify proteins for which annotation is incorrect or missing, and handle cases where sequencing errors have caused framing mistakes in the databases. It might also aid in the identification of proteins in which recoding events such as frameshifting or stop-codon read-through have occurred, elucidating alternative translation mechanisms. The prototype is implemented as a clientserver pair, allowing the distribution, among a set of cluster nodes, of a single or multiple genomes for concurrent analysis.
Proceedings of the National Academy of Sciences 02/2003; 100(1):20-5. DOI:10.1073/pnas.0136893100 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: An mRNA transcript contains many potential antisense oligodeoxynucleotide target sites. Identification of the most efficacious targets remains an important and challenging problem. Building on separate work that revealed a strong correlation between the inclusion of short sequence motifs and the activity level of an oligo, we have developed a predictive artificial neural network system for mapping tetranucleotide motif content to antisense oligo activity. Trained for high-specificity prediction, the system has been cross-validated against a database of 348 oligos from the literature and a larger proprietary database of 908 oligos. In cross- validation tests the system identified effective oligos (i.e. oligos capable of reducing target mRNA expression to <25% that of the control) with 53% accuracy, in contrast to the <10% success rates commonly reported for trial-and-error oligo selection, suggesting a possible 5-fold reduction in the in vivo screening required to find an active oligo. We have implemented a web interface to a trained neural network. Given an RNA transcript as input, the system identifies the most likely oligo targets and provides estimates of the probabilities that oligos targeted against these sites will be effective.
Nucleic Acids Research 10/2002; 30(19):4295-304. · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In an effort to identify potential programmed frameshift sites by statistical analysis, we explore the hypothesis that selective pressure would have rendered such sites underabundant and underrepresented in protein-coding sequences. We developed a computer program to compare the frequencies of k-length subsequences of nucleotides with the frequencies predicted by a zero order Markov chain determined by the codon bias of the same set of sequences. The program was used to calculate and evaluate the distribution of 7-base oligonucleotides in the 6000+ putative protein-coding sequences of S. cerevisiae preliminary to the laboratory testing of the most highly underrepresented oligos for frameshifting efficiency.
Among the most significant results is the finding that the heptanucleotides CUU-AGG-C and CUU-AGU-U, sites of the programmed +1 translational frameshifts required for the production in yeast of actin filament-binding protein ABP140 and telomerase subunit EST3, respectively, rank among the least represented of phase I heptanucleotides in the coding sequences of S. cerevisiae. Laboratory experiments demonstrated that other underrepresented heptanucleotides identified by the program, for example GGU-CAG-A, are also prone to significant translational frameshifting, suggesting the possibility that genes containing other underrepresented heptamers may also encode transframe products.
The program is available for download from http://www.gesteland.genetics.utah.edu/freqAnalysis
Complete results from the analysis of S. cerevisiae are available on http://www.gesteland.genetics.utah.edu/freqAnalysis