Publications (4)9.68 Total impact
-
Chapter: A fast filtration algorithm for the substring matching problem
[show abstract] [hide abstract]
ABSTRACT: Given a text of length n and a query of length q we present an algorithm for finding all locations of m-tuples in the text and in the query that differ by at most K mismatches. This problem is motivated by the dot-matrix constructions for sequence comparison and optimal oligonucleotide probe selection routinely used in molecular biology. In the case q = m the problem coincides with the classical approximate string matching with k mismatches problem. We present a new approach to this problem based on multiple filtration which may have advantages over some sophisticated and theoretically efficient methods that have been proposed. This paper describes a two-stage process. The first stage (multiple filtration) uses a new technique to preselect roughly similar m-tuples. The second stage compares these m-tuples using an accurate method. We demonstrate the advantages of multiple filtration in comparison with other techniques for approximate pattern matching.04/2006: pages 197-214; -
Chapter: Matrix longest common subsequence problem, duality and hilbert bases
[show abstract] [hide abstract]
ABSTRACT: Although a number of efficient algorithms for the longest common subsequence (LCS) problem have been suggested since the 1970's, there is no duality theorem for the LCS problem. In the present paper a simple duality theorem is proved for the LCS problem and for a wide class of partial orders generalizing the notion of common subsequence. An algorithm for finding generalized LCS is suggested which has the classical dynamic programming algorithm as a special case. It is shown that the generalized LCS problem is closely associated with the minimal Hilbert basis problem. The Jeroslav-Schrijver characterization of minimal Hilbert bases gives an O(n) estimation for the number of elementary edit operations for generalized LCS.01/2006: pages 79-89; -
Article: Whole-genome shotgun assembly and comparison of human genome assemblies.
[show abstract] [hide abstract]
ABSTRACT: We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.Proceedings of the National Academy of Sciences 03/2004; 101(7):1916-21. · 9.68 Impact Factor -
Article: Selected Papers from RECOMB'97 - Preface.
Journal of Computational Biology. 01/1997; 4:215-216.
Top Journals
Institutions
-
2004–2006
-
University of Southern California
- Department of Mathematics
Los Angeles, CA, USA
-