Article
Bioinformatics in China: a personal perspective.
Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, People's Republic of China.
PLoS Computational Biology (impact factor:
5.22).
05/2008;
4(4):e1000020.
DOI:10.1371/journal.pcbi.1000020
pp.e1000020
Source: PubMed
-
Article: Measure representation and multifractal analysis of complete genomes.
[show abstract] [hide abstract]
ABSTRACT: This paper introduces the notion of measure representation of DNA sequences. Spectral analysis and multifractal analysis are then performed on the measure representations of a large number of complete genomes. The main aim of this paper is to discuss the multifractal property of the measure representation and the classification of bacteria. From the measure representations and the values of the D(q) spectra and related C(q) curves, it is concluded that these complete genomes are not random sequences. In fact, spectral analyses performed indicate that these measure representations, considered as time series, exhibit strong long-range correlation. Here the long-range correlation is for the K-strings with dictionary ordering, and it is different from the base pair correlations introduced by other people. For substrings with length K=8, the D(q) spectra of all organisms studied are multifractal-like and sufficiently smooth for the C(q) curves to be meaningful. With the decreasing value of K, the multifractality lessens. The C(q) curves of all bacteria resemble a classical phase transition at a critical point. But the "analogous" phase transitions of chromosomes of nonbacteria organisms are different. Apart from chromosome 1 of C. elegans, they exhibit the shape of double-peaked specific heat function. A classification of genomes of bacteria by assigning to each sequence a point in two-dimensional space (D(-1),D1) and in three-dimensional space (D(-1),D1,D(-2)) was given. Bacteria that are close phylogenetically are almost close in the spaces (D(-1),D1) and (D(-1),D1,D(-2)).Physical Review E 10/2001; 64(3 Pt 1):031903. · 2.26 Impact Factor -
Article: Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach.
[show abstract] [hide abstract]
ABSTRACT: A systematic way of inferring evolutionary relatedness of microbial organisms from the oligopeptide content, i.e., frequency of amino acid K-strings in their complete proteomes, is proposed. The new method circumvents the ambiguity of choosing the genes for phylogenetic reconstruction and avoids the necessity of aligning sequences of essentially different length and gene content. The only "parameter" in the method is the length K of the oligopeptides, which serves to tune the "resolution power" of the method. The topology of the trees converges with K increasing. Applied to a total of 109 organisms, including 16 Archaea, 87 Bacteria, and 6 Eukarya, it yields an unrooted tree that agrees with the biologists' "tree of life" based on SSU rRNA comparison in a majority of basic branchings, and especially, in all lower taxa.Journal of Molecular Evolution 02/2004; 58(1):1-11. · 2.27 Impact Factor -
Article: An optimization approach to predicting protein structural class from amino acid composition
[show abstract] [hide abstract]
ABSTRACT: Proteins are generally classified into four structural classes: all-α proteins, all-β proteins, α+β proteins, and α/β proteins. In this article, a protein is expressed as a vector of 20-dimensional space, in which its 20 components are defined by the composition of its 20 amino acids. Based on this, a new method, the so-called maximum component coefficient method, is proposed for predicting the structural class of a protein according to its amino acid composition. In comparison with the existing methods, the new method yields a higher general accuracy of prediction. Especially for the all-α proteins, the rate of correct prediction obtained by the new method is much higher than that by any of the existing methods. For instance, for the 19 all-α proteins investigated previously by P.Y. Chou, the rate of correct prediction by means of his method was 84.2%, but the correct rate when predicted with the new method would be 100%! Furthermore, the new method is characterized by an explicable physical picture. This is reflected by the process in which the vector representing a protein to be predicted is decomposed into four component vectors, each of which corresponds to one of the norms of the four protein structural classes.Protein Science 02/1992; 1(3):401 - 408. · 2.80 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.