[Show abstract][Hide abstract] ABSTRACT: This work presents a novel detection method for three-dimensional domain swapping (DS), a mechanism for forming protein quaternary structures that can be visualized as if monomers had "opened" their "closed" structures and exchanged the opened portion to form intertwined oligomers. Since the first report of DS in the mid 1990s, an increasing number of identified cases has led to the postulation that DS might occur in a protein with an unconstrained terminus under appropriate conditions. DS may play important roles in the molecular evolution and functional regulation of proteins and the formation of depositions in Alzheimer's and prion diseases. Moreover, it is promising for designing auto-assembling biomaterials. Despite the increasing interest in DS, related bioinformatics methods are rarely available. Owing to a dramatic conformational difference between the monomeric/closed and oligomeric/open forms, conventional structural comparison methods are inadequate for detecting DS. Hence, there is also a lack of comprehensive datasets for studying DS. Based on angle-distance (A-D) image transformations of secondary structural elements (SSEs), specific patterns within A-D images can be recognized and classified for structural similarities. In this work, a matching algorithm to extract corresponding SSE pairs from A-D images and a novel DS score have been designed and demonstrated to be applicable to the detection of DS relationships. The Matthews correlation coefficient (MCC) and sensitivity of the proposed DS-detecting method were higher than 0.81 even when the sequence identities of the proteins examined were lower than 10%. On average, the alignment percentage and root-mean-square distance (RMSD) computed by the proposed method were 90% and 1.8Å for a set of 1,211 DS-related pairs of proteins. The performances of structural alignments remain high and stable for DS-related homologs with less than 10% sequence identities. In addition, the quality of its hinge loop determination is comparable to that of manual inspection. This method has been implemented as a web-based tool, which requires two protein structures as the input and then the type and/or existence of DS relationships between the input structures are determined according to the A-D image-based structural alignments and the DS score. The proposed method is expected to trigger large-scale studies of this interesting structural phenomenon and facilitate related applications.
PLoS ONE 10/2010; 5(10):e13361. DOI:10.1371/journal.pone.0013361 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Protein data has an explosive increasing rate both in volume and diversity, yet many of its structures remain unresolved,
as well their functions remain to be identified. The conventional sequence alignment tools are insufficient in remote homology
detection, while the current structural alignment tools would encounter the difficulties for proteins of unresolved structure.
Here, we aimed to overcome the combination of two major obstacles for detecting remote homologous proteins: proteins with
unresolved structure, and proteins of low sequence identity but high structural similarity. We proposed a novel method for
improving the performance of protein matching problem, especially for mining remote homologous proteins. In this study, existing
secondary structure prediction techniques were applied to provide the locations of secondary structure elements of proteins.
The proposed LESS (Length Encoded Secondary Structure) profile was then constructed for segment-based similarity comparison
in parallel computing. As compared to a conventional residue-based sequence alignment tool, detection of remote protein homologies
through LESS profile is favourable in terms of speed and high sequence diversity, and its accuracy and performance can improve
the deficiencies of the traditional primary sequence alignment methodology. This method may further support biologists in
protein folding, evolution, and function prediction.
Algorithms and Architectures for Parallel Processing, 9th International Conference, ICA3PP 2009, Taipei, Taiwan, June 8-11, 2009. Proceedings; 01/2009
[Show abstract][Hide abstract] ABSTRACT: With the advancement of biological techniques, researches in the fields of marine evolution, ecology, and aquaculture have an explosive increasing rate both in volume and diversity. More than tens of thousands of genomic sequences were available for important marine species. However, most of the structures and corresponding functions remain unresolved and unknown. To discover the biological characteristics of genomic sequences of a marine species, an efficient and effective method for detecting distantly related proteins based on experimentally known functions from model species becomes an important strategy. In this study, Ensembl and NCBI genetic databases were employed to build a primitive database of selected marine species. The system contained an abundance of useful DNA, RNA and Protein information, and was named as the Marine Species Genome Database (MSGD). To identify remote proteins, we have proposed a novel LESS (length encoded secondary structure) profile to improve the information retrieval applications, especially for identifying protein sequences without resolved structures and within low sequence identity. The matching algorithms applied several existing secondary structure prediction techniques and a feasible encoding mechanism with respect to the length distribution of secondary structures. Due to the conservation of secondary structures of proteins in evolution, the proposed system demonstrated its suitability for similarity comparison of distantly related proteins, and several important protein sequences can be retrieved by MSGD while those well-known residue-based matching methods missed the identification.
Proceedings of the 2nd International Conference on BioMedical Engineering and Informatics, BMEI 2009, October 17-19, 2009, Tianjin, China; 01/2009