[Show abstract][Hide abstract] ABSTRACT: Results of inter-dataset training and testing of the proposed method for the identification of DS-related homologs. Only DS-related homologs were used as positive data in this experiment, in which common homologs and non-homologs were both regarded as negative data. Performance measures listed in this table include AUC, MCC, sensitivity and specificity.
(0.06 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Results of the structural alignments and hinge loop determinations for DSCO pairs in Datasets L and M. The 1,093 DSCO pairs successfully identified by the proposed method are listed here each with detailed information of the ranges of hinge loops determined by Eisenberg's and our methods, several structural similarity measures as well as the DS score defined in this work, and the virtual superimposition computed by our method. Structural superimpositions shown in this table were drawn using Jmol.
(9.67 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: DS-detecting performance of DynDom assessed based on Eisenberg's DS dataset. Among the 39 query proteins, 12 are detected to posses hinge loops by DynDom . The locations and ranges of hinge loops determined by DynDom are compared to those reported by Eisenberg et al. in .
(0.11 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Performances of several protein structure/sequence comparison methods for the detection of global structural similarities between DS-related homologs with various sequence identities. An experiment that determines the simultaneous alignment qualities of the hinge loops, main domains and swapped domains for several protein structure/sequence comparison methods.
(0.53 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Examples of the A⋅D profile and related hinge loop detection procedure. (a) Crystallins with PDB identifiers 4gcrA and 1blbA, a quasi-domain swapping case . (b) Crystallins with PDB identifiers 4gcrA and 2a5mA, a pair of common global homologs. (c) Acetyltransferases with PDB identifiers 1s60A and 1b6bA, a pair of quasi-domain swapping homologs with a small C-terminal-swapped “domain”.
(1.56 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Structure-based sequence alignments for DSCO pairs in Datasets L and M performed by several protein structural comparison methods. The structure-based sequence alignments performed by TM-align , SARST  and the proposed DS-detecting method as well as the sequence alignments performed by BLAST  for the 1,093 DSCO pairs shown in Table S4 are listed here.
(9.84 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Sensitivity and specificity of various alignment methods and structural similarity measures for the identification of common structural homologs and/or DS-related homologs. Sensitivity and specificity values of all alignment methods were determined based on S-div , except those of BLAST, which were determined based on a normalized sequence similarity score calculated according to the Formula 8 in .
(0.07 MB XLS)
[Show abstract][Hide abstract] ABSTRACT: This work presents a novel detection method for three-dimensional domain swapping (DS), a mechanism for forming protein quaternary structures that can be visualized as if monomers had "opened" their "closed" structures and exchanged the opened portion to form intertwined oligomers. Since the first report of DS in the mid 1990s, an increasing number of identified cases has led to the postulation that DS might occur in a protein with an unconstrained terminus under appropriate conditions. DS may play important roles in the molecular evolution and functional regulation of proteins and the formation of depositions in Alzheimer's and prion diseases. Moreover, it is promising for designing auto-assembling biomaterials. Despite the increasing interest in DS, related bioinformatics methods are rarely available. Owing to a dramatic conformational difference between the monomeric/closed and oligomeric/open forms, conventional structural comparison methods are inadequate for detecting DS. Hence, there is also a lack of comprehensive datasets for studying DS. Based on angle-distance (A-D) image transformations of secondary structural elements (SSEs), specific patterns within A-D images can be recognized and classified for structural similarities. In this work, a matching algorithm to extract corresponding SSE pairs from A-D images and a novel DS score have been designed and demonstrated to be applicable to the detection of DS relationships. The Matthews correlation coefficient (MCC) and sensitivity of the proposed DS-detecting method were higher than 0.81 even when the sequence identities of the proteins examined were lower than 10%. On average, the alignment percentage and root-mean-square distance (RMSD) computed by the proposed method were 90% and 1.8Å for a set of 1,211 DS-related pairs of proteins. The performances of structural alignments remain high and stable for DS-related homologs with less than 10% sequence identities. In addition, the quality of its hinge loop determination is comparable to that of manual inspection. This method has been implemented as a web-based tool, which requires two protein structures as the input and then the type and/or existence of DS relationships between the input structures are determined according to the A-D image-based structural alignments and the DS score. The proposed method is expected to trigger large-scale studies of this interesting structural phenomenon and facilitate related applications.
[Show abstract][Hide abstract] ABSTRACT: The number of DS-related homologs, common homologs and non-homologs remaining in the test set from the experiments presented in Fig. 2 as the alignment ratio cutoff decreases. The alignment cutoff applied in this study is designed to remove globally-superimposeable homologous protein pairs from the testing datasets. Since many common homologous pairs are globally-superimposeable, as this cutoff lowers, the amount of common homologs decreases much more rapidly than the amount of DS-related homologs, which are only partially-superimposeable, decreases. Meanwhile, the amount of non-homologous pairs remains nearly unchanged. Interestingly, relative to the amount of all homologs, including DS-related and common ones, the amount of DS-related homologs remaining in the dataset increases as the alignment ratio cutoff becomes lower within the tested range.
(0.46 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Number of SSEs in the swapped domains. Here an SSE means an α-helix or a β-strand. The number of SSEs that a swapped domain contains roughly reflects the size of the domain. The ranges of SSEs were extracted from the PDB files according to the HELIX and SHEET records.
(0.07 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Stability evaluations of the discriminatory model of the proposed method by k-fold cross-validations. The stability of the discriminatory model applied in the proposed DS-scoring scheme was evaluated based on two datasets. (a) Evaluations based on Dataset L. (b) Evaluations based on Dataset M.
(1.34 MB PDF)
[Show abstract][Hide abstract] ABSTRACT: Protein data has an explosive increasing rate both in volume and diversity, yet many of its structures remain unresolved,
as well their functions remain to be identified. The conventional sequence alignment tools are insufficient in remote homology
detection, while the current structural alignment tools would encounter the difficulties for proteins of unresolved structure.
Here, we aimed to overcome the combination of two major obstacles for detecting remote homologous proteins: proteins with
unresolved structure, and proteins of low sequence identity but high structural similarity. We proposed a novel method for
improving the performance of protein matching problem, especially for mining remote homologous proteins. In this study, existing
secondary structure prediction techniques were applied to provide the locations of secondary structure elements of proteins.
The proposed LESS (Length Encoded Secondary Structure) profile was then constructed for segment-based similarity comparison
in parallel computing. As compared to a conventional residue-based sequence alignment tool, detection of remote protein homologies
through LESS profile is favourable in terms of speed and high sequence diversity, and its accuracy and performance can improve
the deficiencies of the traditional primary sequence alignment methodology. This method may further support biologists in
protein folding, evolution, and function prediction.
[Show abstract][Hide abstract] ABSTRACT: With the advancement of biological techniques, researches in the fields of marine evolution, ecology, and aquaculture have an explosive increasing rate both in volume and diversity. More than tens of thousands of genomic sequences were available for important marine species. However, most of the structures and corresponding functions remain unresolved and unknown. To discover the biological characteristics of genomic sequences of a marine species, an efficient and effective method for detecting distantly related proteins based on experimentally known functions from model species becomes an important strategy. In this study, Ensembl and NCBI genetic databases were employed to build a primitive database of selected marine species. The system contained an abundance of useful DNA, RNA and Protein information, and was named as the Marine Species Genome Database (MSGD). To identify remote proteins, we have proposed a novel LESS (length encoded secondary structure) profile to improve the information retrieval applications, especially for identifying protein sequences without resolved structures and within low sequence identity. The matching algorithms applied several existing secondary structure prediction techniques and a feasible encoding mechanism with respect to the length distribution of secondary structures. Due to the conservation of secondary structures of proteins in evolution, the proposed system demonstrated its suitability for similarity comparison of distantly related proteins, and several important protein sequences can be retrieved by MSGD while those well-known residue-based matching methods missed the identification.