Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database

Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan.
Genome biology (Impact Factor: 10.47). 02/2007; 8(3):R31. DOI: 10.1186/gb-2007-8-3-r31
Source: PubMed

ABSTRACT We present a novel protein structure database search tool, 3D-BLAST, that is useful for analyzing novel structures and can return a ranked list of alignments. This tool has the features of BLAST (for example, robust statistical basis, and effective and reliable search capabilities) and employs a kappa-alpha (kappa, alpha) plot derived structural alphabet and a new substitution matrix. 3D-BLAST searches more than 12,000 protein structures in 1.2 s and yields good results in zones with low sequence similarity.

Download full-text


Available from: Jinn-Moon Yang, Jun 02, 2015
  • Source
    • "The complete set of local structure prototypes defines a structural alphabet (Karchin et al., 2003; Offmann et al., 2007). The number of prototypes in a library is important for characterizing the local fold (Hunter and Subramaniam, 2003; Martin et al., 2008; Sander et al., 2006; Tung et al., 2007; Unger et al., 1989). The structural alphabet used in this study, is composed of 16 local structure prototypes that are 5 residues long, called Protein Blocks (PBs, see Figure 1) (de Brevern et al., 2000; Joseph et al., 2010a; Joseph et al., 2010b). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein structure analysis and prediction methods are based on non-redundant data extracted from the available protein structures, regardless of the species from which the protein originates. Hence, these datasets represent the global knowledge on protein folds, which constitutes a generic distribution of amino acid sequence-protein structure (AAS-PS) relationships. In this study, we try to elucidate whether the AAS-PS relationship could possess specificities depending on the specie. For this purpose, we have chosen three different species: Saccharomyces cerevisiae, Plasmodium falciparum and Arabidopsis thaliana. We analyzed the AAS-PS behaviors of the proteins from these three species and compared it to the "expected" distribution of a classical non-redundant databank. With the classical secondary structure description, only slight differences in amino acid preferences could be observed. With a more precise description of local protein structures (Protein Blocks), significant changes could be highlighted. S. cerevisiae's AAS-PS relationship is close to the general distribution, while striking differences are observed in the case of A. thaliana. P. falciparum is the most distant one. This study presents some interesting view-points on AAS-PS relationship. Certain species exhibit unique preferences for amino acids to be associated with protein local structural elements. Thus, AAS-PS relationships are species dependent. These results can give useful insights for improving prediction methodologies which take the species specific information into account.
    Journal of Theoretical Biology 02/2011; 276(1):209-17. DOI:10.1016/j.jtbi.2011.01.047 · 2.30 Impact Factor
  • Source
    • "Structural similarity searches using DALI (Holm et al., 2008), 3D-BLAST (Tung et al., 2007) and PISA (Krissinel and Henrick, 2007) identified Qua1 as part of a superfamily that uses helix-turn-helix motifs to form four-helix bundle dimers. Most members have a classical antiparallel orientation like the synthetic metal binding peptide DF1 (Maglio et al., 2003), or a parallel orientation like some ROP variants (Willis et al., 2000). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Posttranscriptional regulation of gene expression is an important mechanism for modulating protein levels in eukaryotes, especially in developmental pathways. The highly conserved homodimeric STAR/GSG proteins play a key role in regulating translation by binding bipartite consensus sequences in the untranslated regions of target mRNAs, but the exact mechanism remains unknown. Structures of STAR protein RNA binding subdomains have been determined, but structural information is lacking for the homodimerization subdomain. Here, we present the structure of the C. elegans GLD-1 homodimerization domain dimer, determined by a combination of X-ray crystallography and NMR spectroscopy, revealing a helix-turn-helix monomeric fold with the two protomers stacked perpendicularly. Structure-based mutagenesis demonstrates that the dimer interface is not easily disrupted, but the structural integrity of the monomer is crucial for GLD-1 dimerization. Finally, an improved model for STAR-mediated translational regulation of mRNA, based on the GLD-1 homodimerization domain structure, is presented.
    Structure 03/2010; 18(3):377-89. DOI:10.1016/j.str.2009.12.016 · 6.79 Impact Factor
  • Source
    • "To overcome this limitation, several groups have proposed the idea that representing protein structures as a series of overlapping fragments, each labeled with a symbol , which defines a structural alphabet (SA) for proteins [8] [9] [10] [11] [12] [13]. Such alphabet can be used to predict local structure [15–17], to reconstruct the full-atom representation [18], to identify the structural motifs [19], to classify protein structures [20] and to search against a database [21, 22]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Protein sequence world is discrete as 20 amino acids (AA) while its structure world is continuous, though can be discretized into structural alphabets (SA). In order to reveal the relationship between sequence and structure, it is interesting to consider both AA and SA in a joint space. However, such space has too many parameters, so the reduction of AA is necessary to bring down the parameter numbers. Result: We've developed a simple but effective approach called entropic clustering based on selecting the best mutual information between a given reduction of AAs and SAs. The optimized reduction of AA into two groups leads to hydrophobic and hydrophilic. Combined with our SA, namely conformational letter (CL) of 17 alphabets, we get a joint alphabet called hydropathy conformational letter (hp-CL). A joint substitution matrix with (17*2)*(17*2) indices is derived from FSSP. Moreover, we check the three coding systems, say AA, CL and hp-CL against a large database consisting proteins from family to fold, with their performance on the TopK accuracy of both similar fragment pair (SFP) and the neighbor of aligned fragment pair (AFP). The TopK selection is according to the score calculated by the coding system's substitution matrix. Finally, embedding hp-CL in a pairwise alignment algorithm, say CLeFAPS, to replace the original CL, will get an improvement on the HOMSTRAD benchmark. Comment: 8 pages, 5 figures
Show more