Context-specific amino acid substitution matrices and their use in the detection of protein homologs

Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892-4264, USA.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.92). 05/2008; 71(2):910-9. DOI: 10.1002/prot.21775
Source: PubMed

ABSTRACT The sequence homology detection relies on score matrices, which reflect the frequency of amino acid substitutions observed in a dataset of homologous sequences. The substitution matrices in popular use today are usually constructed without consideration of the structural context in which the substitution takes place. Here, we present amino acid substitution matrices specific for particular polar-nonpolar environment of the amino acid. As expected, these matrices [context-specific substitution matrices (CSSMs)] show striking differences from the popular BLOSUM62 matrix, which does not include structural information. When incorporated into BLAST and PSI-BLAST, CSSM outperformed BLOSUM matrices as assessed by ROC curve analyses of the number of true and false hits and by the accuracy of the sequence alignments to the hit sequences. These findings are also of relevance to profile-profile-based methods of homology detection, since CSSMs may help build a better profile. Profiles generated for protein sequences in PDB using CSSM-PSI-BLAST will be made available for searching via RPSBLAST through our web site


Available from: BK Lee, Jul 29, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein domains are fundamental units of protein structure, function and evolution, thus it is critical to gain a deep understanding of protein domain organization. Previous works have attempted to identify key residues involved in organization of domain architecture. Since one of the most important characteristics of domain architecture is the arrangement of secondary structure elements (SSEs), here we present a picture of domain organization through an integrated consideration of SSE arrangements and residue contact networks. In this work, by representing SSEs as main-chain scaffolds and side-chain interfaces and through construction of residue contact networks, we have identified the SSE interfaces well packed within protein domains as SSE packing clusters. 17334 SSE packing clusters were recognized from 9015 SCOP domains of less than 40% sequence identity. The similar SSE packing clusters were observed not only among domains of the same folds, but also among domains of different folds, indicating their roles as common scaffolds for organization of protein domains. Further analysis of 14 small single-domain proteins reveals a high correlation between the SSE packing clusters and the folding nuclei. Consistent with their important roles in domain organization, SSE packing clusters were found to be more conserved than other regions within the same proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2014; 30(17). DOI:10.1093/bioinformatics/btu327 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The attachment to host skin by Rhipicephalus microplus larvae induces a series of physiological events at the attachment site. The host-parasite interaction might induce a rejection of the larvae, as is frequently observed in Bos taurus indicus cattle, and under certain conditions in Bos taurus taurus cattle. Ticks deactivate the host rejection response by secreting specific proteins and lipids that play an essential role in manipulation of the host immune response. The available genomic information on the R. microplus tick was mined using bioinformatics approaches to identify R. microplus lipocalins (LRMs). This in silico examination revealed a total of 12 different putative R. microplus LRMs (LRM1 - LRM12). The identity of the LRM family showed high sequence variability: from 6% between LRM7 and LRM8 to 55.9% between LRM2 and LRM6. However, the three-dimensional structure of the lipocalin family was conserved in the LRMs. The B and T cell epitopes in these lipocalins were then predicted, and six of the LRMs (5, 6, 9, 10, 11 and 12) were used to examine the host immune interactions with sera and peripheral blood mononuclear cells (PBMCs) collected from tick-susceptible and tick-resistant cattle challenged with R. microplus. On days 28 - 60 after tick infestation, the anti-LRM titres were higher in the resistant group compared with the susceptible cattle. After 60 day, the anti-LRM titres (except LRM9 and LRM11) decreased to zero in the sera of both the tick-resistant and tick-susceptible cattle. Using cell proliferation assays, the PBMCs challenged with some of the predicted T cell epitopes (LRM1_T1, T2; LRM_T1, T2 and LRM12_T) exhibited a significantly higher number of IFN-γ-secreting cells (Th1) in tick-susceptible Holstein-Friesians compared with tick-resistant Brahman cattle. In contrast, expression of the Th2 cytokine (IL-4) was lower in Holstein-Friesians cattle compared with Brahman cattle. Moreover, this study found that LRM6, LRM9 and LRM11 play important roles in the mechanism by which R. microplus interferes with the host's haemostasis mechanisms.
    International journal for parasitology 06/2013; DOI:10.1016/j.ijpara.2013.04.005 · 3.40 Impact Factor