Context-specific amino acid substitution matrices and their use in the detection of protein homologs

Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892-4264, USA.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.92). 05/2008; 71(2):910-9. DOI: 10.1002/prot.21775
Source: PubMed

ABSTRACT The sequence homology detection relies on score matrices, which reflect the frequency of amino acid substitutions observed in a dataset of homologous sequences. The substitution matrices in popular use today are usually constructed without consideration of the structural context in which the substitution takes place. Here, we present amino acid substitution matrices specific for particular polar-nonpolar environment of the amino acid. As expected, these matrices [context-specific substitution matrices (CSSMs)] show striking differences from the popular BLOSUM62 matrix, which does not include structural information. When incorporated into BLAST and PSI-BLAST, CSSM outperformed BLOSUM matrices as assessed by ROC curve analyses of the number of true and false hits and by the accuracy of the sequence alignments to the hit sequences. These findings are also of relevance to profile-profile-based methods of homology detection, since CSSMs may help build a better profile. Profiles generated for protein sequences in PDB using CSSM-PSI-BLAST will be made available for searching via RPSBLAST through our web site

Download full-text


Available from: BK Lee, Jul 29, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The attachment to host skin by Rhipicephalus microplus larvae induces a series of physiological events at the attachment site. The host-parasite interaction might induce a rejection of the larvae, as is frequently observed in Bos taurus indicus cattle, and under certain conditions in Bos taurus taurus cattle. Ticks deactivate the host rejection response by secreting specific proteins and lipids that play an essential role in manipulation of the host immune response. The available genomic information on the R. microplus tick was mined using bioinformatics approaches to identify R. microplus lipocalins (LRMs). This in silico examination revealed a total of 12 different putative R. microplus LRMs (LRM1 - LRM12). The identity of the LRM family showed high sequence variability: from 6% between LRM7 and LRM8 to 55.9% between LRM2 and LRM6. However, the three-dimensional structure of the lipocalin family was conserved in the LRMs. The B and T cell epitopes in these lipocalins were then predicted, and six of the LRMs (5, 6, 9, 10, 11 and 12) were used to examine the host immune interactions with sera and peripheral blood mononuclear cells (PBMCs) collected from tick-susceptible and tick-resistant cattle challenged with R. microplus. On days 28 - 60 after tick infestation, the anti-LRM titres were higher in the resistant group compared with the susceptible cattle. After 60 day, the anti-LRM titres (except LRM9 and LRM11) decreased to zero in the sera of both the tick-resistant and tick-susceptible cattle. Using cell proliferation assays, the PBMCs challenged with some of the predicted T cell epitopes (LRM1_T1, T2; LRM_T1, T2 and LRM12_T) exhibited a significantly higher number of IFN-γ-secreting cells (Th1) in tick-susceptible Holstein-Friesians compared with tick-resistant Brahman cattle. In contrast, expression of the Th2 cytokine (IL-4) was lower in Holstein-Friesians cattle compared with Brahman cattle. Moreover, this study found that LRM6, LRM9 and LRM11 play important roles in the mechanism by which R. microplus interferes with the host's haemostasis mechanisms.
    International journal for parasitology 06/2013; DOI:10.1016/j.ijpara.2013.04.005 · 3.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method which yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared to standard, substitution matrix-based alignment. RESULTS: Here, we develop an alternative, discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for BLAST and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a data set filtered to 20% maximum sequence identity, CS-BLASTdis is 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgen in detecting remote homologs at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21% and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical. AVAILABILITY: Source code (GPL) and benchmark data are available at CONTACT:
    Bioinformatics 10/2012; 28(24). DOI:10.1093/bioinformatics/bts622 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Knowledge of cattle tick (Rhipicephalus (Boophilus) microplus; Acari: Ixodidae) molecular and cellular pathways has been hampered by the lack of an annotated genome. In addition, most of the tick expressed sequence tags (ESTs) available to date consist of ∼50% unassigned sequences without predicted functions. The most common approach to address this has been the application of RNA interference (RNAi) methods to investigate genes and their pathways. This approach has been widely adopted in tick research despite minimal knowledge of the tick RNAi pathway and double-stranded RNA (dsRNA) uptake mechanisms. A strong knockdown phenotype of adult female ticks had previously been observed using a 594 bp dsRNA targeting the cattle tick homologue for the Drosophila Ubiquitin-63E gene leading to nil or deformed eggs. A NimbleGen cattle tick custom microarray based on the BmiGI.V2 database of R. microplus ESTs was used to evaluate the expression of mRNAs harvested from ticks treated with the tick Ubiquitin-63E 594 bp dsRNA compared with controls. A total of 144 ESTs including TC6372 (Ubiquitin-63E) were down-regulated with 136 ESTs up-regulated following treatment. The results obtained substantiated the knockdown phenotype with ESTs identified as being associated with ubiquitin proteolysis as well as oogenesis, embryogenesis, fatty acid synthesis and stress responses. A bioinformatics analysis was undertaken to predict off-target effects (OTE) resulting from the in silico dicing of the 594 bp Ubiquitin-63E dsRNA which identified 10 down-regulated ESTs (including TC6372) within the list of differentially expressed probes on the microarrays. Subsequent knockdown experiments utilising 196 and 109 bp dsRNAs, and a cocktail of short hairpin RNAs (shRNA) targeting Ubiquitin-63E, demonstrated similar phenotypes for the dsRNAs but nil effect following shRNA treatment. Quantitative reverse transcriptase PCR analysis confirmed differential expression of TC6372 and selected ESTs. Our study demonstrated the minimisation of predicted OTEs in the shorter dsRNA treatments (∼100-200 bp) and the usefulness of microarrays to study knockdown phenotypes.
    International journal for parasitology 06/2011; 41(9):1001-14. DOI:10.1016/j.ijpara.2011.05.003 · 3.40 Impact Factor