[Show abstract][Hide abstract] ABSTRACT: Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith-Waterman profile-profile alignment), global-local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42,581 protein-domain models in approximately 24,900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.
Nucleic Acids Research 11/2010; 39(Database issue):D487-93. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The cold shock protein (CSP) from hyperthermophile Thermotoga maritima (TmCSP) is only marginally stable (DeltaG(T(opt)) = 0.3 kcal/mol) at 353 K, the optimum environmental temperature (T(opt)) for T. maritima. In comparison, homologous CSPs from E. coli (DeltaG(T(opt)) = 2.2 kcal/mol) and B. subtilis (DeltaG(T(opt)) = 1.5 kcal/mol) are at least five times more stable at 310 K, the T(opt) for the mesophiles. Yet at the room temperature, TmCSP is more stable (DeltaG(T(R)) = 4.7 kcal/mol) than its homologues (DeltaG(T(R)) = 3.0 kcal/mol for E. coli CSP and DeltaG(T(R)) = 2.1 kcal/mol for B. subtilis CSP). This unique observation suggests that kinetic, rather than thermodynamic, barriers toward unfolding might help TmCSP native structure at high temperatures. Consistently, the unfolding rate of TmCSP is considerably slower than its homologues. High temperature (600 K) complete unfolding molecular dynamics (MD) simulations of TmCSP support our hypothesis and reveal an unfolding scheme unique to TmCSP. For all the studied homologues of TmCSP, the unfolding process first starts at the C-terminal region and N-terminal region unfolds in the end. But for TmCSP, both the terminals resist unfolding for consistently longer simulation times and, in the end, unfold simultaneously. In TmCSP, the C-terminal region is better fortified and has better interactions with the N-terminal region due to the charged residues, R2, E47, E49, H61, K63, and E66, being in spatial vicinity. The electrostatic interactions among these residues are unique to TmCSP. Consistently, the room temperature MD simulations show that TmCSP is more rigid at its N- and C-termini as compared to its homologues from E. coli, B. subtilis, and B. caldolyticus.
Proteins Structure Function and Bioinformatics 06/2008; 71(2):655-69. · 3.34 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.
Nucleic Acids Research 02/2008; 36(Database issue):D793-9. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56,419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37,670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.
Nucleic Acids Research 02/2006; 34(14):3917-28. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: This study is intended to construct a useful method for fold recognition, regardless of whether the proteins to be compared are evolutionarily related. We developed several descendants of our profile-profile comparison method to make use of known structural information for protein structure prediction. Our prediction strategy in CASP6 is simple. For every CASP6 target, we derived target-template alignments from several different versions of profile-profile comparisons. We then constructed and exhaustively evaluated 3D models based on those alignments. Subsequently, we selected proper model(s) among them. We specifically addressed the validation of our simple approach for protein structure prediction through CASP6 because the fold recognition results of CASP5 revealed areas of improvement in the selection of good models. Consequently, we applied a more stringent method for 3D model evaluation this time. All generated models were evaluated based on a structural quality score calculated by both Verify3D and Prosa2003 programs. It turns out that the prediction results of our human group were supported by the results of three servers. The pipeline that we constructed for our human group prediction and human intervention were also greatly effective in improving prediction models, but the efficacy of our scheme for 3D model evaluation was obscure.
Proteins Structure Function and Bioinformatics 02/2005; 61 Suppl 7:114-21. · 3.34 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We investigated human alternative protein isoforms of >2600 genes based on full-length cDNA clones and SwissProt. We classified the isoforms and examined their co-occurrence for each gene. Further, we investigated potential relationships between these changes and differential subcellular localization. The two most abundant patterns were the one with different C-terminal regions and the one with an internal insertion, which together account for 43% of the total. Although changes of the N-terminal region are less common than those of the C-terminal region, extension of the C-terminal region is much less common than that of the N-terminal region, probably because of the difficulty of removing stop codons in one isoform. We also found that there are some frequently used combinations of co-occurrence in alternative isoforms. We interpret this as evidence that there is some structural relationship which produces a repertoire of isoformal patterns. Finally, many terminal changes are predicted to cause differential subcellular localization, especially in targeting either peroxisomes or mitochondria. Our study sheds new light on the enrichment of the human proteome through alternative splicing and related events. Our database of alternative protein isoforms is available through the internet.
Nucleic Acids Research 01/2005; 33(8):2355-63. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: ve characterized the biophysical properties of the recombinant protein corresponding to residues 90--231. In this study, we parallelized and vectorized the molecular dynamics (MD) simulation programs, AMBER and MolTreC. (These program differ in the calculation used for long range interactions.) Currently, the vectorization ratios are 98% and the parallelization ratios are 97.4%. We performed MD simulations on Wild Type and Mutant (Pro102Leu) of HuPrP90--231 at 300K. Dihedral angle phi of Pro102 was very stable, remaining within a small range throughout our simulation. In contrast, dihedral angle phi of Leu102 varied considerably, allowing the possibility of interacting with other residues to form secondary structures, such as a beta sheet. This suggests that Pro102 is critical to prevent the transition from random structure to beta sheet structure in the wild type form. Our planning simulations are wild type and mutant on hexameric prion protein which has residues 90--231. We think Ear
[Show abstract][Hide abstract] ABSTRACT: A central theme in prion protein research is the detection of the process that underlies the conformational transition from the normal cellular prion form (PrP(C)) to its pathogenic isoform (PrP(Sc)). Although the three-dimensional structures of monomeric and dimeric human prion protein (HuPrP) have been revealed by NMR spectroscopy and x-ray crystallography, the process underlying the conformational change from PrP(C) to PrP(Sc) and the dynamics and functions of PrP(C) remain unknown. The dimeric form is thought to play an important role in the conformational transition. In this study, we performed molecular dynamics (MD) simulations on monomeric and dimeric HuPrP at 300 K and 500 K for 10 ns to investigate the differences in the properties of the monomer and the dimer from the perspective of dynamic and structural behaviors. Simulations were also undertaken with Asp178Asn and acidic pH, which is known as a disease-associated factor. Our results indicate that the dynamics of the dimer and monomer were similar (e.g., denaturation of helices and elongation of the beta-sheet). However, additional secondary structure elements formed in the dimer might result in showing the differences in dynamics and properties between the monomer and dimer (e.g., the greater retention of dimeric than monomeric tertiary structure).