[Show abstract][Hide abstract] ABSTRACT: The Weismann barrier, or the impossibility of inheritance of acquired traits,
comprises a foundation of modern biology, and it has been a major obstacle in
establishing the connection between evolution and ontogenesis. We propose the
cooperative model based on the assumption that evolution is achieved by a
cooperation between genetic mutations and acquired changes (phenotypic
plasticity). It is also assumed in this model that natural selection operates
on phenotypes, rather than genotypes, of individuals, and that the relationship
between phenotypes and genotypes is one-to-many. In the simulations based on
these assumptions, individuals exhibited phenotypic changes in response to an
environmental change, corresponding multiple genetic mutations were
increasingly accumulated in individuals in the population, and phenotypic
plasticity was gradually replaced with genetic mutations. This result suggests
that Lamarck's law of use and disuse can effectively hold without conflicting
the Weismann barrier, and thus evolution can be logically connected with
[Show abstract][Hide abstract] ABSTRACT: Intrinsically disordered (ID) proteins (IDPs) are abundant in eukaryotes but are scarce in prokaryotes. Mitochondria, cellular organelles that descended from Rickettsia-like α-proteobacteria, are at the intersection between prokaryotes and eukaryotes. Although IDPs are reportedly as rare in mitochondria as in bacteria, these details remained to be clarified. Human mitochondrial proteins (n = 706) were obtained from the UniProt database, and information on ID regions of all human proteins was extracted from the DICHOT database. A BLAST search carried out against all α-proteobacterial proteins identified two types of mitochondrial proteins: those with (B) and without (E) bacterial homologues. The B-type proteins (n = 387) descended from a bacterial ancestor, whereas the E-type proteins (n = 319) were more recently added to the mitochondria via the host cell during the early evolution of eukaryotes. The average ID ratios of B-type/E-type proteins are 10.3% and 21.4%, respectively. The 706 proteins were further classified into four groups based on the mitochondrial subcompartment, namely, the matrix, intermembrane space, inner membrane, or outer membrane. The ID ratios in these different locations suggest that the frequency of IDPs in mitochondria might be due to the evolutionary origin (B-type/E-type) of the protein, rather than differences in its functional environment.
Genes to Cells 08/2012; 17(10):817-25. · 2.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: De novo design of artificial proteins is an essential approach to elucidate the principles of protein architecture and to understand
specific functions of natural proteins and also to yield novel molecules for medical and industrial aims. We have designed
artificial sequences of 153 amino acids to fit the main-chain framework of the sperm whale myoglobin structure based on the
knowledge-based energy functions to evaluate the compatibility between protein tertiary structures and amino acid sequences.
The synthesized artificial globins bind a single heme per protein molecule as designed, which show well-defined electrochemical
and spectroscopic features characteristic of proteins with a low-spin heme. Redox and ligand binding reactions of the artificial
heme proteins were investigated and these heme-related functions were found to vary with their structural uniqueness. Relationships
between the structural and functional properties are discussed.
Journal of Chemical Sciences 04/2012; 112(3):215-221. · 1.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs) leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein's sequence or large changes, insertions, or deletions in the sequence. We have analyzed a large set (n = 512) of insertions and deletions (indels) and single nucleotide polymorphisms causing premature termination of translation in disease-related genes. Prediction of protein-destabilization effects was performed by graphical presentation of the locations of polymorphisms in the protein structure, using the Genomes TO Protein (GTOP) database, and manual annotation with a set of specific criteria. Protein-destabilization was predicted for 44.4% of the nonsense SNPs, 32.4% of the frameshifting indels, and 9.1% of the non-frameshifting indels. A prediction of nonsense-mediated decay allowed to infer which truncated proteins would actually be translated as defective proteins. These cases included the proteins linked to diseases inherited dominantly, suggesting a relation between these diseases and toxic aggregation. Our approach would be useful in identifying potentially aggregation-inducing polymorphisms that may have pathological effects.
PLoS ONE 01/2012; 7(11):e50445. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Proteins in general consist not only of globular structural domains (SDs), but also of intrinsically disordered regions (IDRs), i.e. those that do not assume unique three-dimensional structures by themselves. Although IDRs are especially prevalent in eukaryotic proteins, the functions are mostly unknown. To elucidate the functions of IDRs, we first divided eukaryotic proteins into subcellular localizations, identified IDRs by the DICHOT system that accurately divides entire proteins into SDs and IDRs, and examined charge and hydropathy characteristics. On average, mitochondrial proteins have IDRs more positively charged than SDs. Comparison of mitochondrial proteins with orthologous prokaryotic proteins showed that mitochondrial proteins tend to have segments attached at both N and C termini, high fractions of which are IDRs. Segments added to the N-terminus of mitochondrial proteins contain not only signal sequences but also mature proteins and exhibit a positive charge gradient, with the magnitude increasing toward the N-terminus. This finding is consistent with the notion that positively charged residues are added to the N-terminus of proteobacterial proteins so that the extended proteins can be chromosomally encoded and efficiently transported to mitochondria after translation. By contrast, nuclear proteins generally have positively charged SDs and negatively charged IDRs. Among nuclear proteins, DNA-binding proteins have enhanced charge tendencies. We propose that SDs in nuclear proteins tend to be positively charged because of the need to bind to negatively charged nucleotides, while IDRs tend to be negatively charged to interact with other proteins or other regions of the same proteins to avoid premature proteasomal degradation.
[Show abstract][Hide abstract] ABSTRACT: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome.
In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing.
We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT.
[Show abstract][Hide abstract] ABSTRACT: O-glycosylation of mammalian proteins is one of the important posttranslational modifications. We applied a support vector machine (SVM) to predict whether Ser or Thr is glycosylated, in order to elucidate the O-glycosylation mechanism. O-glycosylated sites were often found clustered along the sequence, whereas other sites were located sporadically. Therefore, we developed two types of SVMs for predicting clustered and isolated sites separately. We found that the amino acid composition was effective for predicting the clustered type, whereas the site-specific algorithm was effective for the isolated type. The highest prediction accuracy for the clustered type was 74%, while that for the isolated type was 79%. The existence frequency of amino acids around the O-glycosylation sites was different in the two types: namely, Pro, Val and Ala had high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated type. Independent component analyses for the amino acid sequences around O-glycosylation sites showed the position-specific existences of the identified amino acids as independent components. The O-glycosylation sites were preferentially located within intrinsically disordered regions of extracellular proteins: particularly, more than 90% of the clustered O-GalNAc glycosylation sites were observed in intrinsically disordered regions. This feature could be the key for understanding the non-conservation property of O-glycosylation, and its role in functional diversity and structural stability.
International Journal of Molecular Sciences 01/2010; 11(12):4991-5008. · 2.46 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In addition to structural domains, most eukaryotic proteins possess intrinsically disordered (ID) regions. Although ID regions often play important functional roles, their accurate identification is difficult. As human transcription factors (TFs) constitute a typical group of proteins with long ID regions, we regarded them as a model of all proteins and attempted to accurately classify TFs into structural domains and ID regions. Although an extremely high fraction of ID regions besides DNA binding and/or other domains was detected in human TFs in our previous investigation, 20% of the residues were left unassigned. In this report, we exploit the generally higher sequence divergence in ID regions than in structural regions to completely divide proteins into structural domains and ID regions.
The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration. The system was found to be highly accurate: its application to a set of proteins with experimentally verified ID regions had an error rate as low as 2%. Application of this system to human TFs (401 proteins) showed that 38% of the residues were in structural domains, while 62% were in ID regions. The preponderance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions. The method also revealed that 4.0% and 11.8% of the total length in human and E. coli TFs, respectively, are comprised of structural domains whose structures have not been determined.
The present system verifies that sequence divergence including information of unaligned regions is a good indicator of ID regions. The system for the first time estimates the complete fractioning of structured/un-structured regions in human TFs, also revealing structural domains without homology to known structures. These predicted novel structural domains are good targets of structural genomics. When applied to other proteins, the system is expected to uncover more novel structural domains.
[Show abstract][Hide abstract] ABSTRACT: The Genomes TO Protein Structures and Functions (GTOP) database (http://spock.genes.nig.ac.jp/~genome/gtop.html) freely provides an extensive collection of information on protein structures and functions obtained by application of various computational tools to the amino acid sequences of entirely sequenced genomes. GTOP contains annotations of 3D structures, protein families, functions, and other useful data of a protein of interest in user-friendly ways to give a deep insight into the protein structure. From the initial 1999 version, GTOP has been continually updated to reap the fruits of genome projects and augmented to supply novel information, in particular intrinsically disordered regions. As intrinsically disordered regions constitute a considerable fraction of proteins and often play crucial roles especially in eukaryotes, their assignments give important additional clues to the functionality of proteins. Additionally, we have incorporated the following features into GTOP: a platform independent structural viewer, results of HMM searches against SCOP and Pfam, secondary structure predictions, color display of exon boundaries in eukaryotic proteins, assignments of gene ontology terms, search tools, and master files.
Nucleic Acids Research 12/2008; 37(Database issue):D333-7. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.
Nucleic Acids Research 02/2008; 36(Database issue):D793-9. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Using the information from the genome projects, recent comparative studies of thermostable proteins have revealed a certain trend of amino acid composition in which polar residues are scarce and charged residues are rich on the protein surface. To clarify experimentally the effect of the amino acid composition of surface residues on the thermostability of Escherichia coli Ribonuclease HI (RNase HI), we constructed six variants in which five to eleven polar residues were replaced by charged residues (5C, 7Ca, 7Cb, 9Ca, 9Cb and 11C). The thermal denaturation experiments indicated that all of the variant proteins are 3.2-10.1 degrees C in Tm less stable than the wild proteins. The crystal structures of resultant protein variants 7Ca, 7Cb, 9Ca and 11C closely resemble that of E. coli RNase HI in their global fold, and several different hydrogen bonding and ion-pair interactions are formed by the mutations. Comparison of the crystal structures of these variant proteins with that of E. coli RNase HI reveals that thermal destabilization is apparently related to electrostatic repulsion of the charged residues with neighbours. This result suggests that charged residues of natural thermostable proteins are strictly posted on the surface with optimal interactions and without repulsive interactions.
Journal of Biochemistry 11/2007; 142(4):507-16. · 3.07 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Multicanonical molecular dynamics (MD) is a powerful technique for sampling conformations on rugged potential surfaces such as protein. However, it is notoriously difficult to estimate the multicanonical temperature effectively. Wang and Landau developed a convenient method for estimating the density of states based on a multicanonical Monte Carlo method. In their method, the density of states is calculated autonomously during a simulation. In this paper, we develop a set of techniques to effectively apply the Wang-Landau method to MD simulations. In the multicanonical MD, the estimation of the derivative of the density of states is critical. In order to estimate it accurately, we devise two original improvements. First, the correction for the density of states is made smooth by using the Gaussian distribution obtained by a short canonical simulation. Second, an approximation is applied to the derivative, which is based on the Gaussian distribution and the multiple weighted histogram technique. A test of this method was performed with small polypeptides, Met-enkephalin and Trp-cage, and it is demonstrated that Wang-Landau MD is consistent with replica exchange MD but can sample much larger conformational space.
[Show abstract][Hide abstract] ABSTRACT: A systematic survey of intrinsically disordered (ID) regions was carried out in 2109 human plasma membrane proteins with full assignment of the transmembrane topology with respect to the lipid bilayer. ID regions with 30 consecutive residues or more were detected in 41.0% of the human proteins, a much higher percentage than the corresponding figure (4.7%) for inner membrane proteins of Escherichia coli. The domain organization of each of the membrane protein in terms of transmembrane helices, structural domains, ID, and unassigned regions as well as the distinction of inside or outside of the cell was determined. Long ID regions constitute 13.3 and 3.5% of the human plasma membrane proteins on the inside and outside of the cell, respectively, showing that they preferentially occur on the cytoplasmic side. We interpret this phenomenon as a reflection of the general scarcity of ID regions on the extracellular side and their relative abundance on the cytoplasmic side in multicellular eukaryotic organisms.
Journal of Molecular Biology 06/2007; 368(3):902-13. · 3.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is desirable to estimate a tree of life, a species tree including all available species in the 3 superkingdoms, Archaea, Bacteria, and Eukaryota, using not a limited number of genes but full-scale genome information. Here, we report a new method for constructing a tree of life based on protein domain organizations, that is, sequential order of domains in a protein, of all proteins detected in a genome of an organism. The new method is free from the identification of orthologous gene sets and therefore does not require the burdensome and error-prone computation. By pairwise comparisons of the repertoires of protein domain organizations of 17 archaeal, 136 bacterial, and 14 eukaryotic organisms, we computed evolutionary distances among them and constructed a tree of life. Our tree shows monophyly in Archaea, Bacteria, and Eukaryota and then monophyly in each of eukaryotic kingdoms and in most bacterial phyla. In addition, the branching pattern of the bacterial phyla in our tree is consistent with the widely accepted bacterial taxonomy and is very close to other genome-based trees. A couple of inconsistent aspects between the traditional trees and the genome-based trees including ours, however, would perhaps urge to revise the conventional view, particularly on the phylogenetic positions of hyperthermophiles.
Molecular Biology and Evolution 06/2007; 24(5):1181-9. · 10.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The formation mechanism of operons remains unresolved: operons may form by rearrangements within a genome or by acquisition of genes from other species, that is, horizontal gene transfer (HGT). One hindrance to its elucidation is the unavailability of a method to accurately identify HGT, although it is generally considered to occur. It is critically important first to select horizontally transferred (HT) genes reliably and then to determine the extent to which HGT is involved in operon formation. For this purpose, we considered indels in terms of gene clusters instead of individual genes and chose candidates of HT genes in 8 species of Escherichia, Shigella, and Salmonella based on the minimization of indels. To select a benchmark set of positively HT genes against which we can evaluate the candidate set, we devised another procedure using intergenetic alignments. Comparison with the benchmark set demonstrated the absence of a significant number of false positives in the candidate set, showing the high reliability of the method. Analyses of Escherichia coli K-12 operons revealed that although approximately 20 operons were probably gained from the last common ancestor of the 8 gamma-proteobacteria, deletion of intervening genes accounts for the formation of no operons, whereas horizontal transfer expanded 2 operons and introduced 4 entire operons. Based on these observations and reasoning, we suggest that the main mechanism of operon gain is HGT rather than intragenomic rearrangements. We propose that genes with related essential functions tend to reside in conserved operons, whereas genes in nonconserved operons mostly confer slight advantage to the organisms and frequently undergo horizontal transfer and decay. HT genes constitute at least 5.5% of the genes in the 8 species and approximately 45% of which originate from other gamma-proteobacteria. Genes involved in viral functions and mobile and extrachromosomal element functions are HT more often than expected. This finding indicates frequent mediation of HGT by bacteriophages. On the other hand, not only informational genes (those involved in transcription, translation, and related processes) but also operational genes (those involved in housekeeping) are HT less frequently than expected.
Molecular Biology and Evolution 04/2007; 24(3):805-13. · 10.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The formation mechanism of operons remains controversial despite the proposal of many models. Although acquisition of genes from other species, horizontal gene transfer, is considered to occur, definitive concrete cases have been unavailable. It is desirable to select horizontally transferred genes reliably and examine their relationship to operons. We here developed a method to identify candidates of horizontally transferred genes based on minimization of gene cluster insertions/deletions. To select a benchmark set of positively horizontally transferred genes against which the candidate set can be appraised, we devised another procedure using intergenetic alignments. Comparison with the benchmark set of horizontally transferred genes demonstrated the absence of a significant number of false positives in the candidates, showing that the method identifies horizontally transferred genes with a high degree of confidence. Horizontally transferred genes constitute at least 5.5% of the genes in Escherichia, Shigella, and Salmonella and ~46% of which originate from other gamma-proteobacteria. Not only informational genes, but also operational genes (those involved in housekeeping) are horizontally transferred less frequently than expected. A gene-cluster analysis of Escherichia coli K-12 operons revealed that horizontal transfer produced four entire operons and expanded two operons, but deletion of intervening genes accounts for the formation of no operons. We propose that operons generally form by horizontal gene transfer. We further suggest that genes with related essential functions tend to reside in conserved operons, while genes in nonconserved operons generally confer slight advantage to the organisms and frequently undergo horizontal transfer and decay.
[Show abstract][Hide abstract] ABSTRACT: The change in the structural stability of Escherichia coli ribonuclease HI (RNase HI) due to single amino acid substitutions has been estimated computationally by the stability profile of mutant protein (SPMP) [Ota, M., Kanaya, S. Nishikawa, K., 1995. Desk-top analysis of the structural stability of various point mutations introduced into ribonuclease H. J. Mol. Biol. 248, 733-738]. As well, an effective strategy using random mutagenesis and genetic selection has been developed to obtain E. coli RNase HI mutants with enhanced thermostability [Haruki, M., Noguchi, E., Akasako, A., Oobatake, M., Itaya, M., Kanaya, S., 1994. A novel strategy for stabilization of Escherichia coli ribonuclease HI involving a screen for an intragenic suppressor of carboxyl-terminal deletions. J. Biol. Chem. 269, 26904-26911]. In this study, both methods were combined: random mutations were individually introduced to Lys99-Val101 on the N-terminus of the alpha-helix IV and the preceding beta-turn, where substitutions of other amino acid residues were expected to significantly increase the stability from SPMP, and then followed by genetic selection. Val101 to Ala, Gln, and Arg mutations were selected by genetic selection. The Val101-->Ala mutation increased the thermal stability of E. coli RNase HI by 2.0 degrees C in Tm at pH 5.5, whereas the Val101-->Gln and Val101-->Arg mutations decreased the thermostability. Separately, the Lys99-->Pro and Asn100-->Gly mutations were also introduced directly. The Lys99-->Pro mutation increased the thermostability of E. coli RNase HI by 1.8 degrees C in Tm at pH 5.5, whereas the Asn100-->Gly mutation decreased the thermostability by 17 degrees C. In addition, the Lys99-->Pro mutation altered the dependence of the enzymatic activity on divalent metal ions.
Journal of Biotechnology 07/2006; 124(3):512-22. · 3.18 Impact Factor