Conference PaperPDF Available

Evidence of post translational modification bias extracted from the tRNA and corresponding amino acid interplay across a set of diverse organisms

Authors:

Abstract and Figures

A post-translational modification (PTM) describes a form of biosynthesis for the task of initializing proteins for specific functions. PTMs are complexes which are involved in developing or customizing proteins to increase their functional diversity. In times of protein stress, PTMs may be involved in altering protein structures to allow for better chances of survival. Once the stress-condition has elapsed, PTMs are able to transform the protein's structure back to its original form for the continued survival of the protein. PTMs are not applied uniformly across organismal proteins and differing PTM preferences and usages may often exist between proteins of the same organism. Here, we study the frequency of factors (PTM predominance and their associated active sites, tRNAs and amino acids) which likely influence a PTM bias. We extract and study these factor frequencies across both mitochondrial (Mt) and non-Mt proteins of nine diverse organisms (closely following two, Arabidopsis thaliana and Caenorhabditis elegans, due to space limitations) to illustrate their remarkable differences which may strongly influence natural PTM selection. By this work, we offer evidence to argue that this PTM bias may be the result of these factors which combine in a poorly understood system to affect and control PTM interactions. Our analysis is made up of an application of frequency information concerning PTMs, active sites, tRNA and amino acids and is used to create network models for the clear visualization of its mechanisms for this PTM natural selection.
Content may be subject to copyright.
A preview of the PDF is not available
... Our system has two main applications: (1) to study the distances between MSs and all domains located in an organism's proteome, and (2) to study the distances between MSs and user-selected domains which are encountered in a wide variety proteins (Mt and non-Mt). In Section II-A we detail this organism-specific study to show that organisms exhibit general trends of MS spacings which are likely an extended part of their PTM bias as described in [9], [10]. We study the MSs of sequences relative to the domains in locations situated before (upstream), inside and after (downstream) of encountered domains in protein. ...
... Furthermore, we imply that the understanding of these PTM-domain relations may likely help to explain some of the reasons for a potential protein failure or disorder. Following our previous work of PTM involvement with proteomes [9], [10], we limited the current study to the same protein data of the 11 organisms listed in Table I. The protein data was downloaded from the UniProt Knowledge Base [11] protein database in March 2016. ...
... This part of the method allows for the study of all domains of an organism. Building from the work of [10], we note that this approach also describes any biases which may exist between organisms in connection to PTM usage. Other types of PTM usage biases have been uncovered and are discussed in [9]. ...
... In this article, we extend our original study in [18], which described some of the initial patterns of PTM bias inherent in some of the organisms of the present study. We studied the proteomes of 11 diverse organisms shown in Table 1 to show that each organism has unique PTM biases and an associated RS bias. ...
... The networks are read from the left-side PTMs, which interact with the RSs on the right side. We summarize the main results from the networks in [18]. ...
Article
Full-text available
Post-translational modifications (PTMs) are important steps in the biosynthesis of proteins. Aside from their integral contributions to protein development, i.e. perform specialized proteolytic cleavage of regulatory subunits, the covalent addition of functional groups of proteins or the degradation of entire proteins, PTMs are also involved in enabling proteins to withstand and recover from temporary environmental stresses (heat shock, microgravity and many others). The literature supports evidence of thousands of recently discovered PTMs, many of which may likely contribute similarly (perhaps, even, interchangeably) to protein stress response. Although there are many PTM actors on the biological stage, our study determines that these PTMs are generally cast into organism-specific, preferential roles. In this work, we study the PTM compositions across the mitochondrial (Mt) and non-Mt proteomes of 11 diverse organisms to illustrate that each organism appears to have a unique list of PTMs, and an equally unique list of PTM-associated residue reaction sites (RSs), where PTMs interact with protein. Despite the present limitation of available PTM data across different species, we apply existing and current protein data to illustrate particular organismal biases. We explore the relative frequencies of observed PTMs, the RSs and general amino-acid compositions of Mt and non-Mt proteomes. We apply these data to create networks and heatmaps to illustrate the evidence of bias. We show that the number of PTMs and RSs appears to grow along with organismal complexity, which may imply that environmental stress could play a role in this bias.
... Investigators undergoing literature reviews for articles containing these keywords may consult these networks to begin some of their work. Due to multiple PTMs which are likely working together for a process for disorders such as Parkinson's [28] and discussed in [29], [30], [20], investigators may also wish to consult PTM relationship networks, such as that of Figure 12 (glycosylation), to gain a fuller understanding of other PTMs that may work in tandem. ...
Chapter
Full-text available
Investigators in bioinformatics are often confronted with the difficult task of connecting ideas, which are found scattered around the literature, using robust keyword searches. It is often customary to identify only a few keywords in a research article to facilitate search algorithms, which is usually completed in absence of a general approach that would serve to index all possible keywords of an article’s characteristic attributes. Based on only a hand-full of keywords, articles are therefore prioritized by search algorithms that point investigators to seeming subsets of their knowledge. In addition, many articles escape algorithm search strategies due to the fact that their keywords were vague, or have become unfashionable terms. In this case, the article, as well as its source of knowledge, may be lost to the community. Owing to the growing size of the literature, we introduce a text mining method and tool, (BeagleTM), for knowledge harvesting from papers in a literature corpus without the use of article meta-data. Unlike other text mining tools that only highlight found keywords in articles, our method allows users to visually ascertain which keywords have been featured in studies together with others in peer-reviewed work. Drawing from an arbitrarily-sized corpus, BeagleTM creates visual networks describing interrelationships between user-defined terms to facilitate the discovery of connected or parallel studies. We report the effectiveness of BeagleTM by illustrating its ability to connect the keywords from types of PTMs (post-translational modifications), stress-factors, and disorders together according to their relationships. These relationships facilitate the discovery of connected studies, which is often challenging to determine due to the frequently unrelated keywords that were tied to relevant articles containing this type of information.
... In our previous work [12], [13], [15], we used public databases to determine PTM interactions and their frequencies of occurrence in proteomes. Although the data of this study is from a corpus of disconnected literature, and not a database of observed PTM interactions with proteins, we note that there are several fundamental similarities between the observations of this current work and those of our previous studies. ...
Conference Paper
Full-text available
In the proteome, stresses may work against optimal protein function and PTMs play roles in protein stress responses. Many peer-reviewed articles are available to bioinformatics research in the literature, however, the details of stress, protein and their PTM interactions have been scattered throughout the literature and these concepts are mentioned amongst the other details of respective studies. In each publication, for instance, there are many small pieces of knowledge which could be combined to build a better understanding. Since it is impossible to harvest all of its available knowledge using manual means, text mining methods are an attractive approach to assemble ideas from articles where these concepts may not have been a main focus. We present a text mining method to harvest and assemble a knowledge base relating to the relationships of stresses, proteins and PTMs from the literature. Although we also studied the stresses, proteins and PTMs which were associated with apoptosis, diabetes and Parkinson’s diseases in the literature, to introduce our method, we address these concepts as they are related to Alzheimer’s disease. We use the results from our text mining tool to process article abstracts to build networks which suggest how functional proteins may be linked to environmental stresses and their PTMs. We discuss how networks of biologically relevant keywords may eventually be used to describe directions in research which could be further explored to forecast new trends of studies. We also show how our method may help to predict stress, protein and PTM associations which may be included in these future studies.
Article
Full-text available
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and Abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov.
Article
Full-text available
Animal mitochondrial DNA (mtDNA) is usually depicted as a small and very economically organized molecule with almost invariable gene content, stable gene order, a high rate of sequence evolution, and several unorthodox genetic features. Sampling across different animal phyla reveals that such a description applies primarily to mtDNA of bilaterian animals (such as arthropods or chordates). By contrast, mitochondrial genomes of nonbilaterian animals (phyla Cnidaria, Placozoa, and Porifera) display more variation in size and gene content and, in most cases, lack the genetic novelties associated with bilaterian mtDNA. Outside the Metazoa, mtDNA of the choanoflagellate Monosiga brevicollis, the closest unicellular out-group, is a much larger molecule that contains a large proportion of noncoding DNA, 1.5 times more genes, as well as several introns. Thus, changes in animal mtDNA organization appear to correlate with two main transitions in animal evolution: the origin of multicellularity and the origin of the Bilateria. Studies of mtDNA in nonbilaterian animals provide valuable insights into these transitions in the organization of mtDNA and also supply data for phylogenetic analyses of the relationships of early animals. Here I review recent progress in the understanding of nonbilaterian mtDNA and discuss the advantages and limitations of mitochondrial data sets for inferences about the phylogeny and evolution of animals.
Conference Paper
Full-text available
Occurring naturally along the genomes of many viruses and other pathogens, short palindromic restriction sites (<14bps) are often exploited by bacterial restriction enzymes as autoimmune defenses to end pathogen threats. These motifs may also appear in the host's genome where they are methylated so as not to attract restriction enzymes to the host's genetic material. Since these motifs in the host's genome may pose a significant danger, it is likely that their numbers have been reduced due to possible failures of methylation during evolutionary time. These palindromes are composed of bases likely containing information relating to codons used for protein translation. If palindromes are reduced in the genome, then its sequence composition making up the codons may also be found in reduced quantities. Furthermore, during translation codons are associated with tRNAs for protein fabrication which may also occur in reduced numbers. We suggest that a pathway of reduction that can be followed from the onset of these missing palindromes to the reduction (or absence) of specific tRNAs correlated to the codons from the palindromes. To create evidence for this pathway, we studied the bacterial genomes of Bacillus subtilis, Escherichia coli, Haemophilus influenzae, Methanococcus jannaschii, Mycoplasma genitalium, Synechocystis sp. and Marchantia polymorpha. Across these organisms, we applied statistical data from reduced palindromic populations (biological and non-relevant words) to regression models and performed an analysis of genomic tRNA presence from their compositions. We illustrate a pathway of reduction that extends from palindromes to tRNAs which may follow from evolutionary pressures concerning restriction site handling.
Article
Full-text available
The 20 protein-coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, a diverse set of protein sequences is necessary to build functional proteomes. Here, we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data are remarkably well explained when the cost function accounts for amino acid chemical decay. More than 100 organisms reach comparable solutions to the trade-off by different combinations of proteome cost and sequence diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse.
Article
Full-text available
On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.
Article
Full-text available
Protein post-translational modifications (PTMs) allow the cell to regulate protein activity and play a crucial role in the response to changes in external conditions or internal states. Advances in mass spectrometry now enable proteome wide characterization of PTMs and have revealed a broad functional role for a range of different types of modifications. Here we review advances in the study of the evolution and function of PTMs that were spurred by these technological improvements. We provide an overview of studies focusing on the origin and evolution of regulatory enzymes as well as the evolutionary dynamics of modification sites. Finally, we discuss different mechanisms of altering protein activity via post-translational regulation and progress made in the large-scale functional characterization of PTM function.
Article
Full-text available
Exposure to microgravity conditions is detrimental to animal and human protein tissue and is linked to ailments associated with aging, disease and other disorders originating at the protein level. With exposure, dangerously low blood pressure results from diminished blood production forces the heart to beat at abnormal rates and causes damage. The heart, like the other muscles of the body, risk developing muscular atrophy from the reduced dependence on muscle-use. Oxidative carbonylation, the addition of a CO to an amino acid chain, is a natural process used by the cell to degrade and remove proteins. This reaction may also cause many of the diseases associated with protein dysfunction (Alzheimer’s, muscular atrophy, Parkinsons, sepsis, etc.). Although aging has been associated with similar ailments from protein degradation, the stress from weightlessness is thought to increase the rates of oxidative processes impacting general health by upsetting protein function and its structure. Carbonylation is an oxidative reaction for which, motifs high in R, K, P, T, E and S residues can be used to explore its composition in protein data. Since mitochondria also apply oxidative processes to make energy, we hypothesize that this reaction is highly contained so as to minimize local oxidative damage. In this paper, we evaluate the coverage of motifs which are likely attractors of oxidative activity across mitochondrial and non-mitochondrial protein data of fourteen diverse organisms. Here we show that mitochondrial proteins have generally reduced amounts of the same oxidative carbonylation content which we found in abundance in the organism’s nuclear proteins. Furthermore, we show that this general finding is similar between two major profiling systems: oxidative carbonylation (RKPT enriched sequences) and protein degradation (PEST sequences). We suggest an mitochondrial intolerance for motifs that may attract forms of oxidation.
Article
Choices of synonymous codons in unicellular organisms are here reviewed, and differences in synonymous codon usages between Escherichia coli and the yeast Saccharomyces cerevisiae are attributed to differences in the actual populations of isoaccepting tRNAs. There exists a strong positive correlation between codon usage and tRNA content in both organisms, and the extent of this correlation relates to the protein production levels of individual genes. Codon-choice patterns are believed to have been well conserved during the course of evolution. Examination of silent substitutions and tRNA populations in Enterobacteriaceae revealed that the evolutionary constraint imposed by tRNA content on codon usage decelerated rather than accelerated the silent-substitution rate, at least insofar as pairs of taxonomically related organisms were examined. Codon-choice patterns of multicellular organisms are briefly reviewed, and diversity in G+C percentage at the third position of codons in vertebrate genes--as well as a possible causative factor in the production of this diversity--is discussed.
Article
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing pages, GeneMap’99, Davis Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP) pages, Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP) pages, SAGEmap, Online Mendelian Inheritance in Man (OMIM) and the Molecular Modeling Database (MMDB). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov