Article

Metabolism of Oxalate in Humans: A Potential Role Kynurenine Aminotransferase/Glutamine Transaminase/Cysteine Conjugate Beta-lyase Plays in Hyperoxaluria

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Hyperoxaluria, excessive urinary oxalate excretion, is a significant health problem worldwide. Disrupted oxalate metabolism has been implicated in hyperoxaluria and accordingly, an enzymatic disturbance in oxalate biosynthesis can result in the primary hyperoxaluria. Alanine glyoxylate aminotransferase-1 and glyoxylate reductase, the enzymes involving glyoxylate (precursor for oxalate) metabolism, have been related to primary hyperoxalurias. Some studies suggest that other enzymes such as glycolate oxidase and alanine glyoxylate aminotransferase-2 might be associated with primary hyperoxaluria as well, but evidence of a definitive link is not strong between the clinical cases and gene mutations. There are still some idiopathic hyperoxalurias, which require a further study for the etiologies. Some aminotransferases, particularly kynurenine aminotransferases, can convert glyoxylate to glycine. Based on biochemical and structural characteristics, expression level, subcellular localization of some aminotransferases, a number of them appear able to catalyze the transamination of glyoxylate to glycine more efficiently than alanine glyoxylate aminotransferase-1. The aim of this minireview is to explore other undermining causes of primary hyperoxaluria and stimulate research toward achieving a comprehensive understanding of underlying mechanisms leading to the disease. Herein, we reviewed all aminotransferases in the liver for their functions in glyoxylate metabolism. Particularly, kynurenine aminotransferase-I and III were carefully discussed regarding their biochemical and structural characteristics, cellular localization, and enzyme inhibition. Kynurenine aminotransferase-III is, so far, the most efficient putative mitochondrial enzyme to transaminate glyoxylate to glycine in mammalian livers, might be an interesting enzyme to look over in hyperoxaluria etiology of primary hyperoxaluria and should be carefully investigated for its involvement in oxalate metabolism.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Mammalian tissues exhibit at least four enzymes capable of catalyzing transamination of L-kynurenine (KAT1 through KAT4). KAT4 is identical to mitochondrial aspartate aminotransferase (AspAT), whereas KAT1 and KAT3 are identical to GTK and GTL, respectively [2,3,[31][32][33][35][36][37][38][39][40]. The L-glutamine transaminase activities of KAT1 and KAT3 are discussed in more detail below (Section 5). ...
... Failure to convert glyoxylate to glycine by an aminotransferase (peroxisomal alanine aminotransferase) mistargeted to mitochondria results in hyperoxaluria type 1 and potentially the formation of renal calculi (calcium oxalate stones) ( [88] and references cited therein). The L-glutamine transaminases may play a role in preventing the accumulation of potentially toxic oxalate in the kidneys by removing the precursor, glyoxylate [40]. ...
Article
Full-text available
Simple Summary Many types of cancer cells utilize the common amino acid l-glutamine to maintain their metabolic demands for energy and the nitrogen required for DNA synthesis. Versatility to control energy metabolism from l-glutamine (anaplerosis) is promoted by the use of two distinct pathways. The first (pathway 1; the canonical pathway) is as follows: [l-glutamine → l-glutamate ⇆ α-ketoglutarate → tricarboxylic acid (TCA) cycle]. This pathway contrasts with the much less studied GTωA (glutamine transaminase—ω-amidase or glutaminase II) pathway (pathway 2): [l-glutamine ⇆ α-ketoglutaramate (KGM) → α-ketoglutarate → TCA cycle]. Our prior publications have emphasized the importance of regulation of both pathways, which enables cancer cells to maintain selective metabolic advantages and to conserve their reliance on glucose. This review summarizes the metabolic importance of the GTωA pathway in both cancerous and normal tissues and proposes that anti-cancer strategies, based on inhibition of l-glutamine metabolism, require consideration of both the canonical and GTωA pathways. Abstract Many cancers utilize l-glutamine as a major energy source. Often cited in the literature as “l-glutamine addiction”, this well-characterized pathway involves hydrolysis of l-glutamine by a glutaminase to l-glutamate, followed by oxidative deamination, or transamination, to α-ketoglutarate, which enters the tricarboxylic acid cycle. However, mammalian tissues/cancers possess a rarely mentioned, alternative pathway (the glutaminase II pathway): l-glutamine is transaminated to α-ketoglutaramate (KGM), followed by ω-amidase (ωA)-catalyzed hydrolysis of KGM to α-ketoglutarate. The name glutaminase II may be confused with the glutaminase 2 (GLS2) isozyme. Thus, we recently renamed the glutaminase II pathway the “glutamine transaminase—ω-amidase (GTωA)” pathway. Herein, we summarize the metabolic importance of the GTωA pathway, including its role in closing the methionine salvage pathway, and as a source of anaplerotic α-ketoglutarate. An advantage of the GTωA pathway is that there is no net change in redox status, permitting α-ketoglutarate production during hypoxia, diminishing cellular energy demands. We suggest that the ability to coordinate control of both pathways bestows a metabolic advantage to cancer cells. Finally, we discuss possible benefits of GTωA pathway inhibitors, not only as aids to studying the normal biological roles of the pathway but also as possible useful anticancer agents.
... Han et al., as we have pointed out in the current review, have also recently emphasized the importance of several aminotransferases in transaminating glyoxylate [79]. The authors especially single out a possible link between KAT3/GTL and primary hyperoxaluria. ...
... Glutamine has already been used clinically; for example, it is generally well tolerated in the treatment of sickle cell disease [80]. The possibility that interventions designed to increase kidney glutamine levels would be beneficial in primary hyperoxaluria needs to be evaluated [79]. We also suggest that, given the fact that AGT1 is an asparagine transaminase and that several other aminotransferases are active with asparagine and glyoxylate, strategies designed to increase the in vivo levels of asparagine may also have therapeutic value in some types of hyperoxaluria. ...
Article
The asparaginase II pathway consists of an asparagine transaminase [L-asparagine + α-keto acid ⇆ α-ketosuccinamate + L-amino acid] coupled to ω-amidase [α-ketosuccinamate + H2O → oxaloacetate + NH4⁺]. The net reaction is: L-asparagine + α-keto acid + H2O → oxaloacetate + L-amino acid + NH4⁺. Thus, in the presence of a suitable α-keto acid substrate, the asparaginase II pathway generates anaplerotic oxaloacetate at the expense of readily dispensable asparagine. Several studies have shown that the asparaginase II pathway is important in photorespiration in plants. However, since its discovery in rat tissues in the 1950s, this pathway has been almost completely ignored as a conduit for asparagine metabolism in mammals. Several mammalian transaminases can catalyze transamination of asparagine, one of which – alanine-glyoxylate aminotransferase type 1 (AGT1) – is important in glyoxylate metabolism. Glyoxylate is a precursor of oxalate which, in the form of its calcium salt, is a major contributor to the formation of kidney stones. Thus, transamination of glyoxylate with asparagine may be physiologically important for the removal of potentially toxic glyoxylate. Asparaginase has been the mainstay treatment for certain childhood leukemias. We suggest that an inhibitor of ω-amidase may potentiate the therapeutic benefits of asparaginase treatment.
... Highly insoluble calcium oxalate is a major contributor to kidney stones. It has been suggested that glutamine transaminases/KATs may divert glyoxylate to glycine in the kidney thereby lowering the possibility of the conversion of glyoxylate to oxalate [35]. ...
Article
Full-text available
A large literature exists on the biochemistry, chemistry, metabolism, and clinical importance of the α-keto acid analogues of many amino acids. However, although glutamine is the most abundant amino acid in human tissues, and transamination of glutamine to its α-keto acid analogue (α-ketoglutaramate; KGM) was described more than seventy years ago, little information is available on the biological importance of KGM. Herein, we summarize the metabolic importance of KGM as an intermediate in the glutamine transaminase – ω-amidase (GTωA) pathway for the conversion of glutamine to anaplerotic α-ketoglutarate. We describe some properties of KGM, notably its occurrence as a lactam (2-hydroxy-5-oxoproline; 99.7% at pH 7.2), and its presence in normal tissues and body fluids. We note that the concentration of KGM is elevated in the cerebrospinal fluid of liver disease patients and that the urinary KGM/creatinine ratio is elevated in patients with an inborn error of the urea cycle and in patients with citrin deficiency. Recently, of the 607 urinary metabolites measured in a kidney disease study, KGM was noted to be one of five metabolites that was most significantly associated with uromodulin (a potential biomarker for tubular functional mass). Finally, we note that KGM is an intermediate in the breakdown of nicotine in certain organisms and is an important factor in nitrogen homeostasis in some microorganisms and plants. In conclusion, we suggest that biochemists and clinicians should consider KGM as (i) a key intermediate in nitrogen metabolism in all branches of life, and (ii) a biomarker, along with ω-amidase, in several diseases.
... According to previous studies, CCBL2 facilitated the clearance of nephrotoxic substances [26]. The expression of CCBL2 was also decreased in patients with hyperoxaluria [27]. Moreover, CCBL2 expression was positively correlated with the occurrence of hospital-acquired VTE [19]. ...
Article
Full-text available
Objective Cysteine conjugate beta-lyase 2 (CCBL2), also known as kynurenine aminotransferase 3 (KAT3) or glutamine transaminase L (GTL), plays an essential role in transamination and cytochrome P450. Its correlation with some other cancers has been explored, but breast cancer (BC) not yet. Methods The mRNA and protein expression of CCBL2 in BC cell lines and patient samples were detected by RT-qPCR and immunohistochemistry (IHC). BC patients’ clinical information and RNA-Seq expression were acquired via The Cancer Genome Atlas (TCGA) database. Patients were categorized into high/low CCBL2 expression groups based on the optimal cutoff value (8.973) determined by receiver operating characteristic (ROC) curve. We investigated CCBL2 and clinicopathological characteristics’ relationship using Chi-square tests, estimated diagnostic capacity using ROC curves and drew survival curves using Kaplan–Meier estimate. We compared survival differences using Cox regression and externally validated using Gene Expression Omnibus (GEO) database. We evaluated enriched signaling pathways using gene set enrichment analysis (GSEA), explored CCBL2 and relevant genes’ relationship using tumor immunoassay resource (TIMER) databases and used the human protein atlas (HPA) for pan-cancer analysis and IHC. Results CCBL2 was overexpressed in normal human cell lines and tissues. CCBL2 expression was lower in BC tissues (n = 1104) than in normal tissues (n = 114), validated by GEO database. Several clinicopathologic features were related to CCBL2, especially estrogen receptor (ER), progesterone receptor (PR) and clinical stages. The low expression group exhibited poor survival. CCBL2’s area under curve (AUC) analysis showed finite diagnostic capacity. Multivariate cox-regression analysis indicated CCBL2 independently predicted BC survival. GSEA showed enriched pathways: early estrogen response, MYC and so on. CCBL2 positively correlated with estrogen, progesterone and androgen receptors. CCBL2 was downregulated in most cancers and was associated with their survival, including renal and ovarian cancers. Conclusions Low CCBL2 expression is a promising poor BC survival independent prognostic marker.
... Disrupted oxalate metabolism has been implicated in hyperoxaluria and accordingly, an enzymatic disturbance in oxalate biosynthesis can result in the primary hyperoxaluria. The review article entitled "Metabolism of Oxalate in Humans: A Potential Role Kynurenine Aminotransferase/Glutamine Transaminase/Cysteine Conjugate Beta-lyase Plays in Hyperoxaluria" by Professor Dr. J. Y Li et al. [10], reviewed all aminotransferases in the liver for their functions in glyoxylate metabolism. Particularly, kynurenine aminotransferase-I and III were carefully reviewed regarding their biochemical and structural characteristics, cellular localization, and enzyme inhibition. ...
Article
Full-text available
Oxalate is a metabolic end-product whose systemic concentrations are highly variable among individuals. Genetic (primary hyperoxaluria) and non-genetic (e.g., diet, microbiota, renal and metabolic disease) reasons underlie elevated plasma concentrations and tissue accumulation of oxalate, which is toxic to the body. A classic example is the triad of primary hyperoxaluria, nephrolithiasis, and kidney injury. Lessons learned from this example suggest further investigation of other putative factors associated with oxalate dysmetabolism, namely the identification of precursors (glyoxylate, aromatic amino acids, glyoxal and vitamin C), the regulation of the endogenous pathways that produce oxalate, or the microbiota’s contribution to oxalate systemic availability. The association between secondary nephrolithiasis and cardiovascular and metabolic diseases (hypertension, type 2 diabetes, and obesity) inspired the authors to perform this comprehensive review about oxalate dysmetabolism and its relation to cardiometabolic toxicity. This perspective may offer something substantial that helps advance understanding of effective management and draws attention to the novel class of treatments available in clinical practice.
Article
Kynurenic acid (KYNA), an unavoidable tryptophan metabolite during fermentation is naturally blended with alcohol in all alcoholic beverages. Thus, alcohol drinking inevitably results in co-intake of KYNA. Effects of alcohol or KYNA on human health have been widely studied. However, the combined effects of both remain unknown. Here we report that alcohol and KYNA have a synergistic impact of on global gene expression, especially the gene sets related to tryptophan metabolism and cell signaling. Adult mice were exposed to alcohol (ethanol) and/or KYNA daily for a week. Transcriptomes of the brain, kidney and liver were profiled via bulk RNA sequencing. Results indicate that while KYNA alone largely promotes, and alcohol alone mostly inhibits gene expression, alcohol and KYNA co-administration has a stronger inhibition of global gene expression. Tryptophan metabolism is severely skewed towards kynurenine pathway by decreasing tryptophan hydroxylase 2 and increasing tryptophan dioxygenase. Quantification of tryptophan metabolic enzymes corroborates the transcriptional changes of these enzymes. Furthermore, the co-administration greatly enhances the GnRH signaling pathway. This research provides critical data to better understand the effects of alcohol and KYNA in mix on human health. other version: 10.2139/ssrn.4109815
Article
Full-text available
Schizophrenia (SCZ) is a devastating genetic mental disorder. Identification of the SCZ risk genes in brains is helpful to understand this disease. Thus, we first used the minimum Redundancy-Maximum Relevance (mRMR) approach to integrate the genome-wide sequence analysis results on SCZ and the expression quantitative trait locus (eQTL) data from ten brain tissues to identify the genes related to SCZ. Second, we adopted the variance inflation factor regression algorithm to identify their interacting genes in brains. Third, using multiple analysis methods, we explored and validated their roles. By means of the aforementioned procedures, we have found that (1) the cerebellum may play a crucial role in the pathogenesis of SCZ and (2) ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ. These interesting findings may stimulate novel strategy for developing new drugs against SCZ. It has not escaped our notice that the approach reported here is of use for studying many other genome diseases as well.
Article
Full-text available
Meiotic recombination caused by meiotic double-strand DNA breaks. In some regions the frequency of DNA recombination is relatively higher, while in other regions the frequency is lower: the former is usually called “recombination hotspot”, while the latter the “recombination coldspot”. Information of the hot and cold spots may provide important clues for understanding the mechanism of genome revolution. Therefore, it is important to accurately predict these spots. In this study, we rebuilt the benchmark dataset by unifying its samples with a same length (131 bp). Based on such a foundation and using SVM (Support Vector Machine) classifier, a new predictor called “iRSpot-Pse6NC” was developed by incorporating the key hexamer features into the general PseKNC (Pseudo K-tuple Nucleotide Composition) via the binomial distribution approach. It has been observed via rigorous cross-validations that the proposed predictor is superior to its counterparts in overall accuracy, stability, sensitivity and specificity. For the convenience of most experimental scientists, the web-server for iRSpot-Pse6NC has been established at http://lin-group.cn/server/iRSpot-Pse6NC, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.
Article
Full-text available
RNA modifications are additions of chemical groups to nucleotides or their local structural changes. Knowledge about the occurrence sites of these modifications is essential for in-depth understanding of the biological functions and mechanisms and for treating some genomic diseases as well. With the avalanche of RNA sequences generated in the post-genomic age, many computational methods have been proposed for identifying various types of RNA modifications one by one. However, so far no method whatsoever has been developed for simultaneously identifying several different types of RNA modifications. To address such a challenge, we developed a predictor called “iRNA-3typeA,” by which we can simultaneously identify the occurrence sites of the following three most frequently observed modifications in RNA: (1) N¹-methyladenosine (m¹A), (2) N⁶-methyladenosine (m⁶A), and (3) adenosine to inosine (A-to-I). It has been shown via rigorous cross-validations for the RNA sequences from Homo sapiens and Mus musculus transcriptomes that the success rates achieved by the powerful new predictor are quite high. For the convenience of broad experimental scientists, a user-friendly web server for iRNA-3typeA has been established at http://lin-group.cn/server/iRNA-3typeA/. It is anticipated that iRNA-3typeA may become a useful high throughput tool for genome analysis.
Article
Full-text available
The massive technical and computational progress of biomolecular crystallography has generated some adverse side effects. Most crystal structure models, produced by crystallographers or well-trained structural biologists, constitute useful sources of information, but occasional extreme outliers remind us that the process of structure determination is not fail-safe. The occurrence of severe errors or gross misinterpretations raises fundamental questions: Why do such aberrations emerge in the first place? How did they evade the sophisticated validation procedures which often produce clear and dire warnings, and why were severe errors not noticed by the depositors themselves, their supervisors, referees, and editors? Once detected, what can be done to either correct, improve, or eliminate such models? How do incorrect models affect the underlying claims or biomedical hypotheses they were intended, but failed, to support? What is the long-range effect of the propagation of such errors? And finally, what mechanisms can be envisioned to restore the validity of the scientific record and, if necessary, retract publications that are clearly invalidated by the lack of experimental evidence? We suggest that cognitive bias and flawed epistemology are likely at the root of the problem. By using examples from the published literature and from public repositories such as the Protein Data Bank, we provide case summaries to guide correction or improvement of structural models. When strong claims are unsustainable because of a deficient crystallographic model, removal of such a model and even retraction of the affected publication are necessary to restore the integrity of the scientific record. This article is protected by copyright. All rights reserved.
Article
Full-text available
Primary hyperoxaluria type I (PH1) is a rare disease caused by the deficit of liver alanine–glyoxylate aminotransferase (AGT). AGT prevents oxalate formation by converting peroxisomal glyoxylate to glycine. When the enzyme is deficient, progressive calcium oxalate stones deposit first in the urinary tract and then at the systemic level. Pyridoxal 5′-phosphate (PLP), the AGT coenzyme, exerts a chaperone role by promoting dimerization, as demonstrated by studies at protein and cellular level. Thus, variants showing a destabilized dimeric structure should, in principle, be responsive to vitamin B6, a precursor of PLP. However, models to predict the extent of responsiveness of each variant are missing. We examined the effects of pathogenic interfacial mutations by combining bioinformatic predictions with molecular and cellular studies on selected variants (R36H, G42E, I56N, G63R, and G216R), in both their holo- (i.e., with bound PLP) and apo- (i.e., without bound PLP) form. We found that all variants displayed structural alterations mainly related to the apoform and consisting of an altered tertiary and quaternary structure. G216R also shows a strongly reduced catalytic efficiency. Moreover, all but G216R respond to vitamin B6, as shown by their increased specific activity and expression level in a cellular disease model. A global analysis of data unraveled a possible inverse correlation between the degree of destabilization/misfolding induced by a mutation and the extent of B6 responsiveness. These results provide a first explanation of factors influencing B6 response in PH1, a model possibly valuable for other rare diseases caused by protein deficits.
Article
Full-text available
A two-level principal component predictor (2L-PCA) was proposed based on the principal component analysis (PCA) approach. It can be used to quantitatively analyze various compounds and peptides about their functions or potentials to become useful drugs. One level is for dealing with the physicochemical properties of drug molecules, while the other level is for dealing with their structural fragments. The predictor has the self-learning and feedback features to automatically improve its accuracy. It is anticipated that 2L-PCA will become a very useful tool for timely providing various useful clues during the process of drug development.
Article
Full-text available
Being a neurodegenerative disorder, Alzheimer's disease (AD) is the one of the most terrible diseases. And acetylcholinesterase (AChE) is considered as an important target for treating AD. Acetylcholinesterase inhibitors (AChEI) are considered to be one of the effective drugs for the treatment of AD. The aim of this study is to find a novel potential AChEI as a drug for the treatment of AD. In this study, instead of using the synthetic compounds, we used those extracted from plants to investigate the interaction between floribundiquinone B (FB) and AChE by means of both the experimental approach such as fluorescence spectra, ultraviolet-visible (UV-vis) absorption spectrometry, circular dichroism (CD) and the theoretical approaches such as molecular docking. The findings reported here have provided many useful clues and hints for designing more effective and less toxic drugs against Alzheimer's disease.
Article
Full-text available
Primary hyperoxalurias (PH) are devastating, autosomal recessive diseases causing renal stones. Undifferentiated hyperoxaluria is seen in up to 43% of Pakistani paediatric stone patients. High rates of consanguinity in Pakistan suggest significant local prevalence. There is no detailed information regarding number of cases, clinical features, and genetics in Pakistan-origin (P-o) patients. We reviewed available information on P-o PH patients recorded in the literature as well as from two major PH registries (the Rare Kidney Stone Consortium PH Registry (RKSCPHR) and the OxalEurope PH Registry (OxER); and the Aga Khan University Hospital in Pakistan. After excluding overlaps, we noted 217 P-o PH subjects (42 in OxER and 4 in RKSCPHR). Presentations were protean. Details of mutations were available for 94 patients of 201 who had genetic analyses. Unique mutations were noted. Mutation [c.508G>A (p. Gly170Arg)] (present in up to 25% in the West) was reported in only one case. In one series, only 30% had mutations on exons 1,4,7 of AGXT. Of 42 P-o patients in OxER, 52.4% were PH1, 45.2% PH2, and 2.4% PH3. Of concern is that diagnosis was made after renal transplant rejection (four cases) and on bone-marrow aspiration (in five). Lack of consideration of PH as a diagnosis, late diagnosis, and loss of transplanted kidneys mandates that PH be searched for diligently. Mutation analysis will need to extend to all exons and include PH 1,2,3. There is a need to spread awareness and identify patients through a scoring or screening system that alerts physicians to consider a diagnosis of PH.
Article
Full-text available
Occurring at cytosine (C) of RNA, 5-methylcytosine (m5C) is an important post-transcriptional modification (PTCM). The modification plays significant roles in biological processes by regulating RNA metabolism in both eukaryotes and prokaryotes. It may also, however, cause cancers and other major diseases. Given an uncharacterized RNA sequence that contains many C residues, can we identify which one of them can be of m5C modification, and which one cannot? It is no doubt a crucial problem, particularly with the explosive growth of RNA sequences in the postgenomic age. Unfortunately, so far no user-friendly web-server whatsoever has been developed to address such a problem. To meet the increasingly high demand from most experimental scientists working in the area of drug development, we have developed a new predictor called iRNAm5C-PseDNC by incorporating ten types of physical-chemical properties into pseudo dinucleotide composition via the auto/cross-covariance approach. Rigorous jackknife tests show that its anticipated accuracy is quite high. For most experimental scientists' convenience, a user-friendly web-server for the predictor has been provided at http://www.jci-bioinfo.cn/iRNAm5C-PseDNC along with a step-by-step user guide, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the approach presented here can also be used to deal with many other problems in genome analysis.
Article
Full-text available
Disrupted kynurenine pathway (KP) metabolism has been implicated in the progression of neurodegenerative disease, psychiatric disorders and cancer. Modulation of enzyme activity along this pathway may therefore offer potential new therapeutic strategies for these conditions. Considering their prominent positions in the KP, the enzymes indoleamine 2,3-dioxygenase, kynurenine 3-monooxygenase and kynurenine aminotransferase, appear the most attractive targets. Already, increasing interest in this pathway has led to the identification of a number of potent and selective enzyme inhibitors with promising pre-clinical data and the elucidation of several enzyme crystal structures provides scope to rationalize the molecular mechanisms of inhibitor activity. The field seems poised to yield one or more inhibitors that should find clinical utility.
Article
Full-text available
Summary: Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of de- scriptors. Here, we present POSSUM (Position-Specific Scoring matrix-based feature generator for machine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM- based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to fa- cilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research. Availability and implementation: http://possum.erc.monash.edu/. Contact: trevor.lithgow@monash.edu or jiangning.song@monash.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Pse-in-One 2.0 is a package of web-servers evolved from Pse-in-One (Liu, B., Liu, F., Wang, X., Chen, J. Fang, L. & Chou, K.C. Nucleic Acids Research, 2015, 43:W65-W71). In order to make it more flexible and comprehensive as suggested by many users, the updated package has incorporated 23 new pseudo component modes as well as a series of new feature analysis approaches. It is available at http://bioinformatics.hitsz.edu.cn/Pse-in-One2.0/. Moreover, to maximize the convenience of users, provided is also the stand-alone version called “Pse-in-One-Analysis”, by which users can significantly speed up the analysis of massive sequences.
Article
Full-text available
Toxicity evaluation is an extremely important process during drug development. It is usually initiated by experiments on animals, which is time-consuming and costly. To speed up such a process, a quantitative structure-activity relationship (QSAR) study was performed to develop a computational model for correlating the structures of 581 aromatic compounds with their aquatic toxicity to tetrahymena pyriformis. A set of 68 molecular descriptors derived solely from the structures of the aromatic compounds were calculated based on Gaussian 03, HyperChem 7.5, and TSAR V3.3. A comprehensive feature selection method, minimum Redundancy Maximum Relevance (mRMR)-genetic algorithm (GA)-support vector regression (SVR) method, was applied to select the best descriptor subset in QSAR analysis. The SVR method was employed to model the toxicity potency from a training set of 500 compounds. Five-fold cross-validation method was used to optimize the parameters of SVR model. The new SVR model was tested on an independent dataset of 81 compounds. Both high internal consistent and external predictive rates were obtained, indicating the SVR model is very promising to become an effective tool for fast detecting the toxicity.
Article
Full-text available
As a subset of glycosyltransferases, the family of sialyltransferases catalyze transfer of sialic acid (Sia) residues to terminal non-reducing positions on oligosaccharide chains of glycoproteins and glycolipids, utilizing CMP-Neu5Ac as the activated sugar nucleotide donor. In the four known sialyltransferase families (ST3Gal, ST6Gal, ST6GalNAc and ST8Sia), the ST8Sia family catalyzes synthesis of si,8-linked sialic/polysialic acid (polySia) chains according to their acceptor specificity. We have determined 3D structural models of the ST8Sia family members, designated ST8Sia I(1), II(2), IV(4), V(5), and VI(6) using the Phyre2 server. Accuracy of these predicted models are based on the ST8Sia III crystal structure as the calculated template. The common structural features of these models are: (1) Their parallel aemplats and disulfide bonds are buried within the enzymes and are predominately surrounded by helices; (2) The anti-parallel β-sheets are located at the N-terminal region of the enzymes; (3) The mono-sialytransferases (mono-STs), ST8Sia I and VI, contain only a singly pair of disulfide bonds, and there are no anti-parallel -sheets in ST8Sia VI; (4) The N-terminal region of all of the mono-STs are located some distant away from their core structure; (5) These conformational features show that the 3D structures of the mono-STs are less tightly packed than the two polySTs, ST8Sia II and IV, and the oligo-ST, ST8Sia III. These structural features relate to the catalytic specificity of the monoST; (6) In contrast, the tighter structural features of ST8Sia II, IV and III relate to their ability to catalyze the processive synthesis of oligo- (ST8Sia III) and polySia chains (ST8Sia II & IV); (7) Although ST8Sia II, III and IV have similar conformations in their corresponding polysialyltransferase domain (PSTD) and polybasic region (PBR) motifs, the structure of ST8Sia III is looser than that of ST8Sia II and IV, and the amino acid components of the several three-residue-loops in these two motifs of ST8Sia III are different from that in ST8Sia II and IV. This is likely the structural basis for why ST8Sia III is an oligoST and not able to polysialylate and; (8) In contrast, essentially all amino acids within the three-residue-loops in the PSTD of ST8Sia II and IV are highly conserved, and many amino acids in the loops and the helices of these two motifs are critical for NCAM polysialylation by mutational analysis and confirmed by our recent NMR results. In summary, these new findings provide further insights into the molecular mechanisms underlying polySTs-NCAM recognition, polySTs-polySA/oligoSA interactions, and polysialylation of NCAM.
Article
Full-text available
Involved with important cellular or gene functions and implicated with many kinds of cancers, piRNAs, or piwi-interacting RNAs, are of small non-coding RNA with around 19-33 nucleotides in length. Given a small non-coding RNA molecule, can we predict whether it is of piRNA according to its sequence information alone? Furthermore, there are two types of piRNA: one has the function of instructing target mRNA deadenylation, and the other has not. Can we discriminate one from the other? With the avalanche of RNA sequences emerging in the postgenomic age, it is urgent to address the two problems for both basic research and drug development. Unfortunately, to our best knowledge, so far no computational methods whatsoever that could be used to deal with the second problem, needless to say to deal with the two problems together. Here, by incorporating the physicochemical properties of nucleotides into the pseudo K-tuple nucleotide composition (PseKNC), we proposed a powerful predictor called 2L-piRNA. It is a two-layer ensemble classifier, in which the 1st layer is for identifying whether a query RNA molecule as piRNA or non-piRNA, and the 2nd layer for identifying whether a piRNA being with or without the function of instructing target mRNA deadenylation. Rigorous cross validations have indicated that the success rates achieved by the proposed predictor are quite high. For the convenience of most biologists and drug development scientists, the web-server for 2L-piRNA has been established at http://bioinformatics.hitsz.edu.cn/2L-piRNA/, by which users can easily get their desired results without the need to go through the mathematical details.
Article
Full-text available
Recommended by the World Health Organization (WHO), drug compounds have been classified into 14 main ATC (Anatomical Therapeutic Chemical) classes according to their therapeutic and chemical characteristics. Given an uncharacterized compound, can we develop a computational method to fast identify which ATC class or classes it belongs to? The information thus obtained will timely help adjusting our focus and selection, significantly speeding up the drug development process. But this problem is by no means an easy one since some drug compounds may belong to two or more than two ATC classes. To address this problem, using the DO (Drug Ontology) approach based on the ChEBI (Chemical Entities of Biological Interest) database, we developed a predictor called iATC-mDO. Subsequently, hybridizing it with an existing drug ATC classifier, we constructed a predictor called iATC-mHyb. It has been demonstrated by the rigorous cross-validation and from five different measuring angles that iATC-mHyb is remarkably superior to the best existing predictor in identifying the ATC classes for drug compounds. To convenience most experimental scientists, a user-friendly web-server for iATC-mHyd has been established at http://www.jci-bioinfo.cn/iATC-mHyb, by which users can easily get their desired results without the need to go through the complicated mathematical equations involved.
Article
Full-text available
There are many different types of RNA modifications, which are essential for numerous biological processes. Knowledge about the occurrence sites of RNA modifications in its sequence is a key for in-depth understanding their biological functions and mechanism. Unfortunately, it is both time-consuming and laborious to determine these sites purely by experiments alone. Although some computational methods were developed in this regard, they each could only be used to deal with some type of modification individually. To our best knowledge, so far no method whatsoever has been developed that can identify the occurrence sites for several different types of RNA modifications with one seamless package or platform. To address such a challenge, a novel platform called “iRNA-PseColl” has been developed. It was formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides. Rigorous cross-validations have indicated that the anticipated success rates achieved by the proposed platform are quite high. To maximize the convenience for most experimental biologists, the platform’s web-server has been provided at http://lin.uestc.edu.cn/server/iRNA-PseColl along with a step-by-step user guide, by which users can easily get their desired results without the need to go through the mathematical details involved in this paper.
Article
Full-text available
Exploring the function of interleukin (IL) 17 and related cytokine interactions have been proven useful toward understanding the role of inflammation in autoimmune diseases. Production of the inflammatory cytokine IL-23 by dendritic cells (DC’s) has been shown to promote IL-17 expression by Th17 cells. It is well established that Th17 cells play an important role in several autoimmune diseases including psoriasis and alopecia. Our recent investigations have suggested that Kynurenine-rich environment can shift a pro-inflammatory response to an anti-inflammatory response, as is the case in the presence of the enzyme Indoleamine 2,3 dioxygenase (IDO), the rate-limiting enzyme in tryptophan degradation and Kynurenine (Kyn) production. In this study, we sought to explore the potential role of kynurenic acid (KynA), in modulating the expression of IL-23 and IL-17 by DCs and CD4⁺ cells, respectively. The result of flow cytometry demonstrated that the frequency of IL-23-producing DCs is reduced with 100 µg/ml of KynA as compared with that of LPS-stimulated DCs. KynA (100 μg/ml) addition to activated T cells significantly decreased the level of IL-17 mRNA and frequency of IL-17⁺ T cells as compared to that of concanavalin (Con) A-activated T cells. To examine the mechanism of the suppressive role of KynA on IL-23/IL-17 in these cells, cells were treated with 3 μM G-protein-coupled receptor35 (GPCR35) inhibitor (CID), for 60 min. The result showed that the reduction of both adenylate cyclase (AC) and cyclic adenosine monophosphate (cAMP) by KynA is involved in suppression of LPS-induced IL-23p19 expression. Since GPCR35 is also detected on T cells; therefore, it is concluded that KynA plays an important role in modulating the expression of IL-23 and IL-17 in DCs and Th17 cells through inhibiting GPCR35 and downregulation of both AC and cAMP.
Article
Full-text available
In this study we report two high-resolution structures of the pyridoxal 5' phosphate (PLP)-dependent enzyme kynurenine aminotransferase-I (KAT-I). One is the native structure with the cofactor in the PLP form bound to Lys247 with the highest resolution yet available for KAT-I at 1.28 Å resolution, and the other with the general PLP-dependent aminotransferase inhibitor, aminooxyacetate (AOAA) covalently bound to the cofactor at 1.54 Å. Only small conformational differences are observed in the vicinity of the aldimine (oxime) linkage with which the PLP forms the Schiff base with Lys247 in the 1.28 Å resolution native structure, in comparison to other native PLP-bound structures. We also report the inhibition of KAT-1 by AOAA and aminooxy-phenylpropionic acid (AOPP), with IC50 s of 13.1 and 5.7 μM respectively. The crystal structure of the enzyme in complex with the inhibitor AOAA revealed that the cofactor is the PLP form with the external aldimine linkage. The location of this oxime with the PLP, which forms in place of the native internal aldimine linkage of PLP of the native KAT-I, is away from the position of the native internal aldimine, with the free Lys247 substantially retaining the orientation of the native structure. Tyr101, at the active site, was observed in two conformations in both structures. This article is protected by copyright. All rights reserved.
Article
Full-text available
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
Article
Full-text available
As the most abundant RNA modification, pseudouridine plays important roles in many biological processes. Occurring at the uridine site and catalyzed by pseudouridine synthase, the modification has been observed in nearly all kinds of RNA, including transfer RNA, messenger RNA, small nuclear or nucleolar RNA, and ribosomal RNA. Accordingly, its importance to basic research and drug development is self-evident. Despite some experimental technologies have been developed to detect the pseudouridine sites, they are both time-consuming and expensive. Facing the explosive growth of RNA sequences in the postgenomic age, we are challenged to address the problem by computational approaches: For an uncharacterized RNA sequence, can we predict which of its uridine sites can be modified as pseudouridine and which ones cannot? Here a predictor called “iRNA-PseU” was proposed by incorporating the chemical properties of nucleotides and their occurrence frequency density distributions into the general form of pseudo nucleotide composition (PseKNC). It has been demonstrated via the rigorous jackknife test, independent dataset test, and practical genome-wide analysis that the proposed predictor remarkably outperforms its counterpart. For the convenience of most experimental scientists, the web-server for iRNA-PseU was established at http://lin.uestc.edu.cn/server/iRNA-PseU, by which users can easily get their desired results without the need to go through the mathematical details.
Article
Full-text available
Catalyzed by adenosine deaminase (ADAR), the adenosine to inosine (A-to-I) editing in RNA is not only involved in various important biological processes, but also closely associated with a series of major diseases. Therefore, knowledge about the A-to-I editing sites in RNA is crucially important for both basic research and drug development. Given an uncharacterized RNA sequence that contains many adenosine (A) residues, can we identify which one of them can be of A-to-I editing, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such an important problem based on the RNA sequence information alone. To fill this empty area, we have proposed a predictor called iRNA-AI by incorporating the chemical properties of nucleotides and their sliding occurrence density distribution along a RNA sequence into the general form of pseudo nucleotide composition (PseKNC). It has been shown by the rigorous jackknife test and independent dataset test that the performance of the proposed predictor is quite promising. For the convenience of most experimental scientists, a user-friendly web-server for iRNA-AI has been established at http://lin.uestc.edu.cn/server/iRNA-AI/, by which users can easily get their desired results without the need to go through the mathematical details.
Article
As a prevalent post-transcriptional modification, N6-methyladenosine (m6A) plays key roles in a series of biological processes. Although experimental technologies have been developed and applied to identify m6A sites, they are still cost-ineffective for transcriptome-wide detections of m6A. As good complements to the experimental techniques, some computational methods have been proposed to identify m6A sites. However, their performance remains unsatisfactory. In this study, we firstly proposed an Euclidean distance based method was proposed to construct a high quality benchmark dataset. By encoding the RNA sequences using pseudo nucleotide composition, a new predictor called iRNA(m6A)-PseDNC was developed to identify m6A sites in the Saccharomyces cerevisiae genome. It has been demonstrated by the 10-fold cross validation tests that the performance of iRNA(m6A)-PseDNC is superior to the existing methods. Meanwhile, for the convenience of most experimental scientists, established at the site http://lin-group.cn/server/iRNA(m6A)-PseDNC.php is its web-server, by which user can easily get their desired results without need to go through the detailed mathematics. It is anticipated that iRNA(m6A)-PseDNC will become a useful high throughput tool for identifying m6A sites in the S. cerevisiae genome.
Article
One of the hottest topics in molecular cell biology is to determine the subcellular localization of proteins from various different organisms. This is because it is crucially important for both basic research and drug development. Recently, a predictor called "pLoc-mGneg" was developed for identifying the subcellular localization of Gram-negative bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGneg was trained by an extremely skewed dataset in which some subset (subcellular location) was about 5 to 70 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To alleviate such a consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGneg by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGneg, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-negative bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGneg/, by which users can easily get their desired results without the need to go through the detailed mathematics.
Article
A cell contains numerous protein molecules. One of the fundamental goals in molecular cell biology is to determine their subcellular locations since this information is extremely important to both basic research and drug development. In this paper, we report a novel and very powerful predictor called "pLoc_bal-mHum" for predicting the subcellular localization of human proteins based on their sequence information alone. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the new predictor is remarkably superior to the existing state-of-the-art predictor in identifying the subcellular localization of human proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mHum/, by which users can easily get their desired results without the need to go through the detailed mathematics.
Article
Motivation A cell contains numerous protein molecules. One of the fundamental goals in cell biology is to determine their subcellular locations, which can provide useful clues about their functions. Knowledge of protein subcellular localization is also indispensable for prioritizing and selecting the right targets for drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mAnimal” was developed for identifying the subcellular localization of animal proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with the multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mAnimal was trained by an extremely skewed dataset in which some subset (subcellular location) was about 128 times the size of the other subsets. Accordingly, such an uneven training dataset will inevitably cause a biased consequence. Results To alleviate such biased consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mAnimal by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mAnimal, the existing state-of-the-art predictor, in identifying the subcellular localization of animal proteins. Availability To maximize the convenience for the vast majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics. Supplementary information Supplementary data are available at Bioinformatics online.
Article
Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mGpos" was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven training dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the training dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/, by which users can easily get their desired results without the need to go through the detailed mathematics.
Article
This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m5C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m5C sites in RNA precisely. The laboratory techniques and procedures are available to identify m5C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m5C sites from non- m5C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.
Article
Among all the post-translational modifications (PTMs) of proteins, Phosphorylation is known to be the most important and highly occurring PTM in eukaryotes and prokaryotes. It has an important regulatory mechanism which is required in most of the pathological and physiological processes including neural activity and cell signalling transduction. The process of threonine phosphorylation modifies the threonine by the addition of a phosphoryl group to the polar side chain, and generates phosphothreonine sites. The investigation and prediction of phosphorylation sites is important and various methods have been developed based on high throughput mass-spectrometry but such experimentations are time consuming and laborious therefore, an efficient and accurate novel method is proposed in this study for the prediction of phosphothreonine sites. The proposed method uses context-based data to calculate statistical moments. Position relative statistical moments are combined together to train neural networks. Using 10-fold cross validation, 94.97% accurate result has been obtained whereas for Jackknife testing, 96% accurate results have been obtained. The overall accuracy of the system is 94.4% to sensitivity value 94% and specificity 94.6%. These results suggest that the proposed method may play an essential role to the other existing methods for phosphothreonine sites prediction.
Article
Motivation: DNA replication is the key of the genetic information transmission, and it is initiated from the replication origins. Identifying the replication origins is crucial for understanding the mechanism of DNA replication. Although several discriminative computational predictors were proposed to identify DNA replication origins of yeast species, they could only be used to identify very tiny parts (250 bp or 300 bp) of the replication origins. Besides, none of the existing predictors could successfully capture the "GC asymmetry bias" of yeast species reported by experimental observations. And hence it would not be surprised why their power is so limited. To grasp the CG asymmetry feature and make the prediction able to cover the entire replication regions of yeast species, we develop a new predictor called "iRO-3wPseKNC". Results: Rigorous cross-validations on the benchmark datasets from four yeast species (Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis, and Pichia pastoris) have indicated that the proposed predictor is really very powerful for predicting the entire DNA duplication origins. Availability and implementation: The web-server for the iRO-3wPseKNC predictor is available at http://bioinformatics.hitsz.edu.cn/iRO-3wPseKNC/, by which users can easily get their desired results without the need to go through the mathematical details. Contact: bliu@hit.edu.cn or dshuang@tongji.edu.cn or kcchou@gordonlifescience.org. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
As one of the most important and common protein post-translational modifications, citrullination plays a key role in regulating various biological processes and is associated with several human diseases. The accurate identification of citrullination sites is crucial for elucidating the underlying molecular mechanisms of citrullination and designing drugs for related human diseases. In this study, a novel bioinformatics tool named CKSAAP_CitrSite is developed for the prediction of citrullination sites. With the assistance of support vector machine algorithm, the highlight of CKSAAP_CitrSite is to adopt the composition of k-spaced amino acid pairs surrounding a query site as input. As illustrated by 10-fold cross-validation, CKSAAP_CitrSite achieves a satisfactory performance with a Sensitivity of 77.59%, a Specificity of 95.26%, an Accuracy of 89.37% and a Matthew's correlation coefficient of 0.7566, which is much better than those of the existing prediction method. Feature analysis shows that the N-terminal space containing pairs may play an important role in the prediction of citrullination sites, and the arginines close to N-terminus tend to be citrullinated. The conclusions derived from this study could offer useful information for elucidating the molecular mechanisms of citrullination and related experimental validations. A user-friendly web-server for CKSAAP_CitrSite is available at 123.206.31.171/CKSAAP_CitrSite/.
Article
N6-methyladenine (6mA) is one kind of post-replication modification (PTM or PTRM) occurring in a wide range of DNA sequences. Accurate identification of its sites will be very helpful for revealing the biological functions of 6mA, but it is time-consuming and expensive to determine them by experiments alone. Unfortunately, so far, no bioinformatics tool is available to do so. To fill in such an empty area, we have proposed a novel predictor called iDNA6mA-PseKNC that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PseKNC). It has been observed via rigorous cross-validations that the predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 93%, 100%, 96%, and 0.93, respectively. For the convenience of most experimental scientists, a user-friendly web server for iDNA6mA-PseKNC has been established at http://lin-group.cn/server/iDNA6mA-PseKNC, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Article
Motivation: For in-depth understanding the functions of proteins in a cell, the knowledge of their subcellular localization is indispensable. The current study is focused on human protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions that are particularly important for both basic research and drug design. Results: Using the multi-label theory, we present a new predictor called "pLoc-mHum" by extracting the crucial GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a same stringent benchmark dataset have indicated that the proposed pLoc-mHum predictor is remarkably superior to iLoc-Hum, the state-of-the-art method in predicting the human protein subcellular localization. Availability: To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mHum/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Lysine crotonylation (Kcr) is an evolution-conserved histone posttranslational modification (PTM), occurring in both human somatic and mouse male germ cell genomes. It is important for male germ cell differentiation. Information of Kcr sites in proteins is very useful for both basic research and drug development. But it is time-consuming and expensive to determine them by experiments alone. Here, we report a novel predictor called iKcr-PseEns that is established by incorporating five tiers of amino acid pairwise couplings into the general pseudo amino acid composition. It has been observed via rigorous cross-validations that the new predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 90.53%, 95.27%, 94.49%, and 0.826, respectively. For the convenience of most experimental scientists, a user-friendly web-server for iKcr-PseEns has been established at http://www.jci-bioinfo.cn/iKcr-PseEns, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Article
Motivation: Cells are deemed the basic unit of life. However, many important functions of cells as well as their growth and reproduction are performed via the protein molecules located at their different organelles or locations. Facing explosive growth of protein sequences, we are challenged to develop fast and effective method to annotate their subcellular localization. However, this is by no means an easy task. Particularly, mounting evidences have indicated proteins have multi-label feature meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Unfortunately, most of the existing computational methods can only be used to deal with the single-label proteins. Although the 'iLoc-Animal' predictor developed recently is quite powerful that can be used to deal with the animal proteins with multiple locations as well, its prediction quality needs to be improved, particularly in enhancing the absolute true rate and reducing the absolute false rate. Results: Here we propose a new predictor called 'pLoc-mAnimal', which is superior to iLoc-Animal as shown by the compelling facts. When tested by the most rigorous cross-validation on the same high-quality benchmark dataset, the absolute true success rate achieved by the new predictor is 37% higher and the absolute false rate is four times lower in comparison with the state-of-the-art predictor. Availability and implementation: To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mAnimal/ , by which users can easily get their desired results without the need to go through the complicated mathematics involved. Contact: xxiao@gordonlifescience.org or kcchou@gordonlifescience.org. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called "pLoc-mGneg" for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to "iLoc-Gneg", the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Article
Motivation: Being responsible for initiating transaction of a particular gene in genome, promoter is a short region of DNA. Promoters have various types with different functions. Owing to their importance in biological process, it is highly desired to develop computational tools for timely identifying promoters and their types. Such a challenge has become particularly critical and urgent in facing the avalanche of DNA sequences discovered in the postgenomic age. Although some prediction methods were developed, they can only be used to discriminate a specific type of promoters from non-promoters. None of them has the ability to identify the types of promoters. This is due to the facts that different types of promoters may share quite similar consensus sequence pattern, and that the promoters of same type may have considerably different consensus sequences. Results: To overcome such difficulty, using the multi-window-based PseKNC (pseudo K-tuple nucleotide composition) approach to incorporate the short-, middle-, and long-range sequence information, we have developed a two-layer seamless predictor named as "iPromoter-2L". The 1 st layer serves to identify a query DNA sequence as a promoter or non-promoter, and the 2 nd layer to predict which of the following six types the identified promoter belongs to: σ 24 , σ 28 , σ 32 , σ 38 , σ 54 , and σ 70 . Availability: For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bioinformatics.hitsz.edu.cn/iPromoter-2L/ . It is anticipated that iPromoter-2L will become a very useful high throughput tool for genome analysis. Contact: bliu@hit.edu.cn or dshuang@tongji.edu.cn or kcchou@gordonlifescience.org. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
As one of the most important and common histones post-translational modifications, crotonylation plays a key role in regulating various biological processes. The accurate identification of crotonylation sites is crucial to elucidate the underlying molecular mechanisms of crotonylation. In this study, a novel bioinformatics tool named CKSAAP_CrotSite is developed to predict crotonylation sites. The highlight of CKSAAP_CrotSite is to adopt the composition of k-spaced amino acid pairs as input encoding, and the support vector machine is employed as the classifier. As illustrated by jackknife test, CKSAAP_CrotSite achieves a promising performance with a Sensitivity of 92.45%, a Specificity of 99.17%, an Accuracy of 98.11% and a Matthew's correlation coefficient of 0.9283, which is much better than those of the existing prediction methods. Feature analysis shows that some amino acid pairs such as 'KxG', 'KG' and 'PxP' may play an important role in the prediction of crotonylation sites. The results of analysis and prediction could offer useful information for elucidating the molecular mechanisms of crotonylation and related experimental validations. A user-friendly web-server for CKSAAP_CrotSite is available at 123.206.31.171/CKSAAP_CrotSite/.
Article
Many efforts have been made in predicting the subcellular localization of eukaryotic proteins, but most of the existing methods have the following two limitations: (1) their coverage scope is less than ten locations and hence many organelles in an eukaryotic cell cannot be covered, and (2) they can only be used to deal with single-label systems in which each of the constituent proteins has one and only one location. Actually, proteins with multiple locations are particularly interesting since they may have some exceptional functions very important for in-depth understanding the biological process in a cell and for selecting drug target as well. Although several predictors (such as "Euk-mPLoc", "Euk-PLoc 2.0" and "iLoc-Euk") can cover up to 22 different location sites, and they also have the function to treat multi-labeled proteins, further efforts are needed to improve their prediction quality, particularly in enhancing the absolute true rate and in reducing the absolute false rate. Here we propose a new predictor called "pLoc-mEuk" by extracting the key GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a high-quality and stringent benchmark dataset have indicated that the proposed pLoc-mEuk predictor is remarkably superior to iLoc-Euk, the best of the aforementioned three predictors. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mEuk/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Article
Knowledge of subcellular locations of proteins is crucially important for in-depth understanding their functions in a cell. With the explosive growth of protein sequences generated in the postgenomic age, it is highly demanded to develop computational tools for timely annotating their subcellular locations based on the sequence information alone. The current study is focused on virus proteins. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex proteins is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mVirus" by extracting the optimal GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on a same stringent benchmark dataset indicated that the proposed pLoc-mVirus predictor is remarkably superior to iLoc-Virus, the state-of-the-art method in predicting virus protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mVirus/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Article
One of the fundamental goals in cellular biochemistry is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. To realize this, it is indispensable to develop an automated method for fast and accurate identification of the subcellular locations of uncharacterized proteins. The current study is focused on plant protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most of the existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex protein is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called “pLoc-mPlant” by extracting the optimal GO (Gene Ontology) information into the Chou's general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on the same stringent benchmark dataset indicated that the proposed pLoc-mPlant predictor is remarkably superior to iLoc-Plant, the state-of-the-art method for predicting plant protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mPlant/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Article
Motivation: Given a compound, can we predict which anatomical therapeutic chemical (ATC) class/classes it belongs to? It is a challenging problem since the information thus obtained can be used to deduce its possible active ingredients, as well as its therapeutic, pharmacological and chemical properties. And hence the pace of drug development could be substantially expedited. But this problem is by no means an easy one. Particularly, some drugs or compounds may belong to two or more ATC classes.
Article
Objective: Being a kind of post-transcriptional modification (PTCM) in RNA, the 2'-O-methylation modification occurs in the processes of life development and disease formation as well. Accordingly, from the angles of both basic research and drug development, we are facing a challenging problem: given an uncharacterized RNA sequence formed by many nucleotides of A (adenine), C (cytosine), G (guanine), and U (uracil), which one can be of 2-O-methylation modification, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such a problem. Method: To fill this empty area, we propose a predictor called iRNA-2methyl. It is formed by incorporating a series of sequence-coupled factors into the general PseKNC (pseudo nucleotide composition), followed by fusing 12 basic random forest classifier into four ensemble predictors, with each aimed to identify the cases of A, C, G, and U along the RNA sequence concerned, respectively. Results: Rigorous jackknife cross-validations have indicated that the success rates are very high (>93%). For the convenience of most experimental scientists, a user-friendly web-server for iRNA-2mthyl has been established at http://www.jci-bioinfo.cn/iRNA-2methyl, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. Conclusion: The proposed predictor iRNA-2mthyl will become a very useful bioinformatics tool for medicinal chemistry, helping to design effective drugs against the diseases related to the 2'-O-methylation modification.
Article
Background: Occurring at Lys residues, the PGK (lysine phosphoglycerylation) is a special kind of post-translational modification (PTM). It may invert the charge potential of the modified residue and change the protein structures and functions, causing various diseases in liver, brain, and kidney. Objective: From the angles of both basic research and drug development, we are facing a critical challenging problem: for an uncharacterized protein sequence containing many Lys residues, which ones can be of phosphoglycerylation, and which ones cannot? Method: To address this problem, we have developed a predictor called iPGK-PseAAC by incorporating into the general PseAAC (pseudo amino acid composition) with four different tiers of amino acid pairwise coupling information, where tiers 1, 2, 3, and 4 refer to the amino acid pairwise couplings between all the 1st, 2nd, 3rd, and 4th most contiguous residues along a protein segment, respectively. Results: Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. Conclusion: The proposed predictor iPGK-PseAAC will become a very useful bioinformatics tool for medicinal chemistry. For the convenience of most experimental scientists, a user-friendly web-server for iGPK-PseAAC has been established at http://app.aporc.org/iPGK-PseAAC/, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Article
Purpose: Chlorella vulgaris(C. vulgaris), a unicellular green microalga, has been widely used as a food supplement and reported to have antioxidant and anticancer properties. The current study was designed to assess the cytotoxic, apoptotic, and DNA-damaging effects of C. vulgaris growth factor (CGF), hot water C. vulgaris extracts, in lung tumor A549 and NCI-H460 cell lines. Methods: A549 cells, NCI-H460 cells, and normal human fibroblasts were treated with CGF at various concentrations (0-300 μg/ml) for 24 hr. The comet assay and gH2AX assay showed DNA damage in A549 and NCI-H460 cells upon CGF exposure. Evaluation of apoptosis by the TUNEL assay and DNA fragmentation analysis by agarose gel electrophoresis showed that CGF induced apoptosis in A549 and NCI-H460 cells. Results: Chlorella vulgaris hot water extract induced apoptosis and DNA damage in human lung carcinoma cells. Conclusion: CGF can thus be considered a potential cytotoxic or genotoxic drug for treatment of lung carcinoma.
Article
Purpose: Occurring at the cysteine residue in the C-terminal of a protein, prenylation is a special kind of post-translational modification (PTM), which may play a key role for statin in altering immune function. Therefore, knowledge of the prenylation sites in proteins is important for drug development as well as for in-depth understanding the biological process concerned. Given a query protein whose C-terminal contains some cysteine residues, which one can be of prenylation or none of them can be prenylated? Methods: To address this problem, we have developed a new predictor, called "iPreny-PseAAC", by incorporating two tiers of sequence pair coupling effects into the general form of PseAAC (pseudo amino acid composition). Results: It has been observed by four different cross-validation approaches that all the important indexes in reflecting its prediction quality are quite high and fully consistent to each other. Conclusion: It is anticipated that the iPreny-PseAAC predictor holds very high potential to become a useful high throughput tool in identifying protein C-terminal cysteine prenylation sites and the other relevant areas. To maximize the convenience for most experimental biologists, the web-server for the new predictor has been established at http://app.aporc.org/iPreny-PseAAC/, by which users can easily get their desired results without needing to go through the mathematical details involved in this paper.
Article
The eternal or ultimate goal of medicinal chemistry is to find most effective ways to treat various diseases and extend human beings' life as long as possible. Human being is a biological entity. To realize such an ultimate goal, the inputs or breakthroughs from the advances in biological science are no doubt most important that may even drive medicinal science into a revolution. In this review article, we are to address this from several different angles. Copyright© Bentham Science Publishers; For any queries, please email at [email protected]
Article
Motivation: Given a compound, can we predict which ATC (Anatomical Therapeutic Chemical) class/classes it belongs to? It is a challenging problem since the information thus obtained can be used to deduce its possible active ingredients, as well as its therapeutic, pharmacological and chemical properties. And hence the pace of drug development could be substantially expedited. But this problem is by no means an easy one. Particularly, some drugs or compounds may belong to two or more ATC classes. Results: To address it, a multi-label classifier, called IATC-MISF: , was developed by incorporating the information of chemical-chemical interaction, the information of the structural similarity, and the information of the fingerprintal similarity. Rigorous cross-validations showed that the proposed predictor achieved remarkably higher prediction quality than its cohorts for the same purpose, particularly in the absolute true rate, the most important and harsh metrics for the multi-label systems. Availability: The web-server for IATC-MISF: is accessible at http://www.jci-bioinfo.cn/iATC-mISF Furthermore, to maximize the convenience for most experimental scientists, a step-by-step guide was provided, by which users can easily get their desired results without needing to go through the complicated mathematical equations. Their inclusion in this paper is just for the integrity of the new method and stimulating more powerful methods to deal with various multi-label systems in biology. Contact: xxiao@gordonlifescience.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.