Article

Gene ontology: Tool for the unification of biology

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... To investigate the gene dosage effects during prenatal DS brain development, we performed differential gene expression (DGE) analysis between all DS and euploid cell subtypes from the snRNA-seq dataset 44 (Supplementary Fig. 1f). While we focused our primary analysis on NPCs to identify drivers of abnormal corticogenesis, all DGE results are provided in Supplementary Table 2. Pathway analysis using the Gene Ontology (GO) database revealed several key themes of mis-regulated gene expression during prenatal human DS brain development 45,46 . Both oRG and NB ExN showed downregulation of cell cycle, translation, protein homeostasis, and programmed cell death (Fig. 2a), while NB ExN also displayed reduced microtubule cytoskeleton organization and cell migration (Fig. 2b). ...
... In the present study, we conducted proteomic analysis on seven prenatal human forebrain samples (n = 4 DS, n = 3 euploid) spanning the second trimester of gestation Fig. 4b). Pathway analysis of biological processes using GO 45,46 revealed that among the upregulated proteins, phagocytosis, cell killing, proteolysis, inflammatory response, innate and adaptive immune responses were enriched. Notably, the latter three enriched biological processes align with our findings from the human prenatal snRNA-seq dataset. ...
Preprint
Full-text available
Down syndrome (DS, or Trisomy 21) is one of the most common genetic causes of intellectual disability. DS results in both abnormal neurodevelopment and accelerated neurodegeneration, but the molecular mechanisms underlying abnormal cortical construction and aging are incompletely understood. To gain molecular insight into the prenatal neurobiology of DS, we performed single-nucleus sequencing, spatial transcriptomics, and proteomics on mid-gestational prenatal human brain tissue. We captured altered expression dynamics of lineage commitment genes and pronounced de-repression of transposable elements in DS neural progenitor cells, which suggest changes to the fate and functionality of neuronal and glial cells. Given the importance of linking human and model system pathobiology, we also performed highly multiplexed RNA in situ spatial transcriptomics on a well-established trisomic mouse model (Ts65Dn) to study the cellular landscape of the trisomic brain during early life and aging. We profiled the spatial transcriptome of > 240,000 cells in the mouse brain and identified trisomy-associated gene expression patterns in the molecular control of neurogenesis and gliogenesis. Together, our study provides a comprehensive cross-species understanding of the complex multicellular processes underlying DS neurodevelopment.
... Three of the Seventeen loci showed significant association (p < 5e -8 ) with at least one original trait. For the leading SNPs of CPA, 54 loci were reported for traits like cognitive ability (42 loci) and intelligence (39). Only one SNP (rs111959380) was new in the current analysis (Supplementary Data 6). ...
... Notably, 513 and 445 significant genes were identified for CPS and CPA, respectively (Supplementary Data 13-14, Bonferroni-corrected p < 0.05). Enrichment analysis revealed one significant Gene Ontology (GO) term 39 for CPS, namely, postsynaptic membrane (b = 0.30, se = 0.035, Bonferroni-corrected p < 0.05, Fig. 5a, Supplementary Data 15). Additionally, two significant terms were identified for CPA, namely, the generation of neurons and neurogenesis (b = 0.032 and 0.031, se = 0.028 and 0.026, Bonferroni-corrected p < 0.05, Fig. 5a, Supplementary Data 16). ...
Article
Full-text available
Since the birth of cognitive science, researchers have used reaction time and accuracy to measure cognitive ability. Although recognition of these two measures is often based on empirical observations, the underlying consensus is that most cognitive behaviors may be along two fundamental dimensions: cognitive processing speed (CPS) and cognitive processing accuracy (CPA). In this study, we used genomic-wide association studies (GWAS) data from 14 cognitive traits to show the presence of those two factors and revealed the specific neurobiological basis underlying them. We identified that CPS and CPA had distinct brain phenotypes (e.g. white matter microstructure), neurobiological bases (e.g. postsynaptic membrane), and developmental periods (i.e. late infancy). Moreover, those two factors showed differential associations with other health-related traits such as screen exposure and sleep status, and a significant causal relationship with psychiatric disorders such as major depressive disorder and schizophrenia. Utilizing an independent cohort from the Adolescent Brain Cognitive Development (ABCD) study, we also uncovered the distinct contributions of those two factors on the cognitive development of young adolescents. These findings reveal two fundamental factors underlying various cognitive abilities, elucidate the distinct brain structural fingerprint and genetic architecture of CPS and CPA, and hint at the complex interrelationship between cognitive ability, lifestyle, and mental health.
... Task 1 (Term correctness verification): This task is to verify the biomedical terms that make up a generated association, where we use biomedical ontology, such as GO, DOID, ChEBI, and Symptoms ontology [26,27,28,29,30,31,32], as the ground truth to verify the term's identity. When a term is not found in an ontology, it is said to be "unverified." ...
Preprint
Full-text available
The generative capabilities of LLM models present opportunities in accelerating tasks and concerns with the authenticity of the knowledge it produces. To address the concerns, we present a computational approach that systematically evaluates the factual accuracy of biomedical knowledge that an LLM model has been prompted to generate. Our approach encompasses two processes: the generation of disease-centric associations and the verification of them using the semantic knowledge of the biomedical ontologies. Using ChatGPT as the select LLM model, we designed a set of prompt-engineering processes to generate linkages between diseases, drugs, symptoms, and genes to establish grounds for assessments. Experimental results demonstrate high accuracy in identifying disease terms (88%-97%), drug names (90%-91%), and genetic information (88%-98%). The symptom term identification accuracy was notably lower (49%-61%), as verified against the DOID, ChEBI, SYMPTOM, and GO ontologies accordingly. The verification of associations reveals literature coverage rates of (89%-91%) among disease-drug and disease-gene associations. The low identification accuracy for symptom terms also contributed to the verification of symptom-related associations (49%-62%).
... The genes with the parameter of false discovery rate (FDR) below 0.05 and absolute fold change > 2 were selected. Enrichment analysis of DEGs against the GO and KEGG databases was performed to discern significantly affected biological functions and pathways [47][48][49]. ...
Article
Full-text available
Variations in disease resistance among pig breeds have been extensively documented, with Sertoli cells (SCs) playing a pivotal role in spermatogenesis. Infections can induce oxidative stress, which can lead to damage to these cells. This study aimed to compare the levels of oxidative stress in SCs from Rongchang and Landrace pig breeds following LPS challenge. SCs were isolated, cultured, and stimulated with LPS to assess cell viability and markers of oxidative stress. Cell viability was evaluated along with oxidative stress markers such as reactive oxygen species (ROS), mitochondrial superoxide, malondialdehyde, and antioxidant enzymes. Mitochondrial function was assessed using JC-1 and Calcein AM probes. Transcriptomic analysis identified differentially expressed genes (DEGs), while ingenuity pathway analysis (IPA) explored enriched pathways. IL20RA, identified through transcriptomics, was validated using the siRNA knockdown technique. The results showed that Rongchang SCs exhibited lower levels of oxidative stress compared to Landrace SCs along with higher activity of antioxidant enzymes. IL20RA emerged as a key regulator since its knockdown affected mitochondrial superoxide production and catalase secretion. The findings suggest that Rongchang SCs possess superior antioxidant capacity, possibly due to the IL20RA-mediated protection of mitochondria, thereby providing insights into breed-specific resistance against oxidative stress and highlighting the role of IL20RA in maintaining stem cell function.
... To better understand the functional relevance of the CpG sites selected for the clinical models, we performed a gene ontology (GO) enrichment analysis, a widely used method to specify molecular function, cellular localization, and biological processes 41,42 . For each clinical outcome, we categorized CpG sites into positive and negative weights and performed two GO analyses using the top 5% of each category. ...
Preprint
Full-text available
The lack of accurate, cost-effective, and clinically relevant biomarkers remains a major barrier to incorporating omic data into clinical practice. Previous studies have shown that DNA methylation algorithms have utility as surrogate measures for selected proteins and metabolites. We expand upon this work by creating DNAm surrogates, termed epigenetic biomarker proxies (EBPs), across clinical laboratories, the metabolome, and the proteome. After screening >2,500 biomarkers, we trained and tested 1,694 EBP models and assessed their incident relationship with 12 chronic diseases and mortality, followed up to 15 years. We observe broad clinical relevance: 1) there are 1,292 and 4,863 FDR significant incident and prevalent associations, respectively, 2) most of these associations are replicated when looking at the lab-based counterpart, and > 62% of the shared associations have higher odds and hazard ratios to disease outcomes than their respective observed measurements, 3) EBPs of current clinical biochemistries detect deviations from normal with high sensitivity and specificity. Longitudinal EBPs also demonstrate significant changes corresponding to the changes observed in lab-based counterparts. Using two cohorts and > 30,000 individuals, we found that EBPs validate across healthy and sick populations. While further study is needed, these findings highlight the potential of implementing EBPs in a simple, low-cost, high-yield framework that benefits clinical medicine.
... To investigate the biological functions of the differentiation-related modules, we performed enrichment analysis, including Gene Ontology (GO) term and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses 92,93 . Functional GO functional enrichment analysis was conducted using Goatools (https://github.com/ ...
Article
Full-text available
Many plants associate with endophytic microbes that improve root phosphorus (P) uptake. Understanding the interactions between roots and endophytes can enable efforts to improve P utilization. Here, we characterize the interactions between lateral roots of endophytes in a core collection of 50 rapeseed (Brassica napus L.) genotypes with differing sensitivities to low P conditions. With the correlation analysis result between bacterial abundance and plant physiological indices of rapeseeds, and inoculation experiments on plates and soil, we identify one Flavobacterium strain (C2) that significantly alleviates the P deficiency phenotype of rapeseeds. The underlying mechanisms are explored by performing the weighted gene coexpression network analysis (WGCNA), and conducting genome-wide association studies (GWAS) using Flavobacterium abundance as a quantitative trait. Under P-limited conditions, C2 regulates fatty acid and lipid metabolic pathways. For example, C2 improves metabolism of linoleic acid, which mediates root suberin biosynthesis, and enhances P uptake efficiency. In addition, C2 suppresses root jasmonic acid biosynthesis, which depends on α-linolenic acid metabolism, improving C2 colonization and activating P uptake. This study demonstrates that adjusting the endophyte composition can modulate P uptake in B. napus plants, providing a basis for developing agricultural microbial agents.
... Metascape(San Diego, CA, USA, v3.5) is an effective method to study the potential biological processes and related pathways of transcriptome and genome data [19]. In the key module, Metascape was used for gene ontology (GO) analysis [20] and KEGG pathway analysis [21][22][23]. ...
... Gene Ontology (GO) [26] is an international standardized gene functional classification system that offers a dynamically updated controlled vocabulary and a strictly defined concept to comprehensively describe the properties of genes and their products in any organism. GO has the following three ontologies: molecular functions, cellular components, and biological processes. ...
Article
Full-text available
Hypoxia is a common environmental stressor in aquatic ecosystems, and during the cultivation process, Megalobrama amblycephala is prone to death because it is hypoxia-intolerant, which brings huge economic losses to farmers. The pituitary gland is a crucial endocrine gland in fish, and it is mainly involved in the secretion, storage, and regulation of hormones. In the present study, we compared the transcriptional responses to serious hypoxia in the pituitary gland among hypoxia-sensitive (HS) and hypoxia-tolerant (HT) M. amblycephala and a control group that received a normal oxygen supply (C0). The fish were categorized according to the time required to lose balance during a hypoxia treatment. A total of 129,251,170 raw reads were obtained. After raw sequence filtering, 43,461,745, 42,609,567, and 42,730,282 clean reads were obtained for the HS, HT, and C0 groups, respectively. A transcriptomic comparison revealed 1234 genes that were differentially expressed in C0 vs. HS, while 1646 differentially expressed genes were obtained for C0 vs. HT. In addition, the results for HS vs. HT showed that 367 upregulated and 41 downregulated differentially expressed genes were obtained for a total of 408 differentially expressed genes. A KEGG analysis of C0 vs. HS, C0 vs. HT, and HS vs. HT identified 315, 322, and 219 enriched pathways, respectively. Similar hypoxia-induced transcription patterns suggested that the downregulated DEGs and enriched pathways were related to pathways of neurodegeneration in multiple diseases, pathways in cancer, thermogenesis, microRNAs in cancer, diabetic cardiomyopathy, and renin secretion. However, in the upregulated DEGs, the PI3K-Akt signaling pathway (C0 vs. HS), microRNAs in cancer (C0 vs. HT), and HIF-1 signaling pathway (HS vs. HT) were significantly enriched. There is a lack of clarity regarding the role of the pituitary gland in hypoxic stress. These results not only provide new insights into the mechanism by which pituitary tissue copes with hypoxia stress in M. amblycephala but also offer a basis for breeding M. amblycephala with hypoxia-resistant traits.
... We compared DSSI crosslinked proteins to those targeted by DSBU for the HEK293T cell line. For this, proteins cross-linked by DSSI and DSBU were analyzed by the clusterProfiler R package [28] and classified based on Gene Ontology (GO) [29]. DSSI-and DBSU-cross-linked proteins were associated with the same cellular compartments of HEK293T cells, consistent with similar properties of the two cross-linkers ( Figure S8A-B). ...
Preprint
Full-text available
Disuccinimidyl dibutyric urea (DSBU) is a mass spectrometry (MS)-cleavable cross-linker that has multiple applications in struc-tural biology, ranging from isolated protein complexes to comprehensive system-wide interactomics. DSBU facilitates a rapid and reliable identification of cross-links through the dissociation of its urea group in the gas-phase. In this study, we further advance the structural capabilities of DSBU by twisting the urea group into an imide, thus introducing a novel class of cross-linkers. This modification preserves the MS-cleavability of the amide bond, granted by the two acyl groups of the imide function. The central nitrogen atom enables the introduction of affinity purification tags. Here, we introduce disuccinimidyl disuccinic imide (DSSI) as prototype of this class of cross-linkers. It features a phosphonate handle for immobilized metal ion affinity chromatography (IMAC) enrichment. We detail DSSI synthesis and describe its behavior in solution and in the gas-phase while cross-linking isolat-ed proteins and human cell lysates. DSSI and DSBU cross-links are compared at the same enrichment depths to bridge these two cross-linker classes. We validate DSSI cross-links by mapping them in high-resolution structures of large protein assemblies. The cross-links observed yield insights into the morphology of intrinsically disordered proteins (IDPs) and their complexes. The DSSI linker might spearhead a novel class of MS-cleavable and enrichable cross-linkers.
... To calculate the significance level (p-value) of each GO, Fisher's test was used and significant GO was associated with a p-value of <0.05. The top 20 terms were selected to draw the histogram [25]. The differential modified m6A genes were annotated on the basis of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [26]; the significance level (p-value) of the pathway was calculated using Fisher's test, and the significant pathway term for m6A gene enrichment was screened at a p-value of <0.05. ...
Article
Full-text available
Emerging evidence shows that N6-methyladenosine (m6A) is a post-transcriptional RNA modification that plays a vital role in regulation of gene expression, fundamental biological processes, and physiological functions. To explore the effect of starvation on m6A methylation modification in the liver of Larimichthys crocea (L. crocea) under low temperatures, the livers of L. crocea from cold and cold + fasting groups were subjected to MeRIP-seq and RNA-seq using the NovaSeq 6000 platform. Compared to the cryogenic group, the expression of RNA methyltransferases mettl3 and mettl14 was upregulated, whereas that of demethylase fto and alkbh5 was downregulated in the starved cryogenic group. A Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that the differentially m6A-modified genes were mainly enriched in steroid biosynthesis, DNA replication, ribosome biogenesis in eukaryotes, PPAR, ECM-receptor interaction, lysine degradation, phosphatidylinositol, and the MAPK signaling pathway, suggesting that L. crocea responds to starvation under low-temperature stress through m6A methylation modification-mediated cell growth, proliferation, innate immunity, and the maintenance of lipid homeostasis. This study advances understanding of the physiological response mechanism exerted by m6A methylation modification in starved L. crocea at low temperatures.
... GO (20200615, http:// geneontology.org) 74 , KEGG (20191220, http://www.genome.jp/kegg) 75 , SWISS-PROT (202005, http://ftp.ebi. ...
Article
Full-text available
Sciaenops ocellatus is among the most important artificially introduced farmed fish across 11 countries and regions. However, the frequent occurrence of extreme weather events and breeding escapes have placed great pressure on local marine biodiversity and ecosystems. We reported the de novo assembly and annotation with a contig N50 of 28.30 Mb using PacBio HiFi sequencing and Hi-C technologies, which resulted in a 283-fold increase in contig N50 length and improvement in continuity and quality in complex repetitive region for S. ocellatus compared to the previous version. In total, 257.36 Mb of repetitive sequences accounted for 35.48% of the genome, and 22,845 protein-coding genes associated with a BUSCO value of 98.32%, were identified by genome annotation. Moreover, 54 hub genes rapidly responding to hypoosmotic stress were identified by WGCNA. The high-quality chromosome-scale S. ocellatus genome and candidate resistance-related gene sets will not only provide a genomic basis for genetic improvement via molecular breeding, but will also lay an important foundation for investigating the molecular regulation of rapid responses to stress.
... (Consortium, 2000). 132 Differentially Expressed Genes and Gene Ontology Enrichment Analysis 133 Of the genes that were assigned GO annotations, 1,000 functionally annotated genes were 134 identified to be differentially expressed in the young leaf tissues and 612 in the mature leaf tissue Results of the GO enrichment analyses suggest that transcriptomic activity of the young 149 tissue is dominated by regulation of components associated with the cell cycle and cellular 150 developmental patterning while overrepresented GOs in the mature tissue primarily related to 151 photosynthetic processes and cellular energetics. ...
Preprint
Full-text available
Efficient carbon capture by plants is crucial to meet the increasing demands for food, fiber, feed, and fuel worldwide. One potential strategy to improve photosynthetic performance of plants is the conversion of C3-type crops to C4-type crops, enabling them to perform photosynthesis at higher temperatures and with less water. C4-type crops, such as corn, possess a distinct Kranz anatomy, where photosynthesis occurs in two distinct cell types. Remarkably, Bienertia sinuspersici is one of the four known land plant species to perform C4 photosynthesis within a single cell, characterized by dimorphic chloroplasts and corresponding intracellular biochemistry. The young emerging leaves exhibit C3 anatomy which differentiate into the unique single cell C4 anatomy as the leaves mature. A comparative transcriptome analysis yielded a total of 72,820 unique transcripts in young and 72,253 transcripts in mature leaves of B. sinuspersici . In the young leaf, enrichment of processes associated with the cell cycle, cellular developmental patterning, and transcriptional regulatory mechanisms was observed. The mature leaf displayed enrichment of processes associated with photosynthesis, chloroplast components, translational components, and post-translational modifications. Notably, several transcription factors such as auxin response factor (ARF), basic helix-loop-helix (bHLH), GATA, homeodomain (HD), MYB, NAC, squamosa promoter-binding protein-like (SPL), and zinc finger (ZF) family were differentially expressed in in the young leaf. These data expand our insights into the molecular basis of Bienertia ’s unique cellular compartmentalization, chloroplast dimorphism, and single-cell C4 biochemistry, and the information can be useful in the ongoing efforts to transform C3-type crops into C-4 type.
... For yeast, however, we decided not to use MIPS and instead also consider the GO, so our results are not directly comparable to the results reported in the original Mashup paper for yeast. (Our yeast GO annotations are from the Gene Ontology Consortium [22] (downloaded from FuncAssociate3.0 [4] on 02/12/19)). The GO functional labels are grouped into three distinct functional hierarchies: Biological Process(BP), Molecular (which was not certified by peer review) is the author/funder. ...
Preprint
Several popular methods exist to predict function from multiple protein-protein association networks. For example, both the Mashup algorithm, introduced by Cho, Peng and Berger, and deepNF, introduced by Gligorijević, Barotand, and Bonneau, analyze the diffusion in each network first, to characterize the topological context of each node. In Mashup the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein, to yield the multi-network embedding. In deepNF, a multimodal autoencoder is trained to extract common network features across networks that yield a low-dimensional embedding. Neither embedding takes into account known functional labels; rather, these are then used by the machine learning methods applied after embedding. We introduce MELISSA (MultiNetwork Embedding with Label Integrated Semi-Supervised Augmentation) which incorporates functional labels in the embedding stage. The function labels induce sets of “must link” and “cannot link” constraints which guide a further semi-supervised dimension reduction to yield an embedding that captures both the network topology and the information contained in the annotations. We find that the MELISSA embedding improves on the Mashup embedding and outperforms the deepNF embedding in creating more functionally enriched neighborhoods for predicting GO labels for multiplex association networks in both yeast and humans. Availability MELISSA is available at https://github.com/XiaozheHu/melissa ACM Reference Format Kaiyi Wu, Di Zhou, Donna Slonim, Xiaozhe Hu, and Lenore Cowen. 2023. MELISSA: Semi-Supervised Embedding for Protein Function Prediction Across Multiple Networks. In 14th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (BCB ‘23), September 3–6, 2023, Houston, TX, USA . ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3535508.3545542
... For gene set enrichment analyses and pathway analyses, we did not use shrunken log 2 fold changes. Gene set enrichment analysis [65] were conducted using the Bioconductor package fgsea [66], for the gene sets Hallmark [67], Reactome [68], Gene Ontology (GO) [69], and the Kyoto Encyclopedia of Genes and Genomes [70]. Gene sets were acquired through the Molecular Signatures Database available from the Broad Institute [71] through the Bioconductor package msigdbr [72]. ...
Article
Full-text available
Background Astrocytes respond to injury and disease through a process known as reactive astrogliosis, of which inflammatory signaling is one subset. This inflammatory response is heterogeneous with respect to the inductive stimuli and the afflicted central nervous system region. This is of plausible importance in e.g. traumatic axonal injury (TAI), where lesions in the brainstem carries a particularly poor prognosis. In fact, astrogliotic forebrain astrocytes were recently suggested to cause neuronal death following axotomy. We therefore sought to assess if ventral brainstem- or rostroventral spinal astrocytes exert similar effects on motor neurons in vitro. Methods We derived brainstem/rostroventral spinal astrocyte-like cells (ES-astrocytes) and motor neurons using directed differentiation of mouse embryonic stem cells (ES). We activated the ES-astrocytes using the neurotoxicity-eliciting cytokines interleukin- (IL-) 1α and tumor necrosis factor-(TNF-)α and clinically relevant inflammatory mediators. In co-cultures with reactive ES-astrocytes and motor neurons, we assessed neurotoxic ES-astrocyte activity, similarly to what has previously been shown for other central nervous system (CNS) regions. Results We confirmed the brainstem/rostroventral ES-astrocyte identity using RNA-sequencing, immunocytochemistry, and by comparison with primary subventricular zone-astrocytes. Following cytokine stimulation, the c-Jun N-terminal kinase pathway down-stream product phosphorylated c-Jun was increased, thus demonstrating ES-astrocyte reactivity. These reactive ES-astrocytes conferred a contact-dependent neurotoxic effect upon co-culture with motor neurons. When exposed to IL-1β and IL-6, two neuroinflammatory cytokines found in the cerebrospinal fluid and serum proteome following human severe traumatic brain injury (TBI), ES-astrocytes exerted similar effects on motor neurons. Activation of ES-astrocytes by these cytokines was associated with pathways relating to endoplasmic reticulum stress and altered regulation of MYC. Conclusions Ventral brainstem and rostroventral spinal cord astrocytes differentiated from mouse ES can exert neurotoxic effects in vitro. This highlights how neuroinflammation following CNS lesions can exert region- and cell-specific effects. Our in vitro model system, which uniquely portrays astrocytes and neurons from one niche, allows for a detailed and translationally relevant model system for future studies on how to improve neuronal survival in particularly vulnerable CNS regions following e.g. TAI.
... Further input fields allow the specification of a diverse set of filters described below. Input to the A-and B-lists may consist of NCBI GENE (23) IDs and symbols, UNIPROT (24) IDs, FAMPLEX (25) protein family and complex identifiers or names, HGNC (26) gene group names, or Gene Ontology (GO) (27) terms. Family and group query items will include their members in the result, whereas GO term queries will include genes that have been annotated with the respective terms. ...
Article
Full-text available
We present GePI, a novel Web server for large-scale text mining of molecular interactions from the scientific biomedical literature. GePI leverages natural language processing techniques to identify genes and related entities, interactions between those entities and biomolecular events involving them. GePI supports rapid retrieval of interactions based on powerful search options to contextualize queries targeting (lists of) genes of interest. Contextualization is enabled by full-text filters constraining the search for interactions to either sentences or paragraphs, with or without pre-defined gene lists. Our knowledge graph is updated several times a week ensuring the most recent information to be available at all times. The result page provides an overview of the outcome of a search, with accompanying interaction statistics and visualizations. A table (downloadable in Excel format) gives direct access to the retrieved interaction pairs, together with information about the molecular entities, the factual certainty of the interactions (as verbatim expressed by the authors), and a text snippet from the original document that verbalizes each interaction. In summary, our Web application offers free, easy-to-use, and up-to-date monitoring of gene and protein interaction information, in company with flexible query formulation and filtering options. GePI is available at https://gepi.coling.uni-jena.de/.
... We used the Interproscan software v5.19 (Zdobnov and Apweiler 2001;Finn et al. 2017) to annotate protein domains and motifs, gene ontologies (Consortium 2000) and pathways (Kanehisa and Goto 2000;Kanehisa et al. 2016). Briefly, Interproscan searches multiple databases for protein information including PRINTS , Pfam (Punta et al. 2012), ProDom (Bru et al. 2005), and PROSITE (Hulo et al. 2005). ...
Article
Full-text available
Genome size has been measurable since the 1940s but we still do not understand genome size variation. Caenorhabditis nematodes show strong conservation of chromosome number but vary in genome size between closely related species. Androdioecy, where populations are composed of males and self-fertile hermaphrodites, evolved from outcrossing, female-male dioecy, three times in this group. In Caenorhabditis, androdioecious genomes are 10-30% smaller than dioecious species, but in the nematode Pristionchus, androdioecy evolved six times and does not correlate with genome size. Previous hypotheses include genome size evolution through: 1) Deletions and 'genome shrinkage' in androdioecious species; 2) Transposable element (TE) expansion and DNA loss through large deletions (the 'accordion model'); and 3) Differing TE dynamics in androdioecious and dioecious species. We analyzed nematode genomes and found no evidence for these hypotheses. Instead, nematode genome sizes had strong phylogenetic inertia with increases in a few dioecious species, contradicting the 'genome shrinkage' hypothesis. TEs did not explain genome size variation with the exception of the DNA transposon Mutator which was twice as abundant in dioecious genomes. Across short and long evolutionary distances Caenorhabditis genomes evolved through small structural mutations including gene-associated duplications and insertions. Seventy-one protein families had significant, parallel decreases across androdioecious Caenorhabditis including genes involved in the sensory system, regulatory proteins and membrane-associated immune responses. Our results suggest that within a dynamic landscape of frequent small rearrangements in Caenorhabditis, reproductive mode mediates genome evolution by altering the precise fates of individual genes, proteins, and the phenotypes they underlie.
... The transcripts containing only one exon or short transcripts (less than 150 bp) were excluded. Novel genes were annotated by DIAMOND [18] against databases including NR [19], Swiss-Prot [20], COG [21], KOG [22], and KEGG [23]. ...
Article
Full-text available
Ganoderma (Ganodermaceae) is a genus of edible and medicinal mushrooms that create a diverse set of bioactive compounds. Ganoderma lingzhi has been famous in China for more than 2000 years for its medicinal properties. However, the genome information of G. lingzhi has not been characterized. Here, we characterized its 49.15-Mb genome, encoding 13,125 predicted genes which were sequenced by the Illumina and PacBio platform. A wide spectrum of carbohydrate-active enzymes, with a total number of 519 CAZymes were identified in G. lingzhi. Then, the genes involved in sexual recognition and ganoderic acid (GA, key bioactive metabolite) biosynthesis were characterized. In addition, we identified and deduced the possible structures of 20 main GA constituents by UPLC-ESI-MS/MS, including a new special ganochlearic acid A. Furthermore, 3996 novel transcripts were discovered, and 9276 genes were predicted to have the possibility of alternative splicing from RNA-Seq data. The alternative splicing genes were enriched for functional categories involved in protein processing, endocytosis, and metabolic activities by KEGG. These genomic, transcriptomic, and GA constituents’ resources would enrich the toolbox for biological, genetic, and secondary metabolic pathways studies in G. lingzhi.
... Analyses of the differentially expressed mRNAs by GO were performed using Gene Ontology database (http://www.geneontology. org/, accessed on 1 May 2021) [31,32], and KEGG pathway analysis was performed by KEGG database [33]. Pathway enrichment statistical analyses were performed using R package phyper (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Hypergeometric.html, ...
Article
Full-text available
In pigs, meat quality and production are two important traits affecting the pig industry and human health. Compared to lean-type pigs, fat-type pigs contain higher intramuscular fat (IMF) contents, better taste and nutritional value. To uncover genetic factors controlling differences related to IMF in pig muscle, we performed RNA-seq analysis on the transcriptomes of the Longissimus dorsi (LD) muscle of Laiwu pigs (LW, fat-type pigs) and commercial Duroc × Landrace × Yorkshire pigs (DLY, lean-type pigs) at 150 d to compare the expression profiles of mRNA, miRNA and lncRNA. A total of 225 mRNAs, 12 miRNAs and 57 lncRNAs were found to be differentially expressed at the criteria of |log2(foldchange)| > 1 and q < 0.05. The mRNA expression of LDHB was significantly higher in the LD muscle of LW compared to DLY pigs with log2(foldchange) being 9.66. Using protein interaction prediction method, we identified more interactions of estrogen-related receptor alpha (ESRRA) associated with upregulated mRNAs, whereas versican (VCAN) and proenkephalin (PENK) were associated with downregulated mRNAs in LW pigs. Integrated analysis on differentially expressed (DE) mRNAs and miRNAs in the LD muscle between LW and DLY pigs revealed two network modules: between five upregulated mRNA genes (GALNT15, FKBP5, PPARGC1A, LOC110258214 and LOC110258215) and six downregulated miRNA genes (ssc-let-7a, ssc-miR190-3p, ssc-miR356-5p, ssc-miR573-5p, ssc-miR204-5p and ssc-miR-10383), and between three downregulated DE mRNA genes (IFRD1, LOC110258600 and LOC102158401) and six upregulated DE miRNA genes (ssc-miR1379-3p, ssc-miR1379-5p, ssc-miR397-5p, ssc-miR1358-5p, ssc-miR299-5p and ssc-miR1156-5p) in LW pigs. Based on the mRNA and ncRNA binding site targeting database, we constructed a regulatory network with miRNA as the center and mRNA and lncRNA as the target genes, including GALNT15/ssc-let-7a/LOC100523888, IFRD1/ssc-miR1379-5p/CD99, etc., forming a ceRNA network in the LD muscles that are differentially expressed between LW and DLY pigs. Collectively, these data may provide resources for further investigation of molecular mechanisms underlying differences in meat traits between lean- and fat-type pigs.
... GO annotation is a useful tool to classify genes based on their functions in cells including BP, MF, and CF [88]. The biological process, molecular function and cellular function are basic characteristics, which allow a basic understanding of the diverse molecular role of genes. ...
Article
Full-text available
Abiotic stress is an important limiting factor in crop growth and yield around the world. Owing to the continued genetic erosion of the upland cotton germplasm due to intense selection and inbreeding, attention has shifted towards wild cotton progenitors which offer unique traits that can be introgressed into the cultivated cotton to improve their genetic performance. The purpose of this study was to characterize the Pkinase gene family in a previously developed genetic map of the F 2 population derived from a cross between two cotton species: Gossypium hirsutum (CCRI 12-4) and Gossypium darwinii (5-7). Based on phylogenetic analysis, Pkinase (PF00069) was found to be the dominant domain with 151 genes in three cotton species, categorized into 13 subfamilies. Structure analysis of G. hirsutum genes showed that a greater percentage of genes and their exons were highly conserved within the group. Syntenic analysis of gene blocks revealed 99 duplicated genes among G. hirsutum, Gossypium arboreum and Gossypium raimondii. Most of the genes were duplicated in segmental pattern. Expression pattern analysis showed that the Pkinase gene family possessed species-level variation in induction to salinity and G. darwinii had higher expression levels as compared to G. hirsutum. Based on RNA sequence analysis and preliminary RT-qPCR verification, we hypothesized that the Pkinase gene family, regulated by transcription factors (TFs) and miRNAs, might play key roles in salt stress tolerance. These findings inferred comprehensive information on possible structure and function of Pkinase gene family in cotton under salt stress.
Article
Motivation A typical goal in gene expression studies is identifying certain gene sets enriched with significant genes. The measurement of many gene expression experiments for several concentrations or time points allows the modeling of the concentration/time–response relationship for each gene, and the subsequent estimation of a gene-wise alert. In this work, an approach is proposed to transfer the concept of alerts from single genes to gene sets, yielding a global significance statement and the respective concentration or time where the first enrichment of the gene set can be observed. The methodology is based on a Kolmogorov–Smirnoff type test statistic for each gene set. Results Simulations show that a majority of these sets can be identified especially for lower numbers of true gene sets with a signal. The false positive rate can be controlled by subsequent decorrelation approaches. Overall, the true gene set-wise alerts are rarely overestimated and rather tend to be underestimated. Availability and implementation The code needed to reproduce the simulations and apply the AlertGS methodology is available at the GitHub repository: https://github.com/FKappenberg/AlertGS.
Article
Full-text available
Current understanding of viral dynamics of SARS-CoV-2 and host responses driving the pathogenic mechanisms in COVID-19 is rapidly evolving. Here, we conducted a longitudinal study to investigate gene expression patterns during acute SARS-CoV-2 illness. Cases included SARS-CoV-2 infected individuals with extremely high viral loads early in their illness, individuals having low SARS-CoV-2 viral loads early in their infection, and individuals testing negative for SARS-CoV-2. We could identify widespread transcriptional host responses to SARS-CoV-2 infection that were initially most strongly manifested in patients with extremely high initial viral loads, then attenuating within the patient over time as viral loads decreased. Genes correlated with SARS-CoV-2 viral load over time were similarly differentially expressed across independent datasets of SARS-CoV-2 infected lung and upper airway cells, from both in vitro systems and patient samples. We also generated expression data on the human nose organoid model during SARS-CoV-2 infection. The human nose organoid-generated host transcriptional response captured many aspects of responses observed in the above patient samples, while suggesting the existence of distinct host responses to SARS-CoV-2 depending on the cellular context, involving both epithelial and cellular immune responses. Our findings provide a catalog of SARS-CoV-2 host response genes changing over time and magnitude of these host responses were significantly correlated to viral load.
Preprint
Full-text available
How tick-borne pathogens interact with their hosts has been primarily studied in vertebrates where disease is observed. Comparatively less is known about pathogen interactions within the tick. Here, we report that Ixodes scapularis ticks infected with either Anaplasma phagocytophilum (causative agent of anaplasmosis) or Borrelia burgdorferi (causative agent of Lyme disease) show activation of the ATF6 branch of the unfolded protein response (UPR). Disabling ATF6 functionally restricts pathogen survival in ticks. When stimulated, ATF6 functions as a transcription factor, but is the least understood out of the three UPR pathways. To interrogate the Ixodes ATF6 transcriptional network, we developed a custom R script to query tick promoter sequences. This revealed stomatin as a potential gene target, which has roles in lipid homeostasis and vesical transport. Ixodes stomatin was experimentally validated as a bona fide ATF6-regulated gene through luciferase reporter assays, pharmacological activators, and RNAi transcriptional repression. Silencing stomatin decreased A. phagocytophilum colonization in Ixodes and disrupted cholesterol dynamics in tick cells. Furthermore, blocking stomatin restricted cholesterol availability to the bacterium, thereby inhibiting growth and survival. Taken together, we have identified the Ixodes ATF6 pathway as a novel contributor to vector competence through Stomatin-regulated cholesterol homeostasis. Moreover, our custom, web-based transcription factor binding site search tool “ArthroQuest” revealed that the ATF6-regulated nature of stomatin is unique to blood-feeding arthropods. Collectively, these findings highlight the importance of studying fundamental processes in non-model organisms. IMPORTANCE Host-pathogen interactions for tick-borne pathogens like Anaplasma phagocytophilum (causative agent of Anaplasmosis) have been primarily studied in mammalian hosts. Comparatively less is known about interactions within the tick. Herein, we find that tick-borne pathogens activate the cellular stress response receptor, ATF6, in Ixodes ticks. Upon activation, ATF6 is cleaved and the cytosolic portion translocates to the nucleus to function as a transcription factor that coordinates gene expression networks. Using a custom script in R to query the Ixodes ATF6 regulome, stomatin was identified as an ATF6-regulated target that supports Anaplasma colonization by facilitating cholesterol availability to the bacterium. Moreover, our custom, web-based tool “ArthroQuest” found that the ATF6-regulated nature of stomatin is unique to arthropods. Given that lipid hijacking is common among arthropod-borne microbes, ATF6-mediated induction of stomatin may be a mechanism that is exploited in many vector-pathogen relationships for the survival and persistence of transmissible microbes. Collectively, this study identified a novel contributor to vector competence and highlights the importance of studying molecular networks in non-model organisms.
Article
Full-text available
Leprosy is a chronic disease of the skin and peripheral nerves caused by Mycobacterium leprae. A major public health and clinical problem are leprosy reactions, which are inflammatory episodes that often contribute to nerve damage and disability. Type I reversal reactions (T1R) can occur after microbiological cure of leprosy and affect up to 50% of leprosy patients. Early intervention to prevent T1R and, hence, nerve damage, is a major focus of current leprosy control efforts. In a prospective study, we enrolled and collected samples from 32 leprosy patients before the onset of T1R. Whole blood aliquots were challenged with M. leprae sonicate or media and total RNA was extracted. After a three-year follow-up, the transcriptomic response was compared between cells from 22 patients who remained T1R-free and 10 patients who developed T1R during that period. Our analysis focused on differential transcript (i.e. isoform) expression and usage. Results showed that, at baseline, cells from T1R-destined and T1R-free subjects had no main difference in their transcripts expression and usage. However, the cells of T1R patients displayed a transcriptomic immune response to M. leprae antigens that was significantly different from the one of cells from leprosy patients who remained T1R-free. Transcripts with significantly higher upregulation in the T1R-destined group, compared to the cells from T1R-free patients, were enriched for pathways and GO terms involved in response to intracellular pathogens, apoptosis regulation and inflammatory processes. Similarly, transcript usage analysis pinpointed different transcript proportions in response to the in-vitro challenge of cells from T1R-destined patients. Hence, transcript usage in concert with transcript expression suggested a dysregulated inflammatory response including increased apoptosis regulation in the peripheral blood cells of T1R-destined patients before the onset of T1R symptoms. Combined, these results provided detailed insight into the pathogenesis of T1R.
Preprint
The neutral mutation rate is known to vary widely along human chromosomes, leading to mutational hot and cold regions. We provide evidence that categories of functionally-related genes reside preferentially in mutationally hot or cold regions, the size of which we have measured. Genes in hot regions are biased toward extra-cellular communication (surface receptors, cell adhesion, immune response, etc.) while those in cold regions are biased toward essential cellular processes (gene regulation, RNA processing, protein modification, etc.). From a selective perspective, this organization of genes could minimize the mutational load on genes that need to be conserved and allow fast evolution for genes that must frequently adapt. We also analyze the effect of gene duplication and chromosomal recombination, which contribute significantly to these biases for certain categories of hot genes. Overall, our results show that genes are located non-randomly with respect to hot and cold regions, offering the possibility that selection acts at the level of gene location in the human genome.
Article
Full-text available
Leishmania spp. commonly infects phagocytic cells of the immune system, particularly macrophages, employing various immune evasion strategies that enable their survival by altering the intracellular environment. In mammals, these parasites establish persistent infections by modulating gene expression in macrophages, thus interfering with immune signaling and response pathways, ultimately creating a favorable environment for the parasite’s survival and reproduction. In this study, our objective was to use data mining and subsequent filtering techniques to identify the genes that play a crucial role in the infection process of Leishmania spp. We aimed to pinpoint genes that have the potential to influence the progression of Leishmania infection. To achieve this, we exploited prior, curated knowledge from major databases and constructed 16 datasets of human molecular information consisting of coding genes and corresponding proteins. We obtained over 400 proteins, identifying approximately 200 genes. The proteins coded by these genes were subsequently used to build a network of protein–protein interactions, which enabled the identification of key players; we named this set Predicted Genes. Then, we selected approximately 10% of Predicted Genes for biological validation. THP-1 cells, a line of human macrophages, were infected with Leishmania major in vitro for the validation process. We observed that L. major has the capacity to impact crucial genes involved in the immune response, resulting in macrophage inactivation and creating a conducive environment for the survival of Leishmania parasites.
Article
Genetically modified organisms are commonly used in disease research and agriculture but the precise genomic alterations underlying transgenic mutations are often unknown. The position and characteristics of transgenes, including the number of independent insertions, influences the expression of both transgenic and wild-type sequences. We used long-read, Oxford Nanopore Technologies (ONT) to sequence and assemble two transgenic strains of Caenorhabditis elegans commonly used in the research of neurodegenerative diseases: BY250 (pPdat-1::GFP) and UA44 (GFP and human α -synuclein), a model for Parkinson’s research. After scaffolding to the reference, the final assembled sequences were ∼102 Mb with N50s of 17.9 Mb and 18.0 Mb, respectively, and L90s of six contiguous sequences, representing chromosome-level assemblies. Each of the assembled sequences contained more than 99.2% of the Nematoda BUSCO genes found in the C. elegans reference and 99.5% of the annotated C. elegans reference protein-coding genes. We identified the locations of the transgene insertions and confirmed that all transgene sequences were inserted in intergenic regions, leaving the organismal gene content intact. The transgenic C. elegans genomes presented here will be a valuable resource for Parkinson’s research as well as other neurodegenerative diseases. Our work demonstrates that long-read sequencing is a fast, cost-effective way to assemble genome sequences and characterize mutant lines and strains.
Article
Full-text available
Chronic HIV-1 infection is characterized by T-cell dysregulation that is partly restored by antiretroviral therapy. Autophagy is a critical regulator of T-cell function. Here, we demonstrate a protective role for autophagy in HIV-1 disease pathogenesis. Targeted analysis of genetic variation in core autophagy gene ATG16L1 reveals the previously unidentified rs6861 polymorphism, which correlates functionally with enhanced autophagy and clinically with improved survival of untreated HIV-1-infected individuals. T-cells carrying ATG16L1 rs6861(TT) genotype display improved antiviral immunity, evidenced by increased proliferation, revamped immune responsiveness, and suppressed exhaustion/immunosenescence features. In-depth flow-cytometric and transcriptional profiling reveal T-helper-cell-signatures unique to rs6861(TT) individuals with enriched regulation of pro-inflammatory networks and skewing towards immunoregulatory phenotype. Therapeutic enhancement of autophagy recapitulates the rs6861(TT)-associated T-cell traits in non-carriers. These data underscore the in vivo relevance of autophagy for longer-lasting T-cell-mediated HIV-1 control, with implications towards development of host-directed antivirals targeting autophagy to restore immune function in chronic HIV-1 infection.
Preprint
Full-text available
Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Notably, model performance showed a strong positive correlation with the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications. Availability and implementation: The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which will be made freely accessible in a forthcoming publication. Supplementary information: Further details on methods and results are provided in Supplementary Material.
Chapter
While conceptual data models offered a new machinery for modelling, it implicitly assumes database and application design and they’re mostly just diagrams. What if we could create a model for a subject domain, where the model’s content holds across applications and we can make it do various things on the computer to make systems behave ‘intelligently’ or at least better than without such a model? The answer to that is: yes, we can do this, with ontologies. We will first look at what they are, including a gentle introduction to their formal underpinnings with logic and automated reasoning. This is followed by three examples of success stories and varied usages: data integration with the Gene Ontology, uncovering novel knowledge about enzymes outperforming the scientists, and automating educational question generation and marking of the exercises. Second, since their development is nontrivial, there are many methods and procedures to develop ontologies, including top-down and bottom-up approaches. We return to dance to illustrate a sample of those methods, by taking steps to improve a salsa dance ontology. It is possible to complain about limitations also for ontologies, with which we will close the chapter.
Chapter
Diagrams and models in biology, loosely called ‘biological models’, enjoy a freedom of notation unlike mind maps. They are ubiquitous in school books and all the way up into scientific publications. While they might look like cartoons to some, there is a lot more to them than meets the eye on cursory glance. By means of introduction in two illustrations with fermentation of sugars and plankton in the ocean, we’ll proceed to take a look at the systematics behind such models and scientific theories that can be embedded in them, as the cladists do in their cladograms. The second part of the chapter proposes a procedure for how to create such biology diagrams. This is then illustrated for dance, and dancing lyrebirds in particular. While biological models solve the limitations of mind maps, moving the goalposts brings afore new limitations and challenges, with which we close the chapter.
Article
Full-text available
Paramyxoviruses are negative-sense, single-stranded RNA viruses that are associated with numerous diseases in humans and animals. J paramyxovirus (JPV) was first isolated from moribund mice (Mus musculus) with hemorrhagic lung lesions in Australia in 1972. In 2016, JPV was classified into the newly established genus Jeilongvirus. Novel jeilongviruses are being discovered worldwide in wildlife populations. However, the effects of jeilongvirus infection on host gene expression remains uncharacterized. To address this, cellular RNA from JPV-infected mouse fibroblasts was collected at 2, 4, 8, 12, 16, 24, and 48 hours post-infection (hpi) and were sequenced using single-end 75 base pairs (SE75) sequencing chemistry on an Illumina NextSeq platform. Differentially expressed genes (DEGs) between the virus-infected replicates and mock replicates at each timepoint were identified using the Tophat2-Cufflinks-Cuffdiff protocol. At 2 hpi, 11 DEGs were identified in JPV-infected cells, while 1,837 DEGs were detected at 48 hpi. A GO analysis determined that the genes at the earlier timepoints were involved in interferon responses, while there was a shift towards genes that are involved in antigen processing and presentation processes at the later timepoints. At 48 hpi, a KEGG analysis revealed that many of the DEGs detected were involved in pathways that are important for immune responses. qRT-PCR verified that Rtp4, Ifit3, Mx2, and Stat2 were all upregulated during JPV infection, while G0s2 was downregulated. After JPV infection, the expression of inflammatory and antiviral factors in mouse fibroblasts changes significantly. This study provides crucial insight into the different arms of host immunity that mediate Jeilongvirus infection. Understanding the pathogenic mechanisms of Jeilongvirus will lead to better strategies for the prevention and control of potential diseases that may arise from this group of viruses.
Article
The transcription factor, SOX10, plays an important role in the differentiation of neural crest precursors to the melanocytic lineage. Malignant transformation of melanocytes leads to the development of melanoma, and SOX10 promotes melanoma cell proliferation and tumor formation. SOX10 expression in melanomas is heterogeneous, and loss of SOX10 causes a phenotypic switch toward an invasive, mesenchymal-like cell state and therapy resistance; hence, strategies to target SOX10-deficient cells are an active area of investigation. The impact of cell state and SOX10 expression on antitumor immunity is not well understood but will likely have important implications for immunotherapeutic interventions. To this end, we tested whether SOX10 status affects the response to CD8+ T cell–mediated killing and T cell–secreted cytokines, TNFα and IFNγ, which are critical effectors in the cytotoxic killing of cancer cells. We observed that genetic ablation of SOX10 rendered melanoma cells more sensitive to CD8+ T cell–mediated killing and cell death induction by either TNFα or IFNγ. Cytokine-mediated cell death in SOX10-deficient cells was associated with features of caspase-dependent pyroptosis, an inflammatory form of cell death that has the potential to increase immune responses. Implications These data support a role for SOX10 expression altering the response to T cell–mediated cell death and contribute to a broader understanding of the interaction between immune cells and melanoma cells.
Chapter
The thresholding problem is considered in the context of high-throughput biological data. Several approaches are reviewed, implemented, and tested over an assortment of transcriptomic data.
Article
Full-text available
Microbial fertilizers can activate and promote nutrient absorption and help inflorescence elongation. To understand the molecular mechanisms governing grape ( Vitis vinifera ) inflorescence elongation after microbial fertilizer application, we comprehensively analyzed the transcriptome dynamics of ‘Summer Black’ grape inflorescence at different leaf stages. With the development of ‘Summer Black’ grape inflorescence, gibberellic acid content gradually increased and was clearly higher in the microbial fertilizer group than in the corresponding control group. In addition, the microbial fertilizer and control groups had 291, 487, 490, 287, and 323 differentially expressed genes (DEGs) at the 4-, 6-, 8-, 10-, and 12-leaf stages, respectively. Kyoto Encyclopedia of Genes and Genomes pathway annotation revealed that most upregulated DEGs were enriched in starch and sucrose metabolism pathways at the 6-, 8-, and 10-leaf stages. Weighted gene coexpression network analysis identified stage-specific expression of most DEGs. In addition, multiple transcription factors and phytohormone signaling-related genes were found at different leaf stages, including basic helix-loop-helix proteins, CCCH zinc finger proteins, gibberellin receptor GID1A, 2-glycosyl hydrolases family 16, protein TIFY, MYB transcription factors, WRKY transcription factors, and ethylene response factor, suggesting that many transcription factors play important roles in inflorescence elongation at different developmental stages. These results provide valuable insights into the dynamic transcriptomic changes of inflorescence elongation at different leaf stages.
Article
SUMOylation plays an essential role in diverse physiological and pathological processes. Identification of wild-type SUMO1-modification sites by mass spectrometry is still challenging. In this study, we produced a monoclonal SUMO1C-K antibody recognizing SUMOylated peptides and proposed an efficient streamline for identification of SUMOylation sites. We identified 471 SUMOylation sites in 325 proteins from five raw data. These identified sites exhibit a high positive rate when evaluated by mutation-verified SUMOylation sites. We identified many SUMOylated proteins involved in mitochondrial metabolism and non-membrane-bounded organelles formation. We proposed a SUMOylation motif, ΨKXD/EP, where proline is required for efficient SUMOylation. We further revealed SUMOylation of TFII-I was stimulated by growth signals and was required for nucleus-localization of p-ERK1/2. Mutation of SUMOylation sites of TFII-I suppressed tumor cell growth in vitro and in vivo. Taken together, we provided a strategy for personalized identification of wild-type SUMO1-modification sites and revealed the physiological significance of TFII-I SUMOylation in this study.
Article
Full-text available
Since the entry into genome-enabled biology several decades ago, much progress has been made in determining, describing, and disseminating the functions of genes and their products. Yet, this information is still difficult to access for many scientists and for most genomes. To provide easy access and a graphical summary of the status of genome function annotation for model organisms and bioenergy and food crop species, we created a web application (https://genomeannotation.rheelab.org) to visualize, search, and download genome annotation data for 28 species. The summary graphics and data tables will be updated semi-annually, and snapshots will be archived to provide a historical record of the progress of genome function annotation efforts. Clear and simple visualization of up-to-date genome function annotation status, including the extent of what is unknown, will help address the grand challenge of elucidating the functions of all genes in organisms.
Article
Full-text available
Background: The growing prevalence of Alzheimer's disease (AD) is becoming a global health challenge without effective treatments. Defective mitochondrial function and mitophagy have recently been suggested as etiological factors in AD, in association with abnormalities in components of the autophagic machinery like lysosomes and phagosomes. Several large transcriptomic studies have been performed on different brain regions from AD and healthy patients, and their data represent a vast source of important information that can be utilized to understand this condition. However, large integration analyses of these publicly available data, such as AD RNA-Seq data, are still missing. In addition, large-scale focused analysis on mitophagy, which seems to be relevant for the aetiology of the disease, has not yet been performed. Methods: In this study, publicly available raw RNA-Seq data generated from healthy control and sporadic AD post-mortem human samples of the brain frontal lobe were collected and integrated. Sex-specific differential expression analysis was performed on the combined data set after batch effect correction. From the resulting set of differentially expressed genes, candidate mitophagy-related genes were identified based on their known functional roles in mitophagy, the lysosome, or the phagosome, followed by Protein-Protein Interaction (PPI) and microRNA-mRNA network analysis. The expression changes of candidate genes were further validated in human skin fibroblast and induced pluripotent stem cells (iPSCs)-derived cortical neurons from AD patients and matching healthy controls. Results: From a large dataset (AD: 589; control: 246) based on three different datasets (i.e., ROSMAP, MSBB, & GSE110731), we identified 299 candidate mitophagy-related differentially expressed genes (DEG) in sporadic AD patients (male: 195, female: 188). Among these, the AAA ATPase VCP, the GTPase ARF1, the autophagic vesicle forming protein GABARAPL1 and the cytoskeleton protein actin beta ACTB were selected based on network degrees and existing literature. Changes in their expression were further validated in AD-relevant human in vitro models, which confirmed their down-regulation in AD conditions. Conclusion: Through the joint analysis of multiple publicly available data sets, we identify four differentially expressed key mitophagy-related genes potentially relevant for the pathogenesis of sporadic AD. Changes in expression of these four genes were validated using two AD-relevant human in vitro models, primary human fibroblasts and iPSC-derived neurons. Our results provide foundation for further investigation of these genes as potential biomarkers or disease-modifying pharmacological targets.
Article
Full-text available
Growth factors are the key regulators that promote tissue regeneration and healing processes. While the effects of individual growth factors are well documented, a combination of multiple secreted growth factors underlies stem cell–mediated regeneration. To avoid the potential dangers and labor-intensive individual approach of stem cell therapy while maintaining their regeneration-promoting effects based on multiple secreted growth factors, we engineered a “mix-and-match” combinatorial platform based on a library of cell lines producing growth factors. Treatment with a combination of growth factors secreted by engineered mammalian cells was more efficient than with individual growth factors or even stem cell–conditioned medium in a gap closure assay. Furthermore, we implemented in a mouse model a device for allogenic cell therapy for an in situ production of growth factors, where it improved cutaneous wound healing. Augmented bone regeneration was achieved on calvarial bone defects in rats treated with a cell device secreting IGF, FGF, PDGF, TGF-β, and VEGF. In both in vivo models, the systemic concentration of secreted factors was negligible, demonstrating the local effect of the regeneration device. Finally, we introduced a genetic switch that enables temporal control over combinations of trophic factors released at different stages of regeneration mimicking the maturation of natural wound healing to improve therapy and prevent scar formation.
Article
Data-driven drug discovery exploits a comprehensive set of big data to provide an efficient path for the development of new drugs. Currently, publicly available bioassay data sets provide extensive information regarding the bioactivity profiles of millions of compounds. Using these large-scale drug screening data sets, we developed a novel in silico method to virtually screen hit compounds against protein targets, named BEAR (Bioactive compound Enrichment by Assay Repositioning). The underlying idea of BEAR is to reuse bioassay data for predicting hit compounds for targets other than their originally intended purposes, i.e., "assay repositioning". The BEAR approach differs from conventional virtual screening methods in that (1) it relies solely on bioactivity data and requires no physicochemical features of either the target or ligand. (2) Accordingly, structurally diverse candidates are predicted, allowing for scaffold hopping. (3) BEAR shows stable performance across diverse target classes, suggesting its general applicability. Large-scale cross-validation of more than a thousand targets showed that BEAR accurately predicted known ligands (median area under the curve = 0.87), proving that BEAR maintained a robust performance even in the validation set with additional constraints. In addition, a comparative analysis demonstrated that BEAR outperformed other machine learning models, including a recent deep learning model for ABC transporter family targets. We predicted P-gp and BCRP dual inhibitors using the BEAR approach and validated the predicted candidates using in vitro assays. The intracellular accumulation effects of mitoxantrone, a well-known P-gp/BCRP dual substrate for cancer treatment, confirmed nine out of 72 dual inhibitor candidates preselected by primary cytotoxicity screening. Consequently, these nine hits are novel and potent dual inhibitors for both P-gp and BCRP, solely predicted by bioactivity profiles without relying on any structural information of targets or ligands.
Article
Full-text available
In adult mammals, spontaneous repair after spinal cord injury (SCI) is severely limited. By contrast, teleost fish successfully regenerate injured axons and produce new neurons from adult neural stem cells after SCI. The molecular mechanisms underlying this high regenerative capacity are largely unknown. The present study addresses this gap by examining the temporal dynamics of proteome changes in response to SCI in the brown ghost knifefish (Apteronotus leptorhynchus). Two-dimensional difference gel electrophoresis (2D DIGE) was combined with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and tandem mass spectrometry (MS/MS) to collect data during early (1 day), mid (10 days), and late (30 days) phases of regeneration following caudal amputation SCI. Forty-two unique proteins with significant differences in abundance between injured and intact control samples were identified. Correlation analysis uncovered six clusters of spots with similar expression patterns over time and strong conditional dependences, typically within functional families or between isoforms. Significantly regulated proteins were associated with axon development and regeneration; proliferation and morphogenesis; neuronal differentiation and re-establishment of neural connections; promotion of neuroprotection, redox homeostasis, and membrane repair; and metabolism or energy supply. Notably, at all three time points examined, significant regulation of proteins involved in inflammatory responses was absent.
Article
Full-text available
This study aimed to examine the effects of loading different concentrations of metformin onto an α-hemihydrate calcium sulfate/nano-hydroxyapatite (α-CSH/nHA) composite. The material characteristics, biocompatibility, and bone formation were compared as functions of the metformin concentration. X-ray diffraction results indicated that the metformin loading had little influence on the phase composition of the composite. The hemolytic potential of the composite was found to be low, and a CCK-8 assay revealed only weak cytotoxicity. However, the metformin-loaded composite was found to enhance the osteogenic ability of MC3T3-E1 cells, as revealed by alkaline phosphate and alizarin red staining, real-time PCR, and western blotting, and the optimal amount was 500 µM. RNA sequencing results also showed that the composite material increased the expression of osteogenic-related genes. Cranial bone lacks muscle tissue, and the low blood supply leads to poor bone regeneration. As most mammalian cranial and maxillofacial bones are membranous and of similar embryonic origin, the rat cranial defect model has become an ideal animal model for in vivo experiments in bone tissue engineering. Thus, we introduced a rat cranial defect with a diameter of 5 mm as an experimental defect model. Micro-computed tomography, hematoxylin and eosin staining, Masson staining, and immunohistochemical staining were used to determine the effectiveness of the composite as a scaffold in a rat skull defect model. The composite material loaded with 500 µM of metformin had the strongest osteoinduction ability under these conditions. These results are promising for the development of new methods for repairing craniofacial bone defects.
Article
Full-text available
Membrane contact sites (MCSs) link organelles to coordinate cellular functions across space and time. Although viruses remodel organelles for their replication cycles, MCSs remain largely unexplored during infections. Here, we design a targeted proteomics platform for measuring MCS proteins at all organelles simultaneously and define functional virus-driven MCS alterations by the ancient beta-herpesvirus human cytomegalovirus (HCMV). Integration with super-resolution microscopy and comparisons to herpes simplex virus (HSV-1), Influenza A, and beta-coronavirus HCoV-OC43 infections reveals time-sensitive contact regulation that allows switching anti- to pro-viral organelle functions. We uncover a stabilized mitochondria-ER encapsulation structure (MENC). As HCMV infection progresses, MENCs become the predominant mitochondria-ER contact phenotype and sequentially recruit the tethering partners VAP-B and PTPIP51, supporting virus production. However, premature ER-mitochondria tethering activates STING and interferon response, priming cells against infection. At peroxisomes, ACBD5-mediated ER contacts balance peroxisome proliferation versus membrane expansion, with ACBD5 impacting the titers of each virus tested.
Article
Full-text available
Dysregulation of gene expression in Alzheimer’s disease (AD) remains elusive, especially at the cell type level. Gene regulatory network, a key molecular mechanism linking transcription factors (TFs) and regulatory elements to govern gene expression, can change across cell types in the human brain and thus serve as a model for studying gene dysregulation in AD. However, AD-induced regulatory changes across brain cell types remains uncharted. To address this, we integrated single-cell multi-omics datasets to predict the gene regulatory networks of four major cell types, excitatory and inhibitory neurons, microglia and oligodendrocytes, in control and AD brains. Importantly, we analyzed and compared the structural and topological features of networks across cell types and examined changes in AD. Our analysis shows that hub TFs are largely common across cell types and AD-related changes are relatively more prominent in some cell types (e.g., microglia). The regulatory logics of enriched network motifs (e.g., feed-forward loops) further uncover cell type-specific TF-TF cooperativities in gene regulation. The cell type networks are also highly modular and several network modules with cell-type-specific expression changes in AD pathology are enriched with AD-risk genes. The further disease-module-drug association analysis suggests cell-type candidate drugs and their potential target genes. Finally, our network-based machine learning analysis systematically prioritized cell type risk genes likely involved in AD. Our strategy is validated using an independent dataset which showed that top ranked genes can predict clinical phenotypes (e.g., cognitive impairment) of AD with reasonable accuracy. Overall, this single-cell network biology analysis provides a comprehensive map linking genes, regulatory networks, cell types and drug targets and reveals cell-type gene dysregulation in AD.
Article
Full-text available
Background The availability of chromosome-scale genome assemblies is fundamentally important to advance genetics and breeding in crops, as well as for evolutionary and comparative genomics. The improvement of long-read sequencing technologies and the advent of optical mapping and chromosome conformation capture technologies in the last few years, significantly promoted the development of chromosome-scale genome assemblies of model plants and crop species. In grasses, chromosome-scale genome assemblies recently became available for cultivated and wild species of the Triticeae subfamily. Development of state-of-the-art genomic resources in species of the Poeae subfamily, which includes important crops like fescues and ryegrasses, is lagging behind the progress in the cereal species. Results Here, we report a new chromosome-scale genome sequence assembly for perennial ryegrass, obtained by combining PacBio long-read sequencing, Illumina short-read polishing, BioNano optical mapping and Hi-C scaffolding. More than 90% of the total genome size of perennial ryegrass (approximately 2.55 Gb) is covered by seven pseudo-chromosomes that show high levels of collinearity to the orthologous chromosomes of Triticeae species. The transposon fraction of perennial ryegrass was found to be relatively low, approximately 35% of the total genome content, which is less than half of the genome repeat content of cultivated cereal species. We predicted 54,629 high-confidence gene models, 10,287 long non-coding RNAs and a total of 8,393 short non-coding RNAs in the perennial ryegrass genome. Conclusions The new reference genome sequence and annotation presented here are valuable resources for comparative genomic studies in grasses, as well as for breeding applications and will expedite the development of productive varieties in perennial ryegrass and related species.
Article
Full-text available
Protein ubiquitylation is an important posttranslational modification affecting a wide range of cellular processes. Due to the low abundance of ubiquitylated species in biological samples, considerable effort has been spent on methods to purify and detect ubiquitylated proteins. We have developed and characterized a novel tool for ubiquitin detection and purification based on OtUBD, a high-affinity ubiquitin-binding domain (UBD) derived from an Orientia tsutsugamushi deubiquitylase (DUB). We demonstrate that OtUBD can be used to purify both monoubiquitylated and polyubiquitylated substrates from yeast and human tissue culture samples and compare their performance with existing methods. Importantly, we found conditions for either selective purification of covalently ubiquitylated proteins or co-isolation of both ubiquitylated proteins and their interacting proteins. As proof of principle for these newly developed methods, we profiled the ubiquitylome and ubiquitin-associated proteome of the budding yeast Saccharomyces cerevisiae. Combining OtUBD affinity purification with quantitative proteomics, we identified potential substrates for the E3 ligases Bre1 and Pib1. OtUBD provides a versatile, efficient, and economical tool for ubiquitin research with specific advantages over certain other methods, such as in efficiently detecting monoubiquitylation or ubiquitin linkages to noncanonical sites.
ResearchGate has not been able to resolve any references for this publication.