BMC Genomics

Published by Springer Nature
Online ISSN: 1471-2164
Learn more about this page
Recent publications
PCA plots for the complete dataset coloured by phenotype (A) and with an outlier sample (sample ID 3344zz_18_1_19) removed in panel B. The principal component analysis was conducted on normalised read counts
Expression (TPM, transcripts per million) of KDM6B for each temperature switch regime and controls groups at each sampling point regardless of phenotype. The variation in expression levels, mostly driven by the presence of different phenotypes, is reflected in the standard error bars
Expression (TPM, transcripts per million) of three male and three female sex-associated genes for all samples with ovotestes produced from different switching conditions (x-axis). Each point represents an individual (by colour) to depict the range of expression values different individuals can exhibit for a given gene. Sample IDs correspond to those provided in Supplementary File S16, and are as follows: A) 3603zz_18_1_4, B) 3603zz_18_1_7, C) 3603zz_18_1_8, D) 3603zz_18_2_4, E) 3632zz_18_2_9, F) 3632zz_18_2_19, G) 3632zz_18_2_20, H) 3232zz_18_1_19
Histology sections of four individuals demonstrating the range of morphological characteristics that ovotestes can exhibit in embryonic Pogona vitticeps. Specimens in A and B have the most typical ovotestes phenotype where the cortex remains moderately thickened and rudimentary seminiferous tubules (indicated with arrows) dispersed throughout the medulla. The level of degeneration in the medulla varies, with specimen A showing more degradation than specimen B. Specimen C shows very pronounced characteristics of both sexes, having a thick cortex and well-formed seminiferous tubules. Specimen D possess a very thin cortex and a disorganised medulla with spare seminiferous tubules.
Expression (transcripts per million, TPM) for a subset of genes uniquely upregulated in ovotestes mentioned in the text. For the full list of genes see supplementary file S15
Article
Background In some vertebrate species, gene-environment interactions can determine sex, driving bipotential gonads to differentiate into either ovaries or testes. In the central bearded dragon ( Pogona vitticeps ), the genetic influence of sex chromosomes (ZZ/ZW) can be overridden by high incubation temperatures, causing ZZ male to female sex reversal. Previous research showed ovotestes, a rare gonadal phenotype with traits of both sexes, develop during sex reversal, leading to the hypothesis that sex reversal relies on high temperature feminisation to outcompete the male genetic cue. To test this, we conducted temperature switching experiments at key developmental stages, and analysed the effect on gonadal phenotypes using histology and transcriptomics. Results We found sexual fate is more strongly influenced by the ZZ genotype than temperature. Any exposure to low temperatures (28 °C) caused testes differentiation, whereas sex reversal required longer exposure to high temperatures. We revealed ovotestes exist along a spectrum of femaleness to male-ness at the transcriptional level. We found inter-individual variation in gene expression changes following temperature switches, suggesting both genetic sensitivity to, and the timing and duration of the temperature cue influences sex reversal. Conclusions These findings bring new insights to the mechanisms underlying sex reversal, improving our understanding of thermosensitive sex systems in vertebrates.
 
Identification and characterization of differentially expressed genes (DEGs) in different tissues. (A) The number of up-and down-regulated DEGs among T2 vs. T1, T3 vs. T2 and T3 vs. T1 comparisons. (B) Venn diagram comparison summarizing the number of differentially expressed genes among the three comparisons
Heatmaps of DEGs involved in phytohormone signaling including abscisic acid (ABA), jasmonic acid (JA), auxin, cytokinins and ethylene. Each column represents the mean expression value (log 2 FPKM, T2 sample are divided by T1, T3 sample are divided by T1 and T2, respectively) of three biological replicates obtained from RNA-Seq data
Heatmaps of DEGs encoding genes related with Ca 2+ signal transduction during wound healing of potato tubers (A1) and the up-regulated fold of StCDPKs (A2) and StRBOHs (A3) in different comparisons. (Calcium dependent protein kinase, CDPK; Respiratory burst oxidasehomologue, RBOH; calmodulinc binding protein, CaMCaL. Each column represents the mean expression value (log 2 FPKM, T2 sample are divided by T1, T3 sample are divided by T1 and T2, respectively) of three biological replicates obtained from RNA-Seq data)
Article
Background Wound healing is a representative phenomenon of potato tubers subjected to mechanical injuries. Our previous results found that benzo-(1,2,3)-thiadiazole-7-carbothioic acid S-methyl ester (BTH) promoted the wound healing of potato tubers. However, the molecular mechanism related to inducible wound healing remains unknown. Results Transcriptomic evaluation of healing tissues from potato tubers at three stages, namely, 0 d (nonhealing), 5 d (wounded tubers healed for 5 d) and 5 d (BTH-treated tubers healed for 5 d) using RNA-Seq and differentially expressed genes (DEGs) analysis showed that more than 515 million high-quality reads were generated and a total of 7665 DEGs were enriched, and 16 of these DEGs were selected by qRT-PCR analysis to further confirm the RNA sequencing data. Gene ontology (GO) enrichment analysis indicated that the most highly DEGs were involved in metabolic and cellular processes, and KEGG enrichment analysis indicated that a large number of DEGs were associated with plant hormones, starch and sugar metabolism, fatty acid metabolism, phenylpropanoid biosynthesis and terpenoid skeleton biosynthesis. Furthermore, a few candidate transcription factors, including MYB, NAC and WRKY, and genes related to Ca ²⁺ -mediated signal transduction were also found to be differentially expressed during wound healing. Most of these enriched DEGs were upregulated after BTH treatment. Conclusion This comparative expression profile provided useful resources for studies of the molecular mechanism via these promising candidates involved in natural or elicitor-induced wound healing in potato tubers.
 
E. coli OP50 and S. epidermidis avoidance behavior inN2 and CB4856 isolates. A Bacterial lawn avoidance assay. L4 worms were washed and then placed at the periphery of a bacterial lawn. At 1-hr and 24-hr timepoints the percentage of worms occupying the lawn was calculated. B-C Percentage of N2 and CB4856 animals on a lawn of E. coli OP50 (B) or. S. epidermidis (C). In each experiment, 20 worms were placed on NGM plates and the number of worms on the bacterial lawn was recorded at regular intervals. Data presented as mean and standard deviation of five independent experiments.***, p < 0.001; NS, p > 0.05
Expression differences in N2 and CB4856 animals after infection with pathogenic bacteria. A Intersection of differentially expressed genes and B correlation between log 2 fold changes in N2 and CB4856 animals infected with pathogenic bacteria. Spearman's rho value and P-values are presented in the upper left corner. Gray (Non-DE) and black dots (DE Genes) represent expressed and differentially expressed genes, respectively
Expression differences in N2 and CB4856 animals infected with the gram-positive pathogens, S. epidermidis or S. aureus. A Intersection of differentially expressed genes (top) and correlation betweenlog 2 fold changes (bottom) in N2 and CB4856 animals infected witheither S. epidermidis or B S. aureus. Spearman's rho value andP-values are presented in the upper left corner. Gray (Non-DE) and black dots(DE Genes) represent expressed and differentially expressed genes,respectively. C The distribution of up-and down-regulated genes that aredifferentially expressed in both N2 and CB4856 animals infected with either S.aureus or S. epidermidis
Differentially expressed genes (DEGs) in CB4856 animals following 24-hour exposure to microbial pathogens
Article
The soil-dwelling nematode Caenorhabditis elegans serves as a model system to study innate immunity against microbial pathogens. C. elegans have been collected from around the world, where they, presumably, adapted to regional microbial ecologies. Here we use survival assays and RNA-sequencing to better understand how two isolates from disparate climates respond to pathogenic bacteria. We found that, relative to N2 (originally isolated in Bristol, UK), CB4856 (isolated in Hawaii), was more susceptible to the Gram-positive microbe, Staphylococcus epidermidis , but equally susceptible to Staphylococcus aureus as well as two Gram-negative microbes, Providencia rettgeri and Pseudomonas aeruginosa . We performed transcriptome analysis of infected worms and found gene-expression profiles were considerably different in an isolate-specific and microbe-specific manner. We performed GO term analysis to categorize differential gene expression in response to S. epidermidis . In N2, genes that encoded detoxification enzymes and extracellular matrix proteins were significantly enriched, while in CB4856, genes that encoded detoxification enzymes, C-type lectins, and lipid metabolism proteins were enriched, suggesting they have different responses to S. epidermidis , despite being the same species. Overall, discerning gene expression signatures in an isolate by pathogen manner can help us to understand the different possibilities for the evolution of immune responses within organisms.
 
Article
Background Respectively, prostate cancer (PCa) and breast cancer (BC) are the second most and most commonly diagnosed cancer in men and women, and they account for a majority of cancer-related deaths world-wide. Cancer cells typically exhibit much-facilitated growth that necessitates upregulated glycolysis and augmented amino acid metabolism, that of glutamine and aspartate in particular, which is tightly coupled with an increased flux of the tricarboxylic acid (TCA) cycle. Epidemiological studies have exploited metabolomics to explore the etiology and found potentially effective biomarkers for early detection or progression of prostate and breast cancers. However, large randomized controlled trials (RCTs) to establish causal associations between amino acid metabolism and prostate and breast cancers have not been reported. Objective Utilizing two-sample Mendelian randomization (MR), we aimed to estimate how genetically predicted glutamate and aspartate levels could impact upon prostate and breast cancers development. Methods Single nucleotide polymorphisms (SNPs) as instrumental variables (IVs), associated with the serum levels of glutamate and aspartate were extracted from the publicly available genome-wide association studies (GWASs), which were conducted to associate genetic variations with blood metabolite levels using comprehensive metabolite profiling in 1,960 adults; and the glutamate and aspartate we have chosen were two of 644 metabolites. The summary statistics for the largest and latest GWAS datasets for prostate cancer (61,106 controls and 79,148 cases) were from the Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium, and datasets for breast cancer (113,789 controls and 133,384 cases) were from Breast Cancer Association Consortium (BCAC). The study was performed through two-sample MR method. Results Causal estimates were expressed as odds ratios (OR) and 95% confidence interval (CI) per standard deviation increment in serum level of aspartate or glutamate. Aspartate was positively associated with prostate cancer (Effect = 1.043; 95% confidence interval, 1.003 to 1.084; P = 0.034) and breast cancer (Effect = 1.033; 95% confidence interval, 1.004 to 1.063; P = 0.028); however, glutamate was neither associated with prostate cancer nor with breast cancer. The potential causal associations were robust to the sensitivity analysis. Conclusions Our study found that the level of serum aspartate could serve as a risk factor that contributed to the development of prostate and breast cancers. Efforts on a detailed description of the underlying biochemical mechanisms would be extremely valuable in early assessment and/or diagnosis, and strategizing clinical intervention, of both cancers.
 
Article
Background Structural variants (SV) are causative for some prominent phenotypic traits of livestock as different comb types in chickens or color patterns in pigs. Their effects on production traits are also increasingly studied. Nevertheless, accurately calling SV remains challenging. It is therefore of interest, whether close-by single nucleotide polymorphisms (SNPs) are in strong linkage disequilibrium (LD) with SVs and can serve as markers. Literature comes to different conclusions on whether SVs are in LD to SNPs on the same level as SNPs to other SNPs. The present study aimed to generate a precise SV callset from whole-genome short-read sequencing (WGS) data for three commercial chicken populations and to evaluate LD patterns between the called SVs and surrounding SNPs. It is thereby the first study that assessed LD between SVs and SNPs in chickens. Results The final callset consisted of 12,294,329 bivariate SNPs, 4,301 deletions (DEL), 224 duplications (DUP), 218 inversions (INV) and 117 translocation breakpoints (BND). While average LD between DELs and SNPs was at the same level as between SNPs and SNPs, LD between other SVs and SNPs was strongly reduced (DUP: 40%, INV: 27%, BND: 19% of between-SNP LD). A main factor for the reduced LD was the presence of local minor allele frequency differences, which accounted for 50% of the difference between SNP – SNP and DUP – SNP LD. This was potentially accompanied by lower genotyping accuracies for DUP, INV and BND compared with SNPs and DELs. An evaluation of the presence of tag SNPs (SNP in highest LD to the variant of interest) further revealed DELs to be slightly less tagged by WGS SNPs than WGS SNPs by other SNPs. This difference, however, was no longer present when reducing the pool of potential tag SNPs to SNPs located on four different chicken genotyping arrays. Conclusions The results implied that genomic variance due to DELs in the chicken populations studied can be captured by different SNP marker sets as good as variance from WGS SNPs, whereas separate SV calling might be advisable for DUP, INV, and BND effects.
 
Schematic diagrams of IHHNV-EVE clusters in the draft Penaeus monodon WGS. A and B Sequence diagrams for Cluster 1 and 2, respectively, in PC35 with high homology to the non-infectious IHHNV-A query portion (1–3025 bp) of the GenBank record DQ228358. Some of the EVE sequences in the 2 clusters correspond to the same region of DQ228358 (i.e., the same color) but may differ in length and in reading direction indicated by arrowheads. Others are unique to each cluster. A zoom-in expansion of each EVE cluster is shown beneath. Colored numbers below indicate the nucleotide positions corresponding to GenBank record DQ228358. The portion of the record related to the host transposable-element portion of DQ228358 is indicated by a dark brown arrow. C Diagram of the IHHNV-EVE cluster in PC7 (GenBank accession no. JABERT010000007.1) with EVE showing high sequence identity (99%) to GenBank record AF218266 for infectious IHHNV. The numbers below the arrows represent the matching positions AF218266
Agarose gel showing that no PCR amplicon was obtained using archived P. monodon DNA from the Thai genome project as the template with primer set 98F/3762R designed for detection of a 3665 base portion (94%) of the IHHNV genome. N = negative control (without template); Pm = P. monodon DNA from the Thai genome project as a template; P = IHHNV genomic DNA as a positive control. The arrow indicates the 3665 bp-PCR amplicon from P
Agarose gel showing PCR amplicons obtained using archived P. monodon DNA from the Thai genome project as the template with the 309F/R primers recommended by OIE for detection of infectious IHHNV. Pm = P. monodon DNA from the Thai genome project as a template; P = IHHNV genomic DNA as a positive control. The arrow indicates the 309 bp-PCR amplicons from Pm and P
Diagram of the location of two PCR targets (R1 of 1,000 bp and R2 of 1,100 bp) within the IHHNV-EVE cluster in PC7. R1 contains two discontinuous fragments of the IHHNV genome in opposite reading directions while R2 contains two discontinuous fragments in the same reading direction. The accompanying agarose gels show amplicons of the predicted sizes (arrows). Details and sequence alignments are shown in Supplementary Fig. S5
Article
  • Suparat TaengchaiyaphumSuparat Taengchaiyaphum
  • Prapatsorn WongkhaluangPrapatsorn Wongkhaluang
  • Kanchana SittikankaewKanchana Sittikankaew
  • [...]
  • Kallaya SritunyalucksanaKallaya Sritunyalucksana
Background Shrimp have the ability to accommodate viruses in long term, persistent infections without signs of disease. Endogenous viral elements (EVE) play a role in this process probably via production of negative-sense Piwi-interacting RNA (piRNA)-like fragments. These bind with Piwi proteins to dampen viral replication via the RNA interference (RNAi) pathway. We searched a genome sequence (GenBank record JABERT000000000) of the giant tiger shrimp ( Penaeus monodon for the presence of EVE related to a shrimp parvovirus originally named infectious hypodermal and hematopoietic necrosis virus (IHHNV). Results The shrimp genome sequence contained three piRNA-like gene clusters containing scrambled IHHNV EVE. Two clusters were located distant from one another in pseudochromosome 35 (PC35). Both PC35 clusters contained multiple sequences with high homology (99%) to GenBank records DQ228358 and EU675312 that were both called “non-infectious IHHNV Type A” (IHHNV-A) when originally discovered. However, our results and those from a recent Australian P. monodon genome assembly indicate that the relevant GenBank records for IHHNV-A are sequence-assembly artifacts derived from scrambled and fragmental IHHNV-EVE. Although the EVE in the two PC35 clusters showed high homology only to IHHNV-A, the clusters were separate and distinct with respect to the arrangement (i.e., order and reading direction) and proportional content of the IHHNV-A GenBank records. We conjecture that these 2 clusters may constitute independent allele-like clusters on a pair of homologous chromosomes. The third EVE cluster was found in pseudochromosome 7 (PC7). It contained EVE with high homology (99%) only to GenBank record AF218266 with the potential to protect shrimp against current types of infectious IHHNV. One disadvantage was that some EVE in PC7 can give false positive PCR test results for infectious IHHNV. Conclusions Our results suggested the possibility of viral-type specificity in EVE clusters. Specificity is important because whole EVE clusters for one viral type would be transmitted to offspring as collective hereditary units. This would be advantageous if one or more of the EVE within the cluster were protective against the disease caused by the cognate virus. It would also facilitate gene editing for removal of non-protective EVE clusters or for transfer of protective EVE clusters to genetically improve existing shrimp breeding stocks that might lack them.
 
Article
Background Past selection events left footprints in the genome of domestic animals, which can be traced back by stretches of homozygous genotypes, designated as runs of homozygosity (ROHs). The analysis of common ROH regions within groups or populations displaying potential signatures of selection requires high-quality SNP data as well as carefully adjusted ROH-defining parameters. In this study, we used a simultaneous testing of rule- and model-based approaches to perform strategic ROH calling in genomic data from different pig populations to detect genomic regions under selection for specific phenotypes. Results Our ROH analysis using a rule-based approach offered by PLINK, as well as a model-based approach run by RZooRoH demonstrated a high efficiency of both methods. It underlined the importance of providing a high-quality SNP set as input as well as adjusting parameters based on dataset and population for ROH calling. Particularly, ROHs ≤ 20 kb were called in a high frequency by both tools, but to some extent covered different gene sets in subsequent analysis of ROH regions common for investigated pig groups. Phenotype associated ROH analysis resulted in regions under potential selection characterizing heritage pig breeds, known to harbour a long-established breeding history. In particular, the selection focus on fitness-related traits was underlined by various ROHs harbouring disease resistance or tolerance-associated genes. Moreover, we identified potential selection signatures associated with ear morphology, which confirmed known candidate genes as well as uncovered a missense mutation in the ABCA6 gene potentially supporting ear cartilage formation. Conclusions The results of this study highlight the strengths and unique features of rule- and model-based approaches as well as demonstrate their potential for ROH analysis in animal populations. We provide a workflow for ROH detection, evaluating the major steps from filtering for high-quality SNP sets to intersecting ROH regions. Formula-based estimations defining ROHs for rule-based method show its limits, particularly for efficient detection of smaller ROHs. Moreover, we emphasize the role of ROH detection for the identification of potential footprints of selection in pigs, displaying their breed-specific characteristics or favourable phenotypes.
 
Article
Background Type 2C protein phosphatase (PP2C) is a negative regulator of ABA signaling pathway, which plays important roles in stress signal transduction in plants. However, little research on the PP2C genes family of cucumber ( Cucumis sativus L.), as an important economic vegetable, has been conducted. Results This study conducted a genome-wide investigation of the CsPP2C gene family. Through bioinformatics analysis, 56 CsPP2C genes were identified in cucumber. Based on phylogenetic analysis, the PP2C genes of cucumber and Arabidopsis were divided into 13 groups. Gene structure and conserved motif analysis showed that CsPP2C genes in the same group had similar gene structure and conserved domains. Collinearity analysis showed that segmental duplication events played a key role in the expansion of the cucumber PP2C genes family. In addition, the expression of CsPP2Cs under different abiotic treatments was analyzed by qRT-PCR. The results reveal that CsPP2C family genes showed different expression patterns under ABA, drought, salt, and cold treatment, and that CsPP2C3 , 11 – 17 , 23 , 45 , 54 and 55 responded significantly to the four stresses. By predicting the cis-elements in the promoter, we found that all CsPP2C members contained ABA response elements and drought response elements. Additionally, the expression patterns of CsPP2C genes were specific in different tissues. Conclusions The results of this study provide a reference for the genome-wide identification of the PP2C gene family in other species and provide a basis for future studies on the function of PP2C genes in cucumber.
 
Simulation results of renin secretion network under fixed effecting nodes or edges. Type I error and power of both NeRiT and PMNT with data simulated based on renin secretion network under fixed effecting nodes or edges and four different between-node correlation patterns using DPR as the imputation model in TWAS. The red dotted line represents the significance level (α=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{\alpha }=0.05$$\end{document}). A Only node has effect; (B) Only edge has effect; the results for effecting node (C) or for effecting edge (D) when both node and edge change with changing node hanging on the edge; the results for effecting node (E) or for effecting edge (F) when both node and edge change with changing node not hanging on the edge
Simulation results of lipid and atherosclerosis network under fixed effecting nodes or edges. Type I error and power of both NeRiT and PMNT with data simulated based on lipid and atherosclerosis network under random effecting nodes or edges and four different between-node correlation patterns using DPR as the imputation model in TWAS. The red dotted line represents the significance level (α=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{\alpha }=0.05$$\end{document}). A Only node has effect; (B) Only edge has effect; the results for effecting node (C) or for effecting edge (D) when both node and edge change with changing node hanging on the edge; the results for effect node (E) or for effecting edge (F) when both node and edge change with changing node not hanging on the edge
The network structure based on the renin secretion pathway from KEGG
The network structure based on the lipid and atherosclerosis pathway from KEGG
The network structure based on the aldosterone-regulated sodium reabsorption pathway from KEGG
Article
Background Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. Results Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. Conclusions NeRiT is a powerful and efficient network regression method in TWAS.
 
Article
Background Cyclic nucleotide-gated ion channels ( CNGCs ) are calcium-permeable channels that participate in a variety of biological functions, such as signaling pathways, plant development, and environmental stress and stimulus responses. Nevertheless, there have been few studies on CNGC gene family in cotton. Results In this study, a total of 114 CNGC genes were identified from the genomes of 4 cotton species. These genes clustered into 5 main groups: I, II, III, IVa, and IVb. Gene structure and protein motif analysis showed that CNGCs on the same branch were highly conserved. In addition, collinearity analysis showed that the CNGC gene family had expanded mainly by whole-genome duplication (WGD). Promoter analysis of the GhCNGCs showed that there were a large number of cis-acting elements related to abscisic acid (ABA). Combination of transcriptome data and the results of quantitative RT–PCR (qRT–PCR) analysis revealed that some GhCNGC genes were induced in response to salt and drought stress and to exogenous ABA. Virus-induced gene silencing (VIGS) experiments showed that the silencing of the GhCNGC32 and GhCNGC35 genes decreased the salt tolerance of cotton plants (TRV:00). Specifically, physiological indexes showed that the malondialdehyde (MDA) content in gene-silenced plants (TRV: GhCNGC32 and TRV: GhCNGC35 ) increased significantly under salt stress but that the peroxidase (POD) activity decreased. After salt stress, the expression level of ABA-related genes increased significantly, indicating that salt stress can trigger the ABA signal regulatory mechanism. Conclusions we comprehensively analyzed CNGC genes in four cotton species, and found that GhCNGC32 and GhCNGC35 genes play an important role in cotton salt tolerance. These results laid a foundation for the subsequent study of the involvement of cotton CNGC genes in salt tolerance.
 
Phylogenetic tree showing association between Mycobacterium tuberculosis lineages and drug resistance
Article
Background Mycobacterium tuberculosis presents several lineages each with distinct characteristics of evolutionary status, transmissibility, drug resistance, host interaction, latency, and vaccine efficacy. Whole genome sequencing (WGS) has emerged as a new diagnostic tool to reliably inform the occurrence of phylogenetic lineages of Mycobacterium tuberculosis and examine their relationship with patient demographic characteristics and multidrug-resistance development. Methods 191 Mycobacterium tuberculosis isolates obtained from a 2017/2018 Tanzanian drug resistance survey were sequenced on the Illumina Miseq platform at Supranational Tuberculosis Reference Laboratory in Uganda. Obtained fast-q files were imported into tools for resistance profiling and lineage inference (Kvarq v0.12.2, Mykrobe v0.8.1 and TBprofiler v3.0.5). Additionally for phylogenetic tree construction, RaxML-NG v1.0.3(25) was used to generate a maximum likelihood phylogeny with 800 bootstrap replicates. The resulting trees were plotted, annotated and visualized using ggtree v2.0.4 Results Most [172(90.0%)] of the isolates were from newly treated Pulmonary TB patients. Coinfection with HIV was observed in 33(17.3%) TB patients. Of the 191 isolates, 22(11.5%) were resistant to one or more commonly used first line anti-TB drugs (FLD), 9(4.7%) isolates were MDR-TB while 3(1.6%) were resistant to all the drugs. Of the 24 isolates with any resistance conferring mutations, 13(54.2%) and 10(41.6%) had mutations in genes associated with resistance to INH and RIF respectively. The findings also show four major lineages i.e. Lineage 3[81 (42.4%)], followed by Lineage 4 [74 (38.7%)], the Lineage 1 [23 (12.0%)] and Lineages 2 [13 (6.8%)] circulaing in Tanzania. Conclusion The findings in this study show that Lineage 3 is the most prevalent lineage in Tanzania whereas drug resistant mutations were more frequent among isolates that belonged to Lineage 4.
 
Article
Background Plants synthesize metabolites to adapt to a continuously changing environment. Metabolite biosynthesis often occurs in response to the tissue-specific combinatorial developmental cues that are transcriptionally regulated. Polyphyllins are the major bioactive components in Paris species that demonstrate hemostatic, anti-inflammatory and antitumor effects and have considerable market demands. However, the mechanisms underlying polyphyllin biosynthesis and regulation during plant development have not been fully elucidated. Results Tissue samples of P. polyphylla var. yunnanensis during the four dominant developmental stages were collected and investigated using high-performance liquid chromatography and RNA sequencing. Polyphyllin concentrations in the different tissues were found to be highly dynamic across developmental stages. Specifically, decreasing trends in polyphyllin concentration were observed in the aerial vegetative tissues, whereas an increasing trend was observed in the rhizomes. Consistent with the aforementioned polyphyllin concentration trends, different patterns of spatiotemporal gene expression in the vegetative tissues were found to be closely related with polyphyllin biosynthesis. Additionally, molecular dissection of the pathway components revealed 137 candidate genes involved in the upstream pathway of polyphyllin backbone biosynthesis. Furthermore, gene co-expression network analysis revealed 74 transcription factor genes and one transporter gene associated with polyphyllin biosynthesis and allocation. Conclusions Our findings outline the framework for understanding the biosynthesis and accumulation of polyphyllins during plant development and contribute to future research in elucidating the molecular mechanism underlying polyphyllin regulation and accumulation in P. polyphylla .
 
Article
Background Advancements in genomic sequencing continually improve personalized medicine, and recent breakthroughs generate multimodal data on a cellular level. We introduce MOSCATO, a technique for selecting features across multimodal single-cell datasets that relate to clinical outcomes. We summarize the single-cell data using tensors and perform regularized tensor regression to return clinically-associated variable sets for each ‘omic’ type. Results Robustness was assessed over simulations based on available single-cell simulation methods, and applicability was assessed through an example using CITE-seq data to detect genes associated with leukemia. We find that MOSCATO performs favorably in selecting network features while also shown to be applicable to real multimodal single-cell data. Conclusions MOSCATO is a useful analytical technique for supervised feature selection in multimodal single-cell data. The flexibility of our approach enables future extensions on distributional assumptions and covariate adjustments.
 
Location of all CpGs with 5x coverage. Percentage of highly methylated (≥ 50%), moderately methylated (10–50%), and lowly methylated (≤ 10%) CpGs in various genome features
Percent methylation values for DML created using an euclidean distance matrix. Samples in low pH conditions are represented by black, and samples in ambient pH conditions are represented by gray, with maturation stage along the bottom (0 = indeterminate, 3 = spawn-ready mature female). Darker colors indicate higher percent methylation, and a density plot depicts the distribution of percent methylation values for a panel. After excluding C- > T SNPs, 1284 DML were identified using a logistic regression, using a chi-squared test and 50% methylation difference cut-off
Distribution of DML in main chromosomes. Number of DML normalized by number of CpG in each chromosome (bars) and number of genes (line) in each chromosome. Additional DML were identified in scaffolds that were not mapped to any of the ten main linkage groups (Supplementary Table 3)
Location of CpGs with 5x coverage and DML. Percentage of 5x CpGs and DML found in various genome features
Article
Background There is a need to investigate mechanisms of phenotypic plasticity in marine invertebrates as negative effects of climate change, like ocean acidification, are experienced by coastal ecosystems. Environmentally-induced changes to the methylome may regulate gene expression, but methylome responses can be species- and tissue-specific. Tissue-specificity has implications for gonad tissue, as gonad-specific methylation patterns may be inherited by offspring. We used the Pacific oyster ( Crassostrea gigas) — a model for understanding pH impacts on bivalve molecular physiology due to its genomic resources and importance in global aquaculture— to assess how low pH could impact the gonad methylome. Oysters were exposed to either low pH (7.31 ± 0.02) or ambient pH (7.82 ± 0.02) conditions for 7 weeks. Whole genome bisulfite sequencing was used to identify methylated regions in female oyster gonad samples. C- > T single nucleotide polymorphisms were identified and removed to ensure accurate methylation characterization. Results Analysis of gonad methylomes revealed a total of 1284 differentially methylated loci (DML) found primarily in genes, with several genes containing multiple DML. Gene ontologies for genes containing DML were involved in development and stress response, suggesting methylation may promote gonad growth homeostasis in low pH conditions. Additionally, several of these genes were associated with cytoskeletal structure regulation, metabolism, and protein ubiquitination — commonly-observed responses to ocean acidification. Comparison of these DML with other Crassostrea spp. exposed to ocean acidification demonstrates that similar pathways, but not identical genes, are impacted by methylation. Conclusions Our work suggests DNA methylation may have a regulatory role in gonad and larval development, which would shape adult and offspring responses to low pH stress. Combined with existing molluscan methylome research, our work further supports the need for tissue- and species-specific studies to understand the potential regulatory role of DNA methylation.
 
Article
Background Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction. Results We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model’s performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, $$F_{1}$$ F 1 , Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, $$F_{1}$$ F 1 , Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent. Conclusion Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model.
 
Article
Background RNA preparations contaminated with genomic DNA (gDNA) are frequently disregarded by RNA-seq studies. Such contamination may generate false results; however, their effect on the outcomes of RNA-seq analyses is unknown. To address this gap in our knowledge, here we added different concentrations of gDNA to total RNA preparations and subjected them to RNA-seq analysis. Results We found that the contaminating gDNA altered the quantification of transcripts at relatively high concentrations. Differentially expressed genes (DEGs) resulting from gDNA contamination may therefore contribute to higher rates of false enrichment of pathways compared with analogous samples lacking numerous DEGs. A strategy was developed to correct gene expression levels in gDNA-contaminated RNA samples, which assessed the magnitude of contamination to improve the reliability of the results. Conclusions Our study indicates that caution must be exercised when interpreting results associated with low-abundance transcripts. The data provided here will likely serve as a valuable resource to evaluate the influence of gDNA contamination on RNA-seq analysis, particularly related to the detection of putative novel gene elements.
 
Article
Tetrodotoxin (TTX) is a deadly neurotoxin and usually accumulates in large amounts in the ovaries but is non-toxic or low toxic in the testis of pufferfish. The molecular mechanism underlying sexual dimorphism accumulation of TTX in ovary and testis, and the relationship between TTX accumulation with sex related genes expression remain largely unknown. The present study investigated the effects of exogenous TTX treatment on Takifugu flavidus . The results demonstrated that exogenous TTX administration significantly incresed level of TTX concentration in kidney, cholecyst, skin, liver, heart, muscle, ovary and testis of the treatment group (TG) than that of the control group (CG). Transcriptome sequencing and analysis were performed to study differential expression profiles of mRNA and piRNA after TTX administration of the ovary and testis. The results showed that compared with female control group (FCG) and male control group (MCG), TTX administration resulted in 80 and 23 piRNAs, 126 and 223 genes up and down regulated expression in female TTX-treated group (FTG), meanwhile, 286 and 223 piRNAs, 2 and 443 genes up and down regulated expression in male TTX-treated group (MTG). The female dominant genes cyp19a1 , gdf9 and foxl2 were found to be up-regulated in MTG. The cyp19a1 , whose corresponding target piRNA uniq_554482 was identified as down-regulated in the MTG, indicating the gene expression feminization in testis after exogenous TTX administration. The KEGG enrichment analysis revealed that differentially expressed genes (DEGs) and piRNAs (DEpiRNAs) in MTG vs MCG group were more enriched in metabolism pathways, indicating that the testis produced more metabolic pathways in response to exogenous TTX, which might be a reason for the sexual dimorphism of TTX distribution in gonads. In addition, TdT-mediated dUTP-biotin nick end labeling staining showed that significant apoptosis was detected in the MTG testis, and the role of the cell apoptotic pathways was further confirmed. Overall, our research revealed that the response of the ovary and testis to TTX administration was largely different, the ovary is more tolerant whereas the testis is more sensitive to TTX. These data will deepen our understanding on the accumulation of TTX sexual dimorphism in Takifugu .
 
Article
Background The expression and biological functions of circular RNAs (circRNAs) in reproductive organs have been extensively reported. However, it is still unclear whether circRNAs are involved in sex change. To this end, RNA sequencing (RNA-seq) was performed in gonads at 5 sexual stages (ovary, early intersexual stage gonad, middle intersexual stage gonad, late intersexual stage gonad, and testis) of ricefield eel, and the expression profiles and potential functions of circRNAs were studied. Results Seven hundred twenty-one circRNAs were identified, and the expression levels of 10 circRNAs were verified by quantitative real-time PCR (qRT–PCR) and found to be in accordance with the RNA-seq data, suggesting that the RNA-seq data were reliable. Then, the sequence length, category, sequence composition and the relationship between the parent genes of the circRNAs were explored. A total of 147 circRNAs were differentially expressed in the sex change process, and GO and KEGG analyses revealed that some differentially expressed (such as novel_circ_0000659, novel_circ_0004005 and novel_circ_0005865) circRNAs were closely involved in sex change. Furthermore, expression pattern analysis demonstrated that both circSnd1 and foxl2 were downregulated in the process of sex change, which was contrary to mal-miR-135b. Finally, dual-luciferase reporter assay and RNA immunoprecipitation showed that circSnd1 and foxl2 can combine with mal-miR-135b and mal-miR-135c. These data revealed that circSnd1 regulates foxl2 expression in the sex change of ricefield eel by acting as a sponge of mal-miR-135b/c. Conclusion Our results are the first to demonstrate that circRNAs have potential effects on sex change in ricefield eel; and circSnd1 could regulate foxl2 expression in the sex change of ricefield eel by acting as a sponge of mal-miR-135b/c. These data will be useful for enhancing our understanding of sequential hermaphroditism and sex change in ricefield eel or other teleosts.
 
Article
Background In cold regions, low temperature is the main limiting factor affecting grape production. As an important breeding resource, V. amurensis Rupr. has played a crucial role in the discovery of genes which confer cold resistance in grapes. Thus far, many cold-resistance genes have been reported based on the study of V. amurensis . In order to identify more candidate genes related to cold resistance in V. amurensis , QTL mapping and RNA-seq was conducted based on the hybrid population and different cold-resistance cultivars in this study. Results In this study, highly cold-resistant grape cultivar ‘Shuangyou’ (SY) which belongs to V. amurensis, and cold-sensitive cultivar ‘Red Globe’ (RG) which belongs to Vitis vinifera L . were used to identify cold resistance genes. Cold-resistance quantitative trait locus (QTL) mapping was performed based on genetic population construction through interspecific crossing of ‘Shuangyou’ and ‘Red Globe’. Additionally, transcriptome analysis was conducted for the dormant buds of these two cultivars at different periods. Based on transcriptome analysis and QTL mapping, many new structural genes and transcription factors which relate to V. amurensis cold resistance were discovered, including CORs ( VaCOR413IM ), GSTs ( VaGST - APIC , VaGST - PARB , VaGSTF9 and VaGSTF13 ), ARFs ( VaIAA27 and VaSAUR71 ), ERFs ( VaAIL1 ), MYBs ( VaMYBR2 , VaMYBLL and VaMYB3R-1 ) and bHLHs ( VaICE1 and VabHLH30 ). Conclusions This discovery of candidate cold-resistance genes will provide an important theoretical reference for grape cold-resistance mechanisms, research, and cold-resistant grape cultivar breeding in the future.
 
Article
Background The Qinba region is the transition region between Indica and Japonica varieties in China. It has a long history of Indica rice planting of more than 7000 years and is also a planting area for fine-quality Indica rice. The aims of this study are to explore different genetic markers applied to the analysis population structure, genetic diversity, selection and optimization of molecular markers of Indica rice, thus providing more information for the protection and utilization on germplasm resources of Indica rice. Methods Fifteen phenotypic traits, a core set of 48 SSR markers which originated protocol for identification of rice varieties-SSR marker method in agricultural industry standard of the People's Republic of China (Ministry of Agriculture of the PRC, NY/T1433-2014, Protocol for identification of rice varieties-SSR marker method, 2014), and SNPs data obtained by genotyping-by-sequencing (GBS, NlaIII and MseI digestion, referred to as SNPs-NlaIII and SNPs-MseI, respectively) for this panel of 93 samples using the Illumina HiSeq2000 sequencing platform, were employed to explore the genetic diversity and population structure of 93 samples. Results The average of coefficient of variation (CV) and diversity index (He) were 29.72% and 1.83 ranging from 3.07% to 137.43%, and from 1.45 to 2.03, respectively. The correlation coefficient between 15 phenotypic traits ranged from 0.984 to -0.604. The first four PCs accounted for 70.693% phenotypic variation based on phenotypic analysis. A total of 379 alleles were obtained using SSR markers, encompassing an average of 8.0 alleles per primer. Polymorphic bands (PPB) and polymorphism information content (PIC) was 88.65% and 0.77, respectively. The Mantel test showed that the correlation between the genetic distance matrix based on SNPs-NlaIII and SNPs-MseI was the largest (R²=0.88), and that based on 15 phenotypic traits and SSR was the smallest (R²=0.09). The 93 samples could be clustered into two subgroups by 3 types of genetic markers. Molecular variance analysis revealed that the genetic variation was 2% among populations and 98% within populations (the Nm was 0.16), Tajima’s D value was 1.66, the FST between the two populations was 0.61 based on 72,824 SNPs. Conclusions The population genetic variation explained by SNPs was larger than that explained by SSRs. The gene flow of 93 samples used in this study was larger than that of naturally self-pollinated crops, which may be caused by long-term breeding selection of Indica rice in the Qinba region. The genetic structure of the 93 samples was simple and lacked rare alleles.
 
Article
Background Transcription factors (TFs) play important roles in plants. Among the major TFs, GATA plays a crucial role in plant development, growth, and stress responses. However, there have been few studies on the GATA gene family in foxtail millet (Setaria italica). The release of the foxtail millet reference genome presents an opportunity for the genome-wide characterization of these GATA genes. Results In this study, we identified 28 GATA genes in foxtail millet distributed on seven chromosomes. According to the classification method of GATA members in Arabidopsis, SiGATA was divided into four subfamilies, namely subfamilies I, II, III, and IV. Structural analysis of the SiGATA genes showed that subfamily III had more introns than other subfamilies, and a large number of cis-acting elements were abundant in the promoter region of the SiGATA genes. Three tandem duplications and five segmental duplications were found among SiGATA genes. Tissue-specific results showed that the SiGATA genes were mainly expressed in foxtail millet leaves, followed by peels and seeds. Many genes were significantly induced under the eight abiotic stresses, such as SiGATA10, SiGATA16, SiGATA18, and SiGATA25, which deserve further attention. Conclusions Collectively, these findings will be helpful for further in-depth studies of the biological function of SiGATA, and will provide a reference for the future molecular breeding of foxtail millet.
 
Article
Background OSCA (hyperosmolality-gated calcium-permeable channel) is a calcium permeable cation channel protein that plays an important role in regulating plant signal transduction. It is involved in sensing changes in extracellular osmotic potential and an increase in Ca²⁺ concentration. S. habrochaites is a good genetic material for crop improvement against cold, late blight, planthopper and other diseases. Till date, there is no report on OSCA in S. habrochaites. Thus, in this study, we performed a genome-wide screen to identify OSCA genes in S. habrochaites and characterized their responses to biotic and abiotic stresses. Results A total of 11 ShOSCA genes distributed on 8 chromosomes were identified. Subcellular localization analysis showed that all members of ShOSCA localized on the plasma membrane and contained multiple stress-related cis acting elements. We observed that genome-wide duplication (WGD) occurred in the genetic evolution of ShOSCA5 (Solhab04g250600) and ShOSCA11 (Solhab12g051500). In addition, repeat events play an important role in the expansion of OSCA gene family. OSCA gene family of S. habrochaites used the time lines of expression studies by qRT-PCR, do indicate OSCAs responded to biotic stress (Botrytis cinerea) and abiotic stress (drought, low temperature and abscisic acid (ABA)). Among them, the expression of ShOSCAs changed significantly under four stresses. The resistance of silencing ShOSCA3 plants to the four stresses was reduced. Conclusion This study identified the OSCA gene family of S. habrochaites for the first time and analyzed ShOSCA3 has stronger resistance to low temperature, ABA and Botrytis cinerea stress. This study provides a theoretical basis for clarifying the biological function of OSCA, and lays a foundation for tomato crop improvement.
 
Article
Background Green-fleshed radish (Raphanus sativus L.) is an economically important root vegetable of the Brassicaceae family, and chlorophyll accumulates in its root tissues. It was reported that the basic helix-loop-helix (bHLH) transcription factors play vital roles in the process of chlorophyll metabolism. Nevertheless, a comprehensive study on the bHLH gene family has not been performed in Raphanus sativus L. Results In this study, a total of 213 Raphanus sativus L. bHLH (RsbHLH) genes were screened in the radish genome, which were grouped into 22 subfamilies. 204 RsbHLH genes were unevenly distributed on nine chromosomes, and nine RsbHLH genes were located on the scaffolds. Gene structure analysis showed that 25 RsbHLH genes were intron-less. Collineation analysis revealed the syntenic orthologous bHLH gene pairs between radish and Arabidopsis thaliana/Brassica rapa/Brassica oleracea. 162 RsbHLH genes were duplicated and retained from the whole genome duplication event, indicating that the whole genome duplication contributed to the expansion of the RsbHLH gene family. RNA-seq results revealed that RsbHLH genes had a variety of expression patterns at five development stages of green-fleshed radish and white-fleshed radish. In addition, the weighted gene co-expression network analysis confirmed four RsbHLH genes closely related to chlorophyll content. Conclusions A total of 213 RsbHLH genes were identified, and we systematically analyzed their gene structure, evolutionary and collineation relationships, conserved motifs, gene duplication, cis-regulatory elements and expression patterns. Finally, four bHLH genes closely involved in chlorophyll content were identified, which may be associated with the photosynthesis of the green-fleshed radish. The current study would provide valuable information for further functional exploration of RsbHLH genes, and facilitate clarifying the molecular mechanism underlying photosynthesis process in green-fleshed radish.
 
Overview of study workflow. Gene-wise models are fitted for various outcome variables based on reported information or metabolomic surrogates, respectively
Comparison of association study result characteristics. Number of significant associations (based on bacon-corrected and FDR-corrected p-values pbadj<0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p_{b_{adj}}<0.05$\end{document}) (A), mean absolute effect sizes (based on bacon-corrected effect sizes) across all genes (B), and bias (C) and inflation (D) of test statistics (t-statistic) for alternative models per comparisons and cohort. Horizontal lines for test-statistic mean =0 and standard deviation =1 of theoretical null distribution were added. Comparisons are ordered by performance of metabolomic predictors for binary outcome measures. Type of outcome variable is indicated by color: reported or measured variable = black, metabolomic surrogate = orange. Mean values across four cohorts (two cohorts for hsCRP) are plotted as horizontal bars. Note the log10 scale on the y-axis of the upper plot
Pairwise comparisons of association study results. Absolute Pearson correlation coefficients (Pearson r) of bacon-adjusted regression coefficients of gene-wise linear models (limma/voom) for outcome variables in alternative models per comparison and cohort. Comparisons are ordered by performance (AUC) of metabolomic predictors for binary outcome measures. Mean values across four cohorts (two cohorts for hsCRP) are plotted as horizontal bars (gray)
Meta-analyses and replication studies. Number of meta-analyzed genes (significant associations, bacon-adjusted p-values FDR-adjusted for multiple testing, p<0.05) in leave-one-cohort meta-analyses (A) and percentage of genes replicated (significant associations, Bonferroni-adjusted for multiple testing, p<0.05) in replication cohort (B). Type of outcome variable is indicated by color: reported or measured variable = black, metabolomic surrogate = orange. Mean values across four meta-analyses/replication studies are plotted as horizontal bars. Note the log10 scale on the y-axis of the upper plot
Gene-set enrichment analyses of association study results. Numbers of significantly enriched (Bonferroni-adjusted p<0.05) pathways (Reactome) for each outcome found in each cohort and in meta-analysis of all four (two for hsCRP) cohorts (top). Values for each type of outcome variable are represented as colored bars: reported variable = black, metabolomic surrogate = orange, intersection, i.e., pathways found by all outcome variables = blue
Article
Population-scale expression profiling studies can provide valuable insights into biological and disease-underlying mechanisms. The availability of phenotypic traits is essential for studying clinical effects. Therefore, missing, incomplete, or inaccurate phenotypic information can make analyses challenging and prevent RNA-seq or other omics data to be reused. A possible solution are predictors that infer clinical or behavioral phenotypic traits from molecular data. While such predictors have been developed based on different omics data types and are being applied in various studies, metabolomics-based surrogates are less commonly used than predictors based on DNA methylation profiles.In this study, we inferred 17 traits, including diabetes status and exposure to lipid medication, using previously trained metabolomic predictors. We evaluated whether these metabolomic surrogates can be used as an alternative to reported information for studying the respective phenotypes using expression profiling data of four population cohorts. For the majority of the 17 traits, the metabolomic surrogates performed similarly to the reported phenotypes in terms of effect sizes, number of significant associations, replication rates, and significantly enriched pathways.The application of metabolomics-derived surrogate outcomes opens new possibilities for reuse of multi-omics data sets. In studies where availability of clinical metadata is limited, missing or incomplete information can be complemented by these surrogates, thereby increasing the size of available data sets. Additionally, the availability of such surrogates could be used to correct for potential biological confounding. In the future, it would be interesting to further investigate the use of molecular predictors across different omics types and cohorts.
 
Article
Background Scatophagus argus , an estuarine inhabitant, can rapidly adapt to different salinity environments. However, the knowledge of the molecular mechanisms underlying its strong salinity tolerance remains unclear. The gill, as the main osmoregulatory organ, plays a vital role in the salinity adaptation of the fish, and thus relative studies are constructive to reveal unique osmoregulatory mechanisms in S . argus . Results In the present study, iTRAQ coupled with nanoLC-MS/MS techniques were employed to explore branchial osmoregulatory mechanisms in S. argus acclimated to different salinities. Among 1,604 identified proteins, 796 differentially expressed proteins (DEPs) were detected. To further assess osmoregulatory strategies in the gills under different salinities, DEPs related to osmoregulatory (22), non-directional (18), hypo- (52), and hypersaline (40) stress responses were selected. Functional annotation analysis of these selected DEPs indicated that the cellular ion regulation (e.g. Na ⁺ -K ⁺ -ATPase [NKA] and Na ⁺ -K ⁺ -2Cl ⁻ cotransporter 1 [NKCC1]) and ATP synthesis were deeply involved in the osmoregulatory process. As an osmoregulatory protein, NKCC1 expression was inhibited under hyposaline stress but showed the opposite trend in hypersaline conditions. The expression levels of NKA α1 and β1 were only increased under hypersaline challenge. However, hyposaline treatments could enhance branchial NKA activity, which was inhibited under hypersaline environments, and correspondingly, reduced ATP content was observed in gill tissues exposed to hyposaline conditions, while its contents were increased in hypersaline groups. In vitro experiments indicated that Na ⁺ , K ⁺ , and Cl ⁻ ions were pumped out of branchial cells under hypoosmotic stress, whereas they were absorbed into cells under hyperosmotic conditions. Based on our results, we speculated that NKCC1-mediated Na ⁺ influx was inhibited, and proper Na ⁺ efflux was maintained by improving NKA activity under hyposaline stress, promoting the rapid adaptation of branchial cells to the hyposaline condition. Meanwhile, branchial cells prevented excessive loss of ions by increasing NKA internalization and reducing ATP synthesis. In contrast, excess ions in cells exposed to the hyperosmotic medium were excreted with sufficient energy supply, and reduced NKA activity and enhanced NKCC1-mediated Na ⁺ influx were considered a compensatory regulation. Conclusions S. argus exhibited divergent osmoregulatory strategies in the gills when encountering hypoosmotic and hyperosmotic stresses, facilitating effective adaptabilities to a wide range of environmental salinity fluctuation.
 
Article
The immune repertoires of mollusks beyond commercially important organisms such as the pacific oyster Crassostrea gigas or vectors for human pathogens like the bloodfluke planorb Biomphalaria glabrata are understudied. Despite being an important model for neural aging and the role of inflammation in neuropathic pain, the immune repertoire of Aplysia californica is poorly understood. Recent discovery of a neurotropic nidovirus in Aplysia has highlighted the need for a better understanding of the Aplysia immunome. To address this gap in the literature, the Aplysia reference genome was mined using InterProScan and OrthoFinder for putative immune genes. The Aplysia genome encodes orthologs of all critical components of the classical Toll-like receptor (TLR) signaling pathway. The presence of many more TLRs and TLR associated adapters than known from vertebrates suggest yet uncharacterized, novel TLR associated signaling pathways. Aplysia also retains many nucleotide receptors and antiviral effectors known to play a key role in viral defense in vertebrates. However, the absence of key antiviral signaling adapters MAVS and STING in the Aplysia genome suggests divergence from vertebrates and bivalves in these pathways. The resulting immune gene set of this in silico study provides a basis for interpretation of future immune studies in this important model organism.
 
Differentially expressed genes (DEGs) in shANXA2 and shCtrl HK2 cells. A The level of ANXA2 mRNA expression by qRT-PCR in shCtrl vs shANXA2 cells. (n = 3, ***P < 0.001, calculated using the student’s t-test). B Volcano plots indicate gene expression in shCtrl vs shANXA2 cells. Genes with significant differential expression are shown in red (up-regulated) or blue (down-regulated), and genes that were not significantly differentially expressed are shown in grey. Filtering criteria (log2 fold change >1 or <-1 and FDR < 0.05) was set as the threshold to determine the DEGs. C Heatmaps for DEGs of shCtrl vs shANXA2 cells
ANXA2 regulated inflammatory gene mRNA and protein expression in HK2 cells. A Top ten GO biological processes terms enriched by upregulated DEGs in shANXA2 cells vs shCtrl cells. B Top ten KEGG functional pathways enriched by upregulated DEGs in shANXA2 cells vs shCtrl cells. C Top ten KEGG functional pathways enriched by downregulated DEGs in shANXA2 cells vs shCtrl cells. D Validation of mRNA expression of CCL5, IFI6, IFI44, IFITM1,and LTB by qRT-PCR assay. E Validation of mRNA expression of IRF7 and ISG15 by qRT-PCR assay. F Representative images showing protein levels of ANXA2, CCL5, IFI6, IFI44, IFITM1, LTB, IRF7 and ISG15 in LV-shANXA2 group vs LV-shCtrl group. Results are represented as mean ± SD.(n = 3,* P < 0.05, **P < 0.01, ***P < 0.001, calculated using the student’s t-test)
Identification and functional analysis of ANXA2-regulated splicing events. A Classification of regulated alternative splicing events (RASE) induced by ANXA2. B Analysis of the overlap between ANXA2-regulated DEGs and regulated alternative splicing genes (RASG). C Top ten GO biological processes terms of ANXA2-regulated alternative splicing genes. D Top ten KEGG functional pathways of ANXA2-regulated alternative splicing genes
Validation of ANXA2-regulated alternative splicing events in key genes of inflammation pathways. A RASEs in LITAF. B RASEs in UBA52. C RASEs in RBCK1. The altered ratio of alternative splicing (AS) events in RNA-seq was calculated using the formula: AS junction reads / (AS junction reads + Model junction reads); while the altered ratio of AS events in qRT-PCR was calculated using the formula: AS transcripts level / Model transcripts level. (n = 3, *P < 0.05,**P < 0.01, ***P < 0.001, calculated using the student’s t-test)
Article
Background Renal inflammation plays a crucial role during the progression of Chronic kidney disease (CKD), but there is limited research on hub genes involved in renal inflammation. Here, we aimed to explore the effects of Annexin A2 (ANXA2), a potential inflammatory regulator, on gene expression in human proximal tubular epithelial (HK2) cells. RNA-sequencing and bioinformatics analysis were performed on ANXA2-knockdown versus control HK2 cells to reveal the differentially expressed genes (DEGs) and regulated alternative splicing events (RASEs). Then the DEGs and RASEs were validated by qRT-PCR. Results A total of 220 upregulated and 171 downregulated genes related to ANXA2 knockdown were identified. Genes enriched in inflammatory response pathways, such as interferon-mediated signaling, cytokine-mediated signaling, and nuclear factor κB signaling, were under global transcriptional and alternative splicing regulation by ANXA2 knockdown. qRT-PCR confirmed ANXA2-regulated transcription of chemokine gene CCL5 , as well as interferon-regulating genes ISG15 , IFI6 , IFI44 , IFITM1 , and IRF7 , in addition to alternative splicing of inflammatory genes UBA52 , RBCK1 , and LITAF . Conclusions The present study indicated that ANXA2 plays a role in inflammatory response in HK2 cells that may be mediated via the regulation of transcription and alternative splicing of inflammation-related genes.
 
Article
Exposure to environmental mutagens increases the risk of cancer and genetic disorders. We used Duplex Sequencing (DS), a high-accuracy error-corrected sequencing technology, to analyze mutation induction across twenty 2.4 kb intergenic and genic targets in the bone marrow of MutaMouse males exposed to benzo(a)pyrene (BaP), a widespread environmental pollutant. DS revealed a linear dose-related induction of mutations across all targets with low intra-group variability. Heterochromatic and intergenic regions exhibited the highest mutation frequencies (MF). C:G > A:T transversions at CCA, CCC and GCC trinucleotides were enriched in BaP-exposed mice consistent with the known etiology of BaP mutagenesis. However, GC-content had no effect on mutation susceptibility. A positive correlation was observed between DS and the “gold-standard” transgenic rodent gene mutation assay. Overall, we demonstrate that DS is a promising approach to study in vivo mutagenesis and yields critical insight into the genomic features governing mutation susceptibility, spectrum, and variability across the genome.
 
Life Cycle of Anagrus nilaparvatae. Wasps were reared under a 14:10 h (L:D) photoperiod at 27 °C,. A Females and males A. nilaparvatae are attracted to the volatiles of rice or planthoppers and fly to parasitize. B Female A. nilaparvatae laying eggs. C Late larval instars of A. nilaparvatae are pink. D In the late pupa stage, morphological characters of A. nilaparvatae gradually emerge, and the sex of which could be distinguished by antennaes. E The adult of A. nilaparvatae after emergence bites through the egg shell of rice planthopper and fly out
Genome landscape of the parasitoid wasp Anagrus nilaparvatae. The letters and numbers outside the circle represent the scaffold label (scaffold length > 3 Mb). From outer to inner circles: heat map of repeat sequence density, heat map of gene density, density of single nucleotide variants, and GC content. The sliding window size is 200 kb. The innermost line shows the collinear genes within the genome, a line connecting a pair of genes
Phylogenomic analyses of the parasitoid wasp Anagrus nilaparvatae and 12 related species. The maximum-likelihood phylogenetic tree was constructed for A. nilaparvatae and 12 other hymenopterans based on genomewide single-copy orthologs. Apis mellifera was used as outgroup. The black numbers on the nodes indicate divergence times (Mya), with error bars indicating 95% credit intervals. The expansion (green) and contraction (red) of gene families are shown on the branches
Maximum-likelihood tree of OBPs of the parasitoid wasp Anagrus nilaparvatae and other Hymenopteras. Colors of tip nodes indicate species. Genes of A. nilaparvatae are highlighted in red shadow
Maximum-likelihood tree of PPKs of Anagrus nilaparvatae and other Hymenopteras. The gene of A. nilaparvatae is highlighted in red shadow. All gene names are the abbreviation of the species name plus the gene serial number, the gene serial number could be found in NCBI (https://www.ncbi.nlm.nih.gov/) or InsectBase 2.0 (http://v2.insect-genome.com/). Anil, A. nilaparvatae; Amel, Apis mellifera; Btre, Belonocnema treatae; Cflo, Copidosoma floridanum; Csol, Ceratosolen solmsi; Dall, Diachasma alloeum; Fari, Fopius arisanus; Mdem, Microplitis demolitor; Nvit, Nasonia vitripennis; Ppup, Pteromalus puparum; Tbra, Trichogramma brassicae; Tpre, Trichogramma pretiosum; Tsar, Trichomalopsis sarcophagae
Article
Background Mymaridae is an ancient insect group and is a basal lineage of the superfamily Chalcidoidea. Species of Mymaridae have great potential for biological control. Anagrus nilaparvatae , a representative species of Mymaridae, is ideal for controlling rice planthopper due to its high rate of parasitism and ability to find hosts efficiently in paddy ridges and fields. Results Using both PacBio single-molecule real-time and Illumina sequencing, we sequenced and assembled the whole genome of A. nilaparvatae , a first for the family Mymaridae. The assembly consists of 394 scaffolds, totaling 488.8 Mb. The assembly is of high continuity and completeness, indicated by the N50 value of 25.4 Mb and 98.2% mapping rate of Benchmarking Universal Single-Copy Orthologs. In total, 16,894 protein-coding genes in the genome were annotated. A phylogenomic tree constructed for A. nilaparvatae and other 12 species of Hymenoptera confirmed that the family Mymaridae is sister to all remaining chalcidoids. The divergence time between A. nilaparvatae and the other seven Chalcidoidea species was dated at ~ 126.9 Mya. Chemoreceptor and mechanoreceptor genes are important in explaining parasitic behavior. We identified 17 odorant binding proteins, 11 chemosensory proteins, four Niemann-Pick type C2 proteins, 88 olfactory receptors, 12 gustatory receptors, 22 ionotropic receptors and 13 sensory neuron membrane proteins in the genome of A. nilaparvatae , which are associated with the chemosensory functions. Strikingly, there is only one pickpocket receptors and nine transient receptor potential genes in the genome that have a mechanosensory function. Conclusions We obtained a high-quality genome assembly for A. nilaparvatae using PacBio single-molecule real-time sequencing, which provides phylogenomic insights for its evolutionary history. The small numbers of chemo- and mechanosensory genes in A. nilaparvatae indicate the species-specific host detection and oviposition behavior of A. nilaparvatae might be regulated by relatively simple molecular pathways.
 
Article
Background Viola philippica Cav. is the only source plant of “Zi Hua Di Ding”, which is a Traditional Chinese Medicine (TCM) that is utilized as an antifebrile and detoxicant agent for the treatment of acute pyogenic infections. Historically, many Viola species with violet flowers have been misused in “Zi Hua Di Ding”. Viola have been recognized as a taxonomically difficult genera due to their highly similar morphological characteristics. Here, all common V. philippica adulterants were sampled. A total of 24 complete chloroplast (cp) genomes were analyzed, among these 5 cp genome sequences were downloaded from GenBank and 19 cp genomes, including 2 “Zi Hua Di Ding” purchased from a local TCM pharmacy, were newly sequenced. Results The Viola cp genomes ranged from 156,483 bp to 158,940 bp in length. A total of 110 unique genes were annotated, including 76 protein-coding genes, 30 tRNAs, and four rRNAs. Sequence divergence analysis screening identified 16 highly diverged sequences; these could be used as markers for the identification of Viola species. The morphological, maximum likelihood and Bayesian inference trees of whole cp genome sequences and highly diverged sequences were divided into five monophyletic clades. The species in each of the five clades were identical in their positions within the morphological and cp genome tree. The shared morphological characters belonging to each clade was summarized. Interestingly, unique variable sites were found in ndhF, rpl22, and ycf1 of V. philippica, and these sites can be selected to distinguish V. philippica from samples all other Viola species, including its most closely related species. In addition, important morphological characteristics were proposed to assist the identification of V. philippica. We applied these methods to examine 2 “Zi Hua Di Ding” randomly purchased from the local TCM pharmacy, and this analysis revealed that the morphological and molecular characteristics were valid for the identification of V. philippica. Conclusions This study provides invaluable data for the improvement of species identification and germplasm of V. philippica that may facilitate the application of a super-barcode in TCM identification and enable future studies on phylogenetic evolution and safe medical applications.
 
Distribution of lncRNA transcripts in the RPE. Volcano plots of lncRNAs in BXS (A) and BXS-H2O2 (B). Log2 cytoplasm:nuclear fold change and corresponding log10 adjusted p-value are plotted for each transcript. Transcripts with fold change > 2 are colored blue, adjusted p < 0.01 are green, both fold change > 2 and adjusted p < 0.01 are yellow. Genes confirmed via FISH are red (A). Pie graphs of the distribution of lncRNAs within BXS (C) and BXS-H2O2 (D). The cytoplasmic categorization is indicated by orange with horizontal stripes, nuclear by blue with vertical stripes, and mixed by purple with crosshatched stripes. The total number of transcripts in each category and the percentage of the whole are indicated. E RNA-FISH images of iPSC-RPE confirming localization of NEAT1, MTND1P23, and SNHG16 (red) and counterstained with Hoechst solution (blue). Arrows indicate some of the localized RNAs. Scale bar is 5 µm
lncRNAs containing retention signal motifs are more nuclear localized. Pie graphs of the distribution of lncRNAs within BXS and BXS-H2O2. A distribution of transcripts with the 5’SS motif, without the 5’SS motif, and with a random version of the 5’SS motif. B distribution of transcripts with the BORG motif, without the BORG motif, and with a random version of the BORG motif. C distribution of transcripts with the SIRLOIN motif, without the SIRLOIN motif, and with a random version of the SIRLOIN motif. The cytoplasmic categorization is indicated by orange with horizontal stripes, nuclear by blue with vertical stripes, and mixed by purple with crosshatched stripes
The number of retention signal motifs is positively correlated with nuclear localization. Graphs plotting the number of 5’SS, BORG, and SIRLOIN motifs per transcript versus log2 cytoplasm:nuclear fold change from the BXS and BXS-H2O2 samples. Fold change corresponding to nuclear (nuc) and cytoplasmic (cyt) localization is indicated. The orange dotted lines plot the trendlines for the data
A-to-I RNA editing patterns change in response to oxidative stress. A Representative sanger sequencing trace results depicting an instance of A-to-I editing within the RGR mRNA, where an inosine is read by the sequencer as a guanine. B Graph depicting the number of A-to-I edited lncRNAs within the cytoplasmic (cyt) and nuclear (nuc) fractions of BXS (solid yellow) and BXS-H2O2 (magenta with diagonal stripes). C Graph depicting the number of transcripts that were both localized to a given fraction (cytoplasmic [cyt] or nuclear [nuc]) and more highly edited in that fraction for the control BXS or BXS-H2O2 samples
Article
Background Long noncoding RNAs (lncRNAs) are emerging as a class of genes whose importance has yet to be fully realized. It is becoming clear that the primary function of lncRNAs is to regulate gene expression, and they do so through a variety of mechanisms that are critically tied to their subcellular localization. Although most lncRNAs are poorly understood, mapping lncRNA subcellular localization can provide a foundation for understanding these mechanisms. Results Here, we present an initial step toward uncovering the localization landscape of lncRNAs in the human retinal pigment epithelium (RPE) using high throughput RNA-Sequencing (RNA-Seq). To do this, we differentiated human induced pluripotent stem cells (iPSCs) into RPE, isolated RNA from nuclear and cytoplasmic fractions, and performed RNA-Seq on both. Furthermore, we investigated lncRNA localization changes that occur in response to oxidative stress. We discovered that, under normal conditions, most lncRNAs are seen in both the nucleus and the cytoplasm to a similar degree, but of the transcripts that are highly enriched in one compartment, far more are nuclear than cytoplasmic. Interestingly, under oxidative stress conditions, we observed an increase in lncRNA localization in both nuclear and cytoplasmic fractions. In addition, we found that nuclear localization was partially attributable to the presence of previously described nuclear retention motifs, while adenosine to inosine (A-to-I) RNA editing appeared to play a very minimal role. Conclusions Our findings map lncRNA localization in the RPE and provide two avenues for future research: 1) how lncRNAs function in the RPE, and 2) how one environmental factor, in isolation, may potentially play a role in retinal disease pathogenesis through altered lncRNA localization.
 
Article
Background Acyl carrier proteins (ACP) constitute a very conserved carrier protein family. Previous studies have found that ACP not only takes part in the fatty acid synthesis process of almost all organisms, but also participates in the regulation of plant growth, development, and metabolism, and makes plants adaptable to stresses. However, this gene family has not been systematically studied in sorghum. Results Nine ACP family members were identified in the sorghum genome, which were located on chromosomes 1, 2, 5, 7, 8 and 9, respectively. Evolutionary analysis among different species divided the ACP family into four subfamilies, showing that the SbACPs were more closely related to maize. The prediction results of subcellular localization showed that SbACPs were mainly distributed in chloroplasts and mitochondria, while fluorescence localization showed that SbACPs were mainly localized in chloroplasts in tobacco leaf. The analysis of gene structure revealed a relatively simple genetic structure, that there were 1–3 introns in the sorghum ACP family, and the gene structure within the same subfamily had high similarity. The amplification method of SbACPs was mainly large fragment replication, and SbACPs were more closely related to ACPs in maize and rice. In addition, three-dimensional structure analysis showed that all ACP genes in sorghum contained four α helices, and the second helix structure was more conserved, implying a key role in function. Cis -acting element analysis indicated that the SbACPs might be involved in light response, plant growth and development regulation, biotic and abiotic stress response, plant hormone regulation, and other physiological processes. What’s more, qRT-PCR analysis uncovered that some of SbACPs might be involved in the adaptive regulation of drought and salt stresses, indicating the close relationship between fatty acids and the resistance to abiotic stresses in sorghum. Conclusions In summary, these results showed a comprehensive overview of the SbACP s and provided a theoretical basis for further studies on the biological functions of SbACPs in sorghum growth, development and abiotic stress responses.
 
Distribution of serotypes and sequence types (STs) per year (A, B) and per county (C, D). Numbers above each bar indicate the total number of genomes. For visual clarity, only the most frequently detected serotypes and STs are shown. Serotype identity shown here is based on the agglutination serotyping method. Full list of serotypes and STs can be found in Supplementary Table 1
Midpoint-rooted maximum likelihood phylogenetic tree based on 3265 core genes. The scale bar represents the number of nucleotide substitutions per site. Serotype identity shown here is based on the agglutination serotyping method. The black stars on the tip of branches indicate the 33 isolates that had conflicting results from the agglutination test and SeqSero2. BAPS clusters (outermost ring) indicate the sequence clusters determined by RhierBAPS
Distribution of antimicrobial resistance genes. Gene presence-absence matrix showing the distribution of antimicrobial resistance genes across the phylogeny (tree is identical to that in Fig. 2). Black blocks indicate presence of gene listed to the right of the panel. The colored columns represent the STs. Names of the antimicrobial classes are indicated on the right of the resistance genes
Bayesian phylogeny and population dynamics of sequence cluster 1 (Enteritidis ST 11; n = 126 genomes). A Bayesian maximum clade credibility time-calibrated phylogeny based on non-recombining regions of the core genome. Blue bars indicate 95% confidence intervals. B Bayesian skygrowth plot that depicts changes in effective population size over time. Median is represented by a black line and 95% confidence intervals are in blue
Midpoint-rooted maximum likelihood phylogenetic tree of 1354 S. enterica genomes from the United States based on the alignment of 225,784 core SNPs. The scale bar represents the number of nucleotide substitutions per site. The black dots indicate the New Hampshire genomes. Full list of strain names, accession numbers and associated metadata can be found in Supplementary Table 6. The colors in the outer ring of the tree represent the 18 states from where the genomes came from, which also correspond to the colors on the map
Article
Background The implementation of whole genome sequencing (WGS) by PulseNet, the molecular subtyping network for foodborne diseases, has transformed surveillance, outbreak detection, and public health laboratory practices in the United States. In 2017, the New Hampshire Public Health Laboratories, a member of PulseNet, commenced the use of WGS in tracking foodborne pathogens across the state. We present some of the initial results of New Hampshire’s initiative to transition to WGS in tracking Salmonella enterica , a bacterial pathogen that is responsible for non-typhoidal foodborne infections and enteric fever. We characterize the population structure and evolutionary history of 394 genomes of isolates recovered from human clinical cases in New Hampshire from 2017 to 2020. Results The New Hampshire S. enterica population is phylogenetically diverse, consisting of 78 sequence types (ST) and 67 serotypes. Six lineages dominate the population: ST 11 serotype Enteritidis, ST 19 Typhimurium, ST 32 Infantis, ST 118 Newport, ST 22 Braenderup, and ST 26 Thompson. Each lineage is derived from long ancestral branches in the phylogeny, suggesting their extended presence in the region and recent clonal expansion. We detected 61 genes associated with resistance to 14 antimicrobial classes. Of these, unique genes of five antimicrobial classes (aminocoumarins, aminoglycosides, fluoroquinolones, nitroimidazoles, and peptides) were detected in all genomes. Rather than a single clone carrying multiple resistance genes expanding in the state, we found multiple lineages carrying different combinations of independently acquired resistance determinants. We estimate the time to the most recent common ancestor of the predominant lineage ST 11 serotype Enteritidis (126 genomes) to be 1965 (95% highest posterior density intervals: 1927–1982). Its population size expanded until 1978, followed by a population decline until 1990. This lineage has been expanding since then. Comparison with genomes from other states reveal lack of geographical clustering indicative of long-distance dissemination. Conclusions WGS studies of standing pathogen diversity provide critical insights into the population and evolutionary dynamics of lineages and antimicrobial resistance, which can be translated to effective public health action and decision-making. We highlight the need to strengthen efforts to implement WGS-based surveillance and genomic data analyses in state public health laboratories.
 
Article
Background Phosphatidylethanolamine-binding protein (PEBP) is widely present in animals, plants, and microorganisms. Plant PEBP genes are mainly involved in flowering transition and nutritional growth. These genes have been studied in several plants; however, to the best of our knowledge, no studies have explored them in Brassica juncea var. tumida . This study identified and characterized the entire PEBP gene family of Brassica juncea var. tumida. Results A total of 21 PEBP genes were identified from Brassica juncea var. tumida . Through phylogenetic analysis, the 21 corresponding proteins were classified into the following four clusters: TERMINAL FLOWER 1 (TFL1)-like proteins ( n = 8), MOTHER OF FT AND TFL1 (MFT)-like proteins ( n = 5), FLOWERING LOCUS T (FT) - like proteins ( n = 6), and ybhB-like proteins ( n = 2). A total of 18 genes contained four exons and had similar gene structures in each subfamily except BjMFT1 , BjPYBHB1 , and Arabidopsis thaliana CENTRORADIALIS homolog of Brassica juncea var. tumida (BjATC1) . In the analysis of conserved motif composition, the BjPEBP genes exhibited similar characteristics, except for BjFT3 , BjMFT1 , BjPYBHB1 , BjPYBHB2 , and BjATC1 . The BjPEBP promoter includes multiple cis-acting elements such as the G-box and I-box elements that respond to light, ABRE and GARE-motif elements that respond to hormones, and MBSI and CAT-box elements that are associated with plant growth and development. Analysis of RNA-Seq data revealed that the expression of a few BjPEBP genes may be associated with the development of a tumorous stem. The results of qRT–PCR showed that BjTFL1 and BjPYBHB1 were highly expressed in the flower tissue, BjFT1 and BjATC1 were mainly expressed in the root, and BjMFT4 were highly detected in the stem. The results of yeast two-hybrid screening suggested that BjFT interacts with Bj14-3-3. These results indicate that BjFT is involved in flowering regulation. Conclusions To the best of our knowledge, this study is the first to perform a genome-wide analysis of PEBP genes family in Brassica juncea var. tumida . The findings of this study may help improve the yield and molecular breeding of Brassica juncea var. tumida .
 
Article
Background Genomic prediction (GP) and genome-wide association (GWA) analyses are currently being employed to accelerate breeding cycles and to identify alleles or genomic regions of complex traits in forest trees species. Here, 1490 interior lodgepole pine ( Pinus contorta Dougl. ex. Loud. var. latifolia Engelm) trees from four open-pollinated progeny trials were genotyped with 25,099 SNPs, and phenotyped for 15 growth, wood quality, pest resistance, drought tolerance, and defense chemical (monoterpenes) traits. The main objectives of this study were to: (1) identify genetic markers associated with these traits and determine their genetic architecture, and to compare the marker detected by single- (ST) and multiple-trait (MT) GWA models; (2) evaluate and compare the accuracy and control of bias of the genomic predictions for these traits underlying different ST and MT parametric and non-parametric GP methods. GWA, ST and MT analyses were compared using a linear transformation of genomic breeding values from the respective genomic best linear unbiased prediction (GBLUP) model. GP, ST and MT parametric and non-parametric (Reproducing Kernel Hilbert Spaces, RKHS) models were compared in terms of prediction accuracy (PA) and control of bias. Results MT-GWA analyses identified more significant associations than ST. Some SNPs showed potential pleiotropic effects. Averaging across traits, PA from the studied ST-GP models did not differ significantly from each other, with generally a slight superiority of the RKHS method. MT-GP models showed significantly higher PA (and lower bias) than the ST models, being generally the PA (bias) of the RKHS approach significantly higher (lower) than the GBLUP. Conclusions The power of GWA and the accuracy of GP were improved when MT models were used in this lodgepole pine population. Given the number of GP and GWA models fitted and the traits assessed across four progeny trials, this work has produced the most comprehensive empirical genomic study across any lodgepole pine population to date.
 
Article
Background Ribosomally-synthesized cyclic peptides are widely found in plants and exhibit useful bioactivities for humans. The identification of cyclic peptide sequences and their precursor proteins is facilitated by the growing number of sequenced genomes. While previous research largely focused on the chemical diversity of these peptides across various species, there is little attention to a broader range of potential peptides that are not chemically identified. Results A pioneering study was initiated to explore the genetic diversity of linusorbs, a group of cyclic peptides uniquely occurring in cultivated flax ( Linum usitatissimum ). Phylogenetic analysis clustered the 5 known linusorb precursor proteins into two clades and one singleton. Preliminary tBLASTn search of the published flax genome using the whole protein sequence as query could only retrieve its homologues within the same clade. This limitation was overcome using a profile-based mining strategy. After genome reannotation, a hidden Markov Model (HMM)-based approach identified 58 repeats homologous to the linusorb-embedded repeats in 8 novel proteins, implying that they share common ancestry with the linusorb-embedded repeats. Subsequently, we developed a customized profile composed of a random linusorb-like domain (LLD) flanked by 5 conserved sites and used it for string search of the proteome, which extracted 281 LLD-containing repeats (LLDRs) in 25 proteins. Comparative analysis of different repeat categories suggested that the 5 conserved flanking sites among the non-homologous repeats have undergone convergent evolution driven by functional selection. Conclusions The profile-based mining approach is suitable for analyzing repetitive sequences. The 25 LLDR proteins identified herein represent the potential diversity of cyclic peptides within the flax genome and lay a foundation for further studies on the functions and evolution of these protein tandem repeats.
 
Article
Background Sugarcane is the most important sugar crop, contributing > 80% of global sugar production. High sucrose content is a key target of sugarcane breeding, yet sucrose improvement in sugarcane remains extremely slow for decades. Molecular breeding has the potential to break through the genetic bottleneck of sucrose improvement. Dissecting the molecular mechanism(s) and identifying the key genetic elements controlling sucrose accumulation will accelerate sucrose improvement by molecular breeding. In our previous work, a proteomics dataset based on 12 independent samples from high- and low-sugar genotypes treated with ethephon or water was established. However, in that study, employing conventional analysis, only 25 proteins involved in sugar metabolism were identified . Results In this work, the proteomics dataset used in our previous study was reanalyzed by three different statistical approaches, which include a logistic marginal regression, a penalized multiple logistic regression named Elastic net, as well as a Bayesian multiple logistic regression method named Stochastic search variable selection (SSVS) to identify more sugar metabolism-associated proteins. A total of 507 differentially abundant proteins (DAPs) were identified from this dataset, with 5 of them were validated by western blot. Among the DAPs, 49 proteins were found to participate in sugar metabolism-related processes including photosynthesis, carbon fixation as well as carbon, amino sugar, nucleotide sugar, starch and sucrose metabolism. Based on our studies, a putative network of key proteins regulating sucrose accumulation in sugarcane is proposed, with glucose-6-phosphate isomerase, 2-phospho-D-glycerate hydrolyase, malate dehydrogenase and phospho-glycerate kinase, as hub proteins. Conclusions The sugar metabolism-related proteins identified in this work are potential candidates for sucrose improvement by molecular breeding. Further, this work provides an alternative solution for omics data processing.
 
The flowchart of combinations using three sequencers and two variant calling pipelines for germline variants. Key process for NGS data generation and analysis were shown on the left. Squares in the flowchart represent data files, and rhombus indicate processes. NovaSeq means NovaSeq 6000, NextSeq means NextSeq 550
Comparison of variants calling performances in GenoLab M and NovaSeq 6000 from 33X and 22X coverage of the NA12878 sample. A SNP and B InDel on whole genome, C SNP and D InDel F-score on stratification region. Precision, positive predictive value, is the fraction of relevant instances among the retrieved instances, Recall, sensitivity is the fraction of relevant instances that were retrieved. F-score is the harmonic mean of the precision and recall, chr 20 means chromosome 20, NIADR means Not in all Difficult Regions, SDR means Segmental Duplications Regions
Venn diagram of variants calling performances in WGS datasets. A SNP and B InDel
Comparison of variants calling performances in six WES datasets..A SNP and B InDel. Precision, positive predictive value, is the fraction of relevant instances among the retrieved instances, Recall, sensitivity is the fraction of relevant instances that were retrieved. F-score is the harmonic mean of the precision and recall
Upset diagram of variant Calling results of all combinations in WES datasets. A SNP and B InDel
Article
Background GenoLab M is a recently developed next-generation sequencing (NGS) platform from GeneMind Biosciences. To establish the performance of GenoLab M, we present the first report to benchmark and compare the WGS and WES sequencing data of the GenoLab M sequencer to NovaSeq 6000 and NextSeq 550 platform in various types of analysis. For WGS, thirty-fold sequencing from Illumina NovaSeq platform and processed by GATK pipeline is currently considered as the golden standard. Thus this dataset is generated as a benchmark reference in this study. Results GenoLab M showed an average of 94.62% of Q20 percentage for base quality, while the NovaSeq was slightly higher at 96.97%. However, GenoLab M outperformed NovaSeq or NextSeq at a duplication rate, suggesting more usable data after deduplication. For WGS short variant calling, GenoLab M showed significant accuracy improvement over the same depth dataset from NovaSeq, and reached similar accuracy to NovaSeq 33X dataset with 22x depth. For 100X WES, the F-score and Precision in GenoLab M were higher than NovaSeq or NextSeq, especially for InDel calling. Conclusions GenoLab M is a promising NGS platform for high-performance WGS and WES applications. For WGS, 22X depth in the GenoLab M sequencing platform offers a cost-effective alternative to the current mainstream 33X depth on Illumina.
 
Filtering of Transcripts in Pre-weaning and Post-Weaning Rumen Tissue Samples. Once a consensus sequence was generated for each sample, a series of filtering steps were used to isolate candidate lncRNA. Steps included removing known protein coding transcripts, removing transcripts possessing coding potential, and those that demonstrated nucleotide and protein sequence homology. In the pre-weaning rumen tissue sample, 404 transcripts remained, and 234 transcripts remained in the post-weaning rumen tissue sample
Length distribution of candidate lncRNA transcripts. A Length of all candidate lncRNA transcript. The average length of transcripts measured 674 base pairs, indicated by red line. B Zoomed in distribution of length of all candidate lncRNA transcript. Excluding those longer than 2000 base pairs for added clarity. C Length of pre-weaning transcripts, ranging from 200 to 17809 and averaging 466 nucleotides. D Length of post-weaning transcripts, ranging from 200 to 56626 and averaging 1033 nucleotides
Expression of lncRNA candidate transcripts. A FPKM values of transcripts expressed in pre-weaning tissue. Expression levels ranged from 0.17 to 46.81 FPKM, averaging 5.24 FPKM. The average length of transcripts was indicated by red line. B FPKM values of transcripts expressed in post-weaning tissue. Expression levels ranged from 0.72 to 106 FPKM, averaging 7.89 FPKM. The average length of transcripts was indicated by red line
Phastcons scores of pre- and post-weaning conditions at whole genome, intergenic region, and lncRNA levels. A Boxplot of PhastCons scores for all transcripts, all intergenic transcripts, and all lncRNA candidate transcripts. B Violin plot of all six profiles: all preweaning transcripts, preweaning intergenic regions, preweaning lncRNA transcripts, all postweaning transcripts, postweaning intergenic regions, and postweaning lncRNA transcripts
Scatter plot of lncRNA PhastCons scores. Most lncRNAs show scores well below 0.50 with a small number being well conserved across many species. Pre-weaning scores ranged from 0.000873 to 0.879405, and post-weaning scores ranged from 0.000183 to 0.658853, with an outlier of 0.98854
Article
Background This study aimed to identify long non-coding RNA (lncRNA) from the rumen tissue in dairy cattle, explore their features including expression and conservation levels, and reveal potential links between lncRNA and complex traits that may indicate important functional impacts of rumen lncRNA during the transition to the weaning period. Results A total of six cattle rumen samples were taken with three replicates from before and after weaning periods, respectively. Total RNAs were extracted and sequenced with lncRNA discovered based on size, coding potential, sequence homology, and known protein domains. As a result, 404 and 234 rumen lncRNAs were identified before and after weaning, respectively. However, only nine of them were shared under two conditions, with 395 lncRNAs found only in pre-weaning tissues and 225 only in post-weaning samples. Interestingly, none of the nine common lncRNAs were differentially expressed between the two weaning conditions. LncRNA averaged shorter length, lower expression, and lower conservation scores than the genome overall, which is consistent with general lncRNA characteristics. By integrating rumen lncRNA before and after weaning with large-scale GWAS results in cattle, we reported significant enrichment of both pre- and after-weaning lncRNA with traits of economic importance including production, reproduction, health, and body conformation phenotypes. Conclusions The majority of rumen lncRNAs are uniquely expressed in one of the two weaning conditions, indicating a functional role of lncRNA in rumen development and transition of weaning. Notably, both pre- and post-weaning lncRNA showed significant enrichment with a variety of complex traits in dairy cattle, suggesting the importance of rumen lncRNA for cattle performance in the adult stage. These relationships should be further investigated to better understand the specific roles lncRNAs are playing in rumen development and cow performance.
 
Article
Background MYB transcription factor (TF) is one of the largest families of TFs in plants and play essential roles in plant growth and development, and is involved in responses to biological and abiotic stress. However, there are few reports on GsMYB7 gene in soybean under aluminum acid stress, and its regulatory mechanism remains unclear. Results The GsMYB7 protein is localized in the nucleus and has transcriptional activation ability. Quantitative real-time PCR (qRT-PCR) results showed that GsMYB7 held a constitutive expression pattern rich in roots. When AlCl 3 concentration was 25 µM, the total root surface area (SA) of GsMYB7 transgenic lines were 34.97% higher than that of wild-type Huachun 6 (HC6). While the accumulation of Al ³⁺ in root tip of transgenic plants after aluminum treatment was 17.39% lower than that of wild-type. RNA-sequencing analysis indicated that over 1181 genes were regulated by GsMYB7 and aluminum stress. Among all the regulated genes, the expression levels of glutathione peroxidase, protein kinase, cytochrome and other genes in the transgenic lines were significantly higher than those in wild type by acidic aluminum stress. The bioinformatics and qRT-PCR results showed that 9 candidate genes were induced under the treatments of acidic aluminum stress which were indirectly and/or directly regulated by GsMYB7 . After AlCl 3 treatments, the transcripts of these genes in GsMYB7 transgenic seedlings were significantly higher than those of wide-type HC6. Conclusions The results suggested that GsMYB7 may enhance soybean tolerance to acidic aluminum stress by regulating the downstream genes.
 
Article
Background Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being poly(A) selection. In many applications it is well-known that poly(A) selection biases the view of the transcriptome by selecting for longer tailed mRNA species. Results Here, we show that poly(A) selection biases Oxford Nanopore direct RNA sequencing. As expected, poly(A) selection skews sequenced mRNAs toward longer poly(A) tail lengths. Interestingly, we identify a population of mRNAs (> 10% of genes’ mRNAs) that are inconsistently captured by poly(A) selection due to highly variable poly(A) tails, and demonstrate this phenomenon in our hands and in published data. Importantly, we show poly(A) selection is dispensable for Oxford Nanopore’s direct RNA-seq technique, and demonstrate successful library construction without poly(A) selection, with decreased input, and without loss of quality. Conclusions Our work expands the utility of direct RNA-seq by validating the use of total RNA as input, and demonstrates important technical artifacts from poly(A) selection that inconsistently skew mRNA expression and poly(A) tail length measurements.
 
Article
Background Cashmere goats are a heterogeneous hairy mammal. The fineness of cashmere can affect its economic value. Therefore, in this study, we used transcriptome sequencing techniques to analyze the gene expression profiles of the skin tissues of cashmere goats with different cashmere fineness. The selected candidate genes were functionally verified with the secondary hair follicle hair papillary cells of cashmere goats. Results We identified 479 DEGs, of which 238 mRNAs were up-regulated in the fine velvet group and 241 mRNA were down-regulated. Based on functional annotation and protein interaction network analysis, we found some genes that may affect the fineness of cashmere, including SOX18, SOX4, WNT5A, IGFBP4, KAP8, KRT36, and FA2H. Using qRT-PCR, Western blot, CCK-8 cell viability detection, EDU cell proliferation detection, and flow cytometry, we found that overexpression of the FA2H gene could promote the proliferation of secondary hair follicle DPCs in cashmere goats. At the same time, we proved that FA2H could regulate the expression levels of the FGF5 and BMP2 genes in DPCs. Conclusion The results of this study provide a useful reference for the genetics and breeding of Jiangnan cashmere goats and goat genome annotation, and provide an experimental basis for improving cashmere quality of the cashmere goat.
 
Article
Background Adaptive thermogenesis by brown adipose tissue (BAT) is important to the maintenance of temperature in newborn mammals. Cold exposure activates gene expression and lipid metabolism to provide energy for BAT thermogenesis. However, knowledge of BAT metabolism in large animals after cold exposure is still limited. Results In this study, we found that cold exposure induced expression of BAT thermogenesis genes and increased the protein levels of UCP1 and PGC1α. Pathway analysis showed that cold exposure activated BAT metabolism, which involved in cGMP-PKG, TCA cycle, fatty acid elongation, and degradation pathways. These were accompanied by decreased triglyceride (TG) content and increased phosphatidylcholine (PC) and phosphatidylethanolamine (PE) content in BAT. Conclusion These results demonstrate that cold exposure induces metabolites involved in glycerolipids and glycerophospholipids metabolism in BAT. The present study provides evidence for lipid composition associated with adaptive thermogenesis in goat BAT and metabolism pathways regulated by cold exposure.
 
Article
Background In aquatic environments, pH, salinity, and ammonia concentration are extremely important for aquatic animals. NHE is a two-way ion exchange carrier protein, which can transport Na ⁺ into cells and exchange out H ⁺ , and also plays key roles in regulating intracellular pH, osmotic pressure, and ammonia concentration. Results In the present study, ten NHEs , the entire NHE gene family, were identified from Coilia nasus genome and systemically analyzed via phylogenetic, structural, and synteny analysis. Different expression patterns of C. nasus NHEs in multiple tissues indicated that expression profiles of NHE genes displayed tissue-specific. Expression patterns of C. nasus NHEs were related to ammonia excretion during multiple embryonic development stages. To explore the potential functions on salinity challenge and ammonia stress, expression levels of ten NHEs were detected in C. nasus gills under hypotonic stress, hypertonic stress, and ammonia stress. Expression levels of all NHEs were upregulated during hypotonic stress, while they were downregulated during hypertonic stress. NHE2 and NHE3 displayed higher expression levels in C. nasus larvae and juvenile gills under ammonia stress. Conclusions Our study revealed that NHE genes played distinct roles in embryonic development, salinity stress, and ammonia exposure. Syntenic analysis showed significant difference between stenohaline fish and euryhaline fishes. Our findings will provide insight into effects of C. nasus NHE gene family on ion transport and ammonia tolerance and be beneficial for healthy aquaculture of C. nasus .
 
Translocation of LAP-Smads from the cytoplasm to the nucleus upon stimulation with TGF β-1, shown via high content imaging in MDA-MB-231 cells. Left, LAP-Smad2, right LAP-Smad3. Cell nuclei were stained with red Hoechst stain (DAPI channel). Green GFP stain (FITC channel) showed predominantly cytoplasmic localization of LAP-Smads in the absence of TGF β-1 stimulation, and translocation to the nucleus after TGF β-1 stimulation
Characterization of Smad2 and Smad3 binding sites using DNAShapeR. A. Minor grove width (MGW) of Smad2-bound sites (left) and Smad3-bound sites (right). While both Smad2 and Smad3 had similar MGW at the centers of the peaks, there was a marked difference in the MGW 100 base pairs upstream and downstream of the peak center, with Smad3-bound peaks narrower than Smad2-bound. B. Electrostatic potential (EP) of Smad2- and Smad3-bound sites. Smad2-bound sites (left) were observed to have higher electrostatic potential when compared to Smad3-bound sites across the full 200 base pairs of each binding site
Neural networks can classify Smad-bound sites as being Smad2- or Smad3-bound.A. Precision recall curve of CNN (blue) and CNN-LSTM (black) models, taking the average of 10 models for final classification. An average precision of 0.95 was observed for the CNN model, as compared to the slightly higher average precision of 0.96 of the CNN-LSTM model. The model was better able to classify Smad3 (0.87) as compared to Smad2 (0.7) B. Confusion matrix of CNN model in classifying Smad2 and Smad3 sites. The model was able to better classify Smad3 (0.7 vs 0.87). C Confusion matrix of CNN-LSTM. Similar to the CNN model, the CNN-LSTM model was also better at classifying Smad3 (0.84) as compared to Smad2 (0.78), but performed better than the CNN model (as shown in A). D. The effect of ensemble learning on model performance evaluated using AUCPR. We evaluated the performance of increasing the number of models used from one to ten, with increase in AUCPR observed as the number of models increased. The standard deviation, indicative of stability, also decreased as more models were included in the final ensemble. E. Confusion matrix of Smad2/3 binding in hESC, showing model performance in a novel cell type was not included in the training dataset
Architectures of neural networks used in this study. The CNN is made of two convolution stacks (convolution layer + maxpooling). A filter size of five is used in the first convolution stack to serve as a motif detector. Thereafter, we used a larger filter size (32) in the next convolutional layer to capture larger patterns in the sequence. Following the convolution stacks, the features are flattened and batch normalized before passing through two dense layers using the ReLu activation function which are connected by a drop out layer. Finally, the output from the dense layer is passed to an output layer with a sigmoid activation to produce a final prediction value. Similar to the CNN model, we first used a convolution layer with a filter size of five to serve as a local motif detector for our CNN-LSTM model. After maxpooling, the output matrix is passed to an LSTM with 32 cells. Thereafter, the output from the LSTM is batch normalized and passed through two fully connected layers with the same configuration as our CNN model
Article
Background The transforming growth factor beta-1 (TGF β -1) cytokine exerts both pro-tumor and anti-tumor effects in carcinogenesis. An increasing body of literature suggests that TGF β -1 signaling outcome is partially dependent on the regulatory targets of downstream receptor-regulated Smad (R-Smad) proteins Smad2 and Smad3. However, the lack of Smad-specific antibodies for ChIP-seq hinders convenient identification of Smad-specific binding sites. Results In this study, we use localization and affinity purification (LAP) tags to identify Smad-specific binding sites in a cancer cell line. Using ChIP-seq data obtained from LAP-tagged Smad proteins, we develop a convolutional neural network with long-short term memory (CNN-LSTM) as a deep learning approach to classify a pool of Smad-bound sites as being Smad2- or Smad3-bound. Our data showed that this approach is able to accurately classify Smad2- versus Smad3-bound sites. We use our model to dissect the role of each R-Smad in the progression of breast cancer using a previously published dataset. Conclusions Our results suggests that deep learning approaches can be used to dissect binding site specificity of closely related transcription factors.
 
Article
Background Drought stress is a serious threat to land use efficiency and crop yields worldwide. Understanding the mechanisms that plants use to withstand drought stress will help breeders to develop drought-tolerant medicinal crops. Liquorice ( Glycyrrhiza uralensis Fisch.) is an important medicinal crop in the legume family and is currently grown mostly in northwest China, it is highly tolerant to drought. Given this, it is considered an ideal crop to study plant stress tolerance and can be used to identify drought-resistant proteins. Therefore, to understand the effects of drought stress on protein levels of liquorice, we undertook a comparative proteomic analysis of liquorice seedlings grown for 10 days in soil with different relative water content (SRWC of 80%, 65%, 50% and 35%, respectively). We used an integrated approach of Tandem Mass Tag labeling in conjunction with LC–MS/MS. Results A total of 7409 proteins were identified in this study, of which 7305 total proteins could be quantified. There were 837 differentially expressed proteins (DEPs) identified after different drought stresses. Compared with CK, 123 DEPs (80 up-regulated and 43 down-regulated) were found in LS; 353 DEPs (254 up-regulated and 99 down-regulated) in MS; and 564 DEPs (312 up-regulated and 252 down-regulated) in SS.The number of differentially expressed proteins increased with increasing water stress, and the number of up-regulated proteins was higher than that of down-regulated proteins in the different drought stress treatments compared with the CK. Used systematic bioinformatics analysis of these data to identify informative proteins we showed that osmolytes such as cottonseed sugars and proline accumulated under light drought stress and improved resistance. Under moderate and severe drought stress, oxidation of unsaturated fatty acids and accumulation of glucose and galactose increased in response to drought stress. Under moderate and severe drought stress synthesis of the terpene precursors, pentacene 2,3-epoxide and β-coumarin, was inhibited and accumulation of triterpenoids (glycyrrhetinic acid) was also affected. Conclusions These data provide a baseline reference for further study of the downstream liquorice proteome in response to drought stress. Our data show that liquorice roots exhibit specific response mechanisms to different drought stresses.
 
Article
Background Repetitive sequences and mobile elements make up considerable fractions of individual genomes. While transposition events can be detrimental for organismal fitness, repetitive sequences form an enormous reservoir for molecular innovation. In this study, we aim to add repetitive elements to the annotation of the Pristionchus pacificus genome and assess their impact on novel gene formation. Results Different computational approaches define up to 24% of the P. pacificus genome as repetitive sequences. While retroelements are more frequently found at the chromosome arms, DNA transposons are distributed more evenly. We found multiple DNA transposons, as well as LTR and LINE elements with abundant evidence of expression as single-exon transcripts. When testing whether transposons disproportionately contribute towards new gene formation, we found that roughly 10–20% of genes across all age classes overlap transposable elements with the strongest trend being an enrichment of low complexity regions among the oldest genes. Finally, we characterized a horizontal gene transfer of Zisupton elements into diplogastrid nematodes. These DNA transposons invaded nematodes from eukaryotic donor species and experienced a recent burst of activity in the P. pacificus lineage. Conclusions The comprehensive annotation of repetitive elements in the P. pacificus genome builds a resource for future functional genomic analyses as well as for more detailed investigations of molecular innovations.
 
Article
Background Conogethes pinicolalis has been thought as a Pinaceae-feeding variant of the yellow peach moth, Conogethes punctiferalis . The divergence of C. pinicolalis from the fruit-feeding moth C. punctiferalis has been reported in terms of morphology, ecology, and genetics, however there is a lack of detailed molecular data. Therefore, in this study, we investigated the divergence of C. pinicolalis from C. punctiferalis from the aspects of transcriptomics, proteomics, metabolomics and bioinformatics. Results The expression of 74,611 mRNA in transcriptome, 142 proteins in proteome and 218 metabolites in metabolome presented significantly differences between the two species, while the KEGG results showed the data were mainly closely related to metabolism and redox. Moreover, based on integrating system-omics data, we found that the α-amylase and CYP6AE76 genes were mutated between the two species. Mutations in the α-amylase and CYP6AE76 genes may influence the efficiency of enzyme preference for a certain substrate, resulting in differences in metabolic or detoxifying ability in both species. The qPCR and enzyme activity test also confirmed the relevant gene expression. Conclusions These findings of two related species and integrated networks provide beneficial information for further exploring the divergence in specific genes, metabolism, and redox mechanism. Most importantly, it will give novel insight on species adaptation to various diets, such as from monophagous to polyphagous.
 
Article
Background: African swine fever (ASF) is a lethal hemorrhagic disease affecting domestic pigs resulting in up to 100% mortality rates caused by the ASF virus (ASFV). The locally-adapted pigs in South-western Kenya have been reported to be resilient to disease and harsh climatic conditions and tolerate ASF; however, the mechanisms by which this tolerance is sustained remain largely unknown. We evaluated the gene expression patterns in spleen tissues of these locally-adapted pigs in response to varying infective doses of ASFV to elucidate the virus-host interaction dynamics. Methods: Locally adapted pigs (n = 14) were experimentally infected with a high dose (1x106HAD50), medium dose (1x104HAD50), and low dose (1x102HAD50) of the highly virulent genotype IX ASFV Ken12/busia.1 (Ken-1033) isolate diluted in PBS and followed through the course of infection for 29 days. The in vivo pig host and ASFV pathogen gene expression in spleen tissues from 10 pigs (including three from each infective group and one uninfected control) were analyzed in a dual-RNASeq fashion. We compared gene expression between three varying doses in the host and pathogen by contrasting experiment groups against the naïve control. Results: A total of 4954 differentially expressed genes (DEGs) were detected after ASFV Ken12/1 infection, including 3055, 1771, and 128 DEGs in the high, medium, and low doses, respectively. Gene ontology and KEGG pathway analysis showed that the DEGs were enriched for genes involved in the innate immune response, inflammatory response, autophagy, and apoptosis in lethal dose groups. The surviving low dose group suppressed genes in pathways of physiopathological importance. We found a strong association between severe ASF pathogenesis in the high and medium dose groups with upregulation of proinflammatory cytokines and immunomodulation of cytokine expression possibly induced by overproduction of prostaglandin E synthase (4-fold; p < 0.05) or through downregulation of expression of M1-activating receptors, signal transductors, and transcription factors. The host-pathogen interaction resulted in induction of expression of immune-suppressive cytokines (IL-27), inactivation of autophagy and apoptosis through up-regulation of NUPR1 [5.7-fold (high dose) and 5.1-fold (medium dose) [p < 0.05] and IL7R expression. We detected repression of genes involved in MHC class II antigen processing and presentation, such as cathepsins, SLA-DQB1, SLA-DOB, SLA-DMB, SLA-DRA, and SLA-DQA in the medium and high dose groups. Additionally, the host-pathogen interaction activated the CD8+ cytotoxicity and neutrophil machinery by increasing the expression of neutrophils/CD8+ T effector cell-recruiting chemokines (CCL2, CXCL2, CXCL10, CCL23, CCL4, CXCL8, and CXCL13) in the lethal high and medium dose groups. The recovered pigs infected with ASFV at a low dose significantly repressed the expression of CXCL10, averting induction of T lymphocyte apoptosis and FUNDC1 that suppressed neutrophilia. Conclusions: We provide the first in vivo gene expression profile data from locally-adapted pigs from south-western Kenya following experimental infection with a highly virulent ASFV genotype IX isolate at varying doses that mimic acute and mild disease. Our study showed that the locally-adapted pigs induced the expression of genes associated with tolerance to infection and repression of genes involved in inflammation at varying levels depending upon the ASFV dose administered.
 
Schematic representation of the command line workflow. The workflow begins with virus classification using DIAMOND and reports the output as a text file with taxonomic information and similarity metrics. Phylogenetic analysis is performed using a default phylogenetic reference dataset generated by Neighbor-Joining (NJ), Maximum likelihood (ML) and Bayesian tree. Users can specify which phylogenetic reference dataset to use. Query sequences are aligned to the reference dataset multiple sequence alignment with MAFFT, and a ML phylogenetic tree is constructed followed by lineage assignment. An output file with the lineage assignment, bootstrap values and likelihood test ratio is generated in comma-separated values (CSV) file format
Screenshot of the web interface for RVFV typing tool. (A) The web interface offers a portal for users to perform classification and visualize the results. The typing report provides information on the sequence name of the query sequence, the nucleotide length of the sequence, an illustration of the position in the virus’ genomic segment, the species assignment and the genotype assignment. A detailed report (B) is provided for the phylogenetic analysis that resulted into this classification. All results can be exported to a variety of file formats (XML, CSV, Excel or FASTA format). The detailed HTML report (C) contains information on the sequence name, length, assigned virus and genotype, an illustration (D) of the position of the sequence in the virus’ genomic segment and the phylogenetic analysis section. The alignment section shows the alignment and constructed phylogenetic tree
Phylogenetic analysis using Gn and whole genome (L, M & S) segment classifiers. A-D Maximum likelihood (ML) phylogenetic trees inferred from the representative sequences for all lineages within the (A) 51 sequences of the glycoprotein (490 bp) gene aligned with MAFFT and ML tree inferred under the GTR + I + G substitution model, (B) 47 sequences of the Small (S) segment (1690 bp), (C) 47 sequences of the Medium (M) segment (3885 bp) and (D) 47 sequences of the Large (L) segment (6404 bp). All the trees show similar topology for all the lineages
Distribution of RVFV lineages in Africa and Middle East. A Lineages reported in Africa and the Middle East (Saudi Arabia) sampled between 1944 to 2016. B Map of Africa and Saudi Arabia indicating the number RVFV sequences for the M-segment (partial and complete) as of 28th May 2021 for the 129 sequences used in the lineage assignment. C Maximum likelihood phylogenetic tree using glycoprotein (Gn) representative sequences (n = 51) showing geographical distribution of lineages. The tips of the tree are colored according to their country of origin. CAR, Central African Republic
Article
Genetic evolution of Rift Valley fever virus (RVFV) in Africa has been shaped mainly by environmental changes such as abnormal rainfall patterns and climate change that has occurred over the last few decades. These gradual environmental changes are believed to have effected gene migration from macro (geographical) to micro (reassortment) levels. Presently, 15 lineages of RVFV have been identified to be circulating within the Sub-Saharan Africa. International trade in livestock and movement of mosquitoes are thought to be responsible for the outbreaks occurring outside endemic or enzootic regions. Virus spillover events contribute to outbreaks as was demonstrated by the largest epidemic of 1977 in Egypt. Genomic surveillance of the virus evolution is crucial in developing intervention strategies. Therefore, we have developed a computational tool for rapidly classifying and assigning lineages of the RVFV isolates. The computational method is presented both as a command line tool and a web application hosted at https://www.genomedetective.com/app/typingtool/rvfv/ . Validation of the tool has been performed on a large dataset using glycoprotein gene (Gn) and whole genome sequences of the Large (L), Medium (M) and Small (S) segments of the RVFV retrieved from the National Center for Biotechnology Information (NCBI) GenBank database. Using the Gn nucleotide sequences, the RVFV typing tool was able to correctly classify all 234 RVFV sequences at species level with 100% specificity, sensitivity and accuracy. All the sequences in lineages A ( n = 10), B ( n = 1), C ( n = 88), D ( n = 1), E ( n = 3), F ( n = 2), G ( n = 2), H ( n = 105), I ( n = 2), J ( n = 1), K ( n = 4), L (n = 8), M ( n = 1), N ( n = 5) and O ( n = 1) were also correctly classified at phylogenetic level. Lineage assignment using whole RVFV genome sequences (L, M and S-segments) did not achieve 100% specificity, sensitivity and accuracy for all the sequences analyzed. We further tested our tool using genomic data that we generated by sequencing 5 samples collected following a recent RVF outbreak in Kenya. All the 5 samples were assigned lineage C by both the partial (Gn) and whole genome sequence classifiers. The tool is useful in tracing the origin of outbreaks and supporting surveillance efforts. Availability: https://github.com/ajodeh-juma/rvfvtyping
 
Top-cited authors
Rob Edwards
  • Flinders University
Andrei Osterman
  • Sanford Burnham Prebys Medical Discovery Institute
Folker Meyer
  • University Hospital Essen
Svetlana Gerdes
  • Fellowship for Interpretation of Genomes
Olga Zagnitko
  • Sanford Burnham Prebys Medical Discovery Institute