BMC Genomics

Published by Springer Nature

Online ISSN: 1471-2164

Disciplines: Life Sciences

Journal websiteAuthor guidelines

Top read articles

2,098 reads in the past 30 days

Table 1 : HadV-A31 annotation of coding regions
Map of the genome organization and transcription units of HAdV-A31. Early and late transcription units are represented in different colors, intermediate gene products in white. The block arrows represent the predicted protein, titled either by protein name or predicted molecular size. Orientation of the arrows indicates the direction of transcription.
Phylogenetic analysis of all available complete genomic HAdV sequences representing all human adenovirus species (A to G), including the newly generated HAdV-A31 sequence. The tree was generated with MEGA 3.1 using neighbor-joining method, bootstrap values (%) were generated with 1,000 pseudoreplicates. For nucleotide accession numbers see Methods section.
Global pairwise sequence alignment of the HAdV-A31 genome with representative types of each HAdV species. The x axis shows the genome position, the y axis shows the sequence conservation in percent. Arrows on top display the transcription units and the direction of their transcription.
Schematic view of the predicted E3 CR1 beta (A) and the E3 RID beta (B) proteins of HAdV-A31, -A12 and -F40 or -C5, respectively. (A): a V-Set domain (red box) was only predicted within the N-terminal region of the E3 CR1 beta protein of HAdV-A31. (B): N-terminal phosphorylation sites (small red boxes) were predicted both for the HAdV-A31 and -C5 RID beta proteins, but not for HAdV-A12. Domain predictions were carried out using web based Pfam, ProSite and BLASTp.

+1

Unique sequence features of the Human Adenovirus 31 complete genomic sequence conserved in clinical isolates

November 2009

·

504,860 Reads

·

·

·

[...]

·

Download

Aims and scope


Exploring all aspects of genome-scale analysis, functional genomics, epigenomics, proteomics and transcriptomics, including novel methods and techniques, BMC Genomics is an open access peer-reviewed journal with a large readership and a highly experienced Editorial Board. This journal is part of the BMC series, a research community-focused collection publishing scientifically valid studies based on community-agreed standards of questioning, methods and analysis.

Recent articles


Locations of DMCpNs within the mitochondrial genome. The solar plot displays the results of our differential methylation analysis, showing the significance of differential methylation for CpN dinucleotides across the mitochondrial genome for each of our pairwise statistical comparisons; Male control vs. female control (MC vs. FC), male stress vs. female stress (MS vs. FS), male control vs. male stress (MC vs. MS) and female control vs. female stress (FC vs. FS). Each data point represents a possible differentially methylated CpN dinucleotide and is positioned based on its base pair (BP) location on the mitochondrial genome and its significance (-log(p-value)). Dashed lines indicate p-values of 0.05 and 0.005. The shaded blue peak area represents an example of an mtDMR within the COX1 gene, in which CpN methylation is clustered
The numbers and locations of regions of differential methylation (mtDMRs) in response to sex and stress. A The location of regions of differential cytosine methylation (mtDMRs) across the mitochondrial genome of pineal gland mitochondria resulting from circadian disruption (Stress). Within the 4 pairwise statistical comparisons, regions of hypo (blue) and hyper (red) methylation relate to our pairwise statistical comparisons; Male control vs. female control (MC vs. FC), male stress vs. female stress (MS vs. FS), male control vs. male stress (MC vs. MS) and female control vs. female stress (FC vs. FS). Coding regions within the heavy strand (mtDNA genes, above line) and light strand (mtDNA genes, below line) of the mitochondrial DNA are displayed for reference, with subunits of the same electron transport complexes shown in the same colour. tRNAs are shown in purple. Boxes around regions of differential methylation indicate the presence of sequence motifs associated with HNF4 (solid box), ATF4/ZNF324 (dashed box) or both (dot and dashed box). Letters within each window represent the nature of the mtDMR. a = sex-specific mtDMRs unrelated to stress, b = mtDMRs with identical responses in males and females, c = mtDMRs differing in the control and stress of one sex only, d = mtDMRs different across all statistical comparisons. B Venn diagram showing the mtDMRs which are lost, gained or remain between MC vs. FC and MS vs. FS, representing sex-specific stress-related regions (gained/lost) and sex-specific, stress-independent regions (remained). C A Venn diagram showing the numbers of mtDMRs which are either unique or present across our pairwise statistical comparisons
ATF4 presence in mtDNA across diverse taxa. A The DNA recognition motif for human ATF4 transcription factor used for a comparison across diverse taxa. B Locations of human ATF4 transcription factor recognition sites identified using FIMO (black bars above mtDNA coding regions) within the mitochondrial genomes of diverse taxa: Gallus gallus, Homo sapiens, Alligator mississippiensis, Danio rerio and Drosophila melanogaster. Coding regions within the heavy strand (mtDNA genes, above line) and light strand (mtDNA genes, below line) of the mitochondrial DNA are displayed for reference, with subunits of the same electron transport complexes shown in the same colour. tRNAs are shown in purple. C Sequence allignment of the 16S rRNA region associated with ATF4 binding (shown with star in B, outlined in dashes in C) as well as 20 bp upstream and downstream of the site
The mitoepigenome responds to stress, suggesting novel mito-nuclear interactions in vertebrates
  • New
  • Article
  • Full-text available

September 2023

·

14 Reads

The mitochondria are central in the cellular response to changing environmental conditions resulting from disease states, environmental exposures or normal physiological processes. Although the influences of environmental stressors upon the nuclear epigenome are well characterized, the existence and role of the mitochondrial epigenome remains contentious. Here, by quantifying the mitochondrial epigenomic response of pineal gland cells to circadian stress, we confirm the presence of extensive cytosine methylation within the mitochondrial genome. Furthermore, we identify distinct epigenetically plastic regions (mtDMRs) which vary in cytosinic methylation, primarily in a non CpG context, in response to stress and in a sex-specific manner. Motifs enriched in mtDMRs contain recognition sites for nuclear-derived DNA-binding factors (ATF4, HNF4A) important in the cellular metabolic stress response, which we found to be conserved across diverse vertebrate taxa. Together, these findings suggest a new layer of mito-nuclear interaction in which the nuclear metabolic stress response could alter mitochondrial transcriptional dynamics through the binding of nuclear-derived transcription factors in a methylation-dependent context.
Share

Drug-target binding affinity prediction using message passing neural network and self supervised learning

September 2023

·

6 Reads

Background Drug-target binding affinity (DTA) prediction is important for the rapid development of drug discovery. Compared to traditional methods, deep learning methods provide a new way for DTA prediction to achieve good performance without much knowledge of the biochemical background. However, there are still room for improvement in DTA prediction: (1) only focusing on the information of the atom leads to an incomplete representation of the molecular graph; (2) the self-supervised learning method could be introduced for protein representation. Results In this paper, a DTA prediction model using the deep learning method is proposed, which uses an undirected-CMPNN for molecular embedding and combines CPCProt and MLM models for protein embedding. An attention mechanism is introduced to discover the important part of the protein sequence. The proposed method is evaluated on the datasets Ki and Davis, and the model outperformed other deep learning methods. Conclusions The proposed model improves the performance of the DTA prediction, which provides a novel strategy for deep learning-based virtual screening methods.

Effect of ammonium stress on phosphorus solubilization of a novel marine mangrove microorganism Bacillus aryabhattai NM1-A2 as revealed by integrated omics analysis

September 2023

·

10 Reads

Background Phosphorus is one of the essential nutrients for plant growth. Phosphate-solubilizing microorganisms (PSMs) can alleviate available P deficiency and enhance plant growth in an eco-friendly way. Although ammonium toxicity is widespread, there is little understanding about the effect of ammonium stress on phosphorus solubilization (PS) of PSMs. Results In this study, seven PSMs were isolated from mangrove sediments. The soluble phosphate concentration in culture supernatant of Bacillus aryabhattai NM1-A2 reached a maximum of 196.96 mg/L at 250 mM (NH4)2SO4. Whole-genome analysis showed that B. aryabhattai NM1-A2 contained various genes related to ammonium transporter (amt), ammonium assimilation (i.e., gdhA, gltB, and gltD), organic acid synthesis (i.e., ackA, fdhD, and idh), and phosphate transport (i.e., pstB and pstS). Transcriptome data showed that the expression levels of amt, gltB, gltD, ackA and idh were downregulated, while gdhA and fdhD were upregulated. The inhibition of ammonium transporter and glutamine synthetase/glutamate synthase (GS/GOGAT) pathway contributed to reducing energy loss. For ammonium assimilation under ammonium stress, accompanied by protons efflux, the glutamate dehydrogenase pathway was the main approach. More 2-oxoglutarate (2-OG) was induced to provide abundant carbon skeletons. The downregulation of formate dehydrogenase and high glycolytic rate resulted in the accumulation of formic acid and acetic acid, which played key roles in PS under ammonium stress. Conclusions The accumulation of 2-OG and the inhibition of GS/GOGAT pathway played a key role in ammonium detoxification. The secretion of protons, formic acid and acetic acid was related to PS. Our work provides new insights into the PS mechanism, which will provide theoretical guidance for the application of PSMs.

Genomic characterization and comparative genomic analysis of HS-associated Pasteurella multocida serotype B:2 strains from Pakistan

September 2023

·

37 Reads

Background Haemorrhagic septicaemia (HS) is a highly fatal and predominant disease in livestock, particularly cattle and buffalo in the tropical regions of the world. Pasteurella multocida ( P. multocida ), serotypes B:2 and E:2, are reported to be the main causes of HS wherein serotype B:2 is more common in Asian countries including Pakistan and costs heavy financial losses every year. As yet, very little molecular and genomic information related to the HS-associated serotypes of P. multocida isolated from Pakistan is available. Therefore, this study aimed to explore the characteristics of novel bovine isolates of P. multocida serotype B:2 at the genomic level and perform comparative genomic analysis of various P. multocida strains from Pakistan to better understand the genetic basis of pathogenesis and virulence. Results To understand the genomic variability and pathogenomics, we characterized three HS-associated P. multocida serotype B:2 strains isolated from the Faisalabad (PM1), Peshawar (PM2) and Okara (PM3) districts of Punjab, Pakistan. Together with the other nine publicly available Pakistani-origin P. multocida strains and a reference strain Pm70, a comparative genomic analysis was performed. The sequenced strains were characterized as serotype B and belong to ST-122. The strains contain no plasmids; however, each strain contains at least two complete prophages. The pan-genome analysis revealed a higher number of core genes indicating a close resemblance to the studied genomes and very few genes (1%) of the core genome serve as a part of virulence, disease, and defense mechanisms. We further identified that studied P. multocida B:2 strains harbor common antibiotic resistance genes, specifically PBP3 and EF-Tu . Remarkably, the distribution of virulence factors revealed that OmpH and plpE were not present in any P. multocida B:2 strains while the presence of these antigens was reported uniformly in all serotypes of P . multocida. Conclusion This study's findings indicate the absence of OmpH and PlpE in the analyzed P. multocida B:2 strains, which are known surface antigens and provide protective immunity against P. multocida infection. The availability of additional genomic data on P. multocida B:2 strains from Pakistan will facilitate the development of localized therapeutic agents and rapid diagnostic tools specifically targeting HS-associated P. multocida B:2 strains.

Characterization and in silico analysis of the domain unknown function DUF568-containing gene family in rice (Oryza sativa L.)

September 2023

·

5 Reads

Background Domains of unknown function (DUF) proteins are a number of uncharacterized and highly conserved protein families in eukaryotes. In plants, some DUFs have been predicted to play important roles in development and response to abiotic stress. Among them, DUF568-containing protein family is plant-specific and has not been described previously. A basic analysis and expression profiling was performed, and the co-expression and interaction networks were constructed to explore the functions of DUF568 family in rice. Results The phylogenetic tree showed that the 8, 9 and 11 DUF568 family members from rice, Arabidopsis and maize were divided into three groups. The evolutionary relationship between DUF568 members in rice and maize was close, while the genes in Arabidopsis were more distantly related. The cis-elements prediction showed that over 82% of the elements upstream of OsDUF568 genes were responsive to light and phytohormones. Gene expression profile prediction and RT-qPCR experiments revealed that OsDUF568 genes were highly expressed in leaves, stems and roots of rice seedling. The expression of some OsDUF568 genes varied in response to plant hormones (abscisic acid, 6-benzylaminopurine) and abiotic stress (drought and chilling). Further analysis of the co-expression and protein–protein interaction networks using gene ontology showed that OsDUF568 − related genes were enriched in cellular transports, metabolism and processes. Conclusions In summary, our findings suggest that the OsDUF568 family may be a vital gene family for the development of rice roots, leaves and stems. In addition, the OsDUF568 family may participate in abscisic acid and cytokinin signaling pathways, and may be related to abiotic stress resistance in these vegetative tissues of rice.

Circos plot of first 34 contigs of the giant kelp genome. Different aspects of the genome are represented in each concentric circle. A Scaffold size in MB. B Gene density heatmap. C Percentage of GC ranging from 45 to 55%. D Nucleotide diversity ranging from 0 to 0.007. E SNP density heatmap. F Tajima’s D values ranging from -2 to 2. G Fst values ranging from 0 to 0.4. H TE density heatmap. Line values are plotted on the same 200 kb sliding window with 40 Kb intervals while heatmaps are plotted over the same 1 MB windows. Heatmaps are in a 1.75 log scale for greater dynamic range at higher values
Comparison of BUSCO assessment of genome completeness based on the stramenopiles_odb10 dataset between the macroalgaes genomes of Macrocystis pyrifera (assembled in this study), Ectocarpus sp. [25], Saccharina japonica [26], Undaria pinnatifida [27], Macrocystis pyrifera gene models from Molano et al. 2022 [34], and genome from Paul et al. 2022 [33]
Analysis of orthologs. A Protein comparative analysis of orthologs between giant kelp and three other relevant macroalgal species (Ectocarpus sp., Saccharina japonica and Undaria pinnatifida) using Orthofinder. Numbers represent shared orthogroups between species. B Species tree inferred by OrthoFinder
Synteny between the Macrocystis pyrifera genome (dark green) and the genomes of AEctocarpus siliculosus and BUndaria pinnatifida (light green). Bands represent clusters of at least 10 single copy orthologs no more than 3 MB apart. Purple bands are potential chromosome splitting or fusion. Gray bands represent scaffolds that share the highest number of orthologs and, therefore, are most syntenic. Red bands represent orthologs in different scaffolds. When multiple bands overlap, bands with fewer number of orthologs superimpose bands with higher numbers of orthologs for easier visualization. Histogram represents density of single copy orthologs in 1 MB windows
LD decay curve for each sampling population separately
A scaffolded and annotated reference genome of giant kelp (Macrocystis pyrifera)

September 2023

·

58 Reads

Macrocystis pyrifera (giant kelp), is a brown macroalga of great ecological importance as a primary producer and structure-forming foundational species that provides habitat for hundreds of species. It has many commercial uses (e.g. source of alginate, fertilizer, cosmetics, feedstock). One of the limitations to exploiting giant kelp’s economic potential and assisting in giant kelp conservation efforts is a lack of genomic tools like a high quality, contiguous reference genome with accurate gene annotations. Reference genomes attempt to capture the complete genomic sequence of an individual or species, and importantly provide a universal structure for comparison across a multitude of genetic experiments, both within and between species. We assembled the giant kelp genome of a haploid female gametophyte de novo using PacBio reads, then ordered contigs into chromosome level scaffolds using Hi-C. We found the giant kelp genome to be 537 MB, with a total of 35 scaffolds and 188 contigs. The assembly N50 is 13,669,674 with GC content of 50.37%. We assessed the genome completeness using BUSCO, and found giant kelp contained 94% of the BUSCO genes from the stramenopile clade. Annotation of the giant kelp genome revealed 25,919 genes. Additionally, we present genetic variation data based on 48 diploid giant kelp sporophytes from three different Southern California populations that confirms the population structure found in other studies of these populations. This work resulted in a high-quality giant kelp genome that greatly increases the genetic knowledge of this ecologically and economically vital species.

Identification and analysis of RNA-5-methylcytosine-related key genes in osteoarthritis

September 2023

·

3 Reads

Background 5-methylcytosine (m5C) modification is widely associated with many biological and pathological processes. However, knowledge of m5C modification in osteoarthritis (OA) remains lacking. Thus, our study aimed to identify common m5C features in OA. Results In the present study, we identified 1395 differentially methylated genes (DMGs) and 1673 differentially expressed genes (DEGs) using methylated RNA immunoprecipitation next-generation sequencing (MeRIP-seq) and RNA-sequencing. A co-expression analysis of DMGs and DEGs showed that the expression of 133 genes was significantly affected by m5C methylation. A protein–protein interaction network of the 133 genes was constructed using the STRING database, and the cytoHubba plug-in of Cytoscape was used to hub genes were screen out 11 hub genes, including MMP14, VTN, COL15A1, COL6A2, SPARC, COL5A1, COL6A3, COL6A1, COL8A2, ADAMTS2 and COL7A1. The Pathway enrichment analysis by the ClueGO and CluePedia plugins in Cytoscape showed that the hub genes were significantly enriched in collagen degradation and extracellular matrix degradation. Conclusions Our study indicated that m5C modification might play an important role in OA pathogenesis, and the present study provides worthwhile insight into identifying m5C-related therapeutic targets in OA.

miR-128-3p promoted the proliferation of intramuscular adipocytes. A, B Relative mRNA levels of cell proliferation-related genes (P21, BCL2, PCNA) after miR-128-3p overexpression or interference. C Cell growth curves determined by a CCK-8 assay at 12 h, 24 h, 36 h, and 48 h following miR-128-3p overexpression or interference. D Proliferation state of preadipocytes, as assessed by an EdU incorporation assay, after miR-128-3p overexpression or interference. E Cell cycle analysis via flow cytometry after miR-128-3p overexpression or interference. F Apoptosis assay via flow cytometry after miR-128-3p overexpression or interference. The results are shown as the mean ± S.E.M. values, and the data are representative of at least three independent assays. Independent samples t tests were used to analyze the significance of differences between groups (.#P > 0.05, * P < 0.05; ** P < 0.01, *** P < 0.001)
miR-128-3p inhibited the differentiation of chicken intramuscular adipocytes. A, B Relative mRNA levels of cell proliferation-related genes (FABP4, FASN and CEBPA) after miR-128-3p overexpression or interference. C Representative images of Oil Red O staining in intramuscular adipocytes transfected with the miR-128-3p mimic, miR-128-3p inhibitor or corresponding NC (this image was acquired with a 20 × objective, and an enlarged image is shown in the bottom left corner of the image). D, E Semiquantitative assessment of Oil Red O absorbance at 450 nm. F, G The triglyceride content was determined by measurement of the absorbance at 500 nm after miR-128-3p overexpression or interference
Differential mRNA expression analysis. A PCA of the samples. B Validation of mRNA sequencing data. C Volcano plot of gene expression in the SI vs. M comparison. Numbers of upregulated and downregulated differentially expressed mRNAs. The left blue bars represent the numbers of upregulated genes; the orange bars represent the numbers of downregulated genes. D Based on the comparison of binding sites in the seed region of miR-128-3p, Cytoscape software was used for network interaction analysis, and the regulatory network of miR-128-3p was mapped. E KEGG pathway enrichment analysis of the DEGs in the M vs. SI comparison. F Venn diagrams of DEGs identified by RNA-seq in the NC vs. M, NC vs. SI, and SI vs. M comparisons
FDPS as a target gene of miR-128-3p. A The potential miR-128-3p target site in the FDPS mRNA 3’UTR was predicted by the RNAhybrid tool. B The miR-128-3p binding site in the FDPS mRNA 3’UTR. C A dual luciferase reporter assay was performed by cotransfecting plasmids containing the wild-type or mutated FDPS 3’UTR, the psiICH2CK2 plasmid and the miR-128-3p mimic into DF-1 cells. The results are shown as the mean ± S.E.M. values, and the data are representative of at least three independent assays. Independent samples t tests were used to analyze the significance of differences between groups (*** P < 0.001)
Interference with FDPS expression inhibits intramuscular preadipocyte differentiation. A mRNA levels of FDPS during the differentiation of chicken primary intramuscular adipocytes into mature adipocytes. B Relative expression of FDPS in intramuscular adipocytes transfected with FDPS. C Relative mRNA levels of adipocyte differentiation-related genes (FABP4, FASN, and CEBPA) after interference with FDPS expression. D The triglyceride content was determined by measurement of the absorbance at 500 nm after interference with FDPS expression. E Representative images of Oil Red O staining in intramuscular adipocytes transfected with FDPS siRNA or the corresponding NC (this image was acquired with a 20 × objective, and an enlarged image is shown in the bottom left corner of the image.). F Semiquantitative assessment of Oil Red O absorbance at 450 nm
miR-128-3p inhibits intramuscular adipocytes differentiation in chickens by downregulating FDPS

September 2023

·

13 Reads

Background Intramuscular fat (IMF) content is the major indicator for evaluating chicken meat quality due to its positive correlation with tenderness, juiciness, and flavor. An increasing number of studies are focusing on the functions of microRNAs (miRNAs) in intramuscular adipocyte differentiation. However, little is known about the association of miR-128-3p with intramuscular adipocyte differentiation. Our previous RNA-seq results indicated that miR-128-3p was differentially expressed at different periods in chicken intramuscular adipocytes, revealing a possible association with intramuscular adipogenesis. The purpose of this research was to investigate the biological functions and regulatory mechanism of miR-128-3p in chicken intramuscular adipogenesis. Results The results of a series of assays confirmed that miR-128-3p could promote the proliferation and inhibit the differentiation of intramuscular adipocytes. A total of 223 and 1,050 differentially expressed genes (DEGs) were identified in the mimic treatment group and inhibitor treatment group, respectively, compared with the control group. Functional enrichment analysis revealed that the DEGs were involved in lipid metabolism-related pathways, such as the MAPK and TGF-β signaling pathways. Furthermore, target gene prediction analysis showed that miR-128-3p can target many of the DEGs, such as FDPS, GGT5, TMEM37, and ASL2. The luciferase assay results showed that miR-128-3p targeted the 3’ UTR of FDPS. The results of subsequent functional assays demonstrated that miR-128-3p acted as an inhibitor of intramuscular adipocyte differentiation by targeting FDPS. Conclusion miR-128-3p inhibits chicken intramuscular adipocyte differentiation by downregulating FDPS. Our findings provide a theoretical basis for the study of lipid metabolism and reveal a potential target for molecular breeding to improve meat quality.

Codon usage pattern of the ancestor of green plants revealed through Rhodophyta

September 2023

·

15 Reads

Rhodophyta are among the closest known relatives of green plants. Studying the codons of their genomes can help us understand the codon usage pattern and characteristics of the ancestor of green plants. By studying the codon usage pattern of all available red algae, it was found that although there are some differences among species, high-bias genes in most red algae prefer codons ending with GC. Correlation analysis, Nc-GC3s plots, parity rule 2 plots, neutrality plot analysis, differential protein region analysis and comparison of the nucleotide content of introns and flanking sequences showed that the bias phenomenon is likely to be influenced by local mutation pressure and natural selection, the latter of which is the dominant factor in terms of translation accuracy and efficiency. It is worth noting that selection on translation accuracy could even be detected in the low-bias genes of individual species. In addition, we identified 15 common optimal codons in seven red algae except for G. sulphuraria for the first time, most of which were found to be complementary and bound to the tRNA genes with the highest copy number. Interestingly, tRNA modification was found for the highly degenerate amino acids of all multicellular red algae and individual unicellular red algae, which indicates that highly biased genes tend to use modified tRNA in translation. Our research not only lays a foundation for exploring the characteristics of codon usage of the red algae as green plant ancestors, but will also facilitate the design and performance of transgenic work in some economic red algae in the future.

The probiotic property of the nine L. salivarius isolates. a-f Acid tolerance, bile tolerance, anti-Salmonella activity, auto-aggregation abilities, anti-adhesion abilities against S. Derby 14T, and coaggregation abilities of nine L. salivarius isolates, respectively. Note: Data shown are mean SD of triplicate values of independent experiments. The compact letter display indicates significant differences in pairwise comparisons, and strains with different letters are significantly different (P < 0.05, one-way ANOVA, Tukey post hoc test) (c-f)
The genomic characteristics of 24 L. salivarius strains from different hosts. a Heatmap of ANI values based on the sequences of 24 L. salivarius strains. b-f The comparison of genome size, CDSs, GC content, COG-annotated genes, and COG categories of L. salivarius strains from different hosts, respectively. Kruskal–Wallis test with Mann–Whitney post hoc test was used for comparisons (b-f). The compact letter display indicates significant differences in pairwise comparisons, and groups with different letters are significantly different (P < 0.05). The horse group which contains only one strain was excluded from the statistical anaysis. *, P < 0.05; **; P < 0.01; ***, P < 0.001 as revealed by Kruskal–Wallis test (f)
a Estimation of the L. salivarius pan- and core-genome size. b The gene gain and loss events identified by GLOOME analysis. The red colored numbers labeled on the nodes denote the gain events, while the blue colored numbers denote the gene loss events. The tree was constructed based on the gene presence/absence matrix of the pan-genome, and the scale bar denotes 100 gene differences
Genes encoding CAZymes in the genome of L. salivarius strains among different hosts. a Distribution and abundance of CAZymes categories among different hosts. Kruskal–Wallis test with Mann–Whitney post hoc test was used for comparisons. The compact letter display indicates significant differences in pairwise comparisons, and groups with different letters are significantly different (P < 0.05). b Heatmap of the number of specific CAZymes categories in the genome of L. salivarius strains from different hosts. ** denotes P < 0.01 and *** denotes P < 0.001 based on Kruskal–Wallis test, and the Mann–Whitney post hoc test results were shown in Table S4. The horse group which contains only one strain was excluded from the statistical analysis
of mobile elements (plasmids and phages) and CRISPR-Cas systems present in L. salivarius strains. Upper panel, the colored cells denote the presence of each of the plasmid clusters, and a number is shown in the cell if a strain harbor more than one plasmid in the same group. The plasmid sequences for the strains were grouped into clusters using the mob_type method implemented in mob-suite, and the clusters were named with the primary_cluster_id values deposited in the mob-suite database. The number of CDSs predicted in the plasmid sequences of each strain was also shown, and different letters denote significant differences (P < 0.05, Kruskal–Wallis test with Mann–Whitney post hoc test). Middle panel, the colored cells denote presence of each of the prophage clusters. The prophage sequences were grouped into clusters using cd-hit-est with parameter -c 0.9. Lower panel, the colored cells denote presence of each of the three CRISPR-Cas systems
Assessment of beneficial effects and identification of host adaptation-associated genes of Ligilactobacillus salivarius isolated from badgers

September 2023

·

15 Reads

Background Ligilactobacillus salivarius has been frequently isolated from the gut microbiota of humans and domesticated animals and has been studied as a candidate probiotic. Badger (Meles meles) is known as a “generalist” species that consumes complex foods and exhibits tolerance and resistance to certain pathogens, which can be partly attributed to the beneficial microbes such as L. salivarius in the gut microbiota. However, our understanding of the beneficial traits and genomic features of badger-originated L. salivarius remains elusive. Results In this study, nine L. salivarius strains were isolated from wild badgers' feces, one of which exhibited good probiotic properties. Complete genomes of the nine L. salivarius strains were generated, and comparative genomic analysis was performed with the publicly available complete genomes of L. salivarius obtained from humans and domesticated animals. The strains originating from badgers harbored a larger genome, a higher number of protein-coding sequences, and functionally annotated genes than those originating from humans and chickens. The pan-genome phylogenetic tree demonstrated that the strains originating from badgers formed a separate clade, and totally 412 gene families (12.6% of the total gene families in the pan-genome) were identified as genes gained by the last common ancestor of the badger group. The badger group harbored significantly more gene families responsible for the degradation of complex carbohydrate substrates and production of polysaccharides than strains from other hosts; many of these were acquired by gene gain events. Conclusions A candidate probiotic and nine L. salivarius complete genomes were obtained from the badgers’ gut microbiome, and several beneficial genes were identified to be specifically present in the badger-originated strains that were gained in the evolution. Our study provides novel insights into the adaptation of L. salivarius to the intestinal habitat of wild badgers and provides valuable strain and genome resources for the development of L. salivarius as a probiotic.

Integration of miRNA dynamics and drought tolerant QTLs in rice reveals the role of miR2919 in drought stress response

September 2023

·

64 Reads

To combat drought stress in rice, a major threat to global food security, three major quantitative trait loci for ‘yield under drought stress’ (qDTYs) were successfully exploited in the last decade. However, their molecular basis still remains unknown. To understand the role of secondary regulation by miRNA in drought stress response and their relation, if any, with the three qDTYs, the miRNA dynamics under drought stress was studied at booting stage in two drought tolerant (Sahbaghi Dhan and Vandana) and one drought sensitive (IR 20) cultivars. In total, 53 known and 40 novel differentially expressed (DE) miRNAs were identified. The primary drought responsive miRNAs were Osa-MIR2919, Osa-MIR3979, Osa-MIR159f, Osa-MIR156k, Osa-MIR528, Osa-MIR530, Osa-MIR2091, Osa-MIR531a, Osa-MIR531b as well as three novel ones. Sixty-one target genes that corresponded to 11 known and 4 novel DE miRNAs were found to be co-localized with the three qDTYs, out of the 1746 target genes identified. We could validate miRNA-mRNA expression under drought for nine known and three novel miRNAs in eight different rice genotypes showing varying degree of tolerance. From our study, Osa-MIR2919, Osa-MIR3979, Osa-MIR528, Osa-MIR2091-5p and Chr01_11911S14Astr and their target genes LOC_Os01g72000, LOC_Os01g66890, LOC_Os01g57990, LOC_Os01g56780, LOC_Os01g72834, LOC_Os01g61880 and LOC_Os01g72780 were identified as the most promising candidates for drought tolerance at booting stage. Of these, Osa-MIR2919 with 19 target genes in the qDTYs is being reported for the first time. It acts as a negative regulator of drought stress tolerance by modulating the cytokinin and brassinosteroid signalling pathway.

Analysis of gut microbiota in chinese donkey in different regions using metagenomic sequencing

September 2023

·

17 Reads

Background Gut microbiota plays a significant role in host survival, health, and diseases; however, compared to other livestock, research on the gut microbiome of donkeys is limited. Results In this study, a total of 30 donkey samples of rectal contents from six regions, including Shigatse, Changdu, Yunnan, Xinjiang, Qinghai, and Dezhou, were collected for metagenomic sequencing. The results of the species annotation revealed that the dominant phyla were Firmicutes and Bacteroidetes, and the dominant genera were Bacteroides, unclassified_o_Clostridiales (short for Clostridiales) and unclassified_f_Lachnospiraceae (short for Lachnospiraceae). The dominant phyla, genera and key discriminators were Bacteroidetes, Clostridiales and Bacteroidetes in Tibet donkeys (Shigatse); Firmicutes, Clostridiales and Clostridiales in Tibet donkeys (Changdu); Firmicutes, Fibrobacter and Tenericutes in Qinghai donkeys; Firmicutes, Clostridiales and Negativicutes in Yunnan donkeys; Firmicutes, Fibrobacter and Fibrobacteres in Xinjiang donkeys; Firmicutes, Clostridiales and Firmicutes in Dezhou donkeys. In the functional annotation, it was mainly enriched in the glycolysis and gluconeogenesis of carbohydrate metabolism, and the abundance was the highest in Dezhou donkeys. These results combined with altitude correlation analysis demonstrated that donkeys in the Dezhou region exhibited strong glucose-conversion ability, those in the Shigatse region exhibited strong glucose metabolism and utilization ability, those in the Changdu region exhibited a strong microbial metabolic function, and those in the Xinjiang region exhibited the strongest ability to decompose cellulose and hemicellulose. Conclusion According to published literature, this is the first study to construct a dataset with multi-regional donkey breeds. Our study revealed the differences in the composition and function of gut microbes in donkeys from different geographic regions and environmental settings and is valuable for donkey gut microbiome research.

The evolution and expansion of RWP-RK gene family improve the heat adaptability of elephant grass (Pennisetum purpureum Schum.)

August 2023

·

25 Reads

Background Along with global warming, resulting in crop production, exacerbating the global food crisis. Therefore, it is urgent to study the mechanism of plant heat resistance. However, crop resistance genes were lost due to long-term artificial domestication. By analyzing the potential heat tolerance genes and molecular mechanisms in other wild materials, more genetic resources can be provided for improving the heat tolerance of crops. Elephant grass (Pennisetum purpureum Schum.) has strong adaptability to heat stress and contains abundant heat-resistant gene resources. Results Through sequence structure analysis, a total of 36 RWP-RK members were identified in elephant grass. Functional analysis revealed their close association with heat stress. Four randomly selected RKDs (RKD1.1, RKD4.3, RKD6.6, and RKD8.1) were analyzed for expression, and the results showed upregulation under high temperature conditions, suggesting their active role in response to heat stress. The members of RWP-RK gene family (36 genes) in elephant grass were 2.4 times higher than that of related tropical crops, rice (15 genes) and sorghum (15 genes). The 36 RWPs of elephant grass contain 15 NLPs and 21 RKDs, and 73% of RWPs are related to WGD. Among them, combined with the DAP-seq results, it was found that RWP-RK gene family expansion could improve the heat adaptability of elephant grass by enhancing nitrogen use efficiency and peroxidase gene expression. Conclusions RWP-RK gene family expansion in elephant grass is closely related to thermal adaptation evolution and speciation. The RKD subgroup showed a higher responsiveness than the NLP subgroup when exposed to high temperature stress. The promoter region of the RKD subgroup contains a significant number of MeJA and ABA responsive elements, which may contribute to their positive response to heat stress. These results provided a scientific basis for analyzing the heat adaptation mechanism of elephant grass and improving the heat tolerance of other crops.

BMC Genomics

August 2023

·

31 Reads

Background Structural descriptions of complete genomes have elucidated evolutionary processes in angiosperms. In Cactaceae (Caryophyllales), a high structural diversity of the chloroplast genome has been identified within and among genera. In this study, we assembled the first mitochondrial genome (mtDNA) for the short-globose cactus Mammillaria huitzilopochtli. For comparative purposes, we used the published genomes of 19 different angiosperms and the gymnosperm Cycas taitungensis as an external group for phylogenetic issues. Results The mtDNA of M. huitzilopochtli was assembled into one linear chromosome of 2,052,004 bp, in which 65 genes were annotated. These genes account for 57,606 bp including 34 protein-coding genes (PCGs), 27 tRNAs, and three rRNAs. In the non-coding sequences, repeats were abundant, with a total of 4,550 (179,215 bp). In addition, five complete genes (psaC and four tRNAs) of chloroplast origin were documented. Negative selection was estimated for most (23) of the PCGs. The phylogenetic tree showed a topology consistent with previous analyses based on the chloroplast genome. Conclusions The number and type of genes contained in the mtDNA of M. huitzilopochtli were similar to those reported in 19 other angiosperm species, regardless of their phylogenetic relationships. Although other Caryophyllids exhibit strong differences in structural arrangement and total size of mtDNA, these differences do not result in an increase in the typical number and types of genes found in M. huitzilopochtli. We concluded that the total size of mtDNA in angiosperms increases by the lengthening of the non-coding sequences rather than a significant gain of coding genes.

Chromosome-level genome assembly of Babesia caballi reveals diversity of multigene families among Babesia species

August 2023

·

57 Reads

Background Babesia caballi is an intraerythrocytic parasite from the phylum Apicomplexa, capable of infecting equids and causing equine piroplasmosis. However, since there is limited genome information available on B. caballi , molecular mechanisms involved in host specificity and pathogenicity of this species have not been fully elucidated yet. Results Genomic DNA from a B. caballi subclone was purified and sequenced using both Illumina and Nanopore technologies. The resulting assembled sequence consisted of nine contigs with a size of 12.9 Mbp, rendering a total of 5,910 protein-coding genes. The phylogenetic tree of Apicomplexan species was reconstructed using 263 orthologous genes. We identified 481 ves1 -like genes and named “ ves1c ”. In contrast, expansion of the major facilitator superfamily ( mfs ) observed in closely related B. bigemina and B. ovata species was not found in B. caballi . A set of repetitive units containing an open reading frame with a size of 297 bp was also identified. Conclusions We present a chromosome-level genome assembly of B. caballi . Our genomic data may contribute to estimating gene expansion events involving multigene families and exploring the evolution of species from this genus.

Identification of immune-related lncRNA in sepsis by construction of ceRNA network and integrating bioinformatic analysis

August 2023

·

10 Reads

Background Sepsis is a high mortality disease which seriously threatens human life and health, for which the pathogenetic mechanism still unclear. There is increasing evidence showed that immune and inflammation responses are key players in the development of sepsis pathology. LncRNAs, which act as ceRNAs, have critical roles in various diseases. However, the regulatory roles of ceRNA in the immunopathogenesis of sepsis have not yet been elucidated. Results In this study, we aimed to identify immune biomarkers associated with sepsis. We first generated a global immune-associated ceRNA (IMCE) network based on data describing interactions pairs of gene–miRNA and miRNA–lncRNA. Afterward, we excavated a dysregulated sepsis immune-associated ceRNA (SPIMC) network from the global IMCE network by means of a multi-step computational approach. Functional enrichment indicated that lncRNAs in SPIMC network have pivotal roles in the immune mechanism underlying sepsis. Subsequently, we identified module and hub genes (CD4 and STAT4) via construction of a sepsis immune-related PPI network. Then, we identified hub genes based on the modular structure of PPI network and generated a ceRNA subnetwork to analyze key lncRNAs associated with sepsis. Finally, 6 lncRNAs (LINC00265, LINC00893, NDUFA6-AS1, NOP14-AS1, PRKCQ-AS1 and ZNF674-AS1) that identified as immune biomarkers of sepsis. Moreover, the CIBERSORT algorithm and the infiltration of circulating immune cells types were performed to identify the inflammatory state of sepsis. Correlation analyses between immune cells and sepsis immune biomarkers showed that the LINC00265 was strongly positive correlated with the macrophages M2 (r = 0.77). Conclusion Collectively, these results may suggest that these lncRNAs (LINC00265, LINC00893, NDUFA6-AS1, NOP14-AS1, PRKCQ-AS1 and ZNF674-AS1) played important roles in the immune pathogenesis of sepsis and provide potential therapeutic targets for further researches on immune therapy treatment in patients with sepsis.

Understanding the underlying genetic mechanisms for age at first calving, inter-calving period and scrotal circumference in Bonsmara cattle

August 2023

·

45 Reads

Background Reproduction is a key feature of the sustainability of a species and thus represents an important component in livestock genetic improvement programs. Most reproductive traits are lowly heritable. In order to gain a better understanding of the underlying genetic basis of these traits, a genome-wide association was conducted for age at first calving (AFC), first inter-calving period (ICP) and scrotal circumference (SC) within the South African Bonsmara breed. Phenotypes and genotypes (120,692 single nucleotide polymorphisms (SNPs) post editing) were available on 7,128 South African Bonsmara cattle; the association analyses were undertaken using linear mixed models. Results Genomic restricted maximum likelihood analysis of the 7,128 SA Bonsmara cattle yielded genomic heritability’s of 0.183 (SE = 0.021) for AFC, 0.207 (SE = 0.022) for ICP and 0.209 (SE = 0.019) for SC. A total of 16, 23 and 51 suggestive ( P ≤ 4 × 10 ⁻⁶ ) SNPs were associated with AFC, ICP and SC, while 11, 11 and 44 significant ( P ≤ 4 × 10 ⁻⁷ ) SNPs were associated with AFC, ICP and SC respectively. A total of 11 quantitative trait loci (QTL) and 11 candidate genes were co-located with these associated SNPs for AFC, with 10 QTL harbouring 11 candidate genes for ICP and 41 QTL containing 40 candidate genes for SC. The QTL identified were close to genes previously associated with carcass, fertility, growth and milk-related traits. The biological pathways influenced by these genes include carbohydrate catabolic processes, cellular development, iron homeostasis, lipid metabolism and storage, immune response, ovarian follicle development and the regulation of DNA transcription and RNA translation. Conclusions This was the first attempt to study the underlying polymorphisms associated with reproduction in South African beef cattle. Genes previously reported in cattle breeds for numerous traits bar AFC, ICP or SC were detected in this study. Over 20 different genes have not been previously reported in beef cattle populations and may have been associated due to the unique genetic composite background of the SA Bonsmara breed.

The process of data collection and database construction. Upper part shows the process of the data collection, including manually curating literature and integrating public databases. The left panel describes our manual extraction of experimental microbe-disease information. The right panel gives the list of public databases of information source. Lower part shows the types of associations, basic information of each association entry and GMMAD functions. The left panel shows the three types of associations. Disease-metabolite is the core association, and it is provided exclusively by our database. The middle panel tells the basic information of each entry, including details of disease description, microbial category, molecular mass of metabolite and so on. The right panel provides a list of GMMAD’s main functions
Distribution of association strength scores in 113 different diseases. Diseases were shown in the deceasing order of the mean Sas of associated metabolites
Statistic of data in GMMAD. A The number of diseases, microbes and metabolites in GMMAD. B The top 10 diseases in disease-microbe associations. C The top 10 microbes in disease-microbe associations. D The top 10 microbes in microbe-metabolite associations. E The top 10 metabolite s in microbe-metabolite associations
Schematic workflow of GMMAD. Browse part: User can browse all three types of association data in GMMAD. Search part: User can search the interested association by typing the key word. Results part: All the results will be shown in the table which includes information about each term. Details part: User can click ‘Detail’ to obtain more details about disease, microbe, and metabolite
GMMAD: a comprehensive database of human gut microbial metabolite associations with diseases

August 2023

·

23 Reads

Background The natural products, metabolites, of gut microbes are crucial effect factors on diseases. Comprehensive identification and annotation of relationships among disease, metabolites, and microbes can provide efficient and targeted solutions towards understanding the mechanism of complex disease and development of new markers and drugs. Results We developed G ut M icrobial M etabolite A ssociation with D isease (GMMAD), a manually curated database of associations among human diseases, gut microbes, and metabolites of gut microbes. Here, this initial release (i) contains 3,836 disease-microbe associations and 879,263 microbe-metabolite associations, which were extracted from literatures and available resources and then experienced our manual curation; (ii) defines an association strength score and a confidence score. With these two scores, GMMAD predicted 220,690 disease-metabolite associations, where the metabolites all belong to the gut microbes. We think that the positive effective (with both scores higher than suggested thresholds) associations will help identify disease marker and understand the pathogenic mechanism from the sense of gut microbes. The negative effective associations would be taken as biomarkers and have the potential as drug candidates. Literature proofs supported our proposal with experimental consistence; (iii) provides a user-friendly web interface that allows users to browse, search, and download information on associations among diseases, metabolites, and microbes. The resource is freely available at http://guolab.whu.edu.cn/GMMAD . Conclusions As the online-available unique resource for gut microbial metabolite-disease associations, GMMAD is helpful for researchers to explore mechanisms of disease- metabolite-microbe and screen the drug and marker candidates for different diseases.

Early-life HFD induces fatty liver in medaka, which is reversed by subsequent NC. A Experimental design of early-life HFD feeding in medaka, where male fish were fed either HFD or NC for six weeks from hatching stage, followed by eight weeks of NC for both groups. B All fish were maintained individually to minimize variation of the speed of growth. C Body weight and liver weight at seven and 15 weeks of age. D Representative images of liver sections stained with Oil Red O (top right) and H&E (bottom), and E transmission electron microscopy (TEM) of liver sections from NC, HFD, NC-NC, and HFD-NC medaka. Scale bars: 100 µm in (D) and 5 µm in (E)
Early-life HFD-induced changes in hepatocyte gene expression are largely reversed by subsequent NC. A A schematic of tdo2:GFP transgenic medaka construction using CRISPR-Cas9-mediated gene knockin and sorting of GFP-positive hepatocytes. Scale bar: 100 µm B RNA-seq results showing differentially expressed genes in hepatocytes of HFD fish compared to NC. X-axis: log2 fold change (HFD/NC) of expression values, y-axis: –log10 p-value. C Gene ontology analyses of upregulated (upper) and downregulated (lower) genes by HFD for biological processes using PANTHER 17.0. Fold enrichment (orange) and –log10 FDR (bar) are displayed. D Heatmaps of genes belonging to the GO term “lipid metabolic process” (upper) and “translation” (lower) at seven weeks of age. Log2-transformed, and Z-transformed, DESeq2 normalized read counts are displayed. E Differentially expressed genes in hepatocytes of HFD-NC fish compared to NC-NC. F Venn diagram of differentially expressed genes at seven and 15 weeks of age. G A heatmap of differentially expressed genes at seven weeks of age. Log2-transformed, and Z-transformed, DESeq2 normalized read counts are displayed
Early-life HFD-induced changes in hepatocyte chromatin accessibility are largely reversed by subsequent NC. A A representative track view of RNA-seq, ATAC-seq, and ChIP-seq (H3K27ac, H3K27me3, and H3K9me3) signals for hepatocytes/livers for each of the dietary conditions. B Differentially accessible peaks in HFD fish hepatocytes compared to those in NC fish by ATAC-seq. X-axis: log2 fold change (HFD/NC) of normalized read counts within peaks, y-axis: –log10 p-value. C Gene ontology analysis of genes close to peaks with increased or decreased chromatin accessibility in HFD fish. Fold enrichment (orange) and –log10 FDR (bar) are displayed. D Motif analysis of peaks with increased (upper) and decreased (lower) accessibility, inferred by HOMER v4.11. E Differentially accessible peaks in HFD-NC fish hepatocytes compared to NC-NC fish. F Venn diagram of differentially accessible peaks at seven and 15 weeks of age. G Representative examples of ATAC-seq peaks at the promoters of fads2, srebf1, and cyp2r1 genes
Early-life HFD-induced changes in liver histone modifications are largely reversed subsequent NC. A Differentially-H3K27ac enriched peaks in livers of HFD-fed fish compared to NC by ChIP-seq. X-axis: log2 fold change (HFD/NC) of normalized read counts within peaks, y-axis: –log10 p-value. B Differentially-H3K27ac-enriched peaks in livers of HFD-NC fish compared to NC-NC. C Overlaps of differentially-H3K27ac-enriched peaks at seven and 15 weeks of age. D Overlaps of differentially expressed, differentially accessible, and differentially-H3K27ac-enriched genes, at seven and 15 weeks of age. E Differentially-H3K27me3-enriched 4-kb bins in livers of HFD fish relative to NC (left) and HFD-NC relative to NC-NC (right). F Differentially-H3K9me3-enriched 4-kb bins in livers of HFD fish relative to NC (left) and HFD-NC relative to NC-NC (right). G Representative examples of reversible peaks (i.e., genomic regions where epigenetic states were changed soon after HFD feeding but were returned to normal levels by the following NC feeding). Left, fads2; right, s100a14
Genes showing persistent changes in epigenetic state following long-term NC. A Gene ontology analysis of genes close to differentially accessible ATAC-seq peaks between HFD-NC and NC-NC fish (194 genes from 337 peaks). B A heatmap of persistent ATAC-seq peaks after reversal to NC (50 peaks in total). Log2-transformed, and Z-transformed, DESeq2 normalized read counts of ATAC-seq at each peak are displayed. DESeq2 results of RNA-seq and H3K27ac ChIP-seq of nearby genes/peaks are displayed on the right (p-value < 0.01). C, D Representative examples of ATAC-seq peaks showing persistent changes in chromatin accessibility, accompanied by long-term changes in expression of nearby genes (C) or gene expression that returned to normal level after a switch to NC (D)
High-fat diet in early life triggers both reversible and persistent epigenetic changes in the medaka fish (Oryzias latipes)

August 2023

·

30 Reads

Background The nutritional status during early life can have enduring effects on an animal’s metabolism, although the mechanisms underlying these long-term effects are still unclear. Epigenetic modifications are considered a prime candidate mechanism for encoding early-life nutritional memories during this critical developmental period. However, the extent to which these epigenetic changes occur and persist over time remains uncertain, in part due to challenges associated with directly stimulating the fetus with specific nutrients in viviparous mammalian systems. Results In this study, we used medaka as an oviparous vertebrate model to establish an early-life high-fat diet (HFD) model. Larvae were fed with HFD from the hatching stages (one week after fertilization) for six weeks, followed by normal chow (NC) for eight weeks until the adult stage. We examined the changes in the transcriptomic and epigenetic state of the liver over this period. We found that HFD induces simple liver steatosis, accompanied by drastic changes in the hepatic transcriptome, chromatin accessibility, and histone modifications, especially in metabolic genes. These changes were largely reversed after the long-term NC, demonstrating the high plasticity of the epigenetic state in hepatocytes. However, we found a certain number of genomic loci showing non-reversible epigenetic changes, especially around genes related to cell signaling, liver fibrosis, and hepatocellular carcinoma, implying persistent changes in the cellular state of the liver triggered by early-life HFD feeding. Conclusion In summary, our data show that early-life HFD feeding triggers both reversible and persistent epigenetic changes in medaka hepatocytes. Our data provide novel insights into the epigenetic mechanism of nutritional programming and a comprehensive atlas of the long-term epigenetic state in an early-life HFD model of non-mammalian vertebrates.

A systematic analysis of the phloem protein 2 (PP2) proteins in Gossypium hirsutum reveals that GhPP2-33 regulates salt tolerance

August 2023

·

22 Reads

Background Phloem protein 2 (PP2) proteins play a vital role in the Phloem-based defense (PBD) and participate in many abiotic and biotic stress. However, research on PP2 proteins in cotton is still lacking. Results A total of 25, 23, 43, and 47 PP2 genes were comprehensively identified and characterized in G.arboretum, G.raimondii, G.barbadense, and G.hirsutum. The whole genome duplication (WGD) and allopolyploidization events play essential roles in the expansion of PP2 genes. The promoter regions of GhPP2 genes contain many cis-acting elements related to abiotic stress and the weighted gene co-expression network analysis (WGCNA) analysis displayed that GhPP2s could be related to salt stress. The qRT-PCR assays further confirmed that GhPP2-33 could be dramatically upregulated during the salt treatment. And the virus-induced gene silencing (VIGS) experiment proved that the silencing of GhPP2-33 could decrease salt tolerance. Conclusions The results in this study not only offer new perspectives for understanding the evolution of PP2 genes in cotton but also further explore their function under salt stress.

Complete mitochondrial genome of Agrostis stolonifera: insights into structure, Codon usage, repeats, and RNA editing

August 2023

·

24 Reads

Background Plants possess mitochondrial genomes that are large and complex compared to animals. Despite their size, plant mitochondrial genomes do not contain significantly more genes than their animal counterparts. Studies into the sequence and structure of plant mitochondrial genomes heavily imply that the main mechanism driving replication of plant mtDNA, and offer valuable insights into plant evolution, energy production, and environmental adaptation. Results This study presents the first comprehensive analysis of Agrostis stolonifera’s mitochondrial genome, characterized by a branched structure comprising three contiguous chromosomes, totaling 560,800 bp with a GC content of 44.07%. Annotations reveal 33 unique protein-coding genes (PCGs), 19 tRNA genes, and 3 rRNA genes. The predominant codons for alanine and glutamine are GCU and CAA, respectively, while cysteine and phenylalanine exhibit weaker codon usage biases. The mitogenome contains 73, 34, and 23 simple sequence repeats (SSRs) on chromosomes 1, 2, and 3, respectively. Chromosome 1 exhibits the most frequent A-repeat monomeric SSR, whereas chromosome 2 displays the most common U-repeat monomeric SSR. DNA transformation analysis identifies 48 homologous fragments between the mitogenome and chloroplast genome, representing 3.41% of the mitogenome’s total length. The PREP suite detects 460 C-U RNA editing events across 33 mitochondrial PCGs, with the highest count in the ccmFn gene and the lowest in the rps7 gene. Phylogenetic analysis confirms A. stolonifera’s placement within the Pooideae subfamily, showing a close relationship to Lolium perenne, consistent with the APG IV classification system. Numerous homologous co-linear blocks are observed in A. stolonifera’s mitogenomes and those of related species, while certain regions lack homology. Conclusions The unique features and complexities of the A. stolonifera mitochondrial genome, along with its similarities and differences to related species, provide valuable insights into plant evolution, energy production, and environmental adaptation. The findings from this study significantly contribute to the growing body of knowledge on plant mitochondrial genomes and their role in plant biology.

Heatmap showing the percentage of ectopic activations of the 1882 tissue-specific genes encoding for testis, placenta and embryonic stem cells in the total TCGA-BRCA dataset and in breast cancer subtypes. Frequent ectopic activations above the threshold of 10% are presented in red colour map. Infrequent ectopic activations below 10% are shown in blue colour map
Kaplan–Meier survival curves showing disease-free survival probability according to the number of activated genes in the GEC tool for eight breast cancer datasets. A: Training dataset. B-D: Validation datasets. E–H: Test datasets. For each dataset, blue lines show the survival curves for the group of patients in which the corresponding tumours activated 0 or 1 gene in the GEC tool (GEC 0–1). Red lines represent the group of patients in which the tumours activated 2 or more genes (GEC 2–5). The p-values obtained from the logrank test and Cox proportional hazard model as well as the hazard ratios are displayed on the top of each plot. Significance symbols: * for p-value < 0.05, ** for p-value < 0.01, *** for p-value < 0.001
Results of the GEC tool in molecular subtypes of breast cancer. A: Distribution of breast cancer samples for five pooled datasets (TCGA-BRCA, GSE25066, GSE21653, E-MTAB-365 and Yau-2010) in luminal-A, luminal B, HER2-enriched and basal-like subtypes according to the number of activated genes in the GEC panel. The bar plots show the percentage of samples for each GEC group (from GEC 0 to GEC 5) in each molecular subtype. B: Same for the groups GEC 0–1 and GEC 2–5. C-F: Kaplan–Meier survival curves showing disease-free survival probability in luminal-A, luminal B, HER2-enriched and basal-like subtypes, respectively, according to the number of expressed genes in the GEC panel, presented in two groups: GEC 0–1 and GEC 2–5. The p-values obtained from the logrank test and Cox proportional hazard model as well as the hazard ratios are displayed on the top of each plot. Significance symbols: * for p-value < 0.05, ** for p-value < 0.01, *** for p-value < 0.001
Main results of the GSEA analysis for transcriptomic profiles of GEC + versus GEC- tumours in the dataset TCGA-BRCA. A: Kaplan–Meier disease-free survival curves between the group of tumours without GEC ectopic expressions (GEC-) and those with major GEC ectopic expressions of 4 or 5 genes (GEC +). The displayed p-value corresponds to the logrank test between GEC- and GEC + groups. B: Heatmap of the differential expression profiles of GEC + versus GEC- in TCGA-BRCA. The differentially expressed genes used for the heatmap were selected with an adjusted p-value < 0.05 of Mann–Whitney test and abs (ratio) > 1.5. The hierarchical clustering was performed using Euclidian-based distance with Ward’s linkage for samples and Pearson correlation for genes. C: GSEA plots illustrating main enrichment/depletion profiles in GEC + tumours compared to GEC- tumours in the dataset TCGA-BRCA. For all the gene sets, the enrichment or depletion was considered significant with a nominal p-value < 0.05 and FDR < 0.25. The gene sets were selected from the MSigDB database of the Broad Institute (collections C2, C5 or H of the MsigDB)
Gene Set Enrichment Analysis (GSEA) shows consistent molecular signatures of the aggressive GEC + tumours in several breast cancer datasets. The heatmap represents the normalized enrichment score (NES) obtained from the GSEA analysis in ten breast cancer datasets for different genes sets. Significantly enriched gene sets are shown in red colours; significantly depleted gene sets are displayed in blue colours. For all the gene sets, the enrichment or depletion was considered significant with a nominal p-value < 0.05 and FDR < 0.25. Grey cells correspond to non-significant results
Aberrant activation of five embryonic stem cell-specific genes robustly predicts a high risk of relapse in breast cancers

August 2023

·

39 Reads

Background In breast cancer, as in all cancers, genetic and epigenetic deregulations can result in out-of-context expressions of a set of normally silent tissue-specific genes. The activation of some of these genes in various cancers empowers tumours cells with new properties and drives enhanced proliferation and metastatic activity, leading to a poor survival prognosis. Results In this work, we undertook an unprecedented systematic and unbiased analysis of out-of-context activations of a specific set of tissue-specific genes from testis, placenta and embryonic stem cells, not expressed in normal breast tissue as a source of novel prognostic biomarkers. To this end, we combined a strict machine learning framework of transcriptomic data analysis, and successfully created a new robust tool, validated in several independent datasets, which is able to identify patients with a high risk of relapse. This unbiased approach allowed us to identify a panel of five biomarkers, DNMT3B, EXO1, MCM10, CENPF and CENPE, that are robustly and significantly associated with disease-free survival prognosis in breast cancer. Based on these findings, we created a new Gene Expression Classifier (GEC) that stratifies patients. Additionally, thanks to the identified GEC, we were able to paint the specific molecular portraits of the particularly aggressive tumours, which show characteristics of male germ cells, with a particular metabolic gene signature, associated with an enrichment in pro-metastatic and pro-proliferation gene expression. Conclusions The GEC classifier is able to reliably identify patients with a high risk of relapse at early stages of the disease. We especially recommend to use the GEC tool for patients with the luminal-A molecular subtype of breast cancer, generally considered of a favourable disease-free survival prognosis, to detect the fraction of patients undergoing a high risk of relapse.

Identification of CDK gene family and functional analysis of CqCDK15 under drought and salt stress in quinoa

August 2023

·

15 Reads

as one of the oldest cultivated crops in the world, quinoa has been widely valued for its rich nutritional value and green health. In this study, 22 CDK genes (CqCDK01-CqCDK22) were identified from quinoa genome using bioinformatics method. The number of amino acids was 173–811, the molecular weight was 19,554.89 Da-91,375.70 Da, and the isoelectric point was 4.57–9.77. The phylogenetic tree divided 21 CqCDK genes into six subfamilies, the gene structure showed that 12 (54.5%) CqCDK genes (CqCDK03, CqCDK04, CqCDK05, CqCDK06, CqCDK07, CqCDK11, CqCDK14, CqCDK16, CqCDK18, CqCDK19, CqCDK20 and CqCDK21) had UTR regions at 5’ and 3’ ends. Each CDK protein had different motifs (3–9 motifs), but the genes with the same motifs were located in the same branch. Promoter analysis revealed 41 cis-regulatory elements related to plant hormones, abiotic stresses, tissue-specific expression and photoresponse. The results of real-time fluorescence quantitative analysis showed that the expression level of some CDK genes was higher under drought and salt stress, which indicated that CDK genes could help plants to resist adverse environmental effects. Subcellular localization showed that CqCDK15 gene was localized to the nucleus and cytoplasm, and transgenic plants overexpressing CqCDK15 gene showed higher drought and salt tolerance compared to the controls. Therefore, CDK genes are closely related to quinoa stress resistance. In this study, the main functions of quinoa CDK gene family and its expression level in different tissues and organs were analyzed in detail, which provided some theoretical support for quinoa stress-resistant breeding. Meanwhile, this study has important implications for further understanding the function of the CDK gene family in quinoa and our understanding of the CDK family in vascular plant.

Molecular regulatory mechanism of key LncRNAs in subclinical mastitic cows with folic acid supplementation

August 2023

·

47 Reads

Background Folic acid is a water-soluble B vitamin (B9), which is closely related to the body’s immune and other metabolic pathways. The folic acid synthesized by rumen microbes has been unable to meet the needs of high-yielding dairy cows. The incidence rate of subclinical mastitis in dairy herds worldwide ranged between 25%~65% with no obvious symptoms, but it significantly causes a decrease in lactation and milk quality. Therefore, this study aims at exploring the effects of folic acid supplementation on the expression profile of lncRNAs, exploring the molecular mechanism by which lncRNAs regulate immunity in subclinical mastitic dairy cows. Results The analysis identified a total of 4384 lncRNA transcripts. Subsequently, differentially expressed lncRNAs in the comparison of two groups (SF vs. SC, HF vs. HC) were identified to be 84 and 55 respectively. Furthermore, the weighted gene co-expression network analysis (WGCNA) and the KEGG enrichment analysis result showed that folic acid supplementation affects inflammation and immune response-related pathways. The two groups have few pathways in common. One important lncRNA MSTRG.11108.1 and its target genes (ICAM1, CCL3, CCL4, etc.) were involved in immune-related pathways. Finally, through integrated analysis of lncRNAs with GWAS data and animal QTL database, we found that differential lncRNA and its target genes could be significantly enriched in SNPs and QTLs related to somatic cell count (SCC) and mastitis, such as MSTRG.11108.1 and its target gene ICAM1, CXCL3, GRO1. Conclusions For subclinical mastitic cows, folic acid supplementation can significantly affect the expression of immune-related pathway genes such as ICAM1 by regulating lncRNAs MSTRG.11108.1, thereby affecting related immune phenotypes. Our findings laid a ground foundation for theoretical and practical application for feeding folic acid supplementation in subclinical mastitic cows.

Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus

August 2023

·

35 Reads

Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. Results This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. Conclusions Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.

Evidence of positive selection across phylogeny of mammals identified by branch-site model and BUSTED. Long-lived species identified by LQ > 1.57 are highlight in red. Phylogenetic tree of 48 mammals is taken from TimeTree database. Illustrations of long-lived species are taken from phylopic.org. Positively selected genes identified by branch-site model and BUSTED are indicated by pink circles. Convergent amino acid changes occurred in long-lived are shown on the right of the figure (the convergence sites in long-lived species are marked in orange and the remaining are in blue)
Genes with significant evolution signals identified in long-lived species. Positively selected genes, convergent evolution genes and longevity-associated genes are marked in pink, green and blue, respectively. Genes related to cancer and autophagy are shown by gray five-pointed star and gray triangle, respectively
Regression analyses between root-to-tip ω and morphological variables (MLS, LQ and BM)
Evolutionary analysis of the mTOR pathway provide insights into lifespan extension across mammals

August 2023

·

18 Reads

Background Lifespan extension has independently evolved several times during mammalian evolution, leading to the emergence of a group of long-lived animals. Though mammalian/mechanistic target of rapamycin (mTOR) signaling pathway is shown as a central regulator of lifespan and aging, the underlying influence of mTOR pathway on the evolution of lifespan in mammals is not well understood. Results Here, we performed evolution analyses of 72 genes involved in the mTOR network across 48 mammals to explore the underlying mechanism of lifespan extension. We identified a total of 20 genes with significant evolution signals unique to long-lived species, including 12 positively selected genes, four convergent evolution genes, and five longevity associated genes whose evolution rate related to the maximum lifespan (MLS). Of these genes, four positively selected genes, two convergent evolution genes and one longevity-associated gene were involved in the autophagy response and aging-related diseases, while eight genes were known as cancer genes, indicating the long-lived species might have evolved effective regulation mechanisms of autophagy and cancer to extend lifespan. Conclusion Our study revealed genes with significant evolutionary signals unique to long-lived species, which provided new insight into the lifespan extension of mammals and might bring new strategies to extend human lifespan.


A schematic representation of the complete procedure of the challenge. Two offspring groups were produced by mass-crossing selected brooders using two separate spawning. The two batches were grown separately in clean, sand-filtered seawater up to 30 dph (day post-hatch), when they were mixed, and ca. 15 thousand unvaccinated fingerlings were transferred to a floating fish farm located in the coastal area of Singapore. At the farm, fingerlings were acclimatized for a week in filtered, ozonized and UV-treated seawater, then they were transferred into 2,000-L tanks containing aerated raw seawater (flow-through) at ambient temperature (28–30 °C). The mortalities and moribund individuals were collected, counted and archived on a daily basis. The 500 individuals lost during the earliest days of 9 to 19 ddec (days during environmental challenge) served as ‘susceptible’ or ‘sensitive’ individuals. The 750 survivors without any symptoms of pathogen infection at the end of the whole experiment (28 ddec) were randomly sampled as ‘robust’ individuals
The integrated genetic linkage map and distribution of genetic markers along each linkage group of the Asian seabass. Blue, red and yellow vertical lines in the linkage groups represent maternal heterozygous SNPs, paternal heterozygous SNPs, and SNPs heterozygous in both parents, respectively
Genome-wide scan identified a QTL significantly associated with increased robustness on ASBB_LG11. A Genetic location of QTLs associated with increased robustness along the Asian seabass genome. Black horizontal line represents the genome-wide logarithm of odds (LOD) significance threshold of 10.5. X- and Y-axes correspond to the linkage groups and the LOD value, respectively. B Localization of the major QTL associated with increased robustness on ASB_LG11. Map positions and LOD scores are based on a single interval mapping QTL analysis using the software MAPQTL6. The 95% genome-wide LOD significance threshold value was ca. 10.5 (dashed constant line). The QTL had a LOD peak of 12.6 with an interval of 4.7 cM region. Two SNPs (R1-38468 and R1-61252) were identified within the interval. R1-61252 was located at the peak position of 93.6 cM. C The genomic region of mapped QTL and putative causative candidate genes on ASB_LG11. The QTL was located on Superscaffold_35 (LG11) which comprised of four sequences in the following order: unitig_5008|quiver, unitig_1857|quiver, scaffold_60 and unitig_2096|quiver, scaffold_60, unitig_1857|quiver and unitig_5008|quiver. The superscaffold spanned an over 800 kb genomic region, including estimated gaps of 27 kb. Gene symbols of 36 potential causative genes and their distribution on Superscaffold_35, as well as the two SNP markers are indicated
The association of SNP marker R1-61252 to robustness was validated in a different stock of Asian seabass. We found significant differences in genotype frequencies at this SNP between the ‘sensitive’ (red) and ‘robust’ (blue) groups as examined (p < 0.01, Chi-square test). The C/C genotype was nearly four times more frequent in the ‘robust’ group (203/288 individuals) than in the ‘sensitive’ one (46/255); whereas T/T was much less common in the ‘robust’ group (2/288) than in the ‘sensitive’ one (104/255; over 59-fold difference). The C/T genotype was somewhat more common in the ‘sensitive’ group ( (105/255) than in ‘robust’ one (83/288; a 1.4-fold difference)
Three examples of the mis-assembled sequences detected by the ddRAD map. A The first breakpoint (1,676–1,688 kb) within scaffold_18 was detected earlier and confirmed by ddRAD map. This has also been found earlier by synteny analysis, but not by optical mapping. B The second breakpoint (11,315–11,345 kb) was newly detected by the ddRAD map. Both breakpoints were revealed by Illumina short reads from both 500 and 750 insert size libraries as well as long PacBio reads by having low or zero physical coverage of all three libraries. C A breakpoint identified within unitig_4480 by the ddRAD map. The breakpoint was revealed by Illumina short reads from both 500 and 750 short insert size libraries by having no physical coverage as well as high density of repeats located within those few PacBio-based long reads that were mapped there
Mapping of a major QTL for increased robustness and detection of genome assembly errors in Asian seabass (Lates calcarifer)

August 2023

·

40 Reads

Background For Asian seabass (Lates calcarifer, Bloch 1790) cultured at sea cages various aquatic pathogens, complex environmental and stress factors are considered as leading causes of disease, causing tens of millions of dollars of annual economic losses. Over the years, we conducted farm-based challenges by exposing Asian seabass juveniles to complex natural environmental conditions. In one of these challenges, we collected a total of 1,250 fish classified as either ‘sensitive’ or ‘robust’ individuals during the 28-day observation period. Results We constructed a high-resolution linkage map with 3,089 SNPs for Asian seabass using the double digest Restriction-site Associated DNA (ddRAD) technology and a performed a search for Quantitative Trait Loci (QTL) associated with robustness. The search detected a major genome-wide significant QTL for increased robustness in pathogen-infected marine environment on linkage group 11 (ASB_LG11; 88.9 cM to 93.6 cM) with phenotypic variation explained of 81.0%. The QTL was positioned within a > 800 kb genomic region located at the tip of chromosome ASB_LG11 with two Single Nucleotide Polymorphism markers, R1-38468 and R1-61252, located near to the two ends of the QTL. When the R1-61252 marker was validated experimentally in a different mass cross population, it showed a statistically significant association with increased robustness. The majority of thirty-six potential candidate genes located within the QTL have known functions related to innate immunity, stress response or disease. By utilizing this ddRAD-based map, we detected five mis-assemblies corresponding to four chromosomes, namely ASB_LG8, ASB_LG9, ASB_LG15 and ASB_LG20, in the current Asian seabass reference genome assembly. Conclusion According to our knowledge, the QTL associated with increased robustness is the first such finding from a tropical fish species. Depending on further validation in other stocks and populations, it might be potentially useful for selecting robust Asian seabass lines in selection programs.

Overall architecture of CapsNetYY1
ROC curves of CapsNetYY1 models. A ROC curves of cell types HCT116; and B ROC curves of cell types K562 on the training datasets
t-SNE visualization of CapsnetYY1. A and B respectively show the feature visualization after encoding of One-hot, the cell type independent datasets of HCT116. C represents the visualization of the results of digital capsule layer classification under the HCT116 cell type independent datasets. D and E respectively show the feature visualization after encoding of One-hot, the cell type independent datasets of K562. F represents the visualization of the results of digital capsule layer classification under the K562 cell type independent datasets
CapsNetYY1: identifying YY1-mediated chromatin loops based on a capsule network architecture

August 2023

·

12 Reads

Background Previous studies have identified that chromosome structure plays a very important role in gene control. The transcription factor Yin Yang 1 (YY1), a multifunctional DNA binding protein, could form a dimer to mediate chromatin loops and active enhancer-promoter interactions. The deletion of YY1 or point mutations at the YY1 binding sites significantly inhibit the enhancer-promoter interactions and affect gene expression. To date, only a few computational methods are available for identifying YY1-mediated chromatin loops. Results We proposed a novel model named CapsNetYY1, which was based on capsule network architecture to identify whether a pair of YY1 motifs can form a chromatin loop. Firstly, we encode the DNA sequence using one-hot encoding method. Secondly, multi-scale convolution layer is used to extract local features of the sequence, and bidirectional gated recurrent unit is used to learn the features across time steps. Finally, capsule networks (convolution capsule layer and digital capsule layer) used to extract higher level features and recognize YY1-mediated chromatin loops. Compared with DeepYY1, the only prediction for YY1-mediated chromatin loops, our model CapsNetYY1 achieved the better performance on the independent datasets (AUC >0.99\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$> 0.99$$\end{document}). Conclusion The results indicate that CapsNetYY1 is an excellent method for identifying YY1-mediated chromatin loops. We believe that the CapsNetYY1 method will be used for predictive classification of other DNA sequences.

Genome-wide analysis of RNA-chromatin interactions in lizards as a mean for functional lncRNA identification

August 2023

·

28 Reads

Background Long non-coding RNAs (lncRNAs) are defined as transcribed molecules longer than 200 nucleotides with little to no protein-coding potential. LncRNAs can regulate gene expression of nearby genes (cis-acting) or genes located on other chromosomes (trans-acting). Several methodologies have been developed to capture lncRNAs associated with chromatin at a genome-wide level. Analysis of RNA-DNA contacts can be combined with epigenetic and RNA-seq data to define potential lncRNAs involved in the regulation of gene expression. Results We performed Chromatin Associated RNA sequencing (ChAR-seq) in Anolis carolinensis to obtain the genome-wide map of the associations that RNA molecules have with chromatin. We analyzed the frequency of DNA contacts for different classes of RNAs and were able to define cis- and trans-acting lncRNAs. We integrated the ChAR-seq map of RNA-DNA contacts with epigenetic data for the acetylation of lysine 16 on histone H4 (H4K16ac), a mark connected to actively transcribed chromatin in lizards. We successfully identified three trans-acting lncRNAs significantly associated with the H4K16ac signal, which are likely involved in the regulation of gene expression in A. carolinensis. Conclusions We show that the ChAR-seq method is a powerful tool to explore the RNA-DNA map of interactions. Moreover, in combination with epigenetic data, ChAR-seq can be applied in non-model species to establish potential roles for predicted lncRNAs that lack functional annotations.

Temporal transcriptomic profiling elucidates sorghum defense mechanisms against sugarcane aphids

August 2023

·

52 Reads

Background The sugarcane aphid (SCA; Melanaphis sacchari) has emerged as a key pest on sorghum in the United States that feeds from the phloem tissue, drains nutrients, and inflicts physical damage to plants. Previously, it has been shown that SCA reproduction was low and high on sorghum SC265 and SC1345 plants, respectively, compared to RTx430, an elite sorghum male parental line (reference line). In this study, we focused on identifying the defense-related genes that confer resistance to SCA at early and late time points in sorghum plants with varied levels of SCA resistance. Results We used RNA-sequencing approach to identify the global transcriptomic responses to aphid infestation on RTx430, SC265, and SC1345 plants at early time points 6, 24, and 48 h post infestation (hpi) and after extended period of SCA feeding for 7 days. Aphid feeding on the SCA-resistant line upregulated the expression of 3827 and 2076 genes at early and late time points, respectively, which was relatively higher compared to RTx430 and SC1345 plants. Co-expression network analysis revealed that aphid infestation modulates sorghum defenses by regulating genes corresponding to phenylpropanoid metabolic pathways, secondary metabolic process, oxidoreductase activity, phytohormones, sugar metabolism and cell wall-related genes. There were 187 genes that were highly expressed during the early time of aphid infestation in the SCA-resistant line, including genes encoding leucine-rich repeat (LRR) proteins, ethylene response factors, cell wall-related, pathogenesis-related proteins, and disease resistance-responsive dirigent-like proteins. At 7 days post infestation (dpi), 173 genes had elevated expression levels in the SCA-resistant line and were involved in sucrose metabolism, callose formation, phospholipid metabolism, and proteinase inhibitors. Conclusions In summary, our results indicate that the SCA-resistant line is better adapted to activate early defense signaling mechanisms in response to SCA infestation because of the rapid activation of the defense mechanisms by regulating genes involved in monolignol biosynthesis pathway, oxidoreductase activity, biosynthesis of phytohormones, and cell wall composition. This study offers further insights to better understand sorghum defenses against aphid herbivory.

Key biological differences between Microctonus wasps and Hi-C scaffolding results. A Host target and reproductive mechanisms of Microctonus wasps used as biocontrol agents in New Zealand, Hi-C interaction maps for B M. aethiopoides Irish and C M. hyperodae displaying chromosome scaffolds, and D a Circos plot displaying chromosomal synteny between both Hi-C assemblies
A heatmap displaying the meiosis gene inventory of Microctonus wasps, compared to results in other Hymenoptera, Diptera and Coleoptera from Tvedte et al. [51]. Colours in the heatmap indicate the absence or presence (and gene copy number) of each meiosis gene. Core meiosis genes, which are specific to meiosis, and asexual Hymenoptera species are indicated in bold
A Circos plot of the MhFV genome assembly. Predicted genes are indicated in the yellow and green blocks, on the positive and negative strand respectively. Normalised gene expression is displayed in the heatmap, where higher expression is indicated by darker red, and with the six rings representing tissues from the outside inwards as follows; pupa, head, thorax, abdomen, ovaries and venom gland. Dashed black lines indicate SNP variants detected throughout the MhFV genome. Relative read depth for Illumina and ONT MinION sequencing is indicated in blue and teal respectively, calculated with a sliding window with a size of 500 bp and slide of 100 bp
Phylogenetic analysis of nuclear arthropod-specific large double-stranded DNA viruses (NALDVs). Relationships were derived using a maximum likelihood analysis with RAxML-NG, from 12 core NALDV genes, as defined by Burke et al., [65], with a total of 6818 characters from concatenated amino acid sequences. Bootstrap branch support values over 50% are indicated on relevant branches. Species names are abbreviated as follows; Apis mellifera filamentous virus (AmFV), White spot syndrome virus (WSSV), Chionoecetes opilio bacilliform virus (CoBV), Culex nigripalpus nucleopolyhedrovirus (CnNPV), Neodiprion sertifer nucleopolyhedrovirus (NsNPV), Cydia pomonella granulovirus (CpGV), Autographa californica multiple nucleopolyhedrovirus (AcMNPV), Gryllus bimaculatus nudivirus (GbNV), Oryctes rhinoceros nudivirus (OrNV), Drosophila innubila nudivirus (DiNV), Tipula oleracea nudivirus (ToNV), Helicoverpa zea nudivirus 2 (HzNV-2), Penaeus monodon nudivirus (PmNV), Musca domestica salivary gland hypertrophy virus (MdSGHV), Glossina pallidipes salivary gland hypertrophy virus (GpSGHV), Leptopilina boulardi filamentous virus (LbFV), Drosophila-associated filamentous virus (DaFV), and Microctonus hyperodae filamentous virus (MhFV)
Chromosome-level genome assemblies of two parasitoid biocontrol wasps reveal the parthenogenesis mechanism and an associated novel virus

August 2023

·

60 Reads

Background Biocontrol is a key technology for the control of pest species. Microctonus parasitoid wasps (Hymenoptera: Braconidae) have been released in Aotearoa New Zealand as biocontrol agents, targeting three different pest weevil species. Despite their value as biocontrol agents, no genome assemblies are currently available for these Microctonus wasps, limiting investigations into key biological differences between the different species and strains. Methods and findings Here we present high-quality genomes for Microctonus hyperodae and Microctonus aethiopoides , assembled with short read sequencing and Hi-C scaffolding. These assemblies have total lengths of 106.7 Mb for M. hyperodae and 129.2 Mb for M. aethiopoides , with scaffold N50 values of 9 Mb and 23 Mb respectively. With these assemblies we investigated differences in reproductive mechanisms, and association with viruses between Microctonus wasps. Meiosis-specific genes are conserved in asexual Microctonus , with in-situ hybridisation validating expression of one of these genes in the ovaries of asexual Microctonus aethiopoides . This implies asexual reproduction in these Microctonus wasps involves meiosis, with the potential for sexual reproduction maintained. Investigation of viral gene content revealed candidate genes that may be involved in virus-like particle production in M. aethiopoides , as well as a novel virus infecting M. hyperodae , for which a complete genome was assembled. Conclusion and significance These are the first published genomes for Microctonus wasps which have been deployed as biocontrol agents, in Aotearoa New Zealand. These assemblies will be valuable resources for continued investigation and monitoring of these biocontrol systems. Understanding the biology underpinning Microctonus biocontrol is crucial if we are to maintain its efficacy, or in the case of M. hyperodae to understand what may have influenced the significant decline of biocontrol efficacy. The potential for sexual reproduction in asexual Microctonus is significant given that empirical modelling suggests this asexual reproduction is likely to have contributed to biocontrol decline. Furthermore the identification of a novel virus in M. hyperodae highlights a previously unknown aspect of this biocontrol system, which may contribute to premature mortality of the host pest. These findings have potential to be exploited in future in attempt to increase the effectiveness of M. hyperodae biocontrol.

Optimized bisulfite sequencing analysis reveals the lack of 5-methylcytosine in mammalian mitochondrial DNA

August 2023

·

23 Reads

Background DNA methylation is one of the best characterized epigenetic modifications in the mammalian nuclear genome and is known to play a significant role in various biological processes. Nonetheless, the presence of 5-methylcytosine (5mC) in mitochondrial DNA remains controversial, as data ranging from the lack of 5mC to very extensive 5mC have been reported. Results By conducting comprehensive bioinformatic analyses of both published and our own data, we reveal that previous observations of extensive and strand-biased mtDNA-5mC are likely artifacts due to a combination of factors including inefficient bisulfite conversion, extremely low sequencing reads in the L strand, and interference from nuclear mitochondrial DNA sequences (NUMTs). To reduce false positive mtDNA-5mC signals, we establish an optimized procedure for library preparation and data analysis of bisulfite sequencing. Leveraging our modified workflow, we demonstrate an even distribution of 5mC signals across the mtDNA and an average methylation level ranging from 0.19% to 0.67% in both cell lines and primary cells, which is indistinguishable from the background noise. Conclusions We have developed a framework for analyzing mtDNA-5mC through bisulfite sequencing, which enables us to present multiple lines of evidence for the lack of extensive 5mC in mammalian mtDNA. We assert that the data available to date do not support the reported presence of mtDNA-5mC.

Genome-wide identification and expression analyses of the pectate lyase (PL) gene family in Fragaria vesca

August 2023

·

51 Reads

Background Pectate lyase (PL, EC 4.2.2.2), as an endo-acting depolymerizing enzyme, cleaves α-1,4-glycosidic linkages in esterified pectin and involves a broad range of cell wall modifications. However, the knowledge concerning the genome-wide analysis of the PL gene family in Fragaria vesca has not been thoroughly elucidated. Results In this study, sixteen PLs members in F. vesca were identified based on a genome-wide investigation. Substantial divergences existed among FvePLs in gene duplication, cis-acting elements, and tissue expression patterns. Four clusters were classified according to phylogenetic analysis. FvePL6, 8 and 13 in cluster II significantly contributed to the significant expansions during evolution by comparing orthologous PL genes from Malus domestica, Solanum lycopersicum, Arabidopsis thaliana, and Fragaria×ananassa. The cis-acting elements implicated in the abscisic acid signaling pathway were abundant in the regions of FvePLs promoters. The RNA-seq data and in situ hybridization revealed that FvePL1, 4, and 7 exhibited maximum expression in fruits at twenty days after pollination, whereas FvePL8 and FvePL13 were preferentially and prominently expressed in mature anthers and pollens. Additionally, the co-expression networks displayed that FvePLs had tight correlations with transcription factors and genes implicated in plant development, abiotic/biotic stresses, ions/Ca²⁺, and hormones, suggesting the potential roles of FvePLs during strawberry development. Besides, histological observations suggested that FvePL1, 4 and 7 enhanced cell division and expansion of the cortex, thus negatively influencing fruit firmness. Finally, FvePL1-RNAi reduced leaf size, altered petal architectures, disrupted normal pollen development, and rendered partial male sterility. Conclusion These results provide valuable information for characterizing the evolution, expansion, expression patterns and functional analysis, which help to understand the molecular mechanisms of the FvePLs in the development of strawberries.

Transcriptome profiling analysis of uterus during chicken laying periods

August 2023

·

17 Reads

The avian eggshell is formed in the uterus. Changes in uterine function may have a significant effect on eggshell quality. To identify the vital genes impacting uterine functional maintenance in the chicken, uteri in three different periods (22W, 31W, 51W) were selected for RNA sequencing and bioinformatics analysis. In our study, 520, 706 and 736 differentially expressed genes (DEGs) were respectively detected in the W31 vs W22 group, W51 vs W31 group and W51 vs W22 group. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis indicated DEGs were enriched in the extracellular matrix, extracellular region part, extracellular region, extracellular matrix structural constituent, ECM receptor interaction, collagen-containing extracellular matrix and collagen trimer in the uterus (P < 0.05). Protein–protein interaction analysis revealed that FN1, LOX, THBS2, COL1A1, COL1A2, COL5A1, COL5A2, POSTN, MMP13, VANGL2, RAD54B, SPP1, SDC1, BTC, ANGPTL3 might be key candidate genes for uterine functional maintenance in chicken. This study discovered dominant genes and pathways which enhanced our knowledge of chicken uterine functional maintenance.

Sampling and genomic landscape of the divergence of Tibetan and Jiangnan cashmere goat breeds. (A) Map of cashmere goat sampling. The orange and red areas in the figure are the sampling areas of Jiangnan and Tibetan cashmere goat, respectively. In the map, a is Aksu Prefecture; b, c, and d are Ritu County, Gaize County, and Nima County, respectively. (B) Principal components of cashmere goat samples. (C) Phylogenetic tree of 64 cashmere goats. (D) FST in 5-kb sliding windows across autosomes between Tibetan and Jiangnan cashmere goats. (E) FST Pi selection elimination analysis. The X-axis of the scatter diagram is the log2(Pi (π) ratio) value and the Y-axis is the FST value, which correspond to the frequency distribution diagram above the scatter diagram and the frequency distribution diagram on the right, respectively. The area of red scatter points is the top 5% area selected by Pi (π) ratio and FST.
transcriptomic analysis of Jiangnan and Tibetan cashmere goat skin tissue. (A) Clustering tree of 15 Jiangnan or Tibetan cashmere goat skin transcriptomes. (B) Expression patterns of genes within different color modules in skin tissues of Tibetan and Jiangnan cashmere goats. The values in the rectangles represent the correlation between gene expression patterns in different modules and breeds. (C) Volcano plot of DEGs. (D) GO enrichment analysis of DEGs.
SNPs or candidate genes associated with cashmere fineness. (A) Statistics of three cashmere fineness phenotypes in Tibetan and Jiangnan cashmere goats. (B) GWAS of cashmere fineness, and selective sweeps analysis of trait-associated candidate genes
Functional genomic basis of differences in melanin synthesis between Tibetan and Jiangnan cashmere goats. (A) FST of all SNPs in chromosome 22 between Tibetan and Jiangnan cashmere goats. (B) Schematic overview of the melanogenesis pathway. The schematic was drew based on KEGG imagery [62–64]. (C) FST and Pi (π) ratio of MITF gene on chromosome 22
Drivers of plateau adaptability in cashmere goats revealed by genomic and transcriptomic analyses

August 2023

·

17 Reads

Background The adaptive evolution of plateau indigenous animals is a current research focus. However, phenotypic adaptation is complex and may involve the interactions between multiple genes or pathways, many of which remain unclear. As a kind of livestock with important economic value, cashmere goat has a high ability of plateau adaptation, which provides us with good materials for studying the molecular regulation mechanism of animal plateau adaptation. Results In this study, 32 Jiangnan (J) and 32 Tibetan (T) cashmere goats were sequenced at an average of 10. Phylogenetic, population structure, and linkage disequilibrium analyses showed that natural selection or domestication has resulted in obvious differences in genome structure between the two breeds. Subsequently, 553 J vs. T and 608 T vs. J potential selected genes (PSGs) were screened. These PSGs showed potential relationships with various phenotypes, including myocardial development and activity (LOC106502520, ATP2A2, LOC102181869, LOC106502520, MYL2, ISL1, and LOC102181869 genes), pigmentation (MITF and KITLG genes), hair follicles/hair growth (YAP1, POGLUT1, AAK1, HES1, WNT1, PRKAA1, TNKS, WNT5A, VAX2, RSPO4, CSNK1G1, PHLPP2, CHRM2, PDGFRB, PRKAA1, MAP2K1, IRS1, LPAR1, PTEN, PRLR, IBSP, CCNE2, CHAD, ITGB7, TEK, JAK2, and FGF21 genes), and carcinogenesis (UBE2R2, PIGU, DIABLO, NOL4L, STK3, MAP4, ADGRG1, CDC25A, DSG3, LEPR, PRKAA1, IKBKB, and ABCG2 genes). Phenotypic analysis showed that Tibetan cashmere goats has finer cashmere than Jiangnan cashmere goats, which may allow cashmere goats to better adapt to the cold environment in the Tibetan plateau. Meanwhile, KRTs and KAPs expression in Jiangnan cashmere goat skin was significantly lower than in Tibetan cashmere goat. Conclusions The mutations in these PSGs maybe closely related to the plateau adaptation ability of cashmere goats. In addition, the expression differences of KRTs and KAPs may directly determine phenotypic differences in cashmere fineness between the two breeds. In conclusion, this study provide a reference for further studying plateau adaptive mechanism in animals and goat breeding.

Influence of scat ageing on the gut microbiome: how old is too old?

July 2023

·

18 Reads

Background The study of the host-microbiome by the collection of non-invasive samples has the potential to become a powerful tool for conservation monitoring and surveillance of wildlife. However, multiple factors can bias the quality of data recovered from scats, particularly when field-collected samples are used given that the time of defecation is unknown. Previous studies using scats have shown that the impact of aerobic exposure on the microbial composition is species-specific, leading to different rates of change in microbial communities. However, the impact that this aging process has on the relationship between the bacterial and fungal composition has yet to be explored. In this study, we measured the effects of time post-defecation on bacterial and fungal compositions in a controlled experiment using scat samples from the endangered koala (Phascolarctos cinereus). Results We found that the bacterial composition remained stable through the scat aging process, while the fungal composition did not. The absence of an increase in facultative anaerobes and the stable population of obligate anaerobic bacteria were likely due to our sampling from the inner portion of the scat. We report a cluster of fungal taxa that colonises scats after defecation which can dilute the genetic material from the autochthonous mycoflora and inhibit recovery. Conclusion We emphasize the need to preserve the integrity of scat samples collected in the wild and combat the effects of time and provide strategies for doing so.

m7G methylation (transcriptome-wide) and role of lncRNAs in the two cells. (A) Venn diagram of m7G methylation sites recognized in lncRNAs from the two cells. (B) Venn diagram of m7G genes in the two cells. (C) The sequence motif of m7G sites in HL-60 cells. (D) The sequence motif of m7G sites in MX2/ HL60 cells. (E) Percentages of lncRNAs containing different quantities of m7G peaks in the two cells; most lncRNAs contain one m7G peak only. (F) A schematic showing the pathways and biological functions affected and how they differ in drug-resistant versus sensitive AML [26]
A and B. Pie chart of the source of methylated lncRNA in the two cells. C. Visualized chromosome-level distribution of m7G in lncRNAs in the two cells
The differentiated expressions of lncRNAs categorized based on m7G methylation. (A) Differential expression of lncRNAs in the two cells. (B) Cumulative distribution of lncRNA expressions in the two cells for genes with upregulated m7G (red) and downregulated m5C (green); blue denotes others
GO analysis of m7G genes in lncRNAs of HL60/MX2 cells. (A-C) The leading ten significantly enriched GO terms for (A) BPs, (B) CCs, and (C) MFs in up-methylated m7G genes of HL60/MX2 cells. (D–E) The leading ten significantly enriched GO terms of (D) BPs, (E) CCs, and (F) MFs in down-methylated m7G genes of MX2/ HL60 cells
KEGG analysis of m7G genes in lncRNAs of MX2/ HL60 cells. (A) Top 10 significantly enriched pathways for up-methylated m7g genes in MX2/ HL60 cells. (B) Top 10 significantly enriched pathways for down-methylated m7G genes in MX2/ HL60 cells
Landscape of internal N7-methylguanosine of long non-coding RNA modifications in resistant acute myeloid leukemia

July 2023

·

14 Reads

Background Growing evidence indicates that RNA methylation plays a fundamental role in epigenetic regulation, which is associated with the tumorigenesis and drug resistance. Among them, acute myeloid leukemia (AML), as the top acute leukemia for adults, is a deadly disease threatening human health. Although N7-methylguanosine (m7G) has been identified as an important regulatory modification, its distribution has still remained elusive. Methods The present study aimed to explore the long non-coding RNA (lncRNA) functional profile of m7G in AML and drug-resistant AML cells. The transcriptome-wide m7G methylation of lncRNA was analyzed in AML and drug-resistant AML cells. RNA MeRIP-seq was performed to identify m7G peaks on lncRNA and differences in m7G distribution between AML and drug-resistant AML cells. The Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted to predict the possible roles and m7G-associated pathway. Results Using m7G peak sequencing, it was found that a sequence motif was necessary for m7G methylation in drug-resistant AML lncRNA. Unsupervised hierarchical cluster analysis confirmed that lncRNA m7G methylation occurred more frequently in drug-resistant AML cells than in AML cells. RNA sequencing demonstrated that more genes were upregulated by methylation in drug-resistant AML cells, while methylation downregulated more genes in AML cells. The GO and KEGG pathway enrichment analyses revealed that genes having a significant correlation with m7G sites in lncRNA were involved in drug-resistant AML signaling pathways. Conclusion Significant differences in the levels and patterns of m7G methylation between drug-resistant AML cells and AML cells were revealed. Furthermore, the cellular functions potentially influenced by m7G in drug-resistant AML cells were predicted, providing evidence implicating m7G-mediated lncRNA epigenetic regulation in the progression of drug resistance in AML. These findings highlight the involvement of m7G in the development of drug resistance in AML.

Molecular diagnostic yield and monogenic variants inheritance in 103 families with primary or secondary microcephaly. A, Number and percentage of 103 patients in which P/LP level variants in a monogenic cause (49/103, 47.6%), VUS level variants in a monogenic cause (6/103, 5.8%), a known pathogenic CNV (15.5%) or a newly reported or potential new candidate gene (7.8%) was detected by whole-exome sequencing. Blue color denotes that a variant in genes with established disease phenotypes in humans was detected. Green color denotes that a pathogenic or likely pathogenic CNV was detected. Red was chosen if one highly potential novel gene was detected in the family. Yellow indicates no causative or candidate variants were detected. VUS, variant of uncertain significance; LP, likely pathogenic; P, pathogenic. B, Inheritance confirmation of sequence variants (SVs) in 55 patients with a known monogenic cause. Segregation validation revealed 36/55 (65.5%) were diagnosed with autosomal dominant (AD) due to de novo variants, 4/55 (7.3%) with X-linked dominant (XLD) de novo, 2/55 (3.6%) inherited AD form, 10/102 (18.2%) with autosomal recessive form, 3/55(5.5%) with X-linked recessive mode (XLR)
The expressional trajectories in developing cerebral cortex and functional network of known and candidate genes in this cohort. A, Expressional trajectories of 8 candidate genes identified in our cohort in neocortical cell types during mouse cortical development. Data was generated from single cell RNA-Seq data provided by Paola Arlotta (Nature 595:554–559,2021). CPN, callosal projection neurons; CThPN, corticothalamic projection neurons; SCPN, subcerebral projection neurons; VLMC, vascular and leptomeningeal cells. B, Functional network from GeneMANIA showing the interaction among the mutated known genes and eight candidate genes (highlighted with red star) in our cohort. Analysis was based on co-expression (purple edges), physical interactions (red edges), co-localization (dark blue), shared protein domains (light yellow edges), pathway (light blue edges), and genetic interactions (green edges)
WES identified compound heterozygous variants in PWP2 in a male patient with primary microcephaly and global developmental delay. A, Sequencing chromatograms of the compound heterozygous variants c.1457G > A;p.W486Ter and c.1979G > A;p.R660Q of the PWP2 gene. B, Brain MRI of patient NJ1050 showing cortical atrophy. C, Protein domain content of PWP2 protein. The p.R660Q missense change detected in the family is mapped to the WD40 domain. D, Transient overexpression of N-terminally Flag-tagged cDNA constructs modeling the wild-type allele and two independent PWP2 variants (p.Arg660Gln and p.Trp486Ter) in HEK293 cells. The p.Trp486Ter variant produced a lower mount band with decreased expression. E, Multiple sequence alignment of the PWP2 protein region flanking residue Arg660. The Arg660 is well conserved from Homo sapiens to Drosophila melanogaster. F, HEK293 cells was transfected with N-terminal Flag-tagged wild-type PWP2 or mutants. Cells were imaged by confocal microscopy. Representative images of Flag-tagged protein and DAPI localization are shown, revealing that wild-type PWP2 localizes to nuclear, while patient mutants mis-localized diffusely to the cytoplasm
A loss-of-function CCND2 variant in a female patient with primary microcephaly and short stature. A, Growth charts tracking the height measurements of patient NJ3099 on female growth curve. B, Head circumference measurements on growth curve and brain MRI of NJ3099. C, Schematic representation of the CCND2 gene showing the localization of truncating variants that subjecting (NMD +) and escaping (NMD-) to nonsense-mediated RNA decay regions (https://nmdpredictions.shinyapps.io/shiny/). Six previously reported gain of function variants in patients with megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome are shown in blue. The loss-of-function variant p.Gln169* detected in this study is shown in red. The previously published four CCND2 loss of function variants are shown in black. The region of CCND2 where truncating variants trigger NMD is indicated in red. The region that escape NMD are represented in green. The nonstop decay region is indicated in yellow. D, Depicts the protein structure of CCND2, which contains two cyclin-like domains. Sequencing chromatograms of the heterozygous de novo CCND2 variant in the proband and WT sequence detected in the parents. The index family’s heterozygous CCND2 variant c.505C > T leads to a premature stop codon resulting in p.Gln169Ter. E, Transient overexpression of N-terminally Flag-tagged cDNA constructs modeling the wild-type allele and mutant CCND2 (p.Gln169Ter) in HEK293 cells. Protein extracts from transfected cells treated with cycloheximide (CHX) at time point 0, 2 h and 4 h were analyzed by western blotting with an antibody to Flag. The p.Gln169Ter variant produced a lower mount band with decreased expression and showed a drop in protein levels after inhibition of protein translation by CHX
Overall workflow of our WES pipeline. CNV, copy-number variation; Indel, insertion/deletion; PM, primary microcephaly; SM, secondary microcephaly; SNV, single-nucleotide variant; WES, whole-exome sequencing
Diagnostic yield and novel candidate genes for neurodevelopmental disorders by exome sequencing in an unselected cohort with microcephaly

July 2023

·

17 Reads

Objectives Microcephaly is caused by reduced brain volume and most usually associated with a variety of neurodevelopmental disorders (NDDs). To provide an overview of the diagnostic yield of whole exome sequencing (WES) and promote novel candidates in genetically unsolved families, we studied the clinical and genetic landscape of an unselected Chinese cohort of patients with microcephaly. Methods We performed WES in an unselected cohort of 103 NDDs patients with microcephaly as one of the features. Full evaluation of potential novel candidate genes was applied in genetically undiagnosed families. Functional validations of selected variants were conducted in cultured cells. To augment the discovery of novel candidates, we queried our genomic sequencing data repository for additional likely disease-causing variants in the identified candidate genes. Results In 65 families (63.1%), causative sequence variants (SVs) and clinically relevant copy number variants (CNVs) with a pathogenic or likely pathogenic (P/LP) level were identified. By incorporating coverage analysis to WES, a pathogenic or likely pathogenic CNV was detected in 15 families (16/103, 15.5%). In another eight families (8/103, 7.8%), we identified variants in newly reported gene (CCND2) and potential novel neurodevelopmental disorders /microcephaly candidate genes, which involved in cell cycle and division (PWP2, CCND2), CDC42/RAC signaling related actin cytoskeletal organization (DOCK9, RHOF), neurogenesis (ELAVL3, PPP1R9B, KCNH3) and transcription regulation (IRF2BP1). By looking into our data repository of 5066 families with NDDs, we identified additional two cases with variants in DOCK9 and PPP1R9B, respectively. Conclusion Our results expand the morbid genome of monogenic neurodevelopmental disorders and support the adoption of WES as a first-tier test for individuals with microcephaly.

Gene co-expression network analysis identifies hub genes associated with different tolerance under calcium deficiency in two peanut cultivars

July 2023

·

47 Reads

Background Peanut is an economically-important oilseed crop and needs a large amount of calcium for its normal growth and development. Calcium deficiency usually leads to embryo abortion and subsequent abnormal pod development. Different tolerance to calcium deficiency has been observed between different cultivars, especially between large and small-seed cultivars. Results In order to figure out different molecular mechanisms in defensive responses between two cultivars, we treated a sensitive (large-seed) and a tolerant (small-seed) cultivar with different calcium levels. The transcriptome analysis identified a total of 58 and 61 differentially expressed genes (DEGs) within small-seed and large-seed peanut groups under different calcium treatments, and these DEGs were entirely covered by gene modules obtained via weighted gene co-expression network analysis (WGCNA). KEGG enrichment analysis showed that the blue-module genes in the large-seed cultivar were mainly enriched in plant-pathogen attack, phenolic metabolism and MAPK signaling pathway, while the green-module genes in the small-seed cultivar were mainly enriched in lipid metabolism including glycerolipid and glycerophospholipid metabolisms. By integrating DEGs with WGCNA, a total of eight hub-DEGs were finally identified, suggesting that the large-seed cultivar concentrated more on plant defensive responses and antioxidant activities under calcium deficiency, while the small-seed cultivar mainly focused on maintaining membrane features to enable normal photosynthesis and signal transduction. Conclusion The identified hub genes might give a clue for future gene validation and molecular breeding to improve peanut survivability under calcium deficiency.

The construction of multi-source heterogeneous network (MHN) of ncRNA-gene-disease
An illustration of the GDCL-NcDA framework. A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{A}$$\end{document} The multi-source deep graph learning is to obtain significance within similarity network and encode every similarity network. B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}$$\end{document} The multichannel attention mechanism is performed to obtain significance among diverse similarity networks. The reconstruction of association graph (matrix) for downstream predictive task. C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{C}$$\end{document} The DMF for final identification task based on reformulated association score matrix. The contrastive loss generated on the reconstructed graph and predicted graph
The AUC for parameter analysis of GDCL-NcDA on miRNA-disease MHN
GDCL-NcDA: identifying non-coding RNA-disease associations via contrastive learning between deep graph learning and deep matrix factorization

July 2023

·

24 Reads

Non-coding RNAs (ncRNAs) draw much attention from studies widely in recent years because they play vital roles in life activities. As a good complement to wet experiment methods, computational prediction methods can greatly save experimental costs. However, high false-negative data and insufficient use of multi-source information can affect the performance of computational prediction methods. Furthermore, many computational methods do not have good robustness and generalization on different datasets. In this work, we propose an effective end-to-end computing framework, called GDCL-NcDA, of deep graph learning and deep matrix factorization (DMF) with contrastive learning, which identifies the latent ncRNA-disease association on diverse multi-source heterogeneous networks (MHNs). The diverse MHNs include different similarity networks and proven associations among ncRNAs (miRNAs, circRNAs, and lncRNAs), genes, and diseases. Firstly, GDCL-NcDA employs deep graph convolutional network and multiple attention mechanisms to adaptively integrate multi-source of MHNs and reconstruct the ncRNA-disease association graph. Then, GDCL-NcDA utilizes DMF to predict the latent disease-associated ncRNAs based on the reconstructed graphs to reduce the impact of the false-negatives from the original associations. Finally, GDCL-NcDA uses contrastive learning (CL) to generate a contrastive loss on the reconstructed graphs and the predicted graphs to improve the generalization and robustness of our GDCL-NcDA framework. The experimental results show that GDCL-NcDA outperforms highly related computational methods. Moreover, case studies demonstrate the effectiveness of GDCL-NcDA in identifying the associations among diversiform ncRNAs and diseases.

Identification of DEcircRNAs, DElncRNAs, DEmiRNAs and DEmRNAs between CAVD and the normal controls. (A) Workflow of bioinformatics analysis. (B-E) Heatmaps of DEcircRNAs, DElncRNAs, DEmiRNAs and DEmRNAs. (F-I) Volcano plots of DEcircRNAs, DElncRNAs, DEmiRNAs and DEmRNAs.
The circRNA/lncRNA–miRNA–hub gene network and validation of the hub genes, miRNAs and circRNAs. (A) The circRNA/lncRNA–miRNA–hub gene network was constructed according to the ceRNA hypothesis and a subnetwork confirmed by qRT-PCR. Red and yellow represent up-regulation and blue means down-regulation. (B) GSE148219 and GSE76718 were used to validate the hub genes, and the intensity of differential expression is shown as a bar graph. (C) The relative expression of hub genes, miRNAs and circRNAs by qRT-PCR. Results are expressed as fold changes of control group
Immune Cell Patterns in CAVD. (A) Workflow of the scRNA-seq data analysis. (B) Pattern of immune cell infiltration in CAVD. (C) Correlation analysis between expression levels of SPP1, HMOX1, CD28 and immune cells. (D) The t-SNE projection cluster scatter diagram of the combined CAVD and health samples in PRJNA562645, with 6 identified cell types. (E) Volcano plot of DEGs in macrophages between CAVD and health. *p < 0.05; **p < 0.01; ****p < 0.0001
CAVD-related DEGs in VICs and VECs. (A) Volcano plot of the DEGs between VICs cultured with OSM for 20 and 3 days (from GSE101155). (B) Volcano plot of DEGs in VICs between CAVD and health (from PRJNA562645). (C) The intensity of the differential expression of the overlapped DEGs in VICs. (D-F) Volcano plot of the DEGs in fVECs and vVECs between OS and LS (from GSE26953). (G) Volcano plot of DEGs in VECs between CAVD and health (from PRJNA562645). (H) The intensity of the differential expression of the overlapped DEGs in VECs.
CAVD-related DEGs between BAVs and TAVs, and different genders. (A) Workflow of the differential analysis. (B) Volcano plot of the DEGs between calcified cBAVs and cTAVs from GSE76718 and GSE148219. (C) The intensity of the differential expression of the overlapped DEGs between cBAVs and cTAVs. (D) Volcano plot of the DEGs between women and men with CAVD (from GSE102249). (E) The intensity of the differential expression of gender-specific DEGs in CAVD.
CircRNA/lncRNA–miRNA–mRNA network and gene landscape in calcific aortic valve disease

July 2023

·

25 Reads

Background Calcific aortic valve disease (CAVD) is a common valve disease with an increasing incidence, but no effective drugs as of yet. With the development of sequencing technology, non-coding RNAs have been found to play roles in many diseases as well as CAVD, but no circRNA/lncRNA–miRNA–mRNA interaction axis has been established. Moreover, valve interstitial cells (VICs) and valvular endothelial cells (VECs) play important roles in CAVD, and CAVD differed between leaflet phenotypes and genders. This work aims to explore the mechanism of circRNA/lncRNA–miRNA–mRNA network in CAVD, and perform subgroup analysis on the important characteristics of CAVD, such as key cells, leaflet phenotypes and genders. Results We identified 158 differentially expressed circRNAs (DEcircRNAs), 397 DElncRNAs, 45 DEmiRNAs and 167 DEmRNAs, and constructed a hsa-circ-0073813/hsa-circ-0027587–hsa-miR-525-5p–SPP1/HMOX1/CD28 network in CAVD after qRT-PCR verification. Additionally, 17 differentially expressed genes (DEGs) in VICs, 9 DEGs in VECs, 7 DEGs between different leaflet phenotypes and 24 DEGs between different genders were identified. Enrichment analysis suggested the potentially important pathways in inflammation and fibro-calcification during the pathogenesis of CAVD, and immune cell patterns in CAVD suggest that M0 macrophages and memory B cells memory were significantly increased, and many genes in immune cells were also differently expressed. Conclusions The circRNA/lncRNA–miRNA–mRNA interaction axis constructed in this work and the DEGs identified between different characteristics of CAVD provide a direction for a deeper understanding of CAVD and provide possible diagnostic markers and treatment targets for CAVD in the future.

MicroRNA-668-3p inhibits myoblast proliferation and differentiation by targeting Appl1

July 2023

·

41 Reads

Background Skeletal muscle is the largest tissue in the body, and it affects motion, metabolism and homeostasis. Skeletal muscle development comprises myoblast proliferation, fusion and differentiation to form myotubes, which subsequently form mature muscle fibres. This process is strictly regulated by a series of molecular networks. Increasing evidence has shown that noncoding RNAs, especially microRNAs (miRNAs), play vital roles in regulating skeletal muscle growth. Here, we showed that miR-668-3p is highly expressed in skeletal muscle. Methods Proliferating and differentiated C2C12 cells were transfected with miR-668-3p mimics and/or inhibitor, and the mRNA and protein levels of its target gene were evaluated by RT‒qPCR and Western blotting analysis. The targeting of Appl1 by miR-668-3p was confirmed by dual luciferase assay. The interdependence of miR-668-3p and Appl1 was verified by cotransfection of C2C12 cells. Results Our data reveal that miR-668-3p can inhibit myoblast proliferation and myogenic differentiation. Phosphotyrosine interacting with PH domain and leucine zipper 1 (Appl1) is a target gene of miR-668-3p, and it can promote myoblast proliferation and differentiation by activating the p38 MAPK pathway. Furthermore, the inhibitory effect of miR-668-3p on myoblast cell proliferation and myogenic differentiation could be rescued by Appl1. Conclusion Our results indicate a new mechanism by which the miR-668-3p/Appl1/p38 MAPK pathway regulates skeletal muscle development.

Genetic diversity and genome-wide association study of 13 agronomic traits in 977 Beta vulgaris L. germplasms

July 2023

·

107 Reads

Background Sugar beet (Beta vulgaris L.) is an economically essential sugar crop worldwide. Its agronomic traits are highly diverse and phenotypically plastic, influencing taproot yield and quality. The National Beet Medium-term Gene Bank in China maintains more than 1700 beet germplasms with diverse countries of origin. However, it lacks detailed genetic background associated with morphological variability and diversity. Results Here, a comprehensive genome-wide association study (GWAS) of 13 agronomic traits was conducted in a panel of 977 sugar beet accessions. Almost all phenotypic traits exhibited wide genetic diversity and high coefficient of variation (CV). A total of 170,750 high-quality single-nucleotide polymorphisms (SNPs) were obtained using the genotyping-by-sequencing (GBS). Neighbour-joining phylogenetic analysis, principal component analysis, population structure and kinship showed no obvious relationships among these genotypes based on subgroups or regional sources. GWAS was carried out using a mixed linear model, and 159 significant associations were detected for these traits. Within the 25 kb linkage disequilibrium decay of the associated markers, NRT1/PTR FAMILY 6.3 (BVRB_5g097760); nudix hydrolase 15 (BVRB_8g182070) and TRANSPORT INHIBITOR RESPONSE 1 (BVRB_8g181550); transcription factor MYB77 (BVRB_2g023500); and ethylene-responsive transcription factor ERF014 (BVRB_1g000090) were predicted to be strongly associated with the taproot traits of root groove depth (RGD); root shape (RS); crown size (CS); and flesh colour (FC), respectively. For the aboveground traits, UDP-glycosyltransferase 79B6 (BVRB_9g223780) and NAC domain-containing protein 7 (BVRB_5g097990); F-box protein At1g10780 (BVRB_6g140760); phosphate transporter PHO1 (BVRB_3g048660); F-box protein CPR1 (BVRB_8g181140); and transcription factor MYB77 (BVRB_2g023500) and alcohol acyltransferase 9 (BVRB_2g023460) might be associated with the hypocotyl colour (HC); plant type (PT); petiole length (PL); cotyledon size (C); and fascicled leaf type (FLT) of sugar beet, respectively. AP-2 complex subunit mu (BVRB_5g106130), trihelix transcription factor ASIL2 (BVRB_2g041790) and late embryogenesis abundant protein 18 (BVRB_5g106150) might be involved in pollen quantity (PQ) variation. The candidate genes extensively participated in hormone response, nitrogen and phosphorus transportation, secondary metabolism, fertilization and embryo maturation. Conclusions The genetic basis of agronomical traits is complicated in heterozygous diploid sugar beet. The putative valuable genes found in this study will help further elucidate the molecular mechanism of each phenotypic trait for beet breeding.

Transcriptomic regulations of heat stress response in the liver of lactating dairy cows

July 2023

·

56 Reads

Background The global dairy industry is currently facing the challenge of heat stress (HS). Despite the implementation of various measures to mitigate the negative impact of HS on milk production, the cellular response of dairy cows to HS is still not well understood. Our study aims to analyze transcriptomic dynamics and functional changes in the liver of cows subjected to heat stress (HS). To achieve this, a total of 9 Holstein dairy cows were randomly selected from three environmental conditions - heat stress (HS), pair-fed (PF), and thermoneutral (TN) groups - and liver biopsies were obtained for transcriptome analysis. Results Both the dry matter intake (DMI) and milk yield of cows in the HS group exhibited significant reduction compared to the TN group. Through liver transcriptomic analysis, 483 differentially expressed genes (DEGs) were identified among three experimental groups. Especially, we found all the protein coding genes in mitochondria were significantly downregulated under HS and 6 heat shock proteins were significant upregulated after HS exposure, indicating HS may affect mitochondria integrity and jeopardize the metabolic homeostasis in liver. Furthermore, Gene ontology (GO) enrichment of DEGs revealed that the protein folding pathway was upregulated while oxidative phosphorylation was downregulated in the HS group, corresponding to impaired energy production caused by mitochondria dysfunction. Conclusions The liver transcriptome analysis generated a comprehensive gene expression regulation network upon HS in lactating dairy cows. Overall, this study provides novel insights into molecular and metabolic changes of cows conditioned under HS. The key genes and pathways identified in this study provided further understanding of transcriptome regulation of HS response and could serve as vital references to mitigate the HS effects on dairy cow health and productivity.


Pair-wise comparison for genome-wide PBMC methylome datasets from benign, carcinoma, and normal dogs. A Synopsis of genome-wide PBMC methylome study. B A Venn diagram shows the number of common and unique DMRs identified in each comparison (FDR-adjusted p-value < 0.1 and log2FC ≥  ± 0.585). C-E The distributions of genomic features in Total bins, Bins_used, and each DMR to see pronounced regions. ‘Bins_used’ regarded signal peaks used for DMR analysis, excluding noise bins (both low signal bins and zero CpG bins) from ‘Total bins’. F Volcano plots and 100%-scaled stacked bar plots with the frequency and genomic profile of hypo- and hyper- methylated bins. The x-axis is the ‘log2 methylation fold change’, and the y-axis means the statistical significance. Hypermethylated in ‘N’ is expressed as blue, ‘T’ as purple, ‘B’ as orange, and ‘C’ as red. G Heatmap Clustering of ‘N and T with NT_DMR (2840 DMRs)’, ‘N and B with NB_DMR (3373 DMRs)’, ‘N and C with NC_DMR (1876 DMRs)’, ‘B and C with BC DMRs (168 DMRs)’. The clustering distance between samples (columns) followed Pearson’s correlation, and the ‘complete’ method was used
Gene enrichment analysis for DMGs shows differential immune signatures between tumor and normal PBMCs. Immune-related terms significantly enriched in the Gene Ontology (blue box), the MGI Mammalian Phenotype (pink box), the KEGG pathway (yellow box), and the Human Gene Atlas (purple box) are shown. The color of dots means which group is hypermethylated (‘N-hyper’ is expressed as blue, ‘T-hyper’ as purple, ‘B-hyper’ as orange, and ‘C-hyper’ as red. The size of the dots indicates the statistical importance (according to -log10 adjusted p-value). The table corresponding to this figure shows the genes included in each term, which is in Table S7
Immune cell markers involved in normal proliferation and activation of B-cells, T-cells, and NK cells are hypermethylated in tumor PBMCs. A Text clouds intuitively show the frequency of words enriched in immune-related terms. The color of the text indicates which group is hypermethylated (‘N-hyper’ is expressed as blue, ‘T-hyper’ as purple, ‘B-hyper’ as orange, and ‘C-hyper’ as red). The meaning of the four colors (blue, purple, orange, and red) was applied equally to the following graphs in this figure. B The number of hypermethylated genes included in immune cell type markers is expressed as a percentage (%) of total genes in the corresponding cell type. The number of matched genes is displayed on the top of each bar. The list of marker genes for 11 types of immune cells was downloaded from Panglao DB. C Among genes enriched in significant immune-associated terms, hypermethylated DMGs that reversely correlate with expression are shown. The y-axis of the bar graph on top means log2 fold change of methylation values, and that of the middle one means log2 fold change calculated using TPM values derived from RNA-seq. The y-axis of the bottom one shows the degree of inverse correlation between methylation and expression by Pearson’s correlation. Hypermethylated genes included in Panglao DB and its genomic features are listed in Table S8. D The scatter plots with linear regression (red line) in 4 representative genes among 49 genes listed in (C). The Pearson correlation coefficient, expression (fold-change), and methylation (fold-change) for every immune-related DMGs are also described in Table S9
Targeted CpG methylation and expression analysis in representative hypermethylated genes related to immune cell activation. A Methylation peaks in four interesting gene regions are shown. Pink dumbbells also express the loci where primers have been designed. The DMR in the BACH2 gene is located in the second intron of 6 introns, the DMR in the SH2D1A gene is located in TTS, DMR in TXK is located CpG shore promoter, and the DMR in UHRF1 is located in the second exon of 17 exons overlapped with CpG shore. B The methylation validation for 12 CpGs in BACH2 DMR, 7 CpGs in SH2D1A and TXK DMR, and 22 CpGs in UHRF1 DMR by performing targeted bisulfite sequencing using primers listed in Table S10. Methylated CGs are indicated by black circles, and unmethylated CGs are expressed by empty (white) circles. C Violin plots show the distribution of methylated CG (%) between groups. The total percentages of methylated CG were calculated as ‘(The number of methylated CG / The number of total CG in the amplified region) * 100 (%)’ in each CG for every sample. D In contrast to Violin plots in (C), Box plots show the expression levels are significantly down-regulated in Benign and Carcinoma PBMCs versus Normal PBMCs. The y-axis means the log2-transformed (TPM + 1) quantified using RNA-seq
A machine learning-based diagnostic two-step classifier discriminating tumor from normal PBMCs followed by carcinoma from benign PBMCs. A The concept of a two-step classifier for precisely distinguishing three groups (Normal, Benign, and Carcinoma). B Schematic diagram of the diagnostic methylome-based classifier modeling. To generate the best predictive model, tenfold cross-validation with multiple ML algorithms were employed, and then the performance of each model was evaluated. C The ROC curves of the NT classifiers were established by SVM_L, SVM_R, RF, GBM, KNN, and logistic regression. AUC values are shown in the right-bottom area under the curves. D Heatmap of the confusion matrix (left) for tumor detection by the SVM_L-based NT classifier, which has the best AUC value (AUC = 1) and accuracy (Accuracy = 1). The confusion matrix for tenfold cross-validation (right) shows the prediction results for seven to nine test samples in each fold. E Validation of the predictive performance in multiple NT classifiers. PBMC MBD-seq data from six dogs with CMT were used as the validation set. Except for the logistic classifier, which incorrectly predicted three out of six, the SVM_L, SVM_R, RF, GBM, and KNN classifiers predict tumors. F The ROC curves (left) for the BC classifier modeled with 2911 DMRs containing ‘BC_DMR’ and DMRs identified ‘only in NB_DMR’ or ‘only in NC_DMR’. BC classifiers show lower AUC values compared to NT classifiers. The bar graph (right) exhibits the highest accuracy in GBM. 127 DMRs extracted by GBM-based feature importance are used for BC classifier re-modeling. This iterative process is illustrated in the center of (B). G The ROC curves of re-modeled BC classifiers using 127 DMRs, which show enhanced performance compared to previous BC classifiers. H The improved performance was confirmed via both a heatmap of the confusion matrix (left) and the tenfold confusion matrix (right) for the final BC classifier (SVM_L) generated using 127 DMRs
The landscape of PBMC methylome in canine mammary tumors reveals the epigenetic regulation of immune marker genes and its potential application in predicting tumor malignancy

July 2023

·

27 Reads

Background: Genome-wide dysregulation of CpG methylation accompanies tumor progression and characteristic states of cancer cells, prompting a rationale for biomarker development. Understanding how the archetypic epigenetic modification determines systemic contributions of immune cell types is the key to further clinical benefits. Results: In this study, we characterized the differential DNA methylome landscapes of peripheral blood mononuclear cells (PBMCs) from 76 canines using methylated CpG-binding domain sequencing (MBD-seq). Through gene set enrichment analysis, we discovered that genes involved in the growth and differentiation of T- and B-cells are highly methylated in tumor PBMCs. We also revealed the increased methylation at single CpG resolution and reversed expression in representative marker genes regulating immune cell proliferation (BACH2, SH2D1A, TXK, UHRF1). Furthermore, we utilized the PBMC methylome to effectively differentiate between benign and malignant tumors and the presence of mammary gland tumors through a machine-learning approach. Conclusions: This research contributes to a better knowledge of the comprehensive epigenetic regulation of circulating immune cells responding to tumors and suggests a new framework for identifying benign and malignant cancers using genome-wide methylome.

Schematic overview of the ensemble pipeline for Borrelia genome reconstruction established in this study. Lab preparation steps are indicated in grey. Data based on PacBio sequencing is shown in dark blue, data based on Illumina sequencing is shown in orange. A combination of PacBio and Illumina data is colored purple. QC and refinement steps are shown in yellow and the steps to generate the final consensus are shown in red
Dot plot examples before (left) and after (right) contig trimming. Wraparound and terminal direct repeats that need to be trimmed are indicated by a black arrow. The remaining part after trimming is indicated by a red box. Dot plot of PBaeII lp54 (contig ctg.s2.000000F of the microbial assembly) untrimmed (A) and trimmed (B). Dot plot of PBaeII lp28-8 (contig ctg.s2.000004F of the microbial assembly) untrimmed (C) and trimmed (D). The region of the vls locus is indicated by a gray filled box. Dot plot of PBaeII cp26 (contig tig00000016 of the HiCanu assembly) untrimmed (E) and trimmed (F). Dot plots were generated using the web-based NCBI-BLASTN [51]
Dot plot of contig ctg.s2.10 (cp26) of the microbial assembly of PBaeII without terminal direct repeats (left) and containing terminal direct repeats after extension (right). In the left panel (A) is the untrimmed contig that does not show terminal direct repeats, in the right panel (B) is the extended contig which contains the overlapping region (1 bp – 2,000 bp overlaps 27,108 bp – 29,108 bp). Therefore, the plasmid can be considered as circular and complete. Dot plots were generated using the web-based NCBI-BLASTN [51]
Schematic visualization of the genome elements of PBaeII, PBes and 89B13. Partitioning genes are shown as colored dots (PFam32: red, PFam49: green, PFam50: yellow, PFam57/62: blue). Intact genes are shown as filled dots, pseudogenes are shown as unfilled points with a cross. Intact genes and pseudogenes were defined using the NCBI annotator PGAP [52]
Assembly results after genome reconstruction of the three representative isolates (PBaeII, PBes and 89B13) for each assembler (microbial, IPA and HiCanu) and overview of the final combined consensus. Complete reconstructed genome elements are colored green, incomplete, missing or probably wrong assembled genome elements are shown in red. Genome elements used for final consensus are shown in bold
A high fidelity approach to assembling the complex Borrelia genome

July 2023

·

83 Reads

Background Bacteria of the Borrelia burgdorferi sensu lato (s.l.) complex can cause Lyme borreliosis. Different B. burgdorferi s.l. genospecies vary in their host and vector associations and human pathogenicity but the genetic basis for these adaptations is unresolved and requires completed and reliable genomes for comparative analyses. The de novo assembly of a complete Borrelia genome is challenging due to the high levels of complexity, represented by a high number of circular and linear plasmids that are dynamic, showing mosaic structure and sequence homology. Previous work demonstrated that even advanced approaches, such as a combination of short-read and long-read data, might lead to incomplete plasmid reconstruction. Here, using recently developed high-fidelity (HiFi) PacBio sequencing, we explored strategies to obtain gap-free, complete and high quality Borrelia genome assemblies. Optimizing genome assembly, quality control and refinement steps, we critically appraised existing techniques to create a workflow that lead to improved genome reconstruction. Results Despite the latest available technologies, stand-alone sequencing and assembly methods are insufficient for the generation of complete and high quality Borrelia genome assemblies. We developed a workflow pipeline for the de novo genome assembly for Borrelia using several types of sequence data and incorporating multiple assemblers to recover the complete genome including both circular and linear plasmid sequences. Conclusion Our study demonstrates that, with HiFi data and an ensemble reconstruction pipeline with refinement steps, chromosomal and plasmid sequences can be fully resolved, even for complex genomes such as Borrelia. The presented pipeline may be of interest for the assembly of further complex microbial genomes.

Phylogenetic relationship of putative SRLK genes from Pyrus communis (Pycom, ), Erigeron breviscapus (Eb, ), and B.oleracea (Bol, ). The unrooted phylogenetic tree was constructed using the MEGA 7 software through the neighbor-joining (NJ) method with 1000 bootstrap replicates. The bootstrap values are shown near the nodes, and only those values greater than 50 are displayed. The seven groups are indicated with camber lines
Phylogenetic relationships, protein sequence identities and gene structure of putative SRLKs in Erigeron breviscapus. a Neighbor-joining (NJ) phylogenetic unrooted tree for SRLKs in Erigeron breviscapus. Different colors represent the original grouping relationships between SRLKs from the three species in Fig. 1. b Protein sequence identities of SRLKs in Erigeron breviscapus. Heat map represents the protein sequence identities of SRLKs analyzed by ClustalW. The colored bar indicates 20–100% protein sequence identity. c Exon–intron structures of 52 SRLKs identified in Erigeron breviscapus. Due to the large difference in length between genes, three different scales are used to show the length of genes. Exons are presented by green, yellow and dark blue boxes in the three scales, and introns are represented by black, light blue and red lines in the three scales
The distribution and inter-chromosomal correlation of the SRLK gene family in Erigeron breviscapus. The arc segments represent nine chromosomes of Erigeron breviscapus. The red curve in each arc segment indicates gene number in 1 Mb. The scale outside each chromosome represents the physical position and length of chromosome (Mb). The outermost symbols indicate tandem duplication genes, with same-shaped symbols representing one tandem duplication group. Triangles, squares and solid circles represent multiple tandem duplication gene group in same chromosome. The inner lines represent syntenic blocks detected in the genome, the blocks which harbored segmental duplication SRLKs are highlighted by blue line and the relationship between segmental duplication SRLKs are shown with red line
Expression profiles of SRLKs in different tissues and at different flowering stages of Erigeron breviscapus. a The expression patterns of 52 EbSRLKs in different tissues and at different flowering stages. R, root; S, stem; L, leaf; P, peduncle; To, tongue flower; Tu, tubular flower; Fs, flower stage; Self, self-pollinated; Cross, cross-pollinated. The expression levels of genes are presented using FPKM fold-change values transformed to Log2 format. The data of 52 EbSRLKs were extracted from our RNA-Seq data, The SRLKs with high expression levels are highlighted with red stars. The SRLKs with higher expression level in the self-pollinated sample than in the cross-pollinated sample are highlighted with green stars. (b-d) The phenotypes of multiple organs and flower developmental stages that samples were collected. b Plant organs of Erigeron breviscapus. c Tongue flowers and tubular flowers. d Six flowering stages. Bar = 1 cm in each panel
Subcellular localization of EbSRLK23 and EbSRLK43. a Subcellular localization in Arabidopsis thaliana protoplasts. The recombinant vector 35S::EbSRLK23:GFP and 35S::EbSRLK43:GFP and the vector control 35S::GFP was transfected into Arabidopsis protoplast cells individually. The fluorescence was observed under a laser scanning confocal microscopy. Bar = 10 μm. b Subcellular localization of EbSRLK23 and EbSRLK43 proteins in tobacco leaf epidermal cells. Fluorescence images were obtained from confocal microscopy. Merge means image GFP merged with its bright-field photograph in the same cell. Bar = 20 μm
Genome-wide identification and characterization of SRLK gene family reveal their roles in self-incompatibility of Erigeron breviscapus

July 2023

·

23 Reads

Self-incompatibility (SI) is a reproductive protection mechanism that plants acquired during evolution to prevent self-recession. As the female determinant of SI specificity, SRK has been shown to be the only recognized gene on the stigma and plays important roles in SI response. Asteraceae is the largest family of dicotyledonous plants, many of which exhibit self-incompatibility. However, systematic studies on SRK gene family in Asteraceae are still limited due to lack of high-quality genomic data. In this study, we performed the first systematic genome-wide identification of S-locus receptor like kinases (SRLKs) in the self-incompatible Asteraceae species, Erigeron breviscapus, which is also a widely used perennial medicinal plant endemic to China.52 SRLK genes were identified in the E. breviscapus genome. Structural analysis revealed that the EbSRLK proteins in E. breviscapus are conserved. SRLK proteins from E. breviscapus and other SI plants are clustered into 7 clades, and the majority of the EbSRLK proteins are distributed in Clade I. Chromosomal and duplication analyses indicate that 65% of the EbSRLK genes belong to tandem repeats and could be divided into six tandem gene clusters. Gene expression patterns obtained in E. breviscapus multiple-tissue RNA-Seq data revealed differential temporal and spatial features of EbSRLK genes. Among these, two EbSRLK genes having high expression levels in tongue flowers were cloned. Subcellular localization assay demonstrated that both of their fused proteins are localized on the plasma membrane. All these results indicated that EbSRLK genes possibly involved in SI response in E. breviscapus. This comprehensive genome-wide study of the SRLK gene family in E. breviscapus provides valuable information for understanding the mechanism of SSI in Asteraceae.