Article

Database resources of the National Center for Biotechnology Information

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... We also assessed the impact of reference decontamination via Conterminator (34) and Recentrifuge (35) platforms and the impact of short reference sequence removal. For broad taxonomic coverage, we constructed databases from the NCBI BLAST Nucleotide (nt) database (36), the most comprehensive database for nucleotide BLAST search (37). We selected the Centrifuge classifier platform for this comparison due to its classification speed and optimized indexing scheme. ...
... The NCBI BLAST nt database encompasses nearly all traditional GenBank divisions, representing a significant portion of available GenBank sequences and spanning all domains of life (36,38). Therefore, the growth rate of nt follows that of GenBank. ...
... This example is particularly relevant as Kong et al. (47) reported P. yoelii as a distinguishing signature between Huntington's disease mice and wild-type mice, highlighting the critical effects on results when working with a properly decontaminated database vs a standard one. a typical configuration of a pipeline for taxonomic classification in metagenomics using the NCBI BLAST nt database (36,38). Data (and metadata, as available) ...
Article
Full-text available
Accurate metagenomic classification relies on comprehensive, up-to-date, and validated reference databases. While the NCBI BLAST Nucleotide (nt) database, encompassing a vast collection of sequences from all domains of life, represents an invaluable resource, its massive size—currently exceeding 10¹² nucleotides—and exponential growth pose significant challenges for researchers seeking to maintain current nt-based indices for metagenomic classification. Recognizing that no current nt-based indices exist for the widely used Centrifuge classifier, and the last public version currently available was released in 2018, we addressed this critical gap by leveraging advanced high-performance computing resources. We present new Centrifuge-compatible nt databases, meticulously constructed using a novel pipeline incorporating different quality control measures, including reference decontamination and filtering. These measures demonstrably reduce spurious classifications, as shown through our reanalysis of published metagenomic data where Plasmodium annotations were dramatically reduced using our decontaminated database, highlighting how database quality can significantly impact research conclusions. Through temporal comparisons, we also reveal how our approach minimizes inconsistencies in taxonomic assignments stemming from asynchronous updates between public sequence and taxonomy databases. These discrepancies are particularly evident in taxa such as Listeria monocytogenes and Naegleria fowleri, where classification accuracy varied significantly across database versions. These new databases, made available as pre-built Centrifuge indexes, respond to the need for an open, robust, nt-based pipeline for taxonomic classification in metagenomics. Applications such as environmental metagenomics, forensics, and clinical metagenomics, which require comprehensive taxonomic coverage, will benefit from this resource. Our work highlights the importance of treating reference databases as dynamic entities, subject to ongoing quality control and validation akin to software development best practices. This approach is crucial for ensuring accuracy and reliability of metagenomic analysis, especially as databases continue to expand in size and complexity. IMPORTANCE Accurately identifying the diverse microbes present in a sample, whether from the human gut, a soil sample, or a crime scene, is crucial for fields ranging from medicine to environmental science. Researchers rely on comprehensive DNA databases to match sequenced DNA fragments to known microbial species. However, the widely used NCBI nt database, while vast, poses significant challenges. Its massive size makes it difficult for many researchers to use effectively with taxonomic classifiers, and inconsistencies and contamination within the database can impact the accuracy of microbial identification. This work addresses these challenges by providing cleaned, updated, and validated nt-based databases specifically optimized for the widely used Centrifuge classification tool. This new resource demonstrably reduces errors and improves the reliability of microbial identification across diverse taxonomic groups. Moreover, by providing readily usable indexes, we overcome the size barrier, enabling researchers to leverage the full potential of the nt database for metagenomic analysis. Our findings underscore the need to treat reference databases as dynamic entities, emphasizing continuous quality control and versioning as essential practices for robust and reproducible metagenomics research.
... (accessed on 9 July 2023)) [40] Additionally, the sequences identified in the Hb-P and P-Cru hydrolysates were also compared with the sequences previously reported in the literature with proven antibacterial activity against Listeria ssp. by basic local alignment search tool (BLAST) [41]. ...
... Similarities were observed in the presence of potential antimicrobial peptides reported by Sanchez-Reinoso et al. [28], as well as the presence of peptides with possible antilisterial activity. In the case of the hemoglobin alpha chain, the peptides α(1-28), α(1-29), α (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (37)(38)(39)(40)(41)(42)(43)(44)(45)(46), and α(137-141) derived from Hb-P or P-Cru are analogous (>80% sequence identity) to the peptides α(1-28), α(1-29), α (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (37)(38)(39)(40)(41)(42)(43)(44)(45)(46), and α(137-141) derived from bovine hemoglobin (Hb-B). The antimicrobial activity of the latter peptides against L. innocua was demonstrated by Nedjar-Arroume et al. [47]. ...
... Similarities were observed in the presence of potential antimicrobial peptides reported by Sanchez-Reinoso et al. [28], as well as the presence of peptides with possible antilisterial activity. In the case of the hemoglobin alpha chain, the peptides α(1-28), α(1-29), α (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (37)(38)(39)(40)(41)(42)(43)(44)(45)(46), and α(137-141) derived from Hb-P or P-Cru are analogous (>80% sequence identity) to the peptides α(1-28), α(1-29), α (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46), α (37)(38)(39)(40)(41)(42)(43)(44)(45)(46), and α(137-141) derived from bovine hemoglobin (Hb-B). The antimicrobial activity of the latter peptides against L. innocua was demonstrated by Nedjar-Arroume et al. [47]. ...
Article
Full-text available
Listeria monocytogenes is a foodborne pathogen that represents a serious concern for ready-to-eat (RTE) meat products due to its persistence in production facilities. Among the different strategies for the control of this pathogen, the use of antimicrobial peptides derived from food by-products, such as slaughterhouse blood proteins, has emerged as a promising biocontrol strategy. This study evaluated for the first time the use of peptic hydrolysates of porcine hemoglobin as a biocontrol strategy of L. monocytogenes in RTE pork cooked ham. Pure porcine hemoglobin (Hb-P) and porcine cruor (P-Cru) were hydrolyzed using pepsin at different temperatures (37 °C for Hb-P and 23 °C for P-Cru) for 3 h. Then, the hydrolysates were characterized in terms of their degree of hydrolysis (DH), peptide population, color, and antimicrobial activity (in vitro and in situ) against three different serotypes of L. monocytogenes. Reducing the hydrolysis temperature of P-Cru by 14 °C resulted in a 2 percentage unit decrease in DH and some differences in the peptide composition. Nevertheless, the antimicrobial activity (in situ) was not significantly impacted, decreasing the viable count of L. monocytogenes by ~1-log and retarding their growth for 21 days at 4 °C. Although the color of the product was visibly altered, leading to more saturated reddish and yellowish tones and reduced brightness, the discoloration of the hydrolysates can be addressed. This biopreservation approach holds promise for other meat products and contributes to the circular economy concept of the meat industry by valorizing slaughterhouse blood and producing new antilisterial compounds.
... More specifically, we collect DTI information from DrugBank database (v4.3) [19], TTD [20], and PharmGKB [21]. For each drug, its chemical structure is extracted from DrugBank [19] in the simplified molecular input line entry system (SMILES) strings [26], while each protein was mapped to its Entrez ID through NCBI [40]. DDIs are attained from several public resources, including repoDB [41], DrugBank(v4.3) ...
... [19] and DrugCentral databases [42] by fusing drug indications. We standardized disease names using Medical Subject Headings (MeSH) and Unified Medical Language System (UMLS) vocabularies [43], and then mapped them to MedGen ID based on NCBI [40]. ...
Article
Full-text available
Novel drug discovery and repositioning remain critical challenges in biomedical research, requiring accurate prediction of drug–target interactions (DTIs). We propose the CPDP framework, which builds upon existing biomedical representation models and integrates contrastive learning with multi-dimensional representations of proteins and drugs to predict DTIs. By aligning the representation space, CPDP enables GNN-based methods to achieve zero-shot learning capabilities, allowing for accurate predictions of unseen drug data. This approach enhances DTI prediction performance, particularly for novel drugs not included in the BioHNs dataset. Experimental results demonstrate CPDP’s high accuracy and strong generalization ability in predicting novel biological entities while maintaining effectiveness for traditional drug repositioning tasks.
... The Burrows-Wheeler Alignment Tool was used to map the clean reads back to the non-redundant gene set, enabling the evaluation of sequence counts for each non-redundant gene in each sample and subsequent gene quantification [52]. Functional annotation of the non-redundant gene set was performed using databases such as NR, EggNOG, KEGG, CAZy, CARD, VFDB, and Phi [53][54][55][56][57][58][59]. Community structure analysis was conducted using Kraken based on the RefSeq database [60,61]. ...
Article
Full-text available
The Chinese giant salamander (CGS, Andrias davidianus), a flagship amphibian species, is highly vulnerable to high temperatures, posing a significant threat under future climate change. Previous research linked this susceptibility to liver energy deficiency, accompanied by shifts in gut microbiota and reduced food conversion rates, raising questions about the role of the gut-liver axis in mediating heat sensitivity. This study investigated the responses of Chinese giant salamander larvae to a temperature gradient (10–30 °C), assessing physiological changes alongside histological, gut metagenomic, and tissue transcriptomic analyses. Temperatures above 20 °C led to mortality, which resulted in delayed growth. Histological and transcriptomic data revealed metabolic exhaustion and liver fibrosis in heat-stressed salamanders, underscoring the liver’s critical role in heat sensitivity. While heat stress altered the gut microbiota’s community structure, their functional profiles, especially in nutrient absorption and transformation, remained stable. Both gut and liver showed temperature-dependent transcriptional changes, sharing some common variations in actins, heat shock proteins, and genes related to transcription and translation. However, their energy metabolism exhibited opposite trends: it was downregulated in the liver but upregulated in the gut, with the gut showing increased activity in the pentose phosphate pathway and oxidative phosphorylation, potentially countering metabolic exhaustion. Our findings reveal that the liver of the larvae exhibits greater thermal sensitivity than the gut, and the gut-liver axis plays a limited role in mediating thermal intolerance. This study enhances mechanistic understanding of CGS heat susceptibility, providing a foundation for targeted conservation strategies in the face of climate change.
... We used two dataset of human variants: 1) the functional set consists of 3,818, 1,584, and 1,777 variants (Single Nucleotide Polymorphisms, SNPs) from 2,056 human proteins, classified as knock-out, effect, and neutral based on the effect of these variants on protein function (Bromberg et al., 2024;Kawabata et al., 1999) and 2) the pathogenic set comprising 2,499 and 4,804 pathogenic and likely-pathogenic variants from ClinVar, combined with 1,887 common (minor allele frequency, MAF ≥ 0.01) and 3,073 rare variants (0.01>MAF≥ 0.001) from 1,430 human proteins (Bromberg et al., 2024;Coordinators, 2013;Landrum et al., 2014). For each of the variants, we extracted the loglikelihood scores (Bromberg et al., 2024) from pLMs: ESM2 (650M), ESM1v and ProtT5 (Table 1). ...
Preprint
Full-text available
Embeddings, derived by language models, are widely used as numeric proxies for human language sentences and structured data. In the realm of biomolecules, embeddings serve as efficient sequence and/or structure representations, enabling similarity searches, structure and function prediction, and estimation of biophysical and biological properties. However, relying on embeddings without assessing the model's confidence in its ability to accurately represent molecular properties is a critical flaw - akin to using a scalpel in surgery without verifying its sharpness. In this study, we propose a means to evaluate the ability of protein language models to represent proteins, assessing their capacity to encode biologically relevant information. Our findings reveal that low-quality embeddings often fail to capture meaningful biology, displaying vector properties indistinguishable from those of randomly generated sequences. A key contributor to this performance issue is the models' failure to learn the underlying biology from unevenly distributed sequence spaces in the training data. Our novel, model-agnostic scoring framework is, to the best of our knowledge, the first to quantify protein sequence embedding reliability. We believe that our robust approach to screening embeddings prior to making biological inferences, stands to significantly enhance the reliability of downstream applications.
... Pubchem is a database for chemical molecules. The system is maintained by the Nation Centre for Biotechnology Information (NCBI), which is a component of the National Library of Medicine [9]. By using Swiss PDB [10], ChemDraw Professional [11], Chem3D Pro program packages, all internal energies of the ligands were optimized. ...
Technical Report
Full-text available
Breast cancer susceptibility gene 1 (BRCA1) is a pivotal tumor suppressor gene involved in DNA repair, transcriptional regulation, and cell cycle control. Mutations or dysregulation of BRCA1 are strongly associated with breast and ovarian cancers. This study employs in silico drug design techniques to identify potential bioactive compounds targeting BRCA1, aiming to restore or enhance its functionality and thereby combat cancer progression. A library of bioactive compounds was screened using molecular docking and virtual screening to identify high-affinity binders to key functional domains of BRCA1, particularly its RING and BRCT domains. Computational techniques such as molecular dynamics simulations, ADMET predictions (absorption, distribution, metabolism, excretion, and toxicity), and pharmacophore modeling were employed to optimize the lead compounds for efficacy and safety. Several candidate compounds demonstrated promising binding affinities, with favorable interaction profiles and stable conformations within BRCA1 active sites. ADMET analysis highlighted drug-like properties, supporting their potential for further development. In silico identifies bioactive compounds with a potential for BRCA1 modulation, opening doors for BRCA1-related therapies with a target basis for future development. In future studies, high priority will be placed in in vitro and in vivo experiments confirming such compounds for testing them for therapeutic use. In drug discovery, such a role is critical in defining efficacy and toxicity of compounds prior to proceeding with clinical trials.
... Pubchem is a database for chemical molecules. The system is maintained by the Nation Centre for Biotechnology Information (NCBI), which is a component of the National Library of Medicine [9]. By using Swiss PDB [10], ChemDraw Professional [11], Chem3D Pro program packages, all internal energies of the ligands were optimized. ...
Research
Full-text available
Breast cancer susceptibility gene 1 (BRCA1) is a pivotal tumor suppressor gene involved in DNA repair, transcriptional regulation, and cell cycle control. Mutations or dysregulation of BRCA1 are strongly associated with breast and ovarian cancers. This study employs in silico drug design techniques to identify potential bioactive compounds targeting BRCA1, aiming to restore or enhance its functionality and thereby combat cancer progression. A library of bioactive compounds was screened using molecular docking and virtual screening to identify high-affinity binders to key functional domains of BRCA1, particularly its RING and BRCT domains. Computational techniques such as molecular dynamics simulations, ADMET predictions (absorption, distribution, metabolism, excretion, and toxicity), and pharmacophore modeling were employed to optimize the lead compounds for efficacy and safety. Several candidate compounds demonstrated promising binding affinities, with favorable interaction profiles and stable conformations within BRCA1 active sites. ADMET analysis highlighted drug-like properties, supporting their potential for further development. In silico identifies bioactive compounds with a potential for BRCA1 modulation, opening doors for BRCA1-related therapies with a target basis for future development. In future studies, high priority will be placed in in vitro and in vivo experiments confirming such compounds for testing them for therapeutic use. In drug discovery, such a role is critical in defining efficacy and toxicity of compounds prior to proceeding with clinical trials.
... The coding sequence (CDS), protein, and DNA sequences were retrieved from the National Centre for Biotechnology Information (NCBI) https://www.ncbi.nlm.nih.gov/ [12]. The data for all species are shown in Table 1, together with their respective accession IDs, protein IDs, and the databases from which they were downloaded. ...
Article
Full-text available
Heat Shock Protein Beta-1 (HSPB1), a molecular chaperone crucial for cellular response and proteostasis, exhibits evolutionary conservation with potential lineage-specific adaptations in placental mammals, warranting detailed comparative genomic investigation. The study investigated the characteristics, evolutionary links, motifs, secondary structure, and genetic organization of the HSPB1 protein across twelve distinct mammals. Significant sequence conservation was identified using multiple sequence alignments (MSA), with over 70% identity in specific areas among the represented species. Physiochemical analysis revealed that all species’ protein sequences exhibited an acidic nature, while instability indices indicated inherent protein instability. The GRAVY analysis referred to hydrophilic properties, while the aliphatic index showed heat stability. Phylogenetic analysis revealed five distinct clades, corresponding to major placental mammals groups (e.g. Homo sapiens, Bos Taurus), which underscores deep evolutionary divergences and conserved stress-response adaptations across lineages. Motif analysis revealed distinctive patterns in several species, and InterProScan results revealed membership in the "Homologous superfamily HSP20_like_Chapserson" family. An examination of the genetic organization indicated differences among all the represented species in the upstream, downstream, intron, and CDS regions, and the presence of conserved regions suggested their identity and similarity matrices. The current study conducted a computational approach and supporting evidence that HSPB1 is a novel heat shock responsive protein identified in placental mammals. The current study findings provide a foundational framework delving into HSPB1 evolutionary and lineage-specific diversification, offering valuable insights into stress adaptation mechanisms and their implications for biomedical or evolutionary studies in mammals.
... The CDS, protein, and DNA sequences were retrieved from the National Centre for Biotechnology Information (NCBI) https://www.ncbi.nlm.nih.gov/ [12]. The data for all organisms are shown in Table 1, together with their respective accession ids, protein ids, and the databases from which they were downloaded. ...
Preprint
Full-text available
Heat Shock Protein Beta-1 (HSPB1), a molecular chaperone crucial for celluar response and proteostasis, exhibits evolutionary conservation woth potential lineage-specific adaptations in placental mammals, warranting detailed comparative genomic investigation. The study investigated the characteristics, evolutionary links, motifs, secondary structure, and genetic organization of Heat Shock Protein Beta-1 ( HSPB1) protein across twelve distinct mammals. Significant sequence conservation was identified using multiple sequence alignments (MSA), with over 70% identity in specific areas among the chosen organisms. Physiochemical analysis revealed that all species’ protein sequences exhibited an acidic nature, while instability indices indicated inherent protein instability. The GRAVY analysis referred to hydrophilic properties, while the aliphatic index showed heat stability. Phylogenetic analysis revealed five distinct clades, corresponding to major placental mammals’ groups (e.g. Homo sapiens, Bos Taurus ), which underscores deep evolutionary divergences and conserved stress-response adaptations across lineages. Motif analysis revealed distinctive patterns in several species, and InterProScan results revealed membership in the "Homologous superfamily HSP20_like_Chapserson" family. An examination of the genetic organization indicated differences among organisms in the upstream, downstream, intron, and CDS regions, and the presence of conserved regions suggested their identity and similarity matrices. The current study conducted a computational approach and supporting evidence that HSPB1 is a novel heat shock responsive protein identified in placental mammals The current study findings provide a foundational framework delving into HSPB1 evolutionary and lineage-specific diversification, offering valuable insights into stress adaptation mechanisms and their implications for biomedical or evolutionary studies in mammals.
... To further screen targets specifically associated with DMED, target genes associated with type 2 diabetes and ED dysfunction were screened using the HERB database and annotated using the Comparative Toxicology Database (CTD; association score > 20; https://ctdbaseorg/). 21 HomoloGene annotates homologous gene relationships between humans and rats. 22 Furthermore, the Venny package was utilized to discover overlapping targets in the Venn diagram of SMBJ active compounds related to both ED and type 2 diabetes (https//bioinfogp.cnb.csic.es/tools/venny/index.html). Possible targets for SMBJ in DMED may involve these shared targets. ...
Article
Full-text available
Objective Simiao Biejia (SMBJ) granules, a traditional Chinese herbal remedy, have been used to treat erectile dysfunction caused by diabetes mellitus (DMED). However, the molecular mechanisms underlying SMBJ’s therapeutic effects remain unclear. This study aimed to investigate the effects and mechanisms of SMBJ in a rat model of DMED using network pharmacology, proteomics, and molecular docking. Methods A rat model of DMED was established, and SMBJ granules were administered (0, 7.1, 14.2, and 28.4 mg/kg/d, respectively) for 4 weeks. Erectile function was evaluated by measuring intracavernous pressure and mean arterial pressure. The active compounds in SMBJ were analyzed by gas chromatography and identified using network pharmacology and bioinformatics. Potential targets in the penile tissue was identified via proteomics and validated by Western blotting. Molecular docking was used to assess the binding affinity between bioactive compounds and primary targets. Results SMBJ significantly improves erectile function and ameliorates DMED in rats by reducing corpus cavernosum fibrosis, decreasing eNOS and nNOS levels, alleviating oxidative stress in penile tissue, and mitigating damage to smooth muscle cells (SMCs) and vascular endothelial cells (VECs). Network pharmacology and proteomics identified 24 potential SMBJ targets in DMED. The 4 drug molecules identified were involved in the therapeutic effects of SMBJ, among which luteolin was predicted to be the core drug component. Luteolin bound directly with AKT1, a key differentially expressed protein in the penile tissue of DMED rats. Further analysis showed that luteolin in SMBJ activates the PI3K/Akt pathway and regulation of nNOS and NF-kB expression in the penile tissue of DMED rats to improve erectile function. Conclusion SMBJ improved oxidative stress damage, vascular endothelial repair, and angiogenesis in the penile tissue of DMED rats. Luteolin is one of the core drug components of SMBJ in DMED treatment that regulates PI3K/AKT-related pathways.
... The PCGs, open reading frames (ORFs), rRNAs, tRNAs, and introns in the three Ramaria mitogenomes were annotated using the MFannot (Valach et al. 2014) and MITOS (Bernt et al. 2013) tools. Both methods utilize mitochondrial genetic code 4. ORFs (Coordinators NR 2017) that exceed 100 amino acids were subjected to additional refining or annotation using the NCBI Open Reading Frame Finder (Bleasby and Wootton 1990). Subsequently, BLASTP searches were conducted against the NCBI nonredundant protein database. ...
Article
Full-text available
Ramaria has been a remarkable genus throughout the history of macrofungi. However, there is a lack of information on this genus of macrofungi. This study determined the order of nucleotides in the mitochondrial genomes (mitogenomes) of three Ramaria species, followed by a detailed investigation of the obtained genetic information. Circular mitogenomes of Ramaria brunnecliacina, R. ichnusensis, and R. flavescens had sizes of 78,960, 61,851, and 81,282 bp, respectively. The genomes exhibited variations in genetic content, gene length, tRNA, and codon usage. Ramaria mitogenomes demonstrated variable evolutionary rates across several protein‐coding genes. The results revealed significant gene rearrangements in Ramaria mitogenomes, including gene displacement and tRNA duplication. Utilizing Bayesian inference and maximum likelihood methods on a comprehensive set of conserved mitochondrial proteins, we generated a well‐supported phylogenetic tree for Basidiomycota. This analysis revealed that R. brunneciacina and R. flavescens are closely related, while confirming the paraphyletic nature of the Ramaria genus and its genetic affinity with other species of the subclass Phallomycetidae. This study presents a basic structure for understanding the evolutionary dynamics, genetic makeup, and taxonomy categorization of this significant fungal community.
... Pubchem is a database for chemical molecules. The system is maintained by the Nation Centre for Biotechnology Information (NCBI), which is a component of the National Library of Medicine [9]. By using Swiss PDB [10], ChemDraw Professional [11], Chem3D Pro program packages, all internal energies of the ligands were optimized. ...
Article
Full-text available
Breast cancer susceptibility gene 1 (BRCA1) is a pivotal tumor suppressor gene involved in DNA repair, transcriptional regulation, and cell cycle control. Mutations or dysregulation of BRCA1 are strongly associated with breast and ovarian cancers. This study employs in silico drug design techniques to identify potential bioactive compounds targeting BRCA1, aiming to restore or enhance its functionality and thereby combat cancer progression. A library of bioactive compounds was screened using molecular docking and virtual screening to identify high-affinity binders to key functional domains of BRCA1, particularly its RING and BRCT domains. Computational techniques such as molecular dynamics simulations, ADMET predictions (absorption, distribution, metabolism, excretion, and toxicity), and pharmacophore modeling were employed to optimize the lead compounds for efficacy and safety. Several candidate compounds demonstrated promising binding affinities, with favorable interaction profiles and stable conformations within BRCA1 active sites. ADMET analysis highlighted drug-like properties, supporting their potential for further development. In silico identifies bioactive compounds with a potential for BRCA1 modulation, opening doors for BRCA1-related therapies with a target basis for future development. In future studies, high priority will be placed in in vitro and in vivo experiments confirming such compounds for testing them for therapeutic use. In drug discovery, such a role is critical in defining efficacy and toxicity of compounds prior to proceeding with clinical trials.
... The structural diversions of different adsorbates and CO2 on the flat Cu2O (100) and hexagonal Cu2O (111) surfaces were determined, and the change on (100) and (111) surfaces was also discussed. The standard molecular dimensions were confirmed through different databases, including Materials Project, PubChem, and the Crystallography Open Database (COD) [56][57][58]. Structural deviations indicate the catalytic activities of the catalyst interacting with different absorbates. The Cu-O-Cu in both Cu2O (100) and Cu2O (111) shows a constant angle of 109.5° after geometry optimization; however, Cu2O (100) exhibits higher angle changes as compared to Cu2O (111). ...
Article
Full-text available
Carbon dioxide (CO2) can be electrochemically, thermally, and photochemically reduced into valuable products such as carbon monoxide (CO), formic acid (HCOOH), methane (CH4), and methanol (CH3OH), contributing to carbon footprint mitigation. Extensive research has focused on catalysts, combining experimental approaches with computational quantum mechanics to elucidate reaction mechanisms. Although computational studies face challenges due to a lack of accurate approximations, they offer valuable insights and assist in selecting suitable catalysts for specific applications. This study investigates the electrocatalytic pathways of CO2 reduction on cuprous oxide (Cu2O) catalysts, utilizing the computational hydrogen electrode (CHE) model based on density functional theory (DFT). The electrocatalytic performance of flat Cu2O (100) and hexagonal Cu2O (111) surfaces was systematically analysed, using the standard hydrogen electrode (SHE) as a reference. Key parameters, including free energy changes (ΔG), adsorption energies (Eads), reaction mechanisms, and pathways for various intermediates were estimated. The results showed that CO2 was reduced to CO(g) on both Cu2O surfaces at low energies. However, methanol (CH3OH) production was observed preferentially on Cu2O (111) at ΔG = −1.61 eV, whereas formic acid (HCOOH) and formaldehyde (HCOH) formation were thermodynamically unfavourable at interfacial sites. The CO2-to-methanol conversion on Cu2O (100) exhibited a total ΔG of −3.38 eV, indicating lower feasibility compared to Cu2O (111) with ΔG = −5.51 eV. These findings, which are entirely based on a computational approach, highlight the superior catalytic efficiency of Cu2O (111) for methanol synthesis. This approach also holds the potential for assessing the catalytic performance of other transition metal oxides (e.g., nickel oxide, cobalt oxide, zinc oxide, and molybdenum oxide) and their modified forms through doping or alloying with various elements.
... www.nature.com/scientificdata/ HS-BLASTN v0.05 29 against the human, mouse, and microbial databases (accessed on February 3, 2020) from the Nucleotide Sequence Database of National Center for Biotechnology Information (NCBI) 30 . Additionally, we filtered reads against the UniVec database using Blastn v2.14.1 31 . ...
Article
Full-text available
Megachile sculpturalis Smith, 1853 native to East Asia, is an important solitary bee species that has invaded both Europe and the United States. This study provides the first chromosome-level genome assembly of M. sculpturalis using a combination of Nanopore long reads, Illumina short reads, and Hi-C data. The genome comprises 296.99 Mb distributed across 16 chromosomes. N50, L50 and BUSCO completeness reached 19.128 Mb, 7 scaffolds, and 96.7%, respectively. The genome contains 104 Mb repetitive elements (35.02% of the assembly size) and 11,446 predicted protein-coding genes. This chromosome-level genome will serve as an essential genomic resource for future research on Megachilidae.
... Phylogenetic Analysis. Sequences of P2D-type ATPases from species representing a broad range of eukaryotic groups were identified from previous studies (15,18) and BLAST (49) searches of NCBI (50) and EupathDB (51) databases. The sequences chosen for alignment are listed in SI Appendix, Table S1. ...
Article
Full-text available
Among new antimalarials discovered over the past decade are multiple chemical scaffolds that target Plasmodium falciparum P-type ATPase ( Pf ATP4). This essential protein is a Na ⁺ pump responsible for the maintenance of Na ⁺ homeostasis. Pf ATP4 belongs to the type two-dimensional (2D) subfamily of P-type ATPases, for which no structures have been determined. To gain better insight into the structure/function relationship of this validated drug target, we generated a homology model of Pf ATP4 based on sarco/endoplasmic reticulum Ca ²⁺ ATPase, a P2A-type ATPase, and refined the model using molecular dynamics in its explicit membrane environment. This model predicted several residues in Pf ATP4 critical for its function, as well as those that impart resistance to various Pf ATP4 inhibitors. To validate our model, we developed a genetic system involving merodiploid states of Pf ATP4 in which the endogenous gene was conditionally expressed, and the second allele was mutated to assess its effect on the parasite. Our model predicted residues involved in Na ⁺ coordination as well as the phosphorylation cycle of Pf ATP4. Phenotypic characterization of these mutants involved assessment of parasite growth, localization of mutated Pf ATP4, response to treatment with known Pf ATP4 inhibitors, and evaluation of the downstream consequences of Na ⁺ influx. Our results were consistent with modeled predictions of the essentiality of the critical residues. Additionally, our approach confirmed the phenotypic consequences of resistance-associated mutations as well as a potential structural basis for the fitness cost associated with some mutations. Taken together, our approach provides a means to explore the structure/function relationship of essential genes in haploid organisms.
... This study examines the influence of these plant supplements on growth performance and gut health (Semwal et al., 2014). Lawsonia inermis, the only member of its genus in the Lythraceae family, contains bioactive compounds like 2-hydroxy-1,4-naphthoquinone (Coordinators, 2018). LILP exhibits health benefits, including anti-inflammatory and antimicrobial effects. ...
... The integration resulted in the prediction of 39,934 protein-coding genes distributed across the genome, with a mean gene length of 5,087.29 bp. Gene functional annotation was executed by aligning the predicted protein sequences against public functional databases using BLAST v2.11.0 43 (e-value < 10 −5 ), including Trembl 44 , NCBI-nr 45 , KEGG 46 , InterPro 47 , KOG 48 and SwissProt 49 . This comprehensive annotation process resulted into 35,458 being functionally annotated genes representing 88.79% of the protein-coding genes (Supplementary Table S3). ...
Article
Full-text available
The cultivated Zizania latifolia, an aquatic vegetable prevalent in the Yangtze River Basin, represents a unique plant-fungus complex whose domestication is associated with host-parasite co-evolution. In this study, we present a high-quality, chromosome-scale genome assembly of cultivated Z. latifolia. We employed PacBio long-read sequencing and Hi-C technology to generate ~578.42 Mb genome assembly, which contains 47.59% repeat sequences with a contig N50 of ~33.75 Mb. The contigs were successfully clustered into 17 chromosomal-sized scaffolds with a GC content of 43.26%, showing 98.39% completeness in BUSCO analysis. In total, we predicted 39,934 protein-coding genes, 88.79% of which could be functionally annotated. This genome assembly provides a valuable resource for unraveling Z. latifolia’s domestication process, and advances our understanding of the evolutionary history and agricultural potential of Z. latifolia.
... The resulting sequence was edited and assembled using Geneious R8 software (Biomatters Ltd, New Zealand). The nucleotide sequence was subjected to a BLASTn search for preliminary species assignment [11]. Multiple sequences were aligned with the sequence of interest, using MAFFT v7.490 [12] in the Aliview v1.28 software package [13]. ...
Article
Full-text available
This is the first description of the complete genome sequence of a newly characterized monopartite begomovirus isolated from an asymptomatic uncultivated plant, Melochia tomentosa, collected in Burkina Faso. The sequence was obtained through rolling-circle amplification, cloning, and Sanger sequencing. The provisional species name “Begomovirus melochiae” and common virus name “melochia associated virus” (MeAV) are proposed. The MeAV genome was found to share the most nucleotide sequence similarity with three African monopartite begomoviruses: tomato curly stunt virus (74%), pepper yellow vein Mali virus (73%), and tomato leaf curl Cameroon virus (73%). Phylogenetic analysis confirmed its relationship to Old World monopartite begomoviruses. The discovery of MeAV in an uncultivated and asymptomatic plant provides a further example of the high diversity of begomoviruses in sub-Saharan African ecosystems.
... LOVD 3 was developed to connect with various resources, such as HGNC, NCBI, EBI, and Mutalyzer, as a basis for ongoing high-quality data provision. For instance, LOVD may draw on the NCBI sequence viewer to provide a visual representation of each database entry at its corresponding location and its interconnection with other NCBI resources [90]. The NCBI sequence viewer was selected over other genome browsers because of its simplicity and the ability to embed it in any page without the need for local installation and administration [91]. ...
Article
Full-text available
Thalassemia is one of the most prevalent monogenic disorders in low- and middle-income countries (LMICs). There are an estimated 270 million carriers of hemoglobinopathies (abnormal hemoglobins and/or thalassemia) worldwide, necessitating global methods and solutions for effective and optimal therapy. LMICs are disproportionately impacted by thalassemia, and due to disparities in genomics awareness and diagnostic resources, certain LMICs lag behind high-income countries (HICs). This spurred the establishment of the Global Globin Network (GGN) in 2015 at UNESCO, Paris, as a project-wide endeavor within the Human Variome Project (HVP). Primarily aimed at enhancing thalassemia clinical services, research, and genomic diagnostic capabilities with a focus on LMIC needs, GGN aims to foster data collection in a shared database by all affected nations, thus improving data sharing and thalassemia management. In this paper, we propose a minimum requirement for establishing a genomic database in thalassemia based on the HVP database guidelines. We suggest using an existing platform recommended by HVP, the Leiden Open Variation Database (LOVD) (https://www.lovd.nl/). Adoption of our proposed criteria will assist in improving or supplementing the existing databases, allowing for better-quality services for individuals with thalassemia. Database URL: https://www.lovd.nl/
... We found that both the FDR and FNR depend on the genomic context, and, somewhat surprisingly, the FNR for the 50Y50C sample (Figure 5C) was highest within coding regions. Using Homologene paralogs [Coordinators, 2014] we found that the FDR (assuming that tracking variants are TPs) is higher in coding regions with paralogs (0.19, 136/437) than in those without (0.03, 1779/36191). Thus, accounting for homology can reduce the FDR in the coding regions to a value below the (unadjusted) FDR of the UTRs. ...
Preprint
Full-text available
The accurate identification of low-frequency variants in tumors remains an unsolved problem. To support characterization of the issues in a realistic setting, we have developed software tools and a reference dataset for diagnosing variant calling pipelines. The dataset contains millions of variants at frequencies ranging from 0.05 to 1.0. To generate the dataset, we performed whole-genome sequencing of a mixture of two Corriel cell lines, NA19240 and NA12878, the mothers of YRI (Y) and CEU (C) HapMap trios, respectively. The cells were mixed in three different proportions, 10Y/90C, 50Y/50C and 90Y/10C, in an effort to simulate the heterogeneity found in tumor samples. We sequenced three biological replicates for each mixture, yielding approximately 1.4 billion reads per mixture for an average of 64X coverage. Using the published genotypes as our reference, we evaluate the performance of a general variant calling algorithm, constructed as a demonstration of our flexible toolset, and make comparisons to a standard GATK pipeline. We estimate the overall FDR to be 0.028 and the FNR (when coverage exceeds 20X) to be 0.019 in the 50Y/50C mixture. Interestingly, even with these relatively well studied individuals, we predict over 475,000 new variants, validating in well-behaved coding regions at a rate of 0.97, that were not included in the published genotypes.
... Ideally, the reference database should include a wide range of microbial genomes to ensure broad coverage of potential organisms in a sample. A popular choice is the comprehensive NCBI RefSeq Complete Genomes and the nt database for high-quality nucleotides (NCBI Resource Coordinators 2014;O'Leary et al. 2016;Méric et al. 2019). However, the size of these databases can pose significant computational challenges, as the storage requirements can exceed 100 GB (Ye et al. 2019). ...
Article
Full-text available
Accurate taxonomic classification is essential to understanding microbial diversity and function through metagenomic sequencing. However, this task is complicated by the vast variety of microbial genomes and the computational limitations of bioinformatics tools. The aim of this study was to evaluate the impact of reference database selection and confidence score (CS) settings on the performance of Kraken2, a widely used k-mer-based metagenomic classifier. In this study, we generated simulated metagenomic datasets to systematically evaluate how the choice of reference databases, from the compact Minikraken v1 to the expansive nt- and GTDB r202, and different CS (from 0 to 1.0) affect the key performance metrics of Kraken2. These metrics include classification rate, precision, recall, F1 score, and accuracy of true versus calculated bacterial abundance estimation. Our results show that higher CS, which increases the rigor of taxonomic classification by requiring greater k-mer agreement, generally decreases the classification rate. This effect is particularly pronounced for smaller databases such as Minikraken and Standard-16, where no reads could be classified when the CS was above 0.4. In contrast, for larger databases such as Standard, nt and GTDB r202, precision and F1 scores improved significantly with increasing CS, highlighting their robustness to stringent conditions. Recovery rates were mostly stable, indicating consistent detection of species under different CS settings. Crucially, the results show that a comprehensive reference database combined with a moderate CS (0.2 or 0.4) significantly improves classification accuracy and sensitivity. This finding underscores the need for careful selection of database and CS parameters tailored to specific scientific questions and available computational resources to optimize the results of metagenomic analyses. Supplementary Information The online version contains supplementary material available at 10.1007/s42994-024-00178-0.
... BindingDB [22]), as well as scientific and patent evidence (e.g. PubMed [23] and SureChEMBL [24]). Furthermore, to account for the difference in molecule identifiers across databases, we implemented a cross-reference method to map a given identifier based on chemical similarity scores and other known identifiers, or synonyms, in the expanded search. ...
Article
Full-text available
It is well-accepted that knowledge of a small molecule’s target can accelerate optimization. Although chemogenomic databases are helpful resources for predicting or finding compound interaction partners, they tend to be limited and poorly annotated. Furthermore, unlike genes, compound identifiers are often not standardized, and many synonyms may exist, especially in the biological literature, making batch analysis of compounds difficult. Here, we constructed an open-source annotation and target hypothesis prediction tool that explores some of the largest chemical and biological databases, mining these for both common name, synonyms, and structurally similar molecules. We used this Chemical Analysis and Clustering for Target Identification (CACTI) tool to analyze the Pathogen Box collection, an open-source set of 400 drug-like compounds active against a variety of microbial pathogens. Our analysis resulted in 4,315 new synonyms, 35,963 pieces of new information and target prediction hints for 58 members. Scientific contributions With the employment of this tool, a comprehensive report with known evidence, close analogs and drug-target prediction can be obtained for large-scale chemical libraries that will facilitate their evaluation and future target validation and optimization efforts.
... T. adhaerens ionotropic glutamate receptor sequences were identified by BLAST [63] searching a whole animal mRNA transcriptome assembly [21] using a set of NMDA, AMPA and Epsilon protein sequences from human and the ctenophore Mnemiopsis leidyi as queries. Candidate T. adhaerens sequences were then analyzed with SmartBlast [64] and reciprocal BLAST of the NCBI non-redundant database to confirm homology to iGluRs, InterPro [65] to predict conserved domains, and TMHMM [66] to predict transmembrane helices. This identified 13 iGluR sequences, 11 of which contained complete protein coding sequences and a minimum of three predicted transmembrane helices. ...
Preprint
Epsilon ionotropic glutamate receptors (iGluRs) belong to a recently described sub-family of metazoan receptors that is distinct from the AMPA, Kainate, Delta, and Phi ( i.e. , AKDF) sub-family, the NMDA sub-family, and the Lambda subfamily. Here, we sought to better understand the evolutionary and functional properties of Epsilon receptors by focusing on homologues from the basal invertebrate Trichoplax adhaerens (phylum Placozoa). We provide an updated species-guided phylogeny of eukaryotic iGluRs, and a comprehensive phylogeny of placozoan receptors uncovering marked diversification of Epsilon receptors within three conserved subclades, and four invariable subclades of AKDF receptors. Detailed functional characterization of the T. adhaerens Epsilon receptor GluE1αA revealed robust activation by glycine, alanine, serine, and valine, but not glutamate. Through combined of structural modeling and mutation experiments, we used GluE1αA to test the hypothesis that only a small set of amino acids in the ligand binding domain determine ligand selectivity. Mutation of just three amino acids converted GluE1αA selectivity to glutamate, resulted in nascent sensitivity to AMPA, and increased sensitivity to the AMPA/Kainate receptor blocker CNQX. Lastly, combined modeling and mutation experiments revealed that an atypical serine residue in the pore NQR site of GluE1αA, along with an aspartate four amino acids downstream, confers sensitivity to voltage-dependent polyamine block, while the serine alone diminishes both polyamine block and Ca ²⁺ permeation compared to asparagine and glutamine residues of AMPA and Kainate receptors. Altogether, we demonstrate conserved molecular determinants for polyamine regulation between Epsilon and AKDF receptors, and evidence that natural variations in NQR residues have important implications for ion permeation and regulation by polyamines.
... It allows researchers to access and analyze gene expression profiles from various cancer studies. Cancer disease gene data are obtained from the National Centre for Biotechnology Information (NCBI) (Coordinators 2016;Murphy et al. 2021). ...
Article
Full-text available
This systematic review aims to provide a comprehensive overview of graph-based methodologies utilized in the analysis of protein–protein interaction (PPI) networks. The primary objective is to synthesize existing literature and identify key methodologies, resources, and best practices in the field, with a focus on their application in uncovering essential cancer proteins. A systematic literature search was conducted across various databases to identify relevant studies focusing on graph-based explorations of PPI networks. The selected articles were critically reviewed, and data were extracted regarding the methodologies employed, resources utilized, and best practices identified. The review proceeds to outline a workflow that illustrates the systematic process from the compilation of gene/protein datasets to the generation of essential cancer proteins. A case study on “uncovering essential cancer proteins in breast cancer” was included to exemplify the application of graph-based methodologies in a real-world scenario. The review revealed various graph-based methodologies utilized in PPI network analysis, including centrality measures, pathway enrichment analyses, and network visualization techniques. Essential resources such as databases, software tools, and repositories were identified, along with best practices for data preprocessing, network construction, and analysis. The synthesis of findings, complemented by the case study, provides researchers with a comprehensive understanding of the current landscape of graph-based PPI network analysis and its application in cancer research. This systematic review contributes to the field by offering a holistic overview of graph-based explorations in PPI network research, with a specific focus on cancer protein identification. By synthesizing existing knowledge and identifying essential resources and best practices, this review serves as a valuable resource for researchers, facilitating informed decision-making and enhancing research quality and reproducibility. The inclusion of the case study underscores the practical application of graph-based methodologies in uncovering essential cancer proteins.
Chapter
Full-text available
This chapter describes diverse electronic health data categories that can be used for secondary purposes, explores their backgrounds, and highlights associated advantages and challenges. The evolving landscape of digital technologies for health has given rise to various opportunities to collect several types of data. An analysis of the most used categories is summarized, while the descriptions available throughout the chapter show their diversity and significance, mainly in advancing healthcare research and practice.
Article
Full-text available
Life tables allow the exploration of insects' and terrestrial arthropods' biology , and how they respond to external factors. Data collection process has been partially standardized, but the presentation of results mainly depends on the purpose of the study. Two different data representations can be obtained from the raw dataset: the differential representation provides the distribution of the stage-development times, while the integral representation provides the number of individuals into the different life stages, over time. The representations provide relevant biological information, but they lead to a loss of information with respect to the raw dataset. To date, a conceptual explanation of how the two representations can be obtained from the raw data, and of their respective properties, is still missing; moreover, providing the raw dataset as supporting information of the published papers is still not customary. This paper highlights three main points: (i) how the two representations are obtained from life tables raw dataset; (ii) without raw data, it is not possible to switch between the two representations, with a subsequent loss of information; and (iii) why there is the need for a data collection standard. The conceptual explanation is further completed by an electronic file that could support data collection and sharing, and that automatically transform the data in the two representations. We believe that this study is a first step toward a more efficient diffusion of the information among the scientific community, maximizing the efforts made by scholars during the experimental and data analysis process.
Article
Oxford-Nanopore PromethION sequencing is a PCR-free method that retains epigenetic markers and provides direct quantitative information about DNA methylation. Using this long-read sequencing technology, we successfully assembled 5 myxozoan genomes free from discernible host DNA contamination, surpassing previous studies in both quality and completeness. Genome assembly revealed DNA methylation patterns within myxozoan genomes, particularly in GC-rich regions within gene bodies. The findings not only refute the notion of myxozoans lacking DNA methylation capability but also offer a new perspective on gene regulation in these parasites. The high-quality genome assemblies lay a solid foundation for future research on myxozoans, including new strategies to control these commercially significant fish pathogens.
Article
Full-text available
Many viruses of the Flaviviridae family, including the Zika virus (ZIKV), are human pathogens of significant public health concerns. Despite e xtensiv e research, there are currently no appro v ed v accines a v ailable f or ZIKV and specifically no liv e-attenuated Zika v accine. In this current study, we suggest a novel computational algorithm for generating live-attenuated vaccines via the introduction of silent mutation into regions that undergo selection for strong or weak local RNA folding or into regions that exhibit medium levels of sequence conservation. By implementing our approach to the ZIKV genome, we demonstrated strong correlation between the degree of conserved RNA local energy disruption and replicative ability of the viruses in Vero cells. In vivo analysis in the AG129 mouse model demonstrated the ability of the attenuated ZIKV strains to stimulate protective immune response against the wild-type virus. In some cases, up to 80% of the AG129 mice survived both the vaccination and the challenge with the wild-type strains, while 0% of the non v accinated mice survived the challenge. Our study provides a blueprint for a computational design of liv e-attenuated v accine strains that still preserve immunogenic epitopes of the original RNA viruses. We belie v e that the approach is generic and can be used successfully for additional viruses.
Article
The aquatic microbial communities are relatively complex with a high level of prokaryotic diversities. Standard and novel microbiological, analytical, and statistical techniques were employed in the collection, analysis, and comprehensive study of the samples. From the study, the mean values of total heterotrophic bacteria were determined to be 1.35 ± 0.18 (×107), 1.59 ± 0.64 (×107), and 1.56 ± 0.52 (×107) for upstream, midstream, and downstream, respectively, while the mean values for crude oil-utilizing bacteria were 1.08 ± 0.12 (×106), 1.28 ± 0.58 (×106), and 1.24 ± 0.44 (×106) for upstream, midstream, and downstream, respectively. The mean concentrations of physicochemical parameters and heavy metals obtained in the benthic sediment were significantly higher than the mean concentrations in the water samples (tidal and intertidal) (p < 0.02). The result of the component analysis of total petroleum hydrocarbons (including polycyclic aromatic hydrocarbon and benzene, toluene, ethylbenzene, and xylene) revealed significantly higher levels in the sediment than tidal and intertidal water. Structural metagenomics revealed seven top phyla, namely, Proteobacteria, Actinobacteria, Firmicutes, Bacteroidetes, Planctomycetes, Fusobacteria, and Chloroflexi. The significantly (p < 0.05) high levels of hydrocarbon-utilizing microorganisms in Iko River estuary can be taken as a sensitive index of environmental exposure to hydrocarbon pollutants in the estuary. This research revealed the response patterns of microorganisms to natural and anthropogenic gradients toward sustainability in Iko River estuary.
Preprint
Genomic data are becoming increasingly valuable as we develop methods to utilize the information at scale and gain a greater understanding of how genetic information relates to biological function. Advances in synthetic biology and the decreased cost of sequencing are increasing the amount of privately held genomic data. As the quantity and value of private genomic data grows, so does the incentive to acquire and protect such data, which creates a need to store and process these data securely. We present an algorithm for the Secure Interrogation of Genomic DataBases (SIG-DB). The SIG-DB algorithm enables databases of genomic sequences to be searched with an encrypted query sequence without revealing the query sequence to the Database Owner or any of the database sequences to the Querier. SIG-DB is the first application of its kind to take advantage of locality-sensitive hashing and homomorphic encryption to allow generalized sequence-to-sequence comparisons of genomic data.
Preprint
Full-text available
The Chinese giant salamander (CGS, Andrias davidianus ), a flagship amphibian species, is highly vulnerable to high temperatures, posing a significant threat under future climate change. Previous research linked this susceptibility to liver energy deficiency, accompanied by shifts in gut microbiota and reduced food conversion rates, raising questions about the role of the gut-liver axis in mediating thermal intolerance. This study investigated CGS responses to a temperature gradient (10–30°C), assessing physiological changes alongside histological, gut metagenomic, and tissue transcriptomic analyses. Temperatures above 20°C led to mortality and delayed growth. Histological and transcriptomic data revealed metabolic exhaustion and liver fibrosis in heat-stressed salamanders, underscoring the liver's critical role in thermal intolerance. While heat stress altered the gut microbiota's community structure, their functional profiles, especially in nutrient absorption and transformation, remained stable. Both gut and liver showed temperature-dependent transcriptional changes, sharing some common variations in actins, heat shock proteins, and genes related to transcription and translation. However, their energy metabolism exhibited opposite trends: it was downregulated in the liver but upregulated in the gut, with the gut showing increased activity in the pentose phosphate pathway and oxidative phosphorylation, potentially countering metabolic exhaustion. These findings suggest that the gut and its microbiota are less sensitive to high temperatures than the liver, and the gut-liver axis may not be central to CGS thermal sensitivity. This study enhances mechanistic understanding of CGS heat susceptibility, providing a foundation for targeted conservation strategies in the face of climate change.
Book
"Teknik Desain Primer Real-Time PCR" adalah panduan komprehensif yang menggali seluk-beluk desain primer untuk aplikasi Real-Time PCR. Buku ini memadukan teori fundamental dengan praktik terkini, membimbing pembaca dari konsep dasar hingga strategi canggih dalam optimasi primer. Dimulai dengan pengenalan tentang Real-Time PCR, buku ini kemudian mendalami karakteristik primer yang ideal dan faktor-faktor kritis dalam desainnya. Pembaca akan mempelajari langkah-langkah sistematis dalam proses desain, termasuk pemilihan sekuens target, pertimbangan GC content, dan strategi untuk menghindari struktur sekunder yang tidak diinginkan. Buku ini juga mencakup penggunaan berbagai alat dan software untuk desain primer, serta teknik optimasi dan validasi. Studi kasus praktis memberikan wawasan tentang aplikasi di dunia nyata. Yang menonjol adalah bab tentang tren terbaru, yang mengeksplorasi penggunaan AI, nanoteknologi, dan pendekatan inovatif lainnya dalam desain primer. Dengan penekanan pada integrasi pengetahuan tradisional dan teknologi mutakhir, buku ini menjadi sumber daya berharga bagi peneliti, teknisi laboratorium, dan mahasiswa yang ingin menguasai seni dan ilmu desain primer untuk Real-Time PCR.
Article
Full-text available
Main conclusion The chromosome-level genome assembly of Citrullus colocynthis reveals its genetic potential for enhancing drought tolerance, paving the way for innovative crop improvement strategies. Abstract This study presents the first comprehensive genome assembly and annotation of Citrullus colocynthis, a drought-tolerant wild close relative of cultivated watermelon, highlighting its potential for enhancing agricultural resilience to climate change. The study achieved a chromosome-level assembly using advanced sequencing technologies, including PacBio HiFi and Hi-C, revealing a genome size of approximately 366 Mb with low heterozygosity and substantial repetitive content. Our analysis identified 23,327 gene models, that could encode stress response mechanisms for species’ adaptation to arid environments. Comparative genomics with closely related species illuminated the evolutionary dynamics within the Cucurbitaceae family. In addition, resequencing of 27 accessions from the United Arab Emirates (UAE) identified genetic diversity, suggesting a foundation for future breeding programs. This genomic resource opens new avenues for the de novo domestication of C. colocynthis, offering a blueprint for developing crops with enhanced drought tolerance, disease resistance, and nutritional profiles, crucial for sustaining future food security in the face of escalating climate challenges.
Article
Full-text available
The accurate identification and prioritization of antigenic peptides is crucial for the development of personalized cancer immunotherapies. Publicly available pipelines to predict clinical neoantigens do not allow direct integration of mass spectrometry immunopeptidomics data, which can uncover antigenic peptides derived from various canonical and noncanonical sources. To address this, we present an end-to-end clinical proteogenomic pipeline, called NeoDisc, that combines state-of-the-art publicly available and in-house software for immunopeptidomics, genomics and transcriptomics with in silico tools for the identification, prediction and prioritization of tumor-specific and immunogenic antigens from multiple sources, including neoantigens, viral antigens, high-confidence tumor-specific antigens and tumor-specific noncanonical antigens. We demonstrate the superiority of NeoDisc in accurately prioritizing immunogenic neoantigens over recent prioritization pipelines. We showcase the various features offered by NeoDisc that enable both rule-based and machine-learning approaches for personalized antigen discovery and neoantigen cancer vaccine design. Additionally, we demonstrate how NeoDisc’s multiomics integration identifies defects in the cellular antigen presentation machinery, which influence the heterogeneous tumor antigenic landscape.
Article
Full-text available
The medical mushroom Ganoderma resinaceum Boud., 1889, is of great interest in pharmacy due to its diverse functional active ingredients. However, the mitochondrial genome of G. resinaceum remains unexplored. Here, we present the complete mitochondrial genome of G. resinaceum, which spans 67,458 bp and has a GC content of 25.65%. This genome encompasses 15 core protein-coding genes, 8 independent ORFs, 15 intronic ORFs, 27 tRNAs, and 2 rRNA genes. Through phylogenetic analysis using Bayesian inference (BI), we elucidated the evolutionary relationships among 34 Basidiomycota fungi, revealing distinct clades and indicating a close relationship between G. resinaceum and G. subamboinense.
Article
Lentil (Lens culinaris Medik) is an essential pulse crop that is widely grown for its high nutritional value, notably its high protein content, making it an important dietary component for vegetarians and vegans. Despite being the world's fifth most produced pulse, with large contributions from Canada and India, lentil production confronts obstacles such as poor productivity due to limited genetic improvement against biotic and abiotic stresses under rainfed cultivation conditions. Recent advances in lentil genetics and genomics, such as the discovery of genes related to yield, disease resistance, and nutritional content, have boosted breeding efforts to generate improved lentil Review Article Kesari et al.; J. Exp.
Chapter
Immunoinformatics, a dynamic field bridging immunology and bioinformatics, plays a vital role in vaccine development. This interdisciplinary approach integrates immunological principles with computational biology and artificial intelligence (AI) techniques to accelerate vaccine discovery and design for infectious diseases, cancer, and emerging pathogens, offering promising solutions to global health challenges. This chapter explores recent advancements, existing challenges, and future prospects, highlighting the transformative impact of immunoinformatics and AI on biomedical research and public health initiatives.
Article
Full-text available
This work aimed to create software capable of presenting nucleotide sequences in a form convenient for recognition and comparison by humans. For this, the method of DNA walk by vectors of different lengths in the directions North-South- West-East was chosen. Each nucleotide has its direction. It is shown that diagrams of the triander type, represented by three branches, each corresponding to the posi- tion of a nucleotide in a codon, are similar for the same genes of different biolo- gical species and may differ for different genes. Comparing the diagrams allows one to notice even minor differences between gene sequences for different species of the same genus. The sources and binaries for the Windows operating system of Triander software were placed at https://icbge.org.ua/eng/Triander. The Web application jsTriander is located at https://triander.icbge.org.ua and can be used both online and offline.
Article
Full-text available
Main conclusion Mechanical stress induces distinct anatomical, molecular, and morphological changes in Urtica dioica, affecting trichome development, gene expression, and leaf morphology under controlled conditions Abstract The experiments were performed on common nettle, a widely known plant characterized by high variability of leaf morphology and responsiveness to mechanical touch. A specially constructed experimental device was used to study the impact of mechanical stress on Urtica dioica plants under strictly controlled parameters of the mechanical stimulus (touching) and environment in the growth chamber. The general anatomical structure of the plants that were touched was similar to that of control plants, but the shape of the internodes' cross section was different. Stress-treated plants showed a distinct four-ribbed structure. However, as the internodes progressed, the shape gradually approached a rectangular form. The epidermis of control plants included stinging, glandular and simple setulose trichomes, but plants that were touched had no stinging trichomes, and setulose trichomes accumulated more callose. Cell wall lignification occurred in the older internodes of the control plants compared to stress-treated ones. Gene analysis revealed upregulation of the expression of the UdTCH1 gene in touched plants compared to control plants. Conversely, the expression of UdERF4 and UdTCH4 was downregulated in stressed plants. These data indicate that the nettle's response to mechanical stress reaches the level of regulatory networks of gene expression. Image analysis revealed reduced leaf area, increased asymmetry and altered contours in touched leaves, especially in advanced growth stages, compared to control plants. Our results indicate that mechanical stress triggers various anatomical, molecular, and morphological changes in nettle; however, further interdisciplinary research is needed to better understand the underlying physiological mechanisms.
Article
Full-text available
The Mariana Trench (MT) is the deepest part of the ocean on Earth. Previous studies have described the microbial community structures and functional potential in the seawater and surface sediment of MT. Still, the metabolic features and adaptation strategies of the microorganisms involved in nitrogen cycling processes are poorly understood. In this study, comparative metagenomic approaches were used to study microbial nitrogen cycling in three MT habitats, including hadal seawater [9,600–10,500 m below sea level (mbsl)], surface sediments [0–46 cm below seafloor (cmbsf) at a water depth between 7,143 and 8,638 mbsl], and deep sediments (200–306 cmbsf at a water depth of 8,300 mbsl). We identified five new nitrite-oxidizing bacteria (NOB) lineages that had adapted to the oligotrophic MT slope sediment, via their CO2 fixation capability through the reductive tricarboxylic acid (rTCA) or Calvin-Benson-Bassham (CBB) cycle; an anammox bacterium might perform aerobic respiration and utilize sedimentary carbohydrates for energy generation because it contains genes encoding type A cytochrome c oxidase and complete glycolysis pathway. In seawater, abundant alkane-oxidizing Ketobacter species can fix inert N2 released from other denitrifying and/or anammox bacteria. This study further expands our understanding of microbial life in the largely unexplored deepest part of the ocean. IMPORTANCE The metabolic features and adaptation strategies of the nitrogen cycling microorganisms in the deepest part of the ocean are largely unknown. This study revealed that anammox bacteria might perform aerobic respiration in response to nutrient limitation or O2 fluctuations in the Mariana Trench sediments. Meanwhile, an abundant alkane-oxidizing Ketobacter species could fix N2 in hadal seawater. This study provides new insights into the roles of hadal microorganisms in global nitrogen biogeochemical cycles. It substantially expands our understanding of the microbial life in the largely unexplored deepest part of the ocean.
Article
Full-text available
Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward. We summarize the foundation models developed based on RNA sequence data, DNA sequence data, protein sequence data, single-cell transcriptome data, and spatial transcriptome data respectively, and further discuss the research directions for the development of foundation models in molecular biology.
Article
Full-text available
Camellia crapnelliana Tutch., belonging to the Theaceae family, is an excellent landscape tree species with high ornamental values. It is particularly an important woody oil-bearing plant species with high ecological, economic, and medicinal values. Here, we first report the chromosome-scale reference genome of C. crapnelliana with integrated technologies of SMRT, Hi-C and Illumina sequencing platforms. The genome assembly had a total length of ~2.94 Gb with contig N50 of ~67.5 Mb, and ~96.34% of contigs were assigned to 15 chromosomes. In total, we predicted 37,390 protein-coding genes, ~99.00% of which could be functionally annotated. The chromosome-scale genome of C. crapnelliana will become valuable resources for understanding the genetic basis of the fatty acid biosynthesis, and greatly facilitate the exploration and conservation of C. crapnelliana.
Article
Full-text available
Bioinformatics plays a role, in the field of plant science today. With an increase in data volume, there is a growing demand for tools and methods for managing, visualizing, implementing, evaluating, modeling, and predicting this data. However many biology researchers may lack familiarity with the bioinformatics resources, which can lead to missed opportunities and misinterpretation of the data. In this review article, we highlighted the web resources that offer analysis capabilities for plant research data including genomics, transcriptomics, comparative genomics, bio-ontologies, sequence and structural comparisons plant disease related databases well as proteomics databases. Additionally we provide insights into integrated modules found within these resources that are specifically tailored for analyzing plant associated data. Overall this review aims to assist plant researchers in accessing bioinformatics resources for their data analysis needs while promoting the use of bioinformatics tools to effectively address experimental challenges, within the field of plant sciences.
Article
Full-text available
Ten Clostridioides difficile isolates representing the top 10 ribotypes collected in 2016 through the Emerging Infections Program underwent long-read sequencing to obtain high-quality reference genome assemblies. These isolates are publicly available through the CDC & FDA Antibiotic Resistance Isolate Bank.
Article
Full-text available
The computational detection of similarities between protein 3D structures has become an indispensable tool for the detection of homologous relationships, the classification of protein families and functional inference. Consequently, numerous algorithms have been developed that facilitate structure comparison, including rapid searches against a steadily growing collection of protein structures. To this end, NCBI’s Molecular Modeling Database (MMDB), which is based on the Protein Data Bank (PDB), maintains a comprehensive and up-to-date archive of protein structure similarities computed with the Vector Alignment Search Tool (VAST). These similarities have been recorded on the level of single proteins and protein domains, comprising in excess of 1.5 billion pairwise alignments. Here we present VAST+, an extension to the existing VAST service, which summarizes and presents structural similarity on the level of biological assemblies or macromolecular complexes. VAST+ simplifies structure neighboring results and shows, for macromolecular complexes tracked in MMDB, lists of similar complexes ranked by the extent of similarity. VAST+ replaces the previous VAST service as the default presentation of structure neighboring data in NCBI’s Entrez query and retrieval system. MMDB and VAST+ can be accessed via http://www.ncbi.nlm.nih.gov/Structure.
Article
Full-text available
Virus Variation (http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/) is a comprehensive, web-based resource designed to support the retrieval and display of large virus sequence datasets. The resource includes a value added database, a specialized search interface and a suite of sequence data displays. Virus-specific sequence annotation and database loading pipelines produce consistent protein and gene annotation and capture sequence descriptors from sequence records then map these metadata to a controlled vocabulary. The database supports a metadata driven, web-based search interface where sequences can be selected using a variety of biological and clinical criteria. Retrieved sequences can then be downloaded in a variety of formats or analyzed using a suite of tools and displays. Over the past 2 years, the pre-existing influenza and Dengue virus resources have been combined into a single construct and West Nile virus added to the resultant resource. A number of improvements were incorporated into the sequence annotation and database loading pipelines, and the virus-specific search interfaces were updated to support more advanced functions. Several new features have also been added to the sequence download options, and a new multiple sequence alignment viewer has been incorporated into the resource tool set. Together these enhancements should support enhanced usability and the inclusion of new viruses in the future.
Article
Full-text available
New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.
Article
Full-text available
The UniProt Knowledgebase (UniProtKB) acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information. Manual and automatic annotation procedures are used to add data directly to the database while extensive cross-referencing to more than 120 external databases provides access to additional relevant information in more specialized data collections. UniProtKB also integrates a range of data from other resources. All information is attributed to its original source, allowing users to trace the provenance of all data. The UniProt Consortium is committed to using and promoting common data exchange formats and technologies, and UniProtKB data is made freely available in a range of formats to facilitate integration with other databases. Database URL: http://www.uniprot.org/
Article
The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.
Article
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.