Science topic

DNA Sequence Alignment - Science topic

Explore the latest publications in DNA Sequence Alignment, and find DNA Sequence Alignment experts.
Filters
All publications are displayed by default. Use this filter to view only publications with full-texts.
Publications related to DNA Sequence Alignment (10,000)
Sorted by most recent
Presentation
Full-text available
It will help the beginners of Bioinformatics to understand the concept of multiple sequence alignment.
Preprint
Full-text available
Advances in whole genome sequencing have led to a rapid and ongoing increase in the amount of sequence data available, but 40-50% of known genes have no functional annotation and only 25-30% have specific functional annotations. Current functional annotation approaches typically rely on computationally expensive pairwise or multiple sequence alignm...
Article
Full-text available
Purpose Our study identifies a novel HESX1 variant in two siblings, resulting in isolated growth hormone deficiency (IGHD) associated with empty sella. To the best of our knowledge, this represents the second recognized mutation within the EH1 repressor domain in HESX1. We explore and interpret the potential mechanism, with the aim of guiding pedia...
Article
Full-text available
This research introduces a multi-module framework to derive weekly representative travel patterns from single-day travel diaries. The methodology first uses hierarchical clustering to group samples with similar activity patterns, followed by progressive multiple sequence alignment to construct day-level representative patterns. These day-level patt...
Article
Full-text available
Viromics produces millions of viral genomes and fragments annually, overwhelming traditional sequence comparison methods. Here we introduce Vclust, an approach that determines average nucleotide identity by Lempel–Ziv parsing and clusters viral genomes with thresholds endorsed by authoritative viral genomics and taxonomy consortia. Vclust demonstra...
Preprint
Full-text available
With the development of sequencing technologies, chromosome-level genome assemblies have become increasingly common across various organisms, including non-model species. BLAST+ is one of the most widely used bioinformatics tools for computing sequence alignments, offering numerous optimizations for speed and scalability. Dot plots, which visualize...
Article
Full-text available
In the field of computer vision, the task of image annotation and classification has attracted much attention due to its wide demand in applications such as medical image analysis, intelligent surveillance, and image retrieval. However, existing methods have significant limitations in dealing with unknown target domain data, which are manifested in...
Preprint
Full-text available
Deep learning approaches like AlphaFold 2 (AF2) have revolutionized structural biology by accurately predicting the ground state structures of proteins. Recently, clustering and subsampling techniques that manipulate multiple sequence alignment (MSA) inputs into AlphaFold to generate conformational ensembles of proteins have also been proposed. Alt...
Preprint
Full-text available
Conventional approaches to heterologous gene expression rely on codon optimization, which is limited to swapping synonymous codons and often fails to capture deeper adaptive changes. In contrast, naturally evolved orthologous genes between species often differ by more than just synonymous substitutions - they can include non-synonymous mutations, i...
Preprint
Full-text available
Conventional genome mapping-based approaches systematically miss genetic variation, particularly in regions that substantially differ from the reference. To explore this hidden variation, we examined unmapped and poorly mapped reads from the genomes of 640 human individuals from South Asian (SAS) populations in the 1000 Genomes Project and the Simo...
Article
Full-text available
Surveillance for food safety in the United States of America is a collaborative effort among public health agencies with additional partners worldwide contributing sequence data. Assemblies in GenBank and sequence reads in the Sequence Read Archive for surveilled species are received, rapidly analyzed, and results published publicly by an automated...
Article
Full-text available
Tripartite motif (TRIM) proteins, defined by their conserved RBCC domain architecture, play key roles in various cellular processes and virus-host interactions. In this review, we focus on Class VI TRIM proteins, including TRIM24, TRIM28, and TRIM33, highlighting the distinct functional attributes of their RING, B-BOX1, B-BOX2, COILED COIL, PHD, an...
Preprint
Full-text available
The Composition Vector Tree (CVTree) method, developed under the leadership of Professor Hao Bailin, is an alignment-free algorithm for constructing phylogenetic trees. Although initially designed for studying prokaryotic evolution based on whole-genome, it has demonstrated broad applicability across diverse biological systems and gene sequences. I...
Article
Full-text available
Genetic equidistance phenomenon (GEP): A simpler species is equally distantly related to two or more complex species in a sequence alignment when the genetic distance is measured by the identity matrix or percentage non-identity. Maximum genetic diversity (MGD): The upper limit in genetic distance or diversity, as measured by the identity matrix or...
Presentation
Full-text available
It will be helpful for the beginners of bioinformatics to understand sequence alignment.
Preprint
Full-text available
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems are a fundamental defense mechanism in prokaryotes, where short sequences called spacers are stored in the host genome to recognize and target exogenous genetic elements. Viromics, the study of viral communities in environmental samples, relies heavily on identifying these s...
Preprint
Full-text available
Consistently accurate 3D nucleic acid structure prediction would facilitate studies of the diverse RNA and DNA molecules underlying life. In CASP16, blind predictions for 42 targets canvassing a full array of nucleic acid functions, from dopamine binding by DNA to formation of elaborate RNA nanocages, were submitted by 65 groups from 46 different l...
Article
Full-text available
Streptococcus suis is a major bacterial pathogen in the swine industry, causing meningitis, arthritis, and other diseases in infected pigs. It also poses significant public health risks due to its zoonotic potential, particularly in individuals with skin lesions. Current detection methods, including traditional culture-based techniques and PCR assa...
Article
Full-text available
The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity....
Article
Full-text available
Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU’s perfor...
Preprint
Full-text available
Recent advances in audio-visual learning have shown promising results in learning representations across modalities. However, most approaches rely on global audio representations that fail to capture fine-grained temporal correspondences with visual frames. Additionally, existing methods often struggle with conflicting optimization objectives when...
Research
Full-text available
Zero-shot temporal grounding is an emerging challenge in video analysis, where the goal is to localize specific events or actions in a video given a natural language description, without requiring task-specific training data. This approach leverages the power of pretrained language models (PLMs), which have been shown to capture rich semantic repre...
Preprint
Full-text available
Identifying recurrent changes in biological sequences is important to multiple aspects of biological research -from understanding the molecular basis of convergent phenotypes, to pinpointing the causative sequence changes that give rise to antibiotic resistance and disease. Here, we present RECUR, a method for identifying recurrent amino acid subst...
Preprint
Full-text available
Data assimilation combines information from physical observations and numerical simulation results to obtain better estimates of the state and parameters of a physical system. A wide class of physical systems of interest have solutions that exhibit the formation of structures, called features, which have to be accurately captured by the assimilatio...
Preprint
Full-text available
Accurate prediction of RNA secondary structures is essential for understanding the conformation, function, and interactions of RNA. Leveraging co-evolutionary information across species through multiple sequence alignments (MSAs) has been proven to be effective in improving molecular structure predictions. However, existing deep learning approaches...
Preprint
Full-text available
Ribosomal RNA (rRNA) is methylated in organisms ranging from bacteria to metazoans. Despite the pervasiveness of rRNA methylation in biology, the function of rRNA methylation on ribosome function is poorly understood. In this work, we identify a biological function for the rRNA 2′-O-methylcytidine methyltransferase TlyA, conserved between Bacillus...
Article
Full-text available
Background The role of the respiratory microbiome in lung diseases is increasingly recognized, with the potential migration of respiratory pathogens being a significant clinical consideration. Despite its importance, evidence elucidating this phenomenon remains scarce. Methods This prospective study collected clinical samples from patients with su...
Preprint
Full-text available
Motivation As DNA data storage systems gain popularity, the need for an efficient trace reconstruction algorithm becomes increasingly important. These algorithms aim to reconstruct the original encoded sequence from its noisy sequenced copies (or “traces”), enabling a faster and more reliable decoding process. Previous works have often been adaptat...
Article
Full-text available
Competence type IV pili (T4P) are bacterial surface appendages that facilitate DNA uptake during horizontal gene transfer by natural transformation. These dynamic structures actively extend from the cell surface, bind to DNA in the environment, and then retract to import bound DNA into the cell. Competence T4P are found in diverse Gram-negative (di...
Article
Full-text available
Background: The rapid increase in nucleotide sequence data generated by next-generation sequencing (NGS) technologies demands efficient computational tools for sequence comparison. Alignment-free (AF) methods offer a scalable alternative to traditional alignment-based approaches such as BLAST. This study evaluates alignment-free methods as scalabl...
Preprint
Full-text available
The 2024 Nobel Prize in Chemistry was awarded in part for protein structure prediction using AlphaFold2, an artificial intelligence/machine learning (AI/ML) model trained on vast amounts of sequence and 3D structure data. AlphaFold2 and related models, including RoseTTAFold and ESMFold, employ specialized neural network architectures driven by atte...
Article
Full-text available
Puccinia triticina (Pt) is a heteroecious fungus needing two different plants as primary and alternate hosts throughout its life cycle. Thalictrum spp. were first identified as alternate hosts of Pt in 1921, and over 100 species have been identified. However, within China, only T. petaloideum L., T. minus L., T. minus var. hypoleucum and T. baicale...
Article
Full-text available
Deep learning (DL) has become a powerful tool for the recognition and classification of biological sequences. However, conventional single-architecture models often struggle with suboptimal predictive performance and high computational costs. To address these challenges, we present EnsembleDL-Lipo, an innovative ensemble deep learning framework tha...
Article
Full-text available
A pangenome graph represents the genomes of multiple individuals, offering a comprehensive reference and overcoming allele bias from linear reference genomes. Sequence-to-graph alignment, crucial for pangenome tasks, aligns sequences to a graph to find the best matches. However, existing algorithms struggle with large-scale sequences. In this paper...
Article
Full-text available
Few-shot semantic segmentation aims to accurately segment objects from a limited amount of annotated data, a task complicated by intra-class variations and prototype representation challenges. To address these issues, we propose the Multi-Scale Prototype Convolutional Network (MPCN). Our approach introduces a Prior Mask Generation (PMG) module, whi...
Article
Full-text available
Corynebacterium glutamicum is a diderm bacterium extensively used in the industrial-scale production of amino acids. Corynebacteria belong to the bacterial family Mycobacteriaceae, which is characterized by a highly unusual cell envelope with an outer membrane consisting of mycolic acids, called mycomembrane. The mycomembrane is further coated by a...
Article
Full-text available
Introduction. Cancer is accounting for 16.8% of all deaths and 22.8% of noncommunicable disease-related deaths, approximately. The diagnostic, prognostic, and therapeutic aspects of patient management majorly depend on mutations that drive the oncogenic process. However, evaluating the clinical significance of the variant is a major challenge, as m...
Thesis
Full-text available
The goal of this study was to use DNA extraction and sequencing to characterize Fusarium species isolated from diseased Striga hermonthica leaves. Polymerase Chain Reaction (PCR) and Gel Electrophoresis were employed to extract DNA from 13 samples. . 12 samples amplified successfully and showed clear bands. The DNA samples were then sent to INCABA...
Preprint
Full-text available
Background Cinnamomum camphora var. linaloolifera is known for its richness in linalool, which is an acyclic monoterpene widely used in the fragrance and flavour industries. However, limited information is available regarding the genome-wide identification and characterization of the key genes for linalool synthesis in C. camphora var. linaloolifer...
Preprint
Full-text available
Motivation: The analysis of enzyme active sites is essential for understanding their activity in terms of catalyzed reaction and substrate specificity, providing insights for engineering to obtain targeted properties or modify the substrate scope. In 2010, a first version of the Active Site Modeling and Clustering (ASMC) workflow was published. ASM...
Article
Full-text available
Actinomadura isolates obtained from seven human mycetoma cases in Mexico were characterized using nucleotide sequence analysis of a portion of the small subunit ribosomal RNA gene. Most isolates were identified as Actinomadura madurae. However, one isolate, LIID-AQ337, showed inconclusive results. To determine its identity, genomic DNA from LIID-AQ...
Article
Full-text available
The Zika virus's (ZIKV) Non-Structural (NS5) protein is an important target for developing antiviral medications and is essential to the virus's reproduction mechanism. An integrated approach to bioinformatics examines its evolutionary, dynamic, and structural features. Multiple sequence alignment and evolutionary trees are necessary for protein fu...
Article
Full-text available
Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections has made it effectively impossible to search these data using tools such as the Basic Local Alignment Search Tool (BLAST) and its successors. Here, we present a technique ca...
Preprint
Full-text available
Phylogenetic trees are simple models of evolutionary processes. They describe conditionally independent divergent evolution of taxa from common ancestors. Phylogenetic trees commonly do not have enough flexibility to adequately model all evolutionary processes. For example, introgressive hybridization, where genes can flow from one taxon to another...
Preprint
Full-text available
Background The rapid advancement of sequencing technologies has drastically increased the availability of plant genomic and transcriptomic data, shifting the challenge from data generation to functional interpretation. Identifying genes involved in specialized metabolism remains difficult. While coexpression analysis is a widely used approach to id...
Article
Full-text available
Lilies are one of the most popular ornamental flowers in the world. However, the abundant pollen produced in their anthers causes significant inconvenience for producers and consumers. Pollen abortion induced by molecular breeding techniques is one of the effective ways to solve this problem. In this study, the LoTDF1 gene, which is involved in reg...
Article
Full-text available
Ancestral sequence reconstruction is typically performed using homogeneous evolutionary models, which assume that the same substitution propensities affect all sites and lineages. These assumptions are routinely violated: heterogeneous structural and functional constraints favor different amino acids at different sites, and these constraints often...
Preprint
Full-text available
Protein language models are significantly advancing the modeling of sequence-function relationships. However, most of them are not directly informed of homology and evolutionary relationships between protein sequences. Here, we propose a method to make them homology-aware. We introduce RAG‐ESM, a retrieval‐augmented framework that allows to conditi...
Preprint
Full-text available
Palaeoproteomic data can provide invaluable insights into hominid evolution over long timescales. Yet, the potential and limitations of ancient protein sequences to resolve evolutionary relations between species remains largely unexplored. In this study, we aim to quantify how much information about these relations can be obtained from limited anci...
Article
Full-text available
Genomic data analysis is a critical field in modern biological research, especially with the advent of high-throughput sequencing technologies. The sheer volume and complexity of genomic data necessitate the development and application of mathematical and statistical methods to extract meaningful insights. Mathematical approaches are indispensable...
Article
Full-text available
The 16S rRNA gene is frequently sequenced to classify prokaryotes and identify new taxa. If sequences from two strains share less than ~99% identity, the strains are usually classified as different species. Classification thresholds for genera and other ranks have also been proposed, but they are based on dated datasets. Here we update these thresh...
Article
Full-text available
TRACK is a user-friendly Snakemake workflow designed to streamline the discovery and comparison of tandem repeats (TRs) across species. TRACK facilitates the cataloging and filtering of TRs based on reference genomes or T2T transcripts, and applies reciprocal LiftOver and sequence alignment methods to identify putative homologous TRs between specie...
Preprint
Full-text available
RNA molecules play a crucial role in various cellular functions, making understanding their three-dimensional (3D) structures vitally important. Cryogenic electron microscopy (cryo-EM) has significantly advanced the study of RNA structures. However, most existing modeling tools are primarily developed for proteins, making them less effective for RN...
Article
Full-text available
Research on bacteriophages, the viruses infecting bacteria, has fueled the development of modern molecular biology and inspired their therapeutic application to combat bacterial multidrug resistance. However, most work has so far focused on a few model phages which impedes direct applications of these findings in clinics and suggests that a vast po...
Conference Paper
Full-text available
Introduction: Current Newcastle Disease Virus (NDV) sequence typing relies on the amplification and sequencing of the full length of the Fusions (F) gene. The amplification is performed in a two-step process. The initial step is the amplification of the initial (5’ end) 650 – 750 bp of the F gene using two or three primer sets, in two or three sepa...
Article
Full-text available
Identifying DNA-binding proteins and their binding residues is critical for understanding diverse biological processes, but conventional experimental approaches are slow and costly. Existing machinelearningmethods,whilefaster,oftenlackaccuracyandstrugglewithdataimbalance,relying heavily on evolutionary profiles like PSSMs and HMMs derived from mult...
Article
Full-text available
Heterologous expression in well-studied model strains is a routinely applied method to investigate biosynthetic pathways. Here, we pursue a comparative approach of large-scale DNA-affinity-capturing assays (DACAs) coupled with semi-quantitative mass spectrometry (MS) to identify putative regulatory proteins from Streptomyces coelicolor M512, which...
Article
Full-text available
Accurately genotyping structural variant (SV) alleles is crucial to genomics research. We present a novel method (kanpig) for genotyping SVs that leverages variant graphs and k-mer vectors to rapidly generate accurate SV genotypes. Benchmarking against the latest SV datasets shows kanpig achieves a single-sample genotyping concordance of 82.1%, sig...
Article
Full-text available
Background Grass carp (Ctenopharyngodon idella) hemorrhagic disease (GCHD) is a devastating disease that leads to substantial economic losses in the freshwater aquaculture industry. Results In this study, we investigated an outbreak of GCHD in large-scale grass carp and identified GCRV-II infection. Notably, hematoxylin and eosin (H&E) staining sh...
Preprint
Full-text available
FUSE-PhyloTree is a phylogenomic analysis software for identifying local sequence conservation associated with the different functions of a multi-functional (e.g., paralogous or multi-domain) protein family. FUSE-PhyloTree introduces an original approach that combines advanced sequence analysis with phylogenetic methods. First, local sequence conse...
Preprint
Full-text available
We present a robust phylogenetic inference method, called the trimmed log-likelihood method, which effectively identifies fast-evolving, saturated, or erroneous sites in both simulated and empirical multiple sequence alignments. This method avoids circularity by dynamically identifying and removing sites without relying on an initial tree, allowing...
Preprint
Full-text available
Human protein kinases constitute a large superfamily of about 500 genes, historically classified into subfamilies based on phylogenetic relationship. However, many kinases remain unclassified. Phylogeny is typically based on multiple sequence alignments, and neglects the physico-chemical properties of residues at each position of the sequence. By i...
Article
Full-text available
DNA edit distance (ED) measures the minimum number of single nucleotide insertions, substitutions, or deletions required to convert a DNA sequence into another. ED has broad applications in healthcare such as sequence alignment, genome assembly, functional annotation, and drug discovery. Privacy-preserving computation is essential in this context t...
Article
Full-text available
Marek’s virus disease (MDV) is an important avian disease of tumorigenesis nature, immunosuppression outcome, and significant economic impacts. The disease is controlled by mass vaccination supported with good management practices that restrict tumor formation. However, these practices failed in controlling virus spread which continues to evolve an...
Article
Full-text available
Background Cinnamoyl-CoA reductase (CCR) is the first important and committed enzyme in the monolignol synthesis branch of the lignin biosynthesis (LB) pathway, catalyzing the conversion of cinnamoyl-CoAs to cinnamaldehydes and is crucial for the growth of Linum usitatissimum (flax), an important fiber crop. However, little information is available...
Article
Full-text available
Background: Pigs are vital agricultural animals, with growth traits serving as key indicators of their quality. Methods: In this study, we examined the mRNA expression of ENPP1 as a candidate gene in heart, liver, spleen, lungs, and kidneys at 3 days and 6 months of age by real-time polymerase chain reaction method and single-nucleotide polymorphis...
Article
Full-text available
Background Hereditary hemorrhagic telangiectasia (HHT) is an autosomal dominant disorder with variable manifestations, including recurrent epistaxis, telangiectasias, arteriovenous malformations, and family history. It is caused by heterozygous null alleles of ENG, ACVRL1, SMAD4, or BMP9, with delayed clinical diagnosis. Genetic testing is crucial...
Preprint
Full-text available
Maximum Likelihood (ML) based phylogenetic inference is time- and resource-intensive, especially when initiating multiple independent inferences from distinct comprehensive tree topologies. Performing multiple independent inferences is often required to (sufficiently) explore the vast search space of possible unrooted binary tree topologies. Yet, t...
Article
Full-text available
Melanoseris, a diverse genus in the Lactucinae subtribe, has 21 species in China, with 13 being endemic. The high diversity of this genus presents taxonomic challenges, particularly in the M. cyanea group, where overlapping distributions and transitional morphological traits complicate classification. This study aims to analyze the chloroplast geno...
Article
Full-text available
The hybrid offspring of barbel chub Squaliobarbus curriculus and grass carp Ctenopharyngodon idella exhibit stronger resistance to the grass carp reovirus (GCRV) infection than grass carp. Toll-like receptors (TLRs) play indispensable roles in the antiviral immunity of fish. In this study, the structures and antiviral immune functions of barbel chu...
Article
Full-text available
The statistical selection of best-fit models of nucleotide substitution for multiple sequence alignments (MSAs) is routine in phylogenetics. Our analysis of model selection across three widely used phylogenetic programs (jModelTest2, ModelTest-NG, and IQ-TREE) demonstrated that the choice of program did not significantly affect the ability to accur...
Article
Full-text available
Objectives/Goals: To identify the genomic mechanisms underlying cross-species regulation of longevity among mammals and birds and to characterize the impact of those conserved pathways on human aging. More broadly, this study aims to develop a novel evolutionary approach to understand the genetics of complex traits. Methods/Study Population: High-q...
Preprint
Full-text available
Background: The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies—a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable t...
Article
Full-text available
This research delineates the development of a hybrid architecture for text-to-speech (TTS) synthesis, termed the Efficient Speech Synthesizer (ESS). ESS leverages an end-to-end training paradigm to holistically optimize all the parameters, thereby facilitating the synthesis of high-fidelity speech with remarkable efficiency, irrespective of sentenc...
Article
Full-text available
Pseudorabies virus (PRV), the causative agent of Aujeszky’s disease, is an infectious pathogen that significantly impacts the global swine industry. The broad host range of PRV enables it to infect various animals, including pigs, cattle, minks, dogs, and even humans. Although PRV infections in ruminants have been reported, the occurrence of natura...
Preprint
Full-text available
Objective The increasing threat of infectious diseases necessitates robust genomic resources for rapid and accurate analysis of human pathogens. The Human Pathogen Database (HPD) addresses this need by providing a comprehensive, curated collection of pathogen data. The database was built on high-quality, manually organized data of human pathogens,...
Article
Full-text available
Eukaryotic N‐degron pathways are proteolytic systems with the ability to recognize specific N‐terminal residues of substrate proteins, which are essential parts of their degradation signals. Domains, referred to as UBR boxes, of several E3 ubiquitin ligases can recognize basic N‐terminal residues as N‐degrons. UBR6 is among the seven mammalian UBR...
Article
Full-text available
Allosteric regulation of catalytic activity is a widespread property of multi‐enzyme complexes. The tryptophan synthase is a prototypical allosteric enzyme where the constituting α (TrpA) and β (TrpB) subunits mutually activate each other in a manner that is incompletely understood. Experimental and computational studies have shown that LBCA‐TrpB f...
Article
Full-text available
Increasing insights into how sequence motifs in intrinsically disordered regions (IDRs) provide functions underscore the need for systematic motif detection. Contrary to structured regions where motifs can be readily identified from sequence alignments, the rapid evolution of IDRs limits the usage of alignment‐based tools in reliably detecting moti...
Preprint
Full-text available
The computational complexity of many key bioinformatics problems has resulted in numerous alternative heuristic solutions, where no single approach consistently outperforms all others. This creates difficulties for users trying to identify the most suitable tool for their dataset and for developers managing and evaluating alternative methods. As da...
Preprint
Full-text available
Segmental duplications play an important role in genome evolution via their contribution to copy-number variation, gene-family diversification and the emergence of novel functions. The detection of segmental duplications is challenging due to heterogeneous amelioration of sequence similarity among duplicates, which hinders the reconstruction of con...
Preprint
Full-text available
The projection of conservation onto the surface of a protein's 3D structure is a powerful way of inferring functionally important regions. At present, the workflow for doing so can be involved and tedious. For this reason, we created ProteoSync, a Python program that semi-automates the process. The program creates an annotated sequence alignment of...
Preprint
Full-text available
In this study, we investigate the performance of nine protein structure alignment tools by analyzing the influence of alignment results in accomplishing three downstream biological tasks: homology detection, phylogeny reconstruction, and function inference. These tools include (1) traditional sequential methods using both 3D and 2D structure repres...
Preprint
Full-text available
The complexity in the generation of RNA multiple sequence alignments (MSAs) and assessment of the accuracy of such alignments contributes to the challenges in the utilization of RNA MSAs in diverse integrative methods. RNAhub is a freely available user-friendly web server for a reliable generation of RNA multiple sequence alignments and the detecti...
Preprint
Full-text available
The present study aimed to investigate the material basis of Persicaria runcinata var. sinensis (Hemsl.) Bo Li using widely targeted metabolomics and network pharmacology techniques. Efforts were also made to establish a DNA barcode for Persicaria runcinata var. sinensis (Hemsl.) Bo Li. Widely targeted metabolomics technique of Ultra performance li...
Preprint
Full-text available
Background Vaccines against SARS-CoV2 have been essential in controlling COVID-19 related mortality and have saved millions of lives. Adenoviral (Ad) based vaccines have played an integral part in this vaccine campaign, with licensed vaccines based on the simian Y25 isolate (Vaxzevria, Astrazeneca) and human Ad type 26 (Jcovden, Janssen) widely ad...
Article
Full-text available
Phylogenetic inference aims at reconstructing the tree describing the evolution of a set of sequences descending from a common ancestor. The high computational cost of state-of-the-art maximum likelihood and Bayesian inference methods limits their usability under realistic evolutionary models. Harnessing recent advances in likelihood-free inference...
Preprint
Full-text available
As AlphaFold achieves high-accuracy tertiary structure prediction for most single-chain proteins (monomers), the next 2 frontier in protein structure prediction lies in accurately modeling multi-chain protein complexes (multimers). We developed MULTICOM4, the latest version of the MULTICOM system, to improve protein complex structure prediction by...
Article
Full-text available
Statistically significant multiple sequence alignment construction is an important task that has many biological applications. We applied the method for multiple alignments of highly divergent sequences (MAHDS) to construct multiple sequence alignments (MSAs) for 490 protein families with less than 20% identity between family members. The method us...
Article
Full-text available
Sign language recognition and translation remain pivotal for facilitating communication among the deaf and hearing communities. However, end-to-end sign language translation (SLT) faces major challenges, including weak temporal correspondence between sign language (SL) video frames and gloss annotations and the complexity of sequence alignment betw...
Article
Full-text available
The prokaryote-specific ATP-binding cassette (ABC) peptide transporters are involved in various physiological processes and plays an important role in transporting naturally occurring antibiotics across the membrane to their intracellular targets. The dipeptide transporter DppABCDF in Gram-negative bacteria is composed of five distinct subunits, ye...
Article
Full-text available
Avian encephalomyelitis virus (AEV), a picornavirus, primarily infects the central nervous system of 1 to 2-week-old young chickens but not pullets. When wild-type AEV undergoes serial passaging in chicken embryos, it becomes to be embryo-adapted and can cause avian encephalomyelitis in chickens of all ages following intracutaneous infection throug...
Article
Full-text available
Cancer data is widely available in repositories such as the National Cancer Institute (NCI) Genomic Data Commons (GDC). These datasets could serve as controls or comparisons in compendium analyses with user data, avoiding the expense and time of generating additional datasets. However, the user must be able to process their new data in the same man...
Article
Full-text available
NRL (NPH3/RPT2-Like) proteins, which are exclusive to plants, serve as critical mediators in phototropic signaling by dynamically regulating light-dependent cellular processes. We identified 24 NRL genes (VvNRL) in the Vitis vinifera L. genome, which were unevenly distributed on 11 chromosomes. Phylogenetic analysis showed that these family members...
Article
Full-text available
Avastroviruses (AAstVs) are important pathogens in avian species, with cross-species transmission being a critical factor in their evolutionary dynamics. The spike protein encoded by the ORF2 region plays a central role in viral entry and is believed to influence host range and transmission across species. This study analyzed the evolutionary conse...
Article
Full-text available
The Niemann-Pick C1-Like 1 (NPC1L1) protein, primarily expressed in the epithelial cells of the small intestine, is essential for cholesterol absorption from both dietary intake and biliary secretion. Despite this conserved function across mammals, the full-length coding sequence of NPC1L1 remains uncharacterized in key avian models including chick...
Preprint
Full-text available
Recent generative learning models applied to protein multiple sequence alignment (MSA) datasets include simple and interpretable physics-based Potts covariation models and other machine learning models such as MSA-Transformer (MSA-T). The best models accurately reproduce MSA statistics induced by the biophysical constraints within proteins, raising...
Preprint
Full-text available
AlphaFold2 (AF2) has transformed protein structure prediction by harnessing co-evolutionary constraints embedded in multiple sequence alignments (MSAs). MSAs not only encode static structural information, but also hold critical details about protein dynamics, which underpin biological functions. However, these subtle co-evolutionary signatures, whi...
Article
Full-text available
Context The critically endangered grey nurse shark (Carcharias taurus) is a widely distributed coastal to near coastal species. Overexploitation has resulted in severe global population declines, including in Australian waters, where two genetically distinct populations are known to exist in temperate to subtropical waters off the eastern and weste...