Article

Optimal Alignments in Linear Space

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed space-saving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the new proposals, both in theory and in practice. The goal of this paper is to give Hirschberg's idea the visibility it deserves by developing a linear-space version of Gotoh's algorithm, which accommodates affine gap penalties. A portable C-software package implementing this algorithm is available on the BIONET free of charge.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... with the default parameters. Next, the primers were trimmed using the Myers-Miller alignment algorithm at a similarity cut-off of 0.8 [34]. Nonspecific amplicons, such as those not encoding 16S rRNA, were detected using the nhmmer algorithm in the HMMER software package ver. ...
... Unique reads were extracted, and redundant reads were clustered with unique reads using the deep-full-length command in VSEARCH [36]. Taxonomic assignment was performed using the EzBioCloud 16S rRNA database with the use-arch_global command in VSEARCH, followed by a more precise pairwise alignment [34,36,37]. Chimeric reads were filtered to obtain reads with < 97% similarity through reference-based chimeric read detection using the UCHIME algorithm and nonchimeric 16S rRNA database from EzBioCloud [38]. ...
Article
Full-text available
Background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccines are pivotal in combating coronavirus disease 2019 (COVID-19); however, the declining antibody titers postvaccination pose challenges for sustained protection and herd immunity. Although gut microbiome is reported to affect the early antibody response after vaccination, its impact on the longevity of vaccine-induced antibodies remains unexplored. Methods A prospective cohort study was conducted involving 44 healthy adults who received two doses of either the BNT162b2 or ChAdOx1 vaccine, followed by a BNT162b2 booster at six months. The gut microbiome was serially analyzed using 16S rRNA and shotgun sequencing, while humoral immune response was assessed using a SARS-CoV-2 spike protein immunoassay. Results Faecalibacterium prausnitzii was associated with robust and persistent antibody responses post-BNT162b2 vaccination. In comparison, Escherichia coli was associated with a slower antibody decay following ChAdOx1 vaccination. The booster immune response was correlated with metabolic pathways involving cellular functions and aromatic amino acid synthesis. Conclusions The findings of this study underscored the potential interaction between the gut microbiome and the longevity/boosting effect of antibodies following vaccination against SARS-CoV-2. The identification of specific microbial associations suggests the prospect of microbiome-based strategies for enhancing vaccine efficacy.
... The calculation of identities and similarities of the PSA_Lip was done with the EMBOSS Stretcher online tool (.ebi. ac.uk/Tools/psa/emboss_stretcher) (Myers and Miller 1988) and the consensus sequence was visualized with WebLogo v.3 (Crooks et al. 2004). A hybrid homology model was built based on the PSA_Lip sequence using the automatic homology modelling script of YASARA v.17.1.28 ...
... To gain more insight into its structure-to-function relationship, 3D homology models of PSA_Lip were built. The closest PSI-BLAST hit and thus the underlying template structures were a choline esterase from Pseudomonas aeruginosa with PDB code 6UQV, and a phospholipase A from Vibrio vulnificus (PDB code: 6JKZ), though sequence homologies (Myers and Miller 1988) to PSA_Lip of 36-40% were in a medium range. Sequence identities were determined to be 26.6% to 6UQV and 23.8% to 6JKZ using EMBOSS Stretcher online tool. ...
Article
Full-text available
The GDS(L)-like lipase from the Basidiomycota Pleurotus sapidus (PSA_Lip) was heterologously expressed using Trichoderma reesei with an activity of 350 U L⁻¹. The isoelectric point of 5.0 was determined by isoelectric focusing. The novel PSA_Lip showed only 23.8–25.1%, 25.5%, 26.6% and 28.4% identity to the previously characterized GDSL-like enzymes phospholipase, plant lipase, acetylcholinesterase and acetylxylan esterase, from the carbohydrate esterase family 16, respectively. Therefore, the enzyme was purified from the culture supernatant and the catalytic properties and the substrate specificity of the enzyme were investigated using different assays to reveal its potential function. While no phospholipase, acetylcholinesterase and acetylxylan esterase activities were detected, studies on the hydrolysis of ferulic acid methyl ester (~ 8.3%) and feruloylated carbohydrate 5-O-transferuloyl-arabino-furanose (~ 0.8%) showed low conversions of these substrates. By investigating the hydrolytic activity towards p-nitrophenyl-(pNP)-esters with various chain-lengths, the highest activity was determined for medium chain-length pNP-octanoate at 65 °C and a pH value of 8, while almost no activity was detected for pNP-hexanoate. The enzyme is highly stable when stored at pH 10 and 4 °C for at least 7 days. Moreover, using consensus sequence analysis and homology modeling, we could demonstrate that the PSA_Lip does not contain the usual SGNH residues in the actives site, which are usually present in GDS(L)-like enzymes. Supplementary Information The online version contains supplementary material available at 10.1186/s13568-024-01752-x.
... Following quality control, paired-end sequences were merged with VSEARCH version 2.13.4 [28], employing the fastq_mergepairs command with default settings. Utilizing the alignment algorithm of Myers and Miller [29], the primers were subsequently trimmed at a similarity cutoff point of 0.8. Nhmmer [30], in combination with hmm profiles from the HMMER software package (version 3.2.1), ...
... This process was executed using the derep_fulllength command of the VSEARCH software suite [28]. The usearch_global command from VSEARCH [28] was employed for a taxonomic assignment using the EzBioCloud 16S rRNA database [31], followed by a more detailed pairwise alignment as per Myers and Miller [29]. Chimeric reads were removed using reference-based chimera detection via the UCHIME algorithm [32]. ...
Article
Full-text available
Freshwater lakes are critical to healthy ecosystems, providing vital services like drinking water and recreation for surrounding communities. Microorganisms within these ecosystems play essential roles, driving biogeochemical cycles for elements like carbon, nitrogen, and sulfur. This study utilized a metagenomic approach to examine the prokaryotic communities of three freshwater lakes in T?rkiye: the Eber and Bey?ehir lakes, located at close altitudes (967 m and 1,115 m, respectively), which serve as primary water sources for nearby communities, and Lake Uludag Buzlu (2,390 m) that lies at the permanent snow border within the Uludag glacial lake system. Metagenomics allowed us to identify species, genetic structures, and the functional roles of microorganisms. Employing highthroughput next-generation sequencing (NGS) technology, we analyzed 16S ribosomal DNA (rDNA) sequences (V3-V4 regions) from the lake samples. EzBioCloud software facilitated the analysis of prokaryotic diversity obtained using Illumina NovaSeq technology. While Eber and Bey?ehir lakes had similar diversity, Bacillota dominated in the higher-altitude Lake Uludag Buzlu. Genus-level analysis revealed Parabacteroides as the most prevalent in Lake Uludag Buzlu, contrasting with Limnohabitans dominance in Lake Eber; Lake Bey?ehir exhibited co-dominance of Limnohabitans and Planktophila.
... After the QC pass, paired-end sequence data were merged together using the fastq_mergepairs command of VSEARCH v.2.13.4 [53] with the default parameters. The primers were then trimmed with the alignment algorithm of Myers and Miller [54] at a similarity cut off of 0.8. Non-specific amplicons that did not encode 16S rRNA were detected by nhmmer [55] in the HMMER software package v.3.2.1 with hmm profiles. ...
... Unique reads were extracted and redundant reads were clustered with the unique reads by the derep_full length command of VSEARCH [53]. The EzBioCloud 16S rRNA database [56] was used for taxonomic assignment using the usearch_global command of VSEARCH [53] followed by more precise pairwise alignment [54]. Chimeric reads were filtered on reads with <97% similarity by reference-based chimeric detection using the UCHIME algorithm [57] and the non-chimeric 16S rRNA database from EzBioCloud. ...
Article
Full-text available
The symbiotic community of microorganisms in the gut plays an important role in the health of the host. While many previous studies have been performed on the interactions between the gut microbiome and the host in mammals, studies in fish are still lacking. In this study, we investigated changes in the intestinal microbiome and pathogen susceptibility of zebrafish (Danio rerio) following chronic antibiotics exposure. The chronic antibiotics exposure assay was performed on zebrafish for 30 days using oxytetracycline (Otc), sulfamethoxazole/trimethoprim (Smx/Tmp), or erythromycin (Ery), which are antibiotics widely used in the aquaculture industry. The microbiome analysis indicated that Fusobacteria, Proteobacteria, Firmicutes, and Bacteroidetes were the dominant phyla in the gut microbiome of the zebrafish used in this study. However, in Smx/Tmp-treated zebrafish, the compositions of Fusobacteria and Proteobacteria were changed significantly, and in Ery-treated zebrafish, the compositions of Proteobacteria and Firmicutes were altered significantly. Although alpha diversity analysis showed that there was no significant difference in the richness, beta diversity analysis revealed a community imbalance in the gut microbiome of all chronically antibiotics-exposed zebrafish. Intriguingly, in zebrafish with dysbiosis in the gut microbiome, the pathogen susceptibility to Edwardsiella piscicida, a representative Gram-negative fish pathogen, was reduced. Gut microbiome imbalance resulted in a higher count of goblet cells in intestinal tissue and an upregulation of genes related to the intestinal mucosal barrier. In addition, as innate immunity was enhanced by the increased mucosal barrier, immune and stress-related gene expression in the intestinal tissue was downregulated. In this study, we provide new insight into the effect of gut microbiome dysbiosis on pathogen susceptibility.
... Following the quality control step, the sequence data were merged using the fastq mergepairs command of VSEARCH 2.13.42 (Rognes et al. 2016). Primers were then trimmed using the alignment algorithm of Myers and Miller (1988) at a similarity cut-off of 0.8. Non-specific amplicons that did not encode 16S rRNA were detected by nhmmer (Eddy 2011) in HMMER 3.2.1 with hmm profiles. ...
... Non-specific amplicons that did not encode 16S rRNA were detected by nhmmer (Eddy 2011) in HMMER 3.2.1 with hmm profiles. Unique reads were extracted, and redundant reads were clustered with unique reads using the derep_fulllength command of VSEARCH 2. The EzBioCloud 16S rRNA database (Yoon et al. 2017) was used for taxonomic assignment using the usearch_global command of VSEARCH 2, followed by a more precise pairwise alignment (Myers and Miller 1988). Chimeric reads were filtered on reads with <97% sequence similarity by reference-based chimeric detection using the UCHIME algorithm (Edgar et al. 2011) and the non-chimeric 16S rRNA database from EzBioCloud. ...
Article
Full-text available
The Korea Combat Training Center (KCTC), located in Gangwon Province, is a restricted military training facility where research on the environmental conditions and health risks to military personnel has been limited. In this study, using iSeq 100, we investigated the bacterial abundance and microbiome of Haemaphysalis longicornis specimens collected at the KCTC from June to August 2022, to assess current and potential public health risks to military personnel. Our results show that adult ticks had significantly greater species richness compared with larvae and nymphs, with no notable differences in diversity across developmental stages. Principal coordinate analysis of the microbial communities did not show differences attributable to any single factor, such as collection location or date. Coxiella ‐like endosymbionts (AB001519) were identified in all 13 samples, and Jatrophihabitans , Sphingomonas , and Spirosoma were consistently found across all samples. In addition, iSeq 100 also identified Rickettsia rickettsii and Borrelia spp., which were not detected with conventional polymerase chain reaction (PCR).
... Here, we used a sequence-independent algorithm, TM-align, to measure the similarity between the structures. We used EMBOSS Stretcher (Myers and Miller 1988) to calculate the similarity between the amino acid sequences. As is shown, the structures of H. sapiens and E. coli glutathione S-transferase are nearly superimposable, whereas the amino acid sequences of the two proteins only share 35% similarity and 19% identity. ...
... Pairwise sequence comparison of AlphaFold2 modeled structures for characterization of twilight zones was performed with EMBOSS needle (Figs. 3, 4). EMBOSS stretcher implements the Myer-Miller algorithm (Myers and Miller 1988), and EMBOSS needle implements the Needleman-Wunsch algorithm (Needleman and Wunsch 1970). The former requires more time, whereas the latter needs more computer memory. ...
Article
Traditional evolutionary biology research mainly relies on sequence information to infer evolutionary relationships between genes or proteins. In contrast, protein structural information has long been overlooked, although structures are more conserved and closely linked to the functions than the sequences. To address this gap, we conducted a proteome-wide structural analysis using experimental and computed protein structures for organisms from the three distinct domains, including Homo sapiens (eukarya), Escherichia coli (bacteria), and Methanocaldococcus jannaschii (archaea). We reveal the distribution of structural similarity and sequence identity at the genomic level and characterize the twilight zone, where signals obtained from sequence alignment are blurred and evolutionary relationships cannot be inferred unambiguously. We find that structurally similar homologous protein pairs in the twilight zone account for ∼0.004%–0.021% of all possible protein pair combinations, which translates to ∼8%–32% of the protein-coding genes, depending on the species under comparison. In addition, by comparing the structural homologs, we show that human proteins involved in the energy supply are more similar to their E. coli homologs, whereas proteins relating to the central dogma are more similar to their M. jannaschii homologs. We also identify a bacterial GPCR homolog in the E. coli proteome that displays distinctive domain architecture. Our results shed light on the characteristics of the twilight zone and the origin of different pathways from a protein structure perspective, highlighting an exciting new frontier in evolutionary biology.
... After QC pass, paired-end sequence data were merged together using fastq_mergepairs command of VSEARCH v.2.13.4 [23] with default parameters. Primers were then trimmed with the alignment algorithm of Myers and Miller [24] at a similarity cut off of 0.8. Non-speci c amplicons that do not encode 16S rRNA were detected by nhmmer [25] in HMMER software package v.3.2.1 with hmm pro les. ...
... Unique reads were extracted and redundant reads were clustered with the unique reads by derep_full length command of VSEARCH [23]. The EzBioCloud 16S rRNA database [26] was used for taxonomic assignment using usearch_global command of VSEARCH [23] followed by more precise pairwise alignment [24]. Chimeric reads were ltered on reads with < 97% similarity by reference based chimeric detection using UCHIME algorithm [27] and the non-chimeric 16S rRNA database from EzBioCloud. ...
Preprint
Full-text available
The symbiotic community of microorganisms in the gut plays an important role in the health of the host. While many previous studies have been performed on the interaction between the gut microbiome and the host in mammals, studies in fish are still lacking. In this study, we investigated changes in the intestinal microbiome and pathogen susceptibility of zebrafish ( Danio rerio ) following chronic antibiotics exposure. The chronic antibiotics exposure assay was performed on zebrafish for 30 days using oxytetracycline (Otc), sulfamethoxazole/trimethoprim (Smx/Tmp), and erythromycin (Ery), which are antibiotics widely used in aquaculture industry. The microbiome analysis indicated that Fusobacteria, Proteobacteria, Firmicutes, and Bacteroidetes were the dominant phyla in the gut microbiome of zebrafish used in this study. However, in Smx/Tmp-treated zebrafish, the composition of Fusobacteria and Proteobacteria were changed significantly, and in Ery-treated zebrafish, the composition of Proteobacteria and Firmicutes were altered significantly. Although alpha diversity analysis showed that there was no significant difference in the richness, beta diversity analysis revealed a community imbalance in the gut microbiome of all chronically antibiotics exposed zebrafish. Intriguingly, in zebrafish with dysbiosis on the gut microbiome, the pathogen susceptibility to Edwardsiella piscicida , a representative Gram-negative fish pathogen, was reduced. Due to the further effect of gut microbiome dysbiosis, the number of goblet cells in the intestinal tissue was increased, and the intestinal mucosal barrier-related genes expression was also upregulated. In addition, as the innate immunity was enhanced by the increased mucosal barrier, the immune, stress-related gene expression in the intestinal tissue was downregulated. In this study, we provide new insight into the effect of gut microbiome dysbiosis on pathogen susceptibility.
... To calculate 16S rRNA gene sequence similarity values, global alignment was performed using the algorithm of Myers and Miller (1988) (https://pubmed.ncbi.nlm.nih.gov/3382986/), which is identical to Clustal W alignment with default alignment parameters: gap open penalty = 6.66 and gap extension penalty = 6.66. Alignments were created using Clustal_X (v2.1) with the above indicated parameters and we then ran PHYDIT software (Chun 1995 rRNA gene sequences were aligned using Clustal W ( Thompson et al. 1994). ...
Preprint
Full-text available
Duganella sp. strains R1 T , R57 T , and R64 T , isolated from barley roots in Japan, are Gram-stain-negative, motile, rod-shaped bacteria. Duganella species abundantly colonized the barley roots. Strains R1 T , R57 T , and R64 T were capable of growth at 4°C, suggesting adaptation to colonize winter barley roots. Strains R57 T and R64 T formed purple colonies, indicating violacein production, while strain R1 T did not. Based on 16S rRNA gene sequence similarities, strains R1 T , R57 T , and R64 T were most closely related to D. violaceipulchra HSC-15S17 T (99.10%), D. vulcania FT81W T (99.45%), and D. violaceipulchra HSC-15S17 T (99.86%), respectively. Their genome sizes ranged from 7.05 to 7.38 Mbp, and their genomic G+C contents were 64.2 to 64.7%. The average nucleotide identity and digital DNA–DNA hybridization values between R1 T and D. violaceipulchra HSC-15S17 T , R57 T and D. vulcania FT81W T , R64 T and D. violaceipulchra HSC-15S17 T were 86.0% and 33.2%, 95.7% and 67.9%, and 92.7% and 52.6%, respectively. Their fatty acids were predominantly composed of C16:0, C17:0 cyclo, and summed feature 3 (C16:1 ω7c and/or C16:1 ω6c). Based on their distinct genetic and phenotypic characteristics, and supported by chemotaxonomic analyses, we propose that strains R1 T , R57 T , and R64 T represent novel species within the Duganella genus, for which the names Duganella hordei (type strain R1 T = NBRC 115982 T = DSMZ 115069 T ), Duganella kodaimurasaki (type strain R57 T = NBRC 115983 T = DSMZ 115070 T ), and Duganella rhizosphaerae (type strain R64 T = NBRC 115984 T = DSMZ 115071 T ) are suggested.
... The targets and templates were subjected to alignment by using STRETCHER [16] and Fold and Function Alignment Server (FFAS) [17]. Stretcher calculates an optimal global alignment between target and © 2022 Society of Education, India template by using a modification of the classic dynamic programming algorithm which uses linear space where as FFAS server utilizes information present in sequences of homologous proteins and performs profile-profile alignment. ...
Article
For the past four decades, extensive research has been carried out for the development of the methods of protein structure prediction. Development of hybrid methods and novel algorithms including different parameters has contributed to a large extent for the progress of protein structure prediction. In spite of the development of several structure prediction methods with better accuracy, it is not still clear how the one-dimensional amino acid sequence of a protein codes for the three-dimensional structure. The high complex nature of sequence-structure relationship is due to the interplay between physics and evolution and hence the problem must be viewed from a physico-chemical perspective. Further, one of the most important factors influencing the ability to predict accurate models is the extent of structural conservation between target and template. Considering the above facts, we have made a retrospective analysis of the earlier CASP targets and their templates by using physico-chemical properties correlation coefficient of the amino acid residues which were utilized by Argos (1987) in his sensitive sequence comparison algorithm. For most of the targets and templates in all four structural classes, a reasonable correlation coefficient is observed for any one of the five properties. Also, the profile based alignment between target and template was better than the substitution matrix based alignment. The results discussed here point to the need for the development of novel algorithms by incorporating profile alignment and physico-chemical correlation coefficients to select the template or a fold from fold library during the comparative modeling or threading procedures.
... Taxonomic classification was performed using USEARCH with the EzBioCloud database, and sequences were aligned pairwise based on previously established methods (Myers & Miller, 1988). Chimera detection was conducted using UCHIME (Edgar et al., 2011) in conjunction with the EzBioCloud non-chimeric 16S rRNA database. ...
Article
Full-text available
This study aimed to determine the gut microbiota composition of adult Apis mellifera honeybees from bee farms in Bac Giang province, including both healthy colonies and those infected with Sacbrood virus (SBV). The gut microbiota of healthy and SBV-infected bees was assessed using next-generation sequencing (NGS) of the V3-V4 region in the 16S rRNA gene on the Illumina MiSeq system. As a result, NGS analysis identified 1,659 operational taxonomic units (OTUs) with a coverage of 99% and an average read length of 430 bp. The results revealed that SBV-infected bees harbored four microbial phyla: Proteobacteria (48.44%), Firmicutes (38.65%), Actinobacteria (1.57%), and Bacteria_uc (10.95%). In contrast, the healthy bee group consisted of three phyla: Proteobacteria (40.61%), Firmicutes (45.55%), and Bacteria_uc (13.37%). The species composition analysis showed that both healthy and SBV-infected bees shared common core bacterial species. However, Bifidobacterium_uc and Commensalibacter AY370188_s were more prevalent in SBV-infected bees and significantly reduced in healthy bees. Conversely, Fructobacillus fructosus and Lactobacillus kunkeei were found exclusively in healthy bees. These lactic acid bacteria (LAB) have been shown to inhibit the growth of pathogenic bacteria. Our findings provide a valuable scientific foundation for developing biological products to improve honeybee health and disease resistance.
... Subsequently, the sequence denoising and the non-redundant read extraction were performed using DUDE-Seq [22] and UCLUST-clustering [23], respectively. Taxonomic assignment of the obtained sequences was performed using USEARCH with the EzBioCloud database, followed by a more precise pairwise alignment [24]. UCHIME [25] and the non-chimeric 16S rRNA EzBioCloud database were utilized to detect chimeras. ...
... Miller [16] at a similarity cut off of 0.8. Nonspecific amplicons that did not encode 16 S rRNA were detected by nhmmer in the HMMER software package ver. ...
Article
Full-text available
We investigated whether changes in the gut microbiome composition are associated with infections and immunologic complications during the treatment of Korean children, adolescents, and young adults (AYAs) with hematologic malignancies. We analyzed stool samples from 26 patients and 10 healthy siblings using 16 S rRNA gene sequencing. At diagnosis, patients exhibited a lower abundance of Lachnospiraceae and a higher abundance of Enterococcaceae than their healthy siblings. Both the Chao1 and Shannon diversity indices declined from diagnosis to the end of induction chemotherapy. Patients with fever during induction had a lower baseline microbial diversity and higher Ruminococcus g4 abundance than those without fever. The use of either meropenem or piperacillin/tazobactam during induction was correlated with reduced richness and altered composition of the gut microbiome after induction. The Chao index and beta diversity of stool samples significantly differed before conditioning when compared with those of healthy siblings. During allogeneic hematopoietic stem cell transplantation, both the Chao1 and Shannon diversity indices significantly decreased on day 14 but recovered by day 60. Our study highlights the role of gut microbiome diversity and compositional structure in influencing treatment outcomes in children and AYA with hematologic malignancies, providing the information required to improve the gut microbiome configuration and treatment outcomes.
... For taxonomic assignment, the EzBioCloud database was utilized through USEARCH (8.1.1861_linux32) [29], followed by a more precise pair-wise alignment [30]. Detection of chimeras in reads with less than a 97% best-hit similarity rate was carried out using UCHIME [31] and the non-chimeric 16S rRNA database from EzBioCloud. ...
Article
Full-text available
Live biotherapeutic products, represented by probiotics with disease-mitigating or therapeutic effects, face significant limitations in achieving stable colonization in the gut through oral administration. However, paraprobiotics, which consist of dead or inactivated microbial cells derived from probiotics, can provide comparable health benefits while overcoming the limitations associated with live biotherapeutic products. Therefore, the purpose of this study was to quantitatively compare and analyze the effects of probiotics, which are gaining attention as treatments for inflammatory bowel diseases, and their paraprobiotic counterparts on the alleviation of ulcerative colitis. In in vitro evaluations revealed that the paraprobiotics derived from Lactiplantibacillus plantarum MGEL20154, Latilactobacillus sakei MGEL23040, and Limosilactobacillus reuteri MGEL21001 exhibited equal or significantly enhanced activities in terms of antioxidant properties, anti-inflammatory effects, and barrier integrity enhancement compared to their probiotic counterparts. Furthermore, consistent with in vitro findings, both probiotics and paraprobiotics effectively improved histological scores and reduced myeloperoxidase levels in a DSS-induced ulcerative colitis mouse model. Notably, paraprobiotics derived from L. plantarum MGEL20154 and L. reuteri MGEL21001 demonstrated significantly enhanced efficacy in restoring tight junctions, promoting mucin secretion, and reducing inflammation in colonic lesion tissues compared to their probiotic forms. Our results suggest that these paraprobiotics may serve as more suitable agents for alleviating and treating ulcerative colitis, addressing limitations associated with probiotics, such as low survival rates, instability, antibiotic susceptibility, and the potential induction of excessive inflammatory responses.
... Another approach for PSA, such as A * PA [19] and A * PA2 [20], transforms the DP problem into shortest path problem in graph, by designing appropriate heuristic functions to measure the distance in the DP matrix, and then applying the A * algorithm for PSA, which later inspired the development of the wavefront alignment (WFA) algorithm [21]. Miller and Myers [22] introduced a linear space complexity approach for aligning two sequences based on the Hirschberg algorithm [23], which later inspired WFA2 algorithm [24]. The Four Russians approach [25] emphasizes preprocessing all known scenarios, storing solutions in a lookup table, and utilizing pre-calculated results from the lookup table during actual problem-solving to enhance efficiency. ...
Article
Full-text available
Pairwise sequence alignment (PSA) serves as the cornerstone in computational bioinformatics, facilitating multiple sequence alignment and phylogenetic analysis. This paper introduces the FORAlign algorithm, leveraging the Four Russians algorithm with identical upper-bound time and space complexity as the Hirschberg divide-and-conquer PSA algorithm, aimed at accelerating Hirschberg PSA algorithm in parallel. Particularly notable is its capability to achieve up to 16.79 times speedup when aligning sequences with low sequence similarity, compared to the conventional Needleman-Wunsch PSA method using non-heuristic methods. Empirical evaluations underscore FORAlign’s superiority over existing wavefront alignment (WFA) series software, especially in scenarios characterized by low sequence similarity during PSA tasks. Our method is capable of directly aligning monkeypox sequences with other sequences using non-heuristic methods. The algorithm was implemented within the FORAlign library, providing functionality for PSA and foundational support for multiple sequence alignment and phylogenetic trees. The FORAlign library is freely available at https://github.com/malabz/FORAlign.
... 30 Primers were cut to a similarity of 0.8 using the alignment algorithm of Myers & Miller. 31 Amplicons not encoding nonspecific 16S rRNA were detected using nhmmer. 32 The HMMER software package, version 3.2.1, using the hmm profile was used. ...
Article
Full-text available
Rhizosphere bacterial community studies offer valuable insights into the environmental implications of genetically modified (GM) crops. This study compared the effects of a non-GM maize cultivar, namely Hi-IIA, with those of a herbicide-resistant maize cultivar containing the phosphinothricin N-acetyltransferase gene on the rhizosphere bacterial community across growth stages. 16s rRNA amplicon sequencing and data analysis tools revealed no significant differences in bacterial community composition or diversity between the cultivars. Principal component analysis revealed that differences in community structure were driven by plant growth stages rather than plant type. Polymerase chain reaction analysis was conducted to examine the potential horizontal transfer of the introduced gene from the GM maize to rhizosphere microorganisms; however, the introduced gene was not detected in the soil genomic DNA. Overall, the environmental impact of GM maize, particularly on soil microorganisms, is negligible, and the cultivation of GM maize does not alter significantly the rhizosphere bacterial community.
... file are directly examined to verify the target event. For insertions, local sequences around the SV locus are first extracted from each query genome and realigned to the reference sequence using Stretcher (81). The resulting alignment is then used to infer the insertion event based on SV length and sequence identity. ...
Preprint
Full-text available
Comparisons of complete genome assemblies offer a direct procedure for characterizing all genetic differences among them. However, existing tools are often limited to specifi c aligners or optimized for specifi c organisms, narrowing their applicability, particularly for large and repetitive plant genomes. Here, we introduce SVGAP, a pipeline for structural variant (SV) discovery, genotyping, and annotation from high-quality genome assemblies at the population level. Through extensive benchmarks using simulated SV datasets at individual, population, and phylogenetic contexts, we demonstrate that SVGAP performs favorably relative to existing tools in SV discovery. Additionally, SVGAP is one of the few tools to address the challenge of genotyping SVs within large assembled genome samples, and it generates fully genotyped VCF fi les. Applying SVGAP to 26 maize genomes revealed hidden genomic diversity in centromeres, driven by abundant insertions of centromere-specifi c LTR-retrotransposons. The output of SVGAP is well-suited for pan-genome construction and facilitates the interpretation of previously unexplored genomic regions.
... The basic local alignment search tool (BLAST) was used to search for similarity between ceramide synthases [78]. First, the enzymes in the same organism were analyzed, searching through the alignment tool for the percentage of identity and similarity of the amino acid sequence of the structures [79]. ...
... Denote the number of optimality regions in this model by R (al) m,n . Naturally the (expected) number of optimality regions attracted a lot of interest both theoretically [12,19,45] and in applications [10,24,30,33]. The current conjecture [11,38] is that E(R (al) n,n ) = O( √ n), but the complexity of the random variable does not allow for direct calculations. ...
Preprint
We study the sequence alignment problem and its independent version, the discrete Hammersley process with an exploration penalty. We obtain rigorous upper bounds for the number of optimality regions in both models near the soft edge. At zero penalty the independent model becomes an exactly solvable model and we identify cases for which the law of the last passage time converges to a Tracy-Widom law.
... The amino acid sequence of RtaA (Afu3g12830) was analysed for pr otein famil y classification and domain prediction using In-terPr o (P aysan-Lafosse et al. 2023 ). P airwise sequence alignment with Sacc harom yces cerevisiae RTA1 was performed using EMBOSS Str etc her (Myers and Miller 1988 ) and tr ansmembr ane topology prediction using DeepTMHMM (Hallgren et al. 2022 ). F or ph ylogenetic analysis of orthologues of Afu3g12830 and all fungal proteins containing RTA1 domain were identified using OrthoMCL release 6.21 (Chen et al. 2006 ). ...
Article
Full-text available
The polyene antimycotica amphotericin B (AmB) and its liposomal formulation AmBisome belong to the treatment options of invasive aspergillosis caused by Aspergillus fumigatus. However, increasing resistance to AmB in clinical isolates of Aspergillus species is a growing concern, but mechanisms of AmB resistance remain unclear. In this study, we conducted a proteomic analysis of A. fumigatus exposed to sublethal concentrations of AmB and AmBisome. Both antifungals induced significantly increased levels of proteins involved in aromatic acid metabolism, transmembrane transport and secondary metabolite biosynthesis. One of the most upregulated proteins was RtaA, a member of the RTA-like protein family, which includes conserved fungal membrane proteins with putative functions as transporters or translocases. Accordingly, we found that RtaA is mainly located in the cytoplasmic membrane and to a minor extent in vacuolar-like structures. Deletion of rtaA led to increased polyene sensitivity and its overexpression resulted in modest resistance. Interestingly, rtaA expression was only induced by exposure to the polyenes AmB and nystatin, but not by itraconazole and caspofungin. Orthologues of rtaA were also induced by AmB exposure in A. lentulus and A. terreus. Deletion of rtaA did not significantly change the ergosterol content of A. fumigatus, but decreased fluorescence intensity of the sterol-binding stain filipin. This suggests that RtaA is involved in sterol and lipid trafficking, possibly by transporting the target ergosterol to or from lipid droplets. These findings reveal the contribution of RtaA to polyene resistance in A. fumigatus and thus provide a new putative target for antifungal drug development.
... Therefore, each isolate is reported with the first five-ten hits observed in the said database. Further multiple sequence alignment and phylogenetic analysis is therefore recommended for accurate species prediction and evolutionary relationship (Karlin et al., 1990;Myers et al., 1988). Evolutionary history was inferred using the neighborjoining method (Saitou et al. 1987). ...
Article
The properties of plastics, such as being light, economical, flexible, strong, durable, and waterproof, have made it an essential part of daily life. However, disposing of this non-degradable material has become a significant global challenge. Therefore, among various plastic treatment methods, the safest way to dispose plastics is through biodegradation. Five strains were isolated for degradation studies: PDB-2, PDB-3, PDB-5, PDB-7, and NB-3. The Strain PDB-2 demonstrated the best degradation and was selected for further studies, henceforth referred as VSH PD-02. Further the fungal isolate VSH PD-2 was subjected to molecular identification based on 18S rRNA sequences and homology analysis, which showed the closest homology toward Talaromyces aerugineus. This study was aimed to develop a conceptual model for the biological degradation of plastic by microbes. The results of FTIR showed significant difference in molecular bonding after degradation period of 30 days. This research suggests that the polyethylene plastic material has undergone degradation due to the activity of the enzymes produced by these microorganisms. In conclusion, the study on plastic biodegradation by microorganisms, exemplified by Talaromyces aerugineus VSH PD -02 fungi, offers valuable insights with relevance to bioremediation. The ability of these microorganisms to break down plastics parallels the development of biodegradable drug delivery systems, tissue engineering scaffolds, and sustainable medical devices and packaging.
... The statistics on the sequences were retrieved from the output of MUSCLE [28]. The pairwise similarity, between the sequences of each dataset, was computed using MatGAT [45] which calculates the similarity after using the Myers and Miller global alignment algorithm [46]. Since a clustering ground truth is not available for these three datasets, a phylogenetic tree, showing the evolutionary relationship among the sequences of each set, is used for producing individual reference clusterings later, based on the proposed method in Section 3. Indeed, there are many tools that, given an aligned set of sequences, can build the phylogenetic tree School of Sciences Volume 8 Issue 3, 2024 of these sequences. ...
Article
Full-text available
Various recent researches in bioinformatics demonstrated that clustering is a very efficient technique for sequence analysis. Spectral clustering is particularly efficient for highly divergent sequences and GMMs (Gaussian Mixture Models) are often able to cluster overlapping groups if given an adequately designed embedding. The current study used spectral embedding and Mixture Models for clustering potentially divergent biological sequences. The research approach resulted in a pipeline consisting of the following four steps. The first step consists of aligning the biological sequences. The pairwise affinity of the sequences is computed in the second step. Then the Laplacian Eigenmap embedding of the data is performed in the third step. Finally, the last step consists of a GMM-based clustering. Improving the quality of the generated clustering and the performance of this approach is directly related to the enhancement of each one of these four steps. The main contribution is proposing four GMM-based algorithms for automatically selecting the optimal number of clusters and optimizing the clustering quality. A clustering quality assessment method, based on phylogenetic trees, is also proposed. Moreover, a performance study and analysis have been conducted while testing different clustering methods and GMM implementations. Experimental results demonstrated the superiority of using the BIC (Bayesian Information Criterion) for selecting the optimal GMM configuration. Significant processing speed improvements were also recorded for the implementation of the proposed algorithms.
... The 16S rRNA gene (1500 bp) of bacteria was amplified in a thermocycler using the polymerase chain reaction method (Clarridge 2004). In order to anticipate an evolutionary relationship with the isolated species BDR22, homologous sequences with local similarity have been recommended by the National Centre for Biotechnology Information (NCBI) for multiple sequence alignment and phylogenetic tree analysis (Karlin and Altschul 1990;Myers and Miller 1988). Phylogenetic tree construction has been accomplished using the Molecular Evolutionary genetic Analysis (MEGA 11) program for analysing statistical significance (Nag et al. 2021);(Tamura et al. 2021). ...
Article
Full-text available
A large number of recalcitrant bacterial pathogens cannot be easily treated by antibiotics due to the existence of biofilm. Hence, an alternative strategy needs to be adopted to remove the biofilm without the development of antibiotic resistance. Bacteriocins, ribosome-mediated proteinaceous toxins, having potential to inhibit the growth of closely or distantly related bacteria. In the present study, after screening a number of sources, a bacteriocin-producing strain, Enterococcus faecalis BDR22, was isolated that showed a significant reduction in the growth of planktonic cells of Gram-positive Staphylococcus aureus, Bacillus subtilis, and Gram-negative Pseudomonas aeruginosa, Escherichia coli, Serratia marcescens, Enterobacter cloacae, and Klebsiella pneumoniae compared to the conventional antibiotic tetracycline. The considerable reduction of the biofilm-forming sessile cells of the test organisms S. aureus (ATCC 23235) and P. aeruginosa (ATCC 10145), with no significant cell revival even after withdrawal of the treatment, was also observed. The extracellular polymeric substance (EPS) content of the biofilm was also reduced, with around 84% total carbohydrate reduction found for both microorganisms. The antibiofilm activities of the strain against test organisms were clearly visible from scanning electron micrographs and confirmed by the changes in functional groups (C-H, -OH, C = C, C-N etc.) of biofilm matrices by Fourier transform infrared spectroscopy (FTIR) analysis. The molecular docking interactions with docking energies ∆G of − 54.40 kcal/mol and − 66.2373 kcal/mol validate the affinity of the bacteriocin towards the biofilm-forming protein, which confirms the competence of the bacteriocin-producing strain to act as an effective antimicrobial and antibiofilm agent, replacing antibiotics.
... Sequences were denoised using the Mothur preclustering program, which merges and extracts unique sequences, allowing up to 2 differences between them [21]. The EzBioCloud database [22] was used for taxonomic assignment using BLAST 2.2.22 [23], and pairwise alignments were generated to calculate the similarity [24]. The UCHIME algorithm and nonchimeric 16S rRNA database from EzBioCloud were used to detect chimeric sequences for reads with the best hit similarity rate of < 97% [25]. ...
Article
The gut microbiome plays an essential role in host immune responses, including allergic reactions. However, commensal gut microbiota is extremely sensitive to antibiotics and excessive usage can cause microbial dysbiosis. Herein, we investigated how changes in the gut microbiome induced by ampicillin affected the production of IgG1 and IgG2a antibodies in mice subsequently exposed to Anisakis pegreffii antigens. Ampicillin treatment caused a notable change in the gut microbiome as shown by changes in both alpha and beta diversity indexes. In a 1-dimensional immunoblot using Anisakis-specific anti-mouse IgG1, a 56-kDa band corresponding to an unnamed Anisakis protein was detected using mass spectrometry analysis only in ampicillin-treated mice. In the Anisakis-specific anti-mouse IgG2a-probed immunoblot, a 70-kDa band corresponding to heat shock protein 70 (HSP70) was only detected in ampicillin-treated and Anisakis-immunized mice. A 2-dimensional immunoblot against Anisakis extract with immunized mouse sera demonstrated altered spot patterns in both groups. Our results showed that ampicillin treatment altered the gut microbiome composition in mice, changing the immunization response to antigens from A. pegreffii. This research could serve as a basis for developing vaccines or allergy immunotherapies against parasitic infections.
... The idea that only the forward calculation is needed for sequence alignment in linear space came from Eppstein (unpublished), as documented by Hirschberg (Hirschberg 1997). Unfortunately, this unidirectional approach (Powell et al. 1999), referred to herein as the unidirectional Hirschberg (UDH) method, failed to attract as much attention as the popular BDH approach (Myers and Miller 1988), partly because it required twice as large memory as the BDH method. However, the UDH method could circumvent the second and third difficulties mentioned in the previous paragraph. ...
Article
Full-text available
Motivation Spaln is the earliest practical tool for self-sufficient genome mapping and spliced alignment of protein query sequences onto a mammalian-sized eukaryotic genomic sequence. However, its computational speed has become inadequate for the analysis of rapidly growing genomic and transcript sequence data. Results The dynamic programming calculation of Spaln has been sped up in two ways: (i) the introduction of the multi-intermediate unidirectional Hirschberg method and (ii) SIMD-based vectorization. The new version, Spaln3, is ∼7 times faster than the latest Spaln version 2, and its gene prediction accuracy is consistently higher than that of Miniprot. Availability and implementation https://github.com/ogotoh/spaln.
... The global alignments of the nucleotide sequences were calculated using the align0 algorithm from the fasta2.0 package (Myers and Miller, 1988). ...
Preprint
Full-text available
Motivation The accurate characterization of the translational mechanism is crucial for enhancing our understanding of the relationship between genotype and phenotype. In particular, predicting the impact of the genetic variants on gene expression will allow to optimize specific pathways and functions for engineering new biological systems. In this context, the development of accurate methods for predicting translation efficiency from the nucleotide sequence is a key challenge in computational biology. Methods In this work we present PGExpress , a binary classifier to discriminate between mRNA sequences with low and high translation efficiency in E . coli . PGExpress algorithm takes as input 12 features corresponding to RNA folding and anti-Shine-Dalgarno hybridization free energies. The method was trained on a set of 1,772 sequence variants (WT-High) of 137 essential E . coli genes. For each gene, we considered 13 sequence variants of the first 33 nucleotides encoding for the same amino acids followed by the superfolder GFP. Each gene variant is represented sequence blocks that include the Ribosome Binding Site (RBS), the first 33 nucleotides of the coding region (C33), the remaining part of the coding region (CC), and their combinations. Results Our logistic regression-based tool ( PGExpress ) was trained using a 20-fold gene-based cross-validation procedure on the WT-High dataset. In this test PGExpress achieved an overall accuracy of 74%, a Matthews correlation coefficient 0.49 and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.81. Tested on 3 sets of sequences with different Ribosome Binding Sites, PGExpress reaches similar AUC. Finally, we validated our method by performing in-house experiments on five newly generated mRNA sequence variants. The predictions of the expression level of the new variants are in agreement with our experimental results in E . coli . Availability http://folding.biofold.org/pgexpress Contact markus.kollmann@hhu.de , emidio.capriotti@unibo.it
... Quality-assured reads are subsequently organized through UCLUST clustering (Edgar, 2010). The sequences are then taxonomically assigned using USEARCH with the EzBioCloud database (Myers & Miller, 1988). Chimeras are checked using UCHIME (Edgar et al., 2011) against the nonchimeric 16S rRNA database from EzBioCloud. ...
Article
Full-text available
The gut microbiota plays a crucial role in food digestion, enhances the host's immune system, and against pathogens. Numerous studies have been conducted on the microbiota of insects in general and honeybees in particular. However, studies have primarily focused on adult honeybees, with fewer studies dedicated to larvae. Despite being within the hive, honeybee larvae still possess their distinct microbiota. To gain a deeper understanding of the microbiota in the larvae of Apis mellifera honeybees, the larva from honeybee colonies collected in Ha Noi, Vietnam was investigated. Next-generation sequencing (NGS) targeting the 16S rRNA gene was employed for microbiome analysis. Results revealed the presence of 5 phyla including Proteobacteria (70.43%), Actinobacteria (1.16%), Firmicutes (20.87%), Bacteroidetes (2.72%), and Chloroflexi (2%). Representative genera included Bombella (29.97%), Lactobacillus (14.91%), Gilliamella (9.59%), Frischella (4.69%), Snodgrassella (3.85%), and Marinobacter (1.21%). Further characterized species composition in the sample we identified the prevalence of Bifidobacterium intestini (29.96%), Gilliamella apicola (8.08%), Frischella perrara (4.55%), Lactobacillus kimbladii (2.85%), Lactobacillus plantarum (2.80%), Snodgrassella alvi (2.77%), Lactobacillus mellis (2.59%), Lactobacillus_uc (unclassified or not yet classified to species, 2.19%), Lactobacillus kunkeei (1.43%), and Lactobacillus melliventris (1.31%). Understanding these microbial dynamics is crucial for developing strategies to support honeybee health and mitigate the challenges posed by factors, such as pesticides, environmental pollution, and honeybee diseases.
... Sequencing files (.ab1) were edited using HROMASLITE (version 1.5) and further analyzed by the Basic Local Alignment Search Tool (BLAST) with the closest 16S rRNA gene sequence retrieved from the National Centre for Biotechnology Information (NCBI) database (Altschul et al. 1990). Further, multiple sequence alignment was performed using the ClustalW (Myers and Miller 1988). Molecular Evolutionary Genetics Analysis version 11 (MEGA 11) was used to construct optimal phylogenetic analysis for accurate species prediction and drawing evolutionary relationships (Tamura et al. 2021). ...
Article
Full-text available
Microbes play an essential role in soil fertility by replenishing the nutrients; they encounter various biotic and abiotic stresses disrupting their cellular homeostasis, which expedites activating a conserved signaling pathway for transient over-expression of heat shock proteins (HSPs). In the present study, a versatile soil bacterium Bacillus subtilis strain PSK.A2 was isolated and characterized. Further, the isolated bacterium was exposed with several stresses, viz., heat, salt, acid, alkaline, and antibiotics. Stress-attributed cellular morphological modifications such as swelling, shrinkage, and clump formation were observed under the scanning electron microscope. The comparative protein expression pattern was studied by SDS-PAGE, relative protein stabilization was assessed by protein aggregation assay, and relative survival was mapped by single spot dilution and colony-counting method under control, stressed, lethal, and stressed lethal conditions of the isolate. The findings demonstrated that bacterial stress tolerance was maintained via the activation of various HSPs of molecular weight ranging from 17 to 115 kD to respective stimuli. The treatment of subinhibitory dose of antibiotics not interfering protein synthesis (amoxicillin and ciprofloxacin) resulted in the expression of eight HSPs of molecular weight ranging from 18 to 71 kD. The pre-treatment of short stress dosage showed endured overall tolerance of bacterium to lethal conditions, as evidenced by moderately enhanced total soluble intracellular protein content, better protein stabilization, comparatively over-expressed HSPs, and relatively enhanced cell survival. These findings hold an opportunity for developing novel approaches towards enhancing microbial resilience in a variety of conditions, including industrial bioprocessing, environmental remediation, and infectious disease management.
... Unique reads were extracted, and redundant reads were clustered with unique reads using the derep_full-length command in VSEARCH2. The EzBioCloud 16S rRNA database (31) was used for a taxonomic assignment using the usearch_global command of VSEARCH2, followed by more precise pairwise alignment (29). Chimeric reads were filtered for reads with <97% similarity by reference-based chimeric detection using the UCHIME algorithm (32) and a non-chimeric 16S rRNA database from EzBio-Cloud. ...
Article
Full-text available
Lovebugs appeared in large numbers across a wide area in Seoul, South Korea, in June 2023. The sudden appearance of exotic insects not only discomforts people but also fosters anxiety, as their potential for pathogen transmission would be unknown. In this study, targeted next-generation sequencing (NGS) of the 16S rRNA gene V4 region was performed using iSeq 100 to screen for bacteria in lovebugs. Forty-one lovebugs (20 females and 21 males) collected in Seoul, Korea, were identified as Plecia longiforceps based on mitochondrial cytochrome oxidase subunit 1 sequencing data using PCR. We analyzed the microbiome of the lovebugs and detected 453 species of bacteria. Among all bacteria screened based on NGS, Rickettsia was detected in all samples with an average relative abundance of 80.40%, followed by Pandoraea and Ewingella. Diversity (alpha and beta) between females and males did not differ; however, only Tumebacillus showed a higher relative abundance in females. Sequencing analysis of Rickettsia using a gltA gene-specific primer by PCR showed that it had higher sequence similarity to the Rickettsia symbiont of arthropods than to the spotted fever group rickettsiae. Eleven samples in which Pandoraea was detected by iSeq 100 were confirmed by PCR and exhibited 100% sequence identity to Pandoraea oxalativorans strain DSM 23570. Consequently, the likelihood of pathogen transmission to humans is low. The applied method may play a crucial role in swiftly identifying bacterial species in the event of future outbreaks of exotic insects that may be harmful to humans. IMPORTANCE Lovebugs have recently emerged in large numbers in Seoul, causing major concern regarding potential health risks. By performing the next-generation sequencing of the 16S rRNA gene V4 region, we comprehensively examined the microbiome of these insects. We identified the presence of numerous bacteria, including Rickettsia and Pandoraea. Reassuringly, subsequent tests confirmed that these detected bacteria were not pathogenic. The present study addresses health concerns related to lovebugs and shows the accuracy and efficiency of our detection technique. Such methods prove invaluable for rapidly identifying bacterial species during potential outbreaks of unfamiliar insects, thereby ensuring public safety.
... However, this requires O(nm) space. Linear space approaches are documented in the literature by performing a global alignment from the maximum score position or by applying the Myers-Miller algorithm [23], which recursively computes an optimal alignment. ...
... The UCHIME algorithm and non-chimeric 16S rRNA database from EzTaxon were used to detect chimeric sequences for reads with a best-hit similarity rate of <97% (Edgar et al. 2011) (a 97% similarity is generally used as the cut-off for species-level identification). Sequence data were then clustered using CD-Hit and UCLUST (Myers & Miller 1988). All subsequent analyses were performed using EzBioCloud. ...
Article
Full-text available
Cockroaches are insects found in almost all habitats, including unsanitary environments. Understanding their microbial communities is crucial for assessing the potential risks they pose as vectors of pathogens. In this study, we assessed the microbial communities of omnivorous cockroaches collected from external environments and those reared in a clean laboratory for extended periods (5–20 years). Using the iSeq 100 system, we examined the relative abundance of microbial communities at the phylum, family and genus levels. Our results revealed that the predominant taxa in these cockroaches were Proteobacteria, Bacteroidetes and Firmicutes. Interestingly, the bacterial communities of samples from the same cockroach species, regardless of their living conditions, clustered together, indicating species‐specific similarities in microbiomes. The symbiont genus Blattabacterium was consistently present in all samples, delivering nutrients to the host. Pathogen detection at the genus level indicated a higher prevalence of potential pathogens in cockroaches collected from field environments, compared with those from laboratory‐reared cockroaches. These findings underscore the importance of cockroaches as pathogen reservoirs and vectors of opportunistic infections, emphasizing the need for further studies to identify specific microorganisms and confirm their pathogenicity. As cockroaches inhabit human environments, their potential to spread harmful bacteria through defecation warrants attention and underscores the significance of understanding their microbial ecology for public health implications.
... The extracted unique reads were collected after processing redundant reads using the derep_fulllength command of VSEARCH. 51 Based on the EzBioCloud database, 54 taxonomic assignment was performed using the userarch_global command of VSEARCH, 51 followed by pairwise alignment, 52 and chimeric reads with <97% similarity were filtered using the UCHIME algorithm. 55 After filtering the chimeras and compiling unidentified reads using the cluster_fast command, 51 additional operational taxonomic units (OTUs) were generated. ...
Article
Full-text available
Objectives Based on a report that Prunus mume fruits affect gut function and microflora, we hypothesized that mumefural (MF), one of the active compounds of processed Prunus mume fruits, could affect intestinal function or gut microbiome. Methods we investigated the effects of orally administered MF on intestinal function and the intestinal environment by analyzing the changes in the fecal, intestinal, and gut microbiome of a murine model. Results No changes in the body weight and fecal parameters of C57BL/6 mice were observed following MF administration. However, the quantity of residual feces in the large and small intestines and cecum was considerably reduced. The abundance of specific bacteria, ie, members of the Lactobacillus genus, was markedly increased without any change in the abundance or diversity of the gut microbiota, as determined by 16S rRNA gene-based microbiome taxonomic profiling. Conclusions Overall, our results suggest that oral MF administration could improve the gut microbiota in diseases related to specific bacterial changes because it increases the abundance of certain bacteria known to exhibit probiotic functions without any other obvious effects.
... Processing raw reads started with quality check and filtering of low quality (< Q25) reads by Trimmomatic ver. 0.32 43 47 was used for taxonomic assignment using usearch_global command of VSEARCH 44 followed by more precise pairwise alignment 45 . Chimeric reads were filtered on reads with < 97% similarity by reference based chimeric detection using UCHIME algorithm 48 and the non-chimeric 16S rRNA database from EzBioCloud. ...
Article
Full-text available
In this study, high-throughput sequencing of 16S rRNA amplicons and predictive PICRUSt functional profiles were used to perform a comprehensive analysis of the temporal bacterial distribution and metabolic functions of 19 bimonthly samples collected from July 2019 to January 2020 in the surface water of Billings Reservoir, São Paulo. The results revealed that most of the bacterial 16S rRNA gene sequences belonged to Cyanobacteria and Proteobacteria, which accounted for more than 58% of the total bacterial abundance. Species richness and evenness indices were highest in surface water from summer samples (January 2020), followed by winter (July 2019) and spring samples (September and November 2019). Results also showed that the highest concentrations of sulfate (SO4–2), phosphate (P), ammonia (NH3), and nitrate (NO3-) were detected in November 2019 and January 2020 compared with samples collected in July and September 2019 (P < 0.05). Principal component analysis suggests that physicochemical factors such as pH, DO, temperature, and NH3 are the most important environmental factors influencing spatial and temporal variations in the community structure of bacterioplankton. At the genus level, 18.3% and 9.9% of OTUs in the July and September 2019 samples, respectively, were assigned to Planktothrix, while 14.4% and 20% of OTUs in the November 2019 and January 2020 samples, respectively, were assigned to Microcystis. In addition, PICRUSt metabolic analysis revealed increasing enrichment of genes in surface water associated with multiple metabolic processes rather than a single regulatory mechanism. This is the first study to examine the temporal dynamics of bacterioplankton and its function in Billings Reservoir during the winter, spring, and summer seasons. The study provides comprehensive reference information on the effects of an artificial habitat on the bacterioplankton community that can be used to interpret the results of studies to evaluate and set appropriate treatment targets.
Article
PIWI-interacting RNAs (piRNAs) are small noncoding RNAs that silence transposons in the animal germline. PiRNAs are produced from long single-stranded noncoding transcripts, from protein-coding transcripts, as well as from transposons. While some sites that produce piRNAs are in deeply conserved syntenic regions, in general, piRNAs and piRNA-producing loci turnover faster than other functional parts of the genome. To learn about the sequence changes that contribute to the fast evolution of piRNAs, we set out to analyze piRNA expression between genetically different mice. Here we report the sequencing and analysis of small RNAs from the mouse male germline of four classical inbred strains, one inbred wild-derived strain and one outbred strain. We find that genetic differences between individuals underlie variation in piRNA expression. We report significant differences in piRNA production at loci with endogenous retrovirus insertions. Strain-specific piRNA-producing loci include protein-coding genes. Our findings provide evidence that transposable elements contribute to inter-individual differences in expression, and potentially to the fast evolution of piRNA-producing loci in mammals.
Chapter
Multiple Sequence alignment (MSA) is a generalization of Pairwise Sequence Alignment to multiple sequences. Thus, instead of aligning two sequences, the objective in MSA is to align k sequences simultaneously; such an overall function is optimized. The motivation behind doing an MSA is that it allows us to extract consensus evident in a widely diverse set of sequences. The similarities we observe across a wider range of sequences can help us better understand the evolutionary history of sequences and help infer a functional relationship amongst a group of biological sequences. Particularly for protein sequences, MSA can provide insight into the secondary/tertiary structure of proteins and discover critical consensus motifs and common blocks representative of protein domains or functional units. Generally, however, before performing the MSA step, typically we already know that the set of sequences being aligned are related, and our objective is to discover those regions and the strength of relatedness.
Article
The goal of the present work was to investigate the interaction between endophytic bacterial isolates and Aspergillus flavus, with a specific focus on determining the occurrence of mutual antagonism upon contact. Aflatoxin B1 (AFB1) is a highly potent hepatic carcinogen produced by filamentous fungi. To decrease the amount of AFB1 in food products, nine strains of endophytic bacteria were evaluated for their efficacy in reducing AFB1 production; all of these strains were obtained from the roots of Stipa tenacissima L., an indigenous plant to the Algerian steppe. These endophytic bacterial isolates were selected as potential candidates due to their rapid growth. After a co-incubation period of 10 days in vitro, AFB1 analysis indicated that those nine isolates caused a decrease in AFB1 residual concentration by 10.04 to 98.44%, depending on the isolate. The two most efficient strains were ST01 and ST07, which showed 89.96 and 73.46% reduction of AFB1, respectively. The results of 16S rRNA gene sequencing revealed a strong genetic relationship between the two isolates and the species Acinetobacter calcoaceticus and Bacillus velezensis, respectively. These two isolates were subjected to characterization for their in vitro plant growth promotion (PGP) activity, including ACC-deaminase activity, IAA production, inorganic phosphate solubilization, and the production of siderophores and hydrogen cyanide. Based on the obtained findings, it is postulated that the aforementioned isolates hold potential as biofertilizers and biocontrol agents in combating AFB1 while posing a minimal danger to the ecological balance of microbial communities.
Article
Full-text available
The aim of this study is to investigate the protective potential of Limosilactobacillus fermentum IM57, IR51, and IR62 strains, isolated from infant feces, and their mixture against inflammatory bowel disease (IBD). The strains exhibited robust antioxidant activities and anti-inflammatory properties in RAW 264.7 cells. Subsequently, the potential protective effects of each of these three strains, along with their mixture, were evaluated in a murine colitis model induced by dextran sodium sulfate (DSS). Noteworthy improvements in physiological parameters such as body weight, disease activity index, and colon length were observed in mice treated with the mixture followed by IR62. Additionally, administration of each strain and the mixture mitigated DSS-induced changes in gut microbiota composition with increased abundance of Lactobacillus, Bifidobacterium, Ruminococcus, and Muribaculum, compared to DSS-treated mice. Interestingly, the abundance of Muribaculum increased approximately 2.4-fold after administration of the mixture compared to before administration. Additionally, the concentration of short-chain fatty acids (SCFAs) was significantly reduced in DSS-treated group compared to the control group, while the mixture treatment group had the highest concentration of SCFAs. Furthermore, due to these changes in microbiota and the leading metabolites induced by treatment of the mixture, DSS-induced dysregulation of inflammationand barrier function-related mRNA expressions was significantly inhibited in the group fed with the mixture. Consequently, this study indicates that the multi-strain mixture of L. fermentum strains may play a crucial role in modulating gut microbiota, thereby alleviating IBD through the synergistic effect of the individual effects of the three strains.
Preprint
Background: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. Methods and results: We introduce a linear space algorithm for Baum-Welch training. For a hidden Markov model with M states, T free transition and E free emission parameters, and an input sequence of length L, our new algorithm requires O(M) memory and O(L M T_max (T + E)) time for one Baum-Welch iteration, where T_max is the maximum number of states that any state is connected to. The most memory efficient algorithm until now was the checkpointing algorithm with O(log(L) M) memory and O(log(L) L M T_max) time requirement. Our novel algorithm thus renders the memory requirement completely independent of the length of the training sequences. More generally, for an n-hidden Markov model and n input sequences of length L, the memory requirement of O(log(L) L^(n-1) M) is reduced to O(L^(n-1) M) memory while the running time is changed from O(log(L) L^n M T_max + L^n (T + E)) to O(L^n M T_max (T + E)). Conclusions: For the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can thus be both faster and more memory-efficient than any of the existing algorithms.
Article
Prosthetic joint infections are devastating complications of joint arthroplasties. Without effective management, they can lead to limb amputation and even death. A significant proportion of these infections is caused by the primarily commensal Coagulase-negative Staphylococci pathogens, which form thick, antibiotic-resistant biofilms at the site of infection. Combinatorial therapy involving antibiotics and bacteriophages may represent a strategy to overcome resistance. Previous research indicates that as bacteria develop resistance to antibiotics, they often become more susceptible to bacteriophages. In this study, we produced a cocktail of novel bacteriophages and assessed their viability to eradicate nosocomial staphylococcal biofilms. Here, we used clinical isolates from prosthetic joint infections to isolate and identify four new bacteriophages from sewage effluent. These novel phages were characterized through electron microscopy and full genome sequencing. Subsequently, we combined them into a phage cocktail, which effectively re-sensitized biofilms to vancomycin and flucloxacillin. Notably, this phage cocktail demonstrated low cytotoxicity in vitro to human epithelial cells, even when used alongside antibiotic treatments. These findings highlight the potential of the phage cocktail as a tool to increase antibiotic treatment success in prosthetic joint infections.
Article
Antibiotic resistance in shrimp farms has emerged as an extremely serious situation worldwide. The main aim of this study was to optimize the cultural conditions for producing new antibiotic agents from marine Streptomyces species. Streptomyces SK3 was isolated from marine sediment and was identified by its 16S rDNA as well as biochemical characteristics. This microbe produced the highest concentration of bioactive secondary metabolites (BSMs) when cultured in YM medium (YM/2). It produced the maximum total protein (41.8 ± 6.36 mg/ml) during the late lag phase period. The optimum incubation temperature was recorded at 30 °C; BSMs were not produced at ≤10 °C within an incubation period of 3–4 days. The suitable agitation speed was found to be 200 rpm with pH 7.00. The proper carbon, nitrogen, and trace elements supplementation consisted of starch, malt extract, calcium carbonate (CaCO 3 ), and magnesium sulfate (MgSO 4 ). The ethyl acetate extract was found to act strongly against three vibriosis pathogens, Vibrio harveyi, Vibrio parahaemolyticus , and Vibrio vunificus , as indicated by the inhibition zones at 34.5, 35.4, and 34.3 mm, respectively. The extract showed the strongest anti- V. harveyi activity, as indicated by minimum inhibitory concentration (MIC) and minimum bactericidal concentration (MBC) values of 0.101 ± 0.02 and 0.610 ± 0.04 mg/ml, respectively. Basic chemical investigation of the crude extract using thin layer chromatography (TLC), bioautography, liquid chromatography tandem mass spectrometry (LC‒MS/MS), Fourier transform infrared spectroscopy (FTIR), and proton nuclear magnetic resonance ( ¹ H-NMR) revealed that the active components were the terpenoid and steroid groups of compounds. They showed carboxylic acid and ester functions in their molecules.
Preprint
Full-text available
Alignment against a database of genomes is a fundamental operation in bioinformatics, popularized by BLAST. However, the rate at which microbial genomes are sequenced has continued to increase, and there are now datasets in the millions, far beyond the abilities of existing alignment tools. We introduce LexicMap, a nucleotide sequence alignment tool for efficiently querying moderate length sequences (> 500 bp) such as a gene, plasmid or long read against up to millions of prokaryotic genomes. A key innovation is to construct a small set of probe k-mers (e.g. n = 40,000) which window-cover the entire database to be indexed, in the sense that every 500 bp window of every database genome contains multiple seed k-mers each with a shared prefix with one of the probes. Storing these seeds, indexed by the probes with which they agree, in a hierarchical index enables fast and low-memory variable-length seed matching, pseudoalignment, and then full alignment. We show that LexicMap is able to align with higher sensitivity than Blastn as the query divergence drops from 90% to 80% for queries ≥ 1 kb, and then benchmark on small (GTDB) and large (AllTheBacteria and Genbank+RefSeq) databases. We show that LexicMap achieves higher sensitivity and speed and lower memory compared to the state-of-the-art approaches. Alignment of a single gene against 2.34 million prokaryotic genomes from GenBank and RefSeq takes 36 seconds (rare gene) to 15 minutes (16S rRNA gene). LexicMap produces output in standard formats including that of BLAST and is available under MIT license at https://github.com/shenwei356/LexicMap.
Article
Background Honey is a nutritious food made by bees from nectar and sweet deposits of flowering plants and has been used for centuries as a natural remedy for wound healing and other bacterial infections due to its antibacterial properties. Honey contains a diverse community of bacteria, especially probiotic bacteria, that greatly affect the health of bees and their consumers. Therefore, understanding the microorganisms in honey can help to ensure the quality of honey and lead to the identification of potential probiotic bacteria. Methods Herein, the bacteria community in honey produced by Apis cerana was investigated by applying the next-generation sequencing (NGS) method for the V3–V4 hypervariable regions of the bacterial 16S rRNA gene. In addition, lactic acid bacteria (LAB) in the honey sample were also isolated and screened for in vitro antimicrobial activity. Results The results showed that the microbiota of A. cerana honey consisted of two major bacterial phyla, Firmicutes (50%; Clostridia , 48.2%) and Proteobacteria (49%; Gammaproteobacteria , 47.7%). Among the 67 identified bacterial genera, the three most predominant genera were beneficial obligate anaerobic bacteria, Lachnospiraceae (48.14%), followed by Gilliamella (26.80%), and Enterobacter (10.16%). Remarkably, among the identified LAB, Lactobacillus kunkeei was found to be the most abundant species. Interestingly, the isolated L. kunkeei strains exhibited antimicrobial activity against some pathogenic bacteria in honeybees, including Klebsiella spp ., Escherichia coli , Enterococcus faecalis , Pseudomonas aeruginosa and Staphylococcus aureus . This underscores the potential candidacy of L. kunkeei for developing probiotics for medical use. Taken together, our results provided new insights into the microbiota community in the A. cerana honey in Hanoi, Vietnam, highlighting evidence that honey can be an unexplored source for isolating bacterial strains with potential probiotic applications in honeybees and humans.
Conference Paper
O Alinhamento Múltiplo de Sequências genéticas é essencial para a área de bioinformática. Devido à sua complexidade exponencial, heurísticas são utilizadas. A mais popular é o Alinhamento Progressivo, com inúmeras ferramentas desenvolvidas ao longo dos anos. Entretanto, nenhuma consegue gerar sempre o melhor alinhamento, nem se sobressair. Assim, os cientistas são obrigados a escolher e utilizar mais de uma ferramenta. Ao invés de desenvolver uma nova heurística, este trabalho apresenta uma metaferramenta que avalia novas combinações de técnicas extraídas de outras ferramentas e coordena suas execuções eficientemente. A abordagem é capaz de alcançar speedups superlineares, mantendo, e por vezes melhorando, a qualidade dos alinhamentos.
Article
Two Gram-stain-negative, non-spore-forming, rod-shaped, and obligately aerobic bacteria, designated strains CX-624 T and cx-311, were isolated from soil samples in Qinghai Province, China. The two strains grew best at 28 °C on the plate with Tryptone soya agar (TSA). Cells formed circular, convex, translucent, smooth, and orange colonies with approximately 1.0 mm diameter after 2 days of incubation on TSA at 28 °C. The strains were oxidase-negative and catalase-positive. The predominant cellular fatty acids were iso-C 15 : 0 and anteiso-C 15 : 0 , and major polar lipids included phosphatidylethanolamine, an unidentified aminophospholipid, four unidentified lipids and an aminolipid. MK-6 was the sole menaquinone in strain CX-624 T . Comparative analysis of the nearly full-length 16S rRNA gene sequences showed strains CX-624 T and cx-311 were member of the family Weeksellaceae , with the highest similarity to Kaistella haifensis H38 T (96.66 %), Epilithonimonas pallida DSM 18015 T (96.59 %), and Chryseobacterium gambrini DSM 18014 T (96.53 %). Both phylogenetic analysis of the 16S rRNA gene and 177 core genes revealed that strains CX-624 T and cx-311 formed an independent clade. Average nucleotide identity values (< 72.64 %), average amino-acid identity values (<72.61 %) and digital DNA–DNA hybridization (< 21.10 %) indicated that the strains CX-624 T and cx-311 should constitute a novel genus. The DNA G+C contents of strains CX-624 T and cx-311 were 43.0 mol% and 42.7 mol%. According to the data obtained in this study, strain CX-624 T represents a novel species belonging to a novel genus of the Weeksellaceae , for which the name Marnyiella aurantia gen. nov., sp. nov. is proposed. The type strain is CX-624 T (=GDMCC 1.1714 T = JCM 33925 T ).
Article
Full-text available
When comparing two biological sequences, it is often desirable for a gap to be assigned a cost not directly proportional to its length. If affine gap costs are employed, in other words if opening a gap costsv and each null in the gap costsu, the algorithm of Gotoh (1982,J. molec. Biol. 162, 705) finds the minimum cost of aligning two sequences in orderMN steps. Gotoh's algorithm attempts to find only one from among possibly many optimal (minimum-cost) alignments, but does not always succeed. This paper provides an example for which this part of Gotoh's algorithm fails and describes an algorithm that finds all and only the optimal alignments. This modification of Gotoh's algorithm still requires orderMN steps. A more precise form of path graph than previously used is needed to represent accurately all optimal alignments for affine gap costs.
Article
Full-text available
The sequence alignment algorithms of Needleman and Wunsch (1970) and Sellers (1974) are compared. Although the former maximizes similarity and the latter minimizes differences, the two procedures are proven to be equivalent. The equivalence relations necessary for each procedure to give the same result are: 1, the weight assigned to gaps in the Sellers algorithm exceed that in the Needleman-Wunsch algorithm by exactly half the length of the gap times the maximum match value; and 2, for any pair of aligned elements, the degree of similarity assigned by the Needleman-Wunsch algorithm plus the degree of dissimilarity assigned by the Sellers algorithm equal a constant. The utility of the algorithms is independent of the nature of the elements in the sequence and could include anything from geological sequence to the amino acid sequences of proteins. Examples are provided using known nucleotide sequences, one of which shows two sequences to be analogous rather than homologous.
Article
The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D 2) time variation.
Article
Some new metrics are introduced to measure the distance between biological sequences, such as amino acid sequences or nucleotide sequences. These metrics generalize a metric of Sellers, who considered only single deletions, mutations, and insertions. The present metrics allow, for example, multiple deletions and insertions and single mutations. They also allow computation of the distance among more than two sequences. Algorithms for computing the values of the metrics are given which also compute best alignments. The connection with the information theory approach of Reichert, Cohen, and Wong is discussed.
Article
The edit distance between strings a1 … am and b1 … bn is the minimum cost s of a sequence of editing steps (insertions, deletions, changes) that convert one string into the other. A well-known tabulating method computes s as well as the corresponding editing sequence in time and in space O(mn) (in space O(min(m, n)) if the editing sequence is not required). Starting from this method, we develop an improved algorithm that works in time and in space O(s · min(m, n)). Another improvement with time O(s · min(m, n)) and space O(s · min(s, m, n)) is given for the special case where all editing steps have the same cost independently of the characters involved. If the editing sequence that gives cost s is not required, our algorithms can be implemented in space O(min(s, m, n)). Since s = O(max(m, n)), the new methods are always asymptotically as good as the original tabulating method. As a by-product, algorithms are obtained that, given a threshold value t, test in time O(t · min(m, n)) and in space O(min(t, m, n)) whether s ⩽ t. Finally, different generalized edit distances are analyzed and conditions are given under which our algorithms can be used in conjunction with extended edit operation sets, including, for example, transposition of adjacent characters.
Article
The edit distance between two character strings can be defined as the minimum cost of a sequence of editing operations which transforms one string into the other. The operations we admit are deleting, inserting and replacing one symbol at a time, with possibly different costs for each of these operations. The problem of finding the longest common subsequence of two strings is a special case of the problem of computing edit distances. We describe an algorithm for computing the edit distance between two strings of length n and m, n ⪖ m, which requires steps whenever the costs of edit operations are integral multiples of a single positive real number and the alphabet for the strings is finite. These conditions are necessary for the algorithm to achieve the time bound.
Article
Just after he introduced dynamic programming, Richard Bellman with R. Kalaba in 1960 gave a method for finding Kth best policies. Their method has been modified since then, but it is still not practical for many problems. This paper describes a new technique which modifies the usual backtracking procedure and lists all near-optimal policies. This practical algorithm is very much in the spirit of the original formulation of dynamic programming. An application to matching biological sequences is given.
Article
The string-to-string correction problem is to determine the distance between two strings as measured by the minimum cost sequence of “edit operations” needed to change the one string into the other. The edit operations investigated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string. An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings. Possible applications are to the problems of automatic spelling correction and determining the longest subsequence of characters common to two strings.
Article
This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command.
Article
We consider efficient methods for computing a difference metric between two sequences of symbols, where the cost of an operation to insert or delete a block of symbols is a concave function of the block's length. Alternatively, sequences can be optimally aligned when gap penalties are a concave function of the gap length. Two algorithms based on the ‘candidate list paradigm’ first used by Waterman (1984) are presented. The first computes significantly more parsimonious candidate lists than Waterman's method. The second method refines the first to the point of guaranteeingO(N 2 lgN) worst-case time complexity, and under certain conditionsO(N 2). Experimental data show how various properties of the comparison problem affect the methods' relative performance. A number of extensions are discussed, among them a technique for constructing optimal alignments inO(N) space in expectation. This variation gives a practical method for comparing long amino sequences on a small computer.
Article
Existing methods for getting the locally best matched alignments between a pair of biological sequences require O(N2) computational steps and O(N2) storage, where N is the average sequence length. An improved method is presented with which the storage requirement is greatly reduced, while the computational steps remain O(N2). Only a small number of additional steps are required to display any common sub-sequences with similarity scores greater than a given threshold. The aligments found by the algorithm are optimal in the sense that their scores are locally maximal, where each score is a sum of weights given to individual matches/replacements, insertions and deletions involved in the alignment. The algorithm was implemented in C programming language on a personal computer. Data area of 64 kbytes on random access memory and a few hundred kbytes on a disk is sufficient for comparing two protein or nucleic acid sequences of 2500 residues. The programs are particularly valuable when used in combination with fast sequence search programs.
Article
The major algorithms currently used for aligning biological sequences are those based on dynamic programming method. A dynamic programming algorithm consists of two major procedures, forward and traceback routines. This paper describes a dynamic programming algorithm for aligning three sequences at a time. Deletions and insertions are penalized according to their numbers and lengths. A forward process is accomplished in O(L3) computational steps, where L is the average sequence length. On the other hand, a traceback process is done in T steps, where T is the number of elementary configurations involved in the optimal alignment (usually T much less than L). The traceback procedure uses an effective technique for memory management, which is applicable to a wide range of sequence-matching methods.
Article
An algorithm and a program have been developed which enable optimal alignments of biological sequences on an 8–bit microcomputer. The compiled program can process sequences up to 1000 residues on a Commodore 64. Since this program was written originally in the BASIC language, it may readily be adapted to other microcomputers with small changes.
Article
The algorithm of Gotoh1 computes in two passes of MN steps the alignment of a pair of sequences of lengths M and N, subject to a constraint on the form of the gap weighting function. This compares with the previous algorithm of Waterman et al.2 which runs in M2n steps. Gotoh1 also gave a method using two passes of (L+2)MN steps in the case where gap weights remain constant for gaps of length greater than L. Here we describe a procedure for computing the alignment (evolutionary distance and optimal path) in a single pass of MN steps for both cases.
Article
We show how to speed up sequence alignment algorithms of the type Introduced by Needleman and Wunsch (and generalized by Sellers and research-articles). Faster alignment algorithms have been introduced, but always at the cost of possibly getting sub-optimal alignments. Our modification results in the optimal alignment still being found, often In 1/10 the usual time. What we do is reorder the computation of the usual alignment matrix so that the optimal alignment is ordinarily found when only a small fraction of the matrix is filled. The number of matrix elements which have to be computed is related to the distance between the sequences being aligned; the better the optimal alignment, the faster the algorithm runs.
Article
The algorithm of Waterman et al. (1976) for matching biological sequences was modified under some limitations to be accomplished in essentially MN steps, instead of the M2N steps necessary in the original algorithm. The limitations do not seriously reduce the generality of the original method, and the present method is available for most practical uses. The algorithm can be executed on a small computer with a limited capacity of core memory.