Article

{Next-generation DN}A sequencing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Biomarkers are biological molecules that are associated with a particular disease or condition, and their measurement can provide valuable diagnostic and prognostic information. [5] Artificial intelligence (AI) is also playing an increasingly important role in diagnostic testing. Machine learning algorithms can analyze large amounts of data to identify patterns and make accurate predictions, leading to more accurate and efficient diagnoses. ...
... Additionally, ethical and legal issues surrounding genetic testing and other sensitive diagnostic tests must also be considered. [5], [6] International Journal of Science Early detection and diagnosis of diseases have been linked to improved patient outcomes. When a disease is detected early, it is easier to treat, and the patient has a higher chance of recovery. ...
... They can be found in various body fluids, such as blood, urine, and cerebrospinal fluid, as well as in tissues and cells. [5], [6] Biomarkers are valuable diagnostic tools because they can provide information about the presence, severity, and progression of a disease Blood biomarkers are commonly used to diagnose a wide range of diseases, including cancer, cardiovascular disease, and infectious diseases. For example, elevated levels of prostate -specific antigen (PSA) in the blood can be a biomarker for prostate cancer, while increased levels of troponin in the blood can indicate a heart attack. ...
... (25,26) Sanger sequencing (also known as the chain termination method) uses special nucleotides called dideoxy terminators (ddNTPs), which are characterized by the absence of the free OH group at the 3' carbon of the pentose. (5,25,27) When these ddNTPs are added to a growing DNA sequence, the sequence is prevented from continuing to grow. This results in DNA sequences of different sizes which, after amplification, are run on an agarose gel. ...
... This results in DNA sequences of different sizes which, after amplification, are run on an agarose gel. (5,27) Depending on the radioactivity of each fragment and its size, it is possible to determine the genetic sequence of the target region nucleotide by nucleotide. (5,25) Nowadays, instead of using radioactive ddNTPs or agarose gels, automated methods based on fluorescent ddNTPs and running fragments in multichannel capillary electrophoresis devices are used ( Figure 5). ...
... (5,25) Nowadays, instead of using radioactive ddNTPs or agarose gels, automated methods based on fluorescent ddNTPs and running fragments in multichannel capillary electrophoresis devices are used ( Figure 5). (25,27) This method is still time-consuming and very expensive if the intention is to sequence a large number of genes (justifying the years and billions of dollars spent on the HGP to sequence a single genome). (12,26) However, this does not invalidate its value, with this method currently considered not only the most reliable, but also the gold standard (often used as a confirmatory test for NGS). ...
Article
Full-text available
The neonatologist is often the first clinician to identify genetic disorders without prenatal diagnosis. Technological advances in genetics over the past few decades have opened up possibilities never before imagined. Gone are the days when we could offer our patients little more than a peripheral blood karyotype. Newer methods, such as comparative genomic hybridization or Sanger sequencing and next-generation sequencing, allow a more detailed analysis of the human genome, both at the level of large rearrangements (deletions, duplications) and potentially pathogenic point variants. High-tech technologies have been useful in uncovering genes involved in diseases that have long been known to have a genetic origin, but whose etiology has remained elusive. Despite the promise of these technologies, no method is self-sufficient, and all have limitations. The aim of this review is to update clinicians on the genetic tests that are currently available and in use. Given that the first human genome was sequenced just over twenty years ago, what news will the next twenty years bring?
... With this strategy, output is improved while reagent costs are kept to a minimum (Metzker, 2010). The procedure, also known as bridge amplification or cluster creation, makes it easier for compact colonies known as polonies to replicate a clonally enriched template DNA (Shendure and Ji, 2008). With shorter read lengths (around 100 base pairs) and a larger output of sequencing data (600 gigabases) in a single run, the sequencer is more cost-effective. ...
... In order to use this method, which is based on the Sequencing of 2 Nucleotides by Ligation (SBL) idea, the probe must first be annealed before being ligated to the template (Sendure and Ji, 2008). The SOLiD 5500 W series has previously been used in a variety of applications, including whole-genome, transcriptome, and exome research (Shendure and Ji, 2008). With the use of fluorescently labelled octamer probes, this sequencer repeatedly engages in cycles of annealing and ligation. ...
Chapter
Biotechnology is one of the emerging fields that can add new and better application in a wide range of sectors like health care, service sector, agriculture, and processing industry to name some. This book will provide an excellent opportunity to focus on recent developments in the frontier areas of Biotechnology and establish new collaborations in these areas. The book will highlight multidisciplinary perspectives to interested biotechnologists, microbiologists, pharmaceutical experts, bioprocess engineers, agronomists, medical professionals, sustainability researchers and academicians. This technical publication will provide a platform for potential knowledge exhibition on recent trends, theories and practices in the field of Biotechnology. Aim of the research articles are invited in the following areas of interest.
... The ability to examine the human genome at various levels, from chromosomal to single-base changes, greatly enhances its current potential. Next-generation sequencing (NGS) or massively parallel sequencing (MPS) are commonly used terms to describe this technology, encompassing a broad range of methodologies (Shendure and Ji 2008). NGS allows for the rapid and cost-effective generation of vast amounts of data in each instrument run, enabling parallel analysis of multiple samples. ...
... Over the years, various NGS platforms with distinct sequencing chemistries and approaches were developed. Among them, Illumina's sequencing-by-synthesis technology, introduced in 2006, gained widespread adoption due to its accuracy, scalability, and cost-effectiveness (Shendure and Ji 2008). Other notable NGS platforms include Roche's 454 pyrosequencing, Ion Torrent's semiconductor sequencing, and Pacific Biosciences' single-molecule real-time (SMRT) sequencing. ...
Chapter
Full-text available
Next-generation sequencing (NGS) technologies have revolutionized the field of genomics by enabling high-throughput, cost-effective, and rapid DNA sequencing on an unprecedented scale. This introduction offers a synopsis of NGS and its profound implications across diverse areas of biological research and medical diagnostics. The fundamental principles underlying NGS, including library preparation, sequencing-by-synthesis, and data generation, are outlined. The different NGS platforms, such as Illumina, Ion Torrent, and Oxford Nanopore, as well as their respective strengths and limitations, are discussed. Recent advancements in sequencing technologies, such as single-cell sequencing, long-read sequencing, and spatial transcriptomics, are explored, expanding the capabilities of NGS and facilitating comprehensive genomic investigations. Subsequently, the applications of NGS in genomics, transcriptomics, epigenomics, metagenomics, and personalized medicine are examined. The accelerated discovery of genetic variants, gene expression patterns, DNA methylation profiles, and microbial communities through NGS is emphasized. Moreover, the role of NGS in uncovering disease mechanisms, identifying therapeutic targets, and enabling precision medicine approaches is discussed. Furthermore, the computational challenges associated with NGS data analysis, including read alignment, variant calling, and data interpretation, are addressed. The pivotal role of bioinformatics and data analysis pipelines in transforming raw sequencing data into biologically meaningful insights is highlighted. Additionally, the integration of NGS data with other omics datasets and the emerging field of multi-omics integration, providing a holistic view of biological systems, are briefly touched upon. Additionally, the impact of NGS on clinical diagnostics, encompassing the detection of genetic disorders, cancer genomics, infectious disease surveillance, and pharmacogenomics, is elucidated. The potential of NGS-based liquid biopsies and non-invasive prenatal testing in revolutionizing clinical practice is underscored. Lastly, the challenges and considerations associated with NGS, such as data storage, privacy concerns, ethical considerations, and the importance of standardization and quality control measures, are addressed. The significance of interdisciplinary collaborations among scientists, clinicians, and bioinformaticians in harnessing the full potential of NGS and driving innovation in genomic research and healthcare is emphasized. In conclusion, this comprehensive introduction to next-generation sequencing provides an overview of the technology, its applications, and its impact across various fields. By empowering researchers and clinicians with unprecedented genomic information, NGS has the potential to revolutionize our understanding of biological systems, unravel disease complexities, and facilitate personalized approaches to healthcare.
... The rapid development of sequencing technologies has promoted substantial advancement in GWAS, particularly in obtaining comprehensive genetic information from limited samples [7,8]. The integration of sequenced samples provides a great opportunity for identifying novel genetic associations and increasing the statistical power of single-variant association tests [9]. ...
... A recent transcriptome study identified WNT7B as amongst the most enriched transcripts in anterior capsule tissue in patients undergoing arthroscopic capsulotomy surgery for frozen shoulder (tissue disorder) suggesting WNT7B as a potential causal gene at the locus [30]. SNP rs2290221 on chromosome 7 is identified for association with Fibroblastic disorders, and shows the strongest association signal with a p-value of 8 1.26 10 − × by iECAT-RC. This SNP is in the intronic of the genes secreted frizzled-related protein 4 (SFRP4) and ependymal related protein 1 (zebrafish) (EPDR1). ...
Preprint
Full-text available
Genome-wide association studies (GWAS) have successfully revealed many disease-associated genetic variants. For a case-control study, the adequate power of an association test can be achieved with a large sample size, although genotyping large samples is expensive. A cost‐effective strategy to boost power is to integrate external control samples with publicly available genotyped data. However, the naïve integration of external controls may inflate the type I error rates if ignoring the systematic differences (batch effect) between studies, such as the differences in sequencing platforms, genotype calling procedures, population stratification, and so forth. To account for the batch effect, we propose an approach by integrating External Controls into the Association Test by Regression Calibration (iECAT-RC) in case-control association studies. Extensive simulation studies show that iECAT-RC not only can control type I error rates but also can boost statistical power in all models. We also apply iECAT-RC to the UK Biobank data for M72 Fibroblastic disorders by considering genotype calling as the batch effect. Four SNPs associated with Fibroblastic disorders have been detected by iECAT-RC and the other two comparison methods. However, our method has a higher probability of identifying these significant SNPs in the scenario of an unbalanced case-control association study.
... To assess the amplification fidelity, gene sequencing technologies can be used to quantify amplification error rates. Traditional Sanger sequencing can be used [15], however, massively parallelized, next generation sequencing (NGS), which is also known as second generation, (e.g., Illumina) provide a larger data set for error analysis. One limitation to this approach is the use of short reads (typically 75-300 bp) which requires a complex genome assembly that works well to reconstruct common sequences (ideally a single sequence) but lacks the depth to capture mutations on pieces that fail to align. ...
Preprint
Full-text available
In this work, Oxford Nanopore sequencing is tested as an accessible method for quantifying heterogeneity of amplified DNA. This method enables rapid quantification of deletions, insertions, and substitutions, the probability of each mutation error, and their locations in the replicated sequences. Amplification techniques tested were conventional polymerase chain reaction (PCR) with varying levels of polymerase fidelity (OneTaq, Phusion, and Q5) as well as rolling circle amplification (RCA) with Phi29 polymerase. Plasmid amplification using bacteria was also assessed. By analyzing the distribution of errors in a large set of sequences for each sample, we examined the heterogeneity and mode of errors in each sample. This analysis revealed that Q5 and Phusion polymerases exhibited the lowest error rates observed in the amplified DNA. As a secondary validation, we analyzed the emission spectra of sfGFP fluorescent proteins synthesized with amplified DNA using cell free expression. Error-prone polymerase chain reactions confirmed the dependency of reporter protein emission spectra peak broadness to DNA error rates. The presented nanopore sequencing methods serve as a roadmap to quantify the accuracy of other gene amplification techniques, as they are discovered, enabling more homogenous cell-free expression of desired proteins.
... It includes a comparative analysis of three prominent next-generation sequencing platforms, highlighting their strengths and limitations. Additionally, it provides a comprehensive review of next-generation DNA sequencing technologies, detailing their methodologies and applications in genomics research [11,15,24]. ...
Article
Full-text available
Using Sri Lanka as a case study, the paper explores how DNA analysis has transformed genetic research, particularly in low and middle-income countries (LMICs). It discusses various DNA analyzing techniques, from traditional methods like Sanger sequencing to advanced techniques such as next-generation sequencing (NGS) and CRISPR-Cas9, highlighting their applications in disease research, population genetics, and forensic science. Sri Lanka's advancements in genetic research, including DNA sequencing, typing, and recent developments in X-chromosome-based DNA typing, are emphasized. The paper also examines challenges and opportunities in LMICs regarding genetic research and underscores the importance of DNA analysis in advancing personalized medicine and understanding genetic diversity. Additionally, it discusses Sri Lanka's efforts in education and training in molecular biology. It explores the country's rich genetic diversity 55 and demographic history, focusing on ethnic studies and historical interactions among different population groups. Overall, the paper highlights the significance of DNA analysis in genetic research and its potential implications for LMICs. Sri Lanka is a notable example of progress in the field.
... The central dogma of molecular biology elucidates the flow of information from DNA to RNA through transcription, and from RNA to protein through translation. Over the past few years, next generation sequencing (NGS) technologies have revolutionized transcriptomic analysis, providing a rapid and cost-effective means to explore large-scale data (1)(2)(3). NGS has become an indispensable tool in biomedicine and cancer research, facilitating a comprehensive understanding of gene expression responses to various cellular states ( 4 ,5 ). The highthroughput nature of NGS allows simultaneous analysis of millions of RNA sequences, yielding extensive information previously unattainable with traditional sequencing or PCRbased methods. ...
Article
Full-text available
In this review, we explore the transformative impact of next generation sequencing technologies in the realm of translatomics (the study of how translational machinery acts on a genome-wide scale). Despite the expectation of a direct correlation between mRNA and protein content, the complex regulatory mechanisms that affect this relationship remark the limitations of standard RNA-seq approaches. Then, the review characterizes crucial techniques such as polysome profiling, ribo-seq, trap-seq, proximity-specific ribosome profiling, rnc-seq, tcp-seq, qti-seq and scRibo-seq. All these methods are summarized within the context of cancer research, shedding light on their applications in deciphering aberrant translation in cancer cells. In addition, we encompass databases and bioinformatic tools essential for researchers that want to address translatome analysis in the context of cancer biology.
... The advent of Next-Generation Sequencing (NGS) has significantly advanced DNA sequencing technology, revolutionizing genetic and medical research. Compared to traditional Sanger sequencing methods, NGS can swiftly sequence complete genomes, including both coding and non-coding regions such as telomeres [59,60]. Computational tools like Telocat, Computel, and Tel-seq estimate telomere length from genomic data, eliminating the need for laboratory experiments [17,61,62]. ...
Article
Full-text available
Telomeres are located at the ends of chromosomes and have specific sequences with a distinctive structure that safeguards genes. They possess capping structures that protect chromosome ends from fusion events and ensure chromosome stability. Telomeres shorten in length during each cycle of cell division. When this length reaches a certain threshold, it can lead to genomic instability, thus being implicated in various diseases, including cancer and neurodegenerative disorders. The possibility of telomeres serving as a biomarker for aging and age-related disease is being explored, and their significance is still under study. This is because post-mitotic cells, which are mature cells that do not undergo mitosis, do not experience telomere shortening due to age. Instead, other causes, for example, exposure to oxidative stress, can directly damage the telomeres, causing genomic instability. Nonetheless, a general agreement has been established that measuring telomere length offers valuable insights and forms a crucial foundation for analyzing gene expression and epigenetic data. Numerous approaches have been developed to accurately measure telomere lengths. In this review, we summarize various methods and their advantages and limitations for assessing telomere length.
... DNBS (DNA nanoball sequencing) enables large collection of DNA nanoballs for simultaneous sequencing. Illumina-based sequencing technology represents a "reversible terminator sequencing" method (Mardis et al., 2008: Shendure et al., 2008. High-throughput sequencing has the advantage of fast speed, low sequencing cost and high accuracy, otherwise known as next-generation sequencing (NGS). ...
Chapter
Full-text available
In recent times, deep sequencing technologies have been revolutionizing biology and medicine, providing single base-level precision for our understanding of nucleic acid sequences in a high throughput manner. One of those used for profiling the transcriptome is known as RNA-Seq. RNA-Seq also referred to as RNA Sequencing, is now a popular method for analyzing gene expression and identifying novel RNA species. Contrasted to earlier Sanger sequencing- and microarray-based methods, RNA-Seq provides significantly higher coverage and better resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data produced by RNA-Seq enable the identification of alternatively spliced genes, the discovery of novel transcripts, and the detection of allele-specific expressions. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to bioinformatic analysis, have made it possible for researchers to better understand the functional complexity of transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can also be used to examine various RNA populations, such as total RNA, pre�mRNA, and noncoding RNA, including microRNA and long ncRNA.
... In 1999, the Food and Drug Administration (FDA) of the USA authorized the consumption of soy and soy-derived products for the treatment of cardiovascular diseases (Shendure & Ji, 2008). There exists a relationship between the various 'omics' strategies like genomics, transcriptomics, proteomics, and metabolomics (Fig. 4) since omics provide information about the epi-level of single-cells, molecular interactions, and disease-related features such as the metabolome and immunome (X. ...
Article
Background Okara is a significant agricultural waste manufactured through tofu and soymilk processing. The high protein content (>35%) and extractability (>80%) of okara render it a promising candidate for application as a functional ingredient in food fortification. However, raw okara decomposes decisively, necessitating prompt handling and preservation since its unfavorable moisture content (70–80%) renders it prone to deterioration. Scope and approach Okara has been utilized as an inexpensive substrate in microbial fermentation (MF) processes that convert it into several value-added products. The practice of okara MF provides new possibilities to design more bioprocessing with improved nutritional, sensory, and genetic development goals for extending the number of exciting secondary metabolites (SMs), enzymes, and functional ingredients. However, there is a paucity of data summarizing existing okara protein functionality features associated with recent advances in analytical strategies. Key findings and conclusions Multi-omics can investigate protein properties, functional expression, and the interaction of significant functional annotations, including amino acids and carbohydrate metabolism. Using bioinformatic analytical tools, the protein structure, potential bioactive peptides, and potential ligand conformations can be anticipated, and okara bioactive peptides may be investigated as prospective drug production candidates. This study explored the prospective technologies for microbial protein production, their benefits, the constraints associated with their advanced analysis, and the perspectives for their implementation on a larger scale. The content documented in this manuscript could assist in developing okara MF as improved nutritional properties in a cost-effective production strategy.
... Although studies have indicated a shared genetic basis between common variants associated with lifetime MDD and depressive symptoms in the general population, it remains unclear whether this association applies to rare variants [17]. Advancement in NGS technology can help identify rare variants [18]. Previous studies have conducted gene-based analyses of rare damaging variants to identify genes related to MDD using the UK Biobank exome dataset [19,20]. ...
Article
Full-text available
Major depressive disorder (MDD) is a common mental illness worldwide and is triggered by an intricate interplay between environmental and genetic factors. Although there are several studies on common variants in MDD, studies on rare variants are relatively limited. In addition, few studies have examined the genetic contributions to neurostructural alterations in MDD using whole-exome sequencing (WES). We performed WES in 367 patients with MDD and 161 healthy controls (HCs) to detect germline and copy number variations in the Korean population. Gene-based rare variants were analyzed to investigate the association between the genes and individuals, followed by neuroimaging-genetic analysis to explore the neural mechanisms underlying the genetic impact in 234 patients with MDD and 135 HCs using diffusion tensor imaging data. We identified 40 MDD-related genes and observed 95 recurrent regions of copy number variations. We also discovered a novel gene, FRMPD3, carrying rare variants that influence MDD. In addition, the single nucleotide polymorphism rs771995197 in the MUC6 gene was significantly associated with the integrity of widespread white matter tracts. Moreover, we identified 918 rare exonic missense variants in genes associated with MDD susceptibility. We postulate that rare variants of FRMPD3 may contribute significantly to MDD, with a mild penetration effect.
... Next-generation sequencing (NGS) [1,2] technology has allowed ambitious omics projects, such as the Human Genome Project and Earth BioGenome Project, to better support humanity's development and Earth's environmental harmony. Moreover, the increase in sequencing technology and decreasing cost [3] have enabled scientific researchers to use multi-omics techniques for in-depth analysis of major scientific issues covering human health [4,5], animal science [6], plant science [7] and earth science [8]. ...
Article
Full-text available
Introduction: The rapid growth of omics technologies has led to the use of bioinformatics as a powerful tool for unravelling scientific puzzles. However, the obstacles of bioinformatics are compounded by the complexity of data processing and the distinct nature of omics data types, particularly in terms of visualization and statistics. Objectives: We developed a comprehensive and free platform, CFViSA, to facilitate effortless visualization and statistical analysis of omics data by the scientific community. Methods: CFViSA was constructed using the Scala programming language and utilizes the AKKA toolkit for the web server and MySQL for the database server. The visualization and statistical analysis were performed with the R program. Results: CFViSA integrates two omics data analysis pipelines (microbiome and transcriptome analysis) and an extensive array of 79 analysis tools spanning simple sequence processing, visualization, and statistics available for various omics data, including microbiome and transcriptome data. CFViSA starts from an analysis interface, paralleling a demonstration full course to help users understand operating principles and scientifically set the analysis parameters. Once analysis is conducted, users can enter the task history interface for figure adjustments, and then a complete series of results, including statistics, feature tables and figures. All the graphic layouts were printed with necessary statistics and a traceback function recording the options for analysis and visualization; these statistics were excluded from the five competing methods. Conclusion: CFViSA is a user-friendly bioinformatics cloud platform with detailed guidelines for integrating functions in multi-omics analysis with real-time visualization adjustment and complete series of results provision. CFViSA is available at http://www.cloud.biomicroclass.com/en/CFViSA/.
... There is a pressing need for tools that can expedite and enhance the accuracy of species identification, allowing us to fully harness the potential of arthropod communities as indicators of environmental change and key contributors to ecological processes. Recent developments in metabarcoding studies using high-throughput sequencing (HTS) have allowed comprehensive biodiversity assessments on whole communities encompassing thousands of species across diverse environments and taxonomic groups, including various life stages (e.g., Fonseca et al., 2010Fonseca et al., , 2014Mardis, 2008;Shendure & Ji, 2008). DNA-based methodologies offer a significantly enhanced capacity for precise taxonomic identification from environmental samples (e.g., Fonseca et al., 2017;Lallias et al., 2015;Zimmermann et al., 2014) and detect orders of magnitude more sequence information compared to traditional Sanger sequencing methods (Haas et al., 2011). ...
Article
Full-text available
All ecosystems face ecological challenges in this century. Therefore, it is becoming increasingly important to understand the ecology and degree of local adaptation of functionally important Arctic-alpine biomes by looking at the most diverse taxon of metazoans: the Arthropoda. This is the first study to utilize metabarcoding in the Alpine tundra, providing insights into the effects of micro-environmental parameters on alpha-and beta-diversity of arthropods in such unique environments. To characterize arthropod diversity, pitfall traps were set at three middle-alpine sampling sites in the Scandinavian mountain range in Norway during the snow-free season in 2015. A metabarcoding approach was then used to determine the small-scale biodiversity patterns of arthropods in the Alpine tundra. All DNA was extracted directly from the preservative EtOH from 27 pitfall traps. In order to identify the controlling environmental conditions, all sampling locations were equipped with automatic data loggers for permanent measurement of the microenvironmental conditions. The variables measured were: air temperature [°C] at 15 cm height, soil temperature [°C] at 15 cm depth, and soil moisture [vol.%] at 15 cm depth. A total of 233 Arthropoda OTUs were identified. The number of unique OTUs found per sampling location (ridge, south-facing slope, and depression) was generally higher than the OTUs shared between the sampling locations, demonstrating that niche features greatly impact arthropod community structure. Our findings emphasize the fine-scale heterogeneity of arctic-alpine ecosystems and provide evidence for trait-based and niche-driven adaptation. The spatial and temporal differences in arthropod diversity were best explained by soil moisture and soil temperature at the respective locations. Furthermore, our results show that arthropod diversity is underestimated in alpine-tundra ecosystems using classical approaches and highlight the importance of integrating long-term functional environmental data and modern taxonomic techniques into biodiversity research to expand our ecological understanding of fine-and meso-scale biogeographical patterns. K E Y W O R D S Arctic-alpine ecosystems, arthropods, biodiversity, ethanol-based DNA, microclimate, Scandinavia
... To address these challenges, target enrichment strategies have been developed, allowing researchers to focus on specific regions of interest in the genome. This approach makes deep sequencing more affordable and yields a wealth of valuable genomic data [1,2]. Although these strategies rely on hybridization techniques to capture the regions of interest, the kinetics and thermodynamics of hybridization can lead to uneven sequencing depth across locus-specific probes [3][4][5][6], potentially resulting in increased costs due to non-uniform coverage and the necessity to enhance focuses on optimizing probe selection for a specific NGS library, making its prediction results susceptible to variations resulting from different library constructions. ...
Article
Full-text available
Target enrichment sequencing techniques are gaining widespread use in the field of genomics, prized for their economic efficiency and swift processing times. However, their success depends on the performance of probes and the evenness of sequencing depth among each probe. To accurately predict probe coverage depth, a model called Deqformer is proposed in this study. Deqformer utilizes the oligonucleotides sequence of each probe, drawing inspiration from Watson–Crick base pairing and incorporating two BERT encoders to capture the underlying information from the forward and reverse probe strands, respectively. The encoded data are combined with a feed-forward network to make precise predictions of sequencing depth. The performance of Deqformer is evaluated on four different datasets: SNP panel with 38 200 probes, lncRNA panel with 2000 probes, synthetic panel with 5899 probes and HD-Marker panel for Yesso scallop with 11 000 probes. The SNP and synthetic panels achieve impressive factor 3 of accuracy (F3acc) of 96.24% and 99.66% in 5-fold cross-validation. F3acc rates of over 87.33% and 72.56% are obtained when training on the SNP panel and evaluating performance on the lncRNA and HD-Marker datasets, respectively. Our analysis reveals that Deqformer effectively captures hybridization patterns, making it robust for accurate predictions in various scenarios. Deqformer leads to a novel perspective for probe design pipeline, aiming to enhance efficiency and effectiveness in probe design tasks.
... The employment of microarray analysis is achieving momentum in understanding and countering this intricate condition. At least on a genetic or epigenetic basis, artificial inteligence enables comprehension of highthroughput DNA sequencing data (Shendure and Ji 2008). RNA sequencing (RNA-seq) has also adopted these sequencing technologies, allowing for the identification and quantification of various RNA populations, such as mRNA and total RNA, related to gene expression. ...
Chapter
Full-text available
As a result, although the number of studies in Turkey is limited in the literature review, it has been seen that protection and control methods should be increased and animal owners should be made aware of the disease, as Tritrichomonas foetus causes serious clinical findings, including abortion and pyometra in cows, and fecal clumping and anus inflammation in cats.
... In the broader landscape of genomics research, the pioneering spirit of this study aligns with the transformative potential of high-throughput sequencing methods, as highlighted by Shendure and Ji (2008 Declarations a pivotal role in the success of this project. This contribution is a signi cant segment of the rst author's PhD thesis. ...
Preprint
Full-text available
Phyllanthus is a genus of plants that are both ecologically and medicinally valuable. This diversity highlights the need for accurate identification in order to support both conservation efforts and medical research. The escalating demand for Phyllanthus-derived herbal products raises concerns regarding market adulteration and misidentification. In response, our study employs DNA barcoding, specifically targeting the internal transcribed spacer 2 (ITS2) region, to authenticate Indian Phyllanthus species. The study underscores the ITS2 region's efficacy in identifying Indian Phyllanthus species, demonstrating substantial advancements in resolving genus relationships compared to prior analyses. To check if our plant DNA matches known ones, we used two tools: NCBI BLASTn and the ITS2 database. The results showed really high similarities, ranging from 98–100%. This helps us understand how closely related our plant is to others in the Phyllanthus family. We deposited the genetic data, particularly DNA sequences, of Phyllanthus plants into the NCBI GenBank repository. The construction of a phylogenetic tree through multiple sequence alignment of the ITS2 gene confirms clustering among Phyllanthus species, illuminating genetic relationships and diversity crucial for conservation. The ribosomal nuclear ITS2 region exhibits notable differences within and between species, validated by DNA barcodes and secondary structure analyses using minimum free energy calculations. This study underscores the effectiveness of ITS2-based DNA barcoding in accurately identifying Phyllanthus species, mitigating adulteration concerns, ensuring product quality, preserving biodiversity, and promoting sustainable utilization of these invaluable plant resources.
... DNBS (DNA nanoball sequencing) enables large collection of DNA nanoballs for simultaneous sequencing. Illumina-based sequencing technology represents a "reversible terminator sequencing" method (Mardis et al., 2008: Shendure et al., 2008. High-throughput sequencing has the advantage of fast speed, low sequencing cost and high accuracy, otherwise known as next-generation sequencing (NGS). ...
Chapter
Full-text available
While world facing the problem of climate change and increasing population, the revolutionary CRISPR-Cas9 technology has emerged as a ray of hope for agriculture. Although, traditional breeding methods provided several varieties, but current pace of crop improvement is low. Abiotic stresses, such as drought, high temperatures, cold stress, and salinity severely threaten crop yields worldwide. CRISPR-based genome editing techniques enable precise and targeted modifications of plant genomes, allowing us to modify crops against these abiotic stresses. By elucidating key genetic pathways and engineering stress-responsive genes, researchers have successfully developed crops with improved tolerance to abiotic stresses, increasing agricultural productivity and reducing reliance on chemical inputs. Furthermore, this approach offers environmentally friendly and sustainable solutions to address the looming food crisis. Few challenges in front of this technology such as off target effects, cost and ethical regulations are need to be addressed in future. This chapter highlights the transformative potential of CRISPR technology in agriculture, specially related to abiotic stress and its critical role in shaping a brighter, more resilient tomorrow.
... The quality and quantity of DNA were tested using 1% agarose gel electrophoresis, and high-quality DNA was sequenced on an Illumina HiSeq2500 platform from Novogene (Beijing, China) according to the standard Illumina sequencing protocols [90]. Paired-end 150 reads were generated from libraries with an insert size of 300 bp. ...
Article
Full-text available
Background The genus Sanicula L. is a unique perennial herb that holds important medicinal values. Although the previous studies on Sanicula provided us with a good research basis, its taxonomic system and interspecific relationships have not been satisfactorily resolved, especially for those endemic to China. Moreover, the evolutionary history of this genus also remains inadequately understood. The plastid genomes possessing highly conserved structure and limited evolutionary rate have proved to be an effective tool for studying plant phylogeny and evolution. Results In the current study, we newly sequenced and assembled fifteen Sanicula complete plastomes. Combined with two previously reported plastomes, we performed comprehensively plastid phylogenomics analyses to gain novel insights into the evolutionary history of this genus. The comparative results indicated that the seventeen plastomes exhibited a high degree of conservation and similarity in terms of their structure, size, GC content, gene order, IR borders, codon bias patterns and SSRs profiles. Such as all of them displayed a typical quadripartite structure, including a large single copy region (LSC: 85,074–86,197 bp), a small single copy region (SSC: 17,047–17,132 bp) separated by a pair of inverted repeat regions (IRs: 26,176–26,334 bp). And the seventeen plastomes had similar IR boundaries and the adjacent genes were identical. The rps19 gene was located at the junction of the LSC/IRa, the IRa/SSC junction region was located between the trnN gene and ndhF gene, the ycf1 gene appeared in the SSC/IRb junction and the IRb/LSC boundary was located between rpl12 gene and trnH gene. Twelve specific mutation hotspots (atpF, cemA, accD, rpl22, rbcL, matK, ycf1, trnH-psbA, ycf4-cemA, rbcL-accD, trnE-trnT and trnG-trnR) were identified that can serve as potential DNA barcodes for species identification within the genus Sanicula. Furthermore, the plastomes data and Internal Transcribed Spacer (ITS) sequences were performed to reconstruct the phylogeny of Sanicula. Although the tree topologies of them were incongruent, both provided strong evidence supporting the monophyly of Saniculoideae and Apioideae. In addition, the sister groups between Saniculoideae and Apioideae were strongly suggested. The Sanicula species involved in this study were clustered into a clade, and the Eryngium species were also clustered together. However, it was clearly observed that the sections of Sanicula involved in the current study were not respectively recovered as monophyletic group. Molecular dating analysis explored that the origin of this genus was occurred during the late Eocene period, approximately 37.84 Ma (95% HPD: 20.33–52.21 Ma) years ago and the diversification of the genus was occurred in early Miocene 18.38 Ma (95% HPD: 10.68–25.28 Ma). Conclusion The plastome-based tree and ITS-based tree generated incongruences, which may be attributed to the event of hybridization/introgression, incomplete lineage sorting (ILS) and chloroplast capture. Our study highlighted the power of plastome data to significantly improve the phylogenetic supports and resolutions, and to efficiently explore the evolutionary history of this genus. Molecular dating analysis explored that the diversification of the genus occurred in the early Miocene, which was largely influenced by the prevalence of the East Asian monsoon and the uplift of the Hengduan Mountains (HDM). In summary, our study provides novel insights into the plastome evolution, phylogenetic relationships, taxonomic framework and evolution of genus Sanicula.
... Üçüncü zorluk, dizilemeden önceki amplifikasyon adımı olmuştur. Bu son zorluk, farklı PCR sapması kaynaklarını, kimerik dizilerin oluşumunu ve yapıyla ilgili ikincil sorunları içermektedir [7][8]. YND teknolojileri, özellikle geniş dizileme çıktılarını analiz etmek için gerekli hesaplama algoritmalarıyla birleştiğinde, genomik tabanlı çalışmaların doğasını kökten değiştirmeyi vaat etmektedir [9]. ...
Conference Paper
Full-text available
Bioinformatics; It is a branch of science, which is the synthesis of mathematics, statistics, computer science, molecular biology and genetics in order to make sense of biological data, store it, visualize it and make maximum use of this huge knowledge. Bioinformatics studies; genetic disease research, disease detection and DNA sequence methods in order to produce solutions to detect diseases are focused on. Accordingly, the main purpose of bioinformatics is to try to understand the mechanism that causes diseases starting from the nucleotide sequence in the DNA where our genetic code is written and contribute to the development of treatment methods accordingly. Today, especially with the “Human Genome Project”, it has become a vital issue to analyze genetic data with faster and more reliable methods. The analysis of genetic data includes sub-studies such as reducing their size, choosing a subset from their properties and classifying the data, clustering, and estimating their new status. One of the purposes of analyzing biological data by computer is to make a preliminary analysis with the computer and analyze the predicted variables (bio-pointers) in the laboratory environment before analyzing these very high-dimensional / variable data with classical laboratory research. The main problems in the analysis of genetic data are the complexities of data sequences based on their size. This complexity brings about the error of reading data encountered during the processing of the data. Since large data sequences cannot be read at once, it is necessary to process the data in pieces (Sequence). With the help of Next Generation Sequencing (YND) devices, it is possible to read large genetic data, but the cost of these operations is high. In addition, YND devices perform erroneous readings between 1% and 3% during reading of genetic data sequences. In this study, a method is proposed to detect and correct common data reading errors for the detection of one of the most common cancer diseases, V-raf Murine Sarcoma Viral Oncogen Homologous B1 (BRAF) gene mutation. In this method, the healthy BRAF gene shared over the National Center for Biotechnology Information (NCBI) was used. Reading errors in proportion to the error ratios that occur by simulated YND devices have been added to this gene. The faulty gene was read at the specified depth size and recorded in a filter environment. Healthy and faulty genes were compared as a result of piecewise reading and correction procedures were applied with the help of the algorithm developed on detected errors. The proposed data reading and correction method is aimed to assist in the detection and resolution of DNA mutations that cause genetic disorders to contribute to the studies in the field of bioinformatics.
... These combined approaches empower researchers to craft custom biological components, paving the way for innovative applications in synthetic biology, biocomputing, and beyond. [37][38][39] Gene synthesis techniques provide a cost-effective and rapid means to create novel genetic constructs for implementing biocomputational designs. Researchers can easily order synthetic genes from commercial vendors, receiving DNA fragments tailored to their specifications. ...
Article
Full-text available
Bio computing is an emerging interdisciplinary field that harnesses the information processing capabilities of biological substrates like DNA, proteins and cells to perform computational tasks. Rather than relying solely on conventional silicon-based computers, bio computing leverages the innate computational properties of biomolecules to encode, store, process and transmit information in unconventional ways. Core approaches include DNA computing, which uses DNA biochemistry to solve problems in a massively parallel fashion. Protein computing utilizes protein conformational dynamics to implement logic gates and communication modules for molecular information processing. Cellular computing focuses on engineering gene circuits and synthetic biology tools to program computational behaviours in living cells. Neural computing builds artificial neural networks inspired by biological brains. Key application areas include biomedicine, smart drug delivery systems, biosensing, hybrid organic-inorganic electronics, and biomolecular manufacturing. While still facing challenges around biocompatibility, programming complexity and ethical concerns, bio computing has achieved major technical milestones demonstrating its promise. Continued progress at the interface of biology and computing could enable future technologies like bio processors, in-vivo biocomputers, living materials and bio-intelligent systems. With responsible development, bio-inspired computation may catalyse the next revolution in human technological capabilities. This emerging field thus warrants enthusiastic attention as computation further converges with the living world.
... genetic research because of their greater genetic stability and automatic detection (Shendure et al. 2008). In our study, SNP markers of E. japonicus were developed and validated by double digest restriction-site associated DNA sequencing (ddRAD-Seq). ...
Article
Full-text available
The Japanese anchovy, Engraulis japonicus, is an important economic fish that is distributed in the northwest Pacific Ocean. The effective assessment and management of the Engraulis japonicus fishery requires reliable information regarding its population’s genetic structure. The recent development of double digest restriction-site associated DNA sequencing (ddRAD-Seq) methods may contribute to the discovery of SNPs and the assessment of genetic structure. In our study, 98 single nucleotide polymorphism (SNP) markers were developed using ddRAD-Seq. The observed heterozygosity (Ho) and expected heterozygosity (He) ranged from 0.3333 to 0.8000 and 0.2778 to 0.5000, respectively. The polymorphism information content (PIC) ranged from 0.239198 to 0.375. All loci have been substantiated to follow the Hardy–Weinberg equilibrium. These novel polymorphic SNP markers will play an important role in the genetic research of E.japonicus, which will be beneficial to the development and utilization of E.japonicus resources. This study aims to provide technical support for the research of genetic diversity of E. japonicus populations through the development of SNP markers.
... Molecular genetic techniques, such as polymerase chain reaction (PCR), DNA barcoding, and genomic sequencing, have significantly impacted the processes of species identification and classification. These techniques allow researchers to directly examine the genetic material of organisms, thereby providing a precise method to differentiate between closely related species, understand their evolutionary relationships, and perform accurate classifications [1][2][3][4][5]. Furthermore, they have transformed our understanding of mitochondrial genomes, allowing us to investigate their intricate details with unparalleled precision [6][7][8][9][10]. ...
Article
Full-text available
Background: The mitochondrial genome is a powerful tool for exploring and confirming species identity and understanding evolutionary trajectories. The genus Cambaroides, which consists of freshwater crayfish, is recognized for its evolutionary and morphological complexities. However, comprehensive genetic and mitogenomic data on species within this genus, such as C. wladiwostokiensis, remain scarce, thereby necessitating an in-depth mitogenomic exploration to decipher its evolutionary position and validate its species identity. Methods: The mitochondrial genome of C. wladiwostokiensis was obtained through shallow Illumina paired-end sequencing of total DNA, followed by hybrid assembly using both de novo and reference-based techniques. Comparative analysis was performed using available Cambaroides mitochondrial genomes obtained from National Center for Biotechnology Information (NCBI). Additionally, phylogenetic analyses of 23 representatives from three families within the Astacidea infraorder were employed using the PhyloSuite platform for sequence management and phylogenetic preparation, to elucidate phylogenetic relationships via Bayesian Inference (BI), based on concatenated mitochondrial fragments. Results: The resulting genome, which spans 16,391 base pairs was investigated, revealing 13 protein-coding genes, rRNAs (12S and 16S), 19 tRNAs, and a putative control region. Comparative analysis together with five other Cambaroides mitogenomes retrieved from GenBank unveiled regions that remained unread due to challenges associated with the genome skimming technique. Protein-coding genes varied in size and typically exhibited common start (ATG) and stop (TAA) codons. However, exceptions were noted in ND5 (start codon: GTG) and ND1 (stop codon: TAG). Landscape analysis was used to explore sequence variation across the five available mitochondrial genomes of Cambaroides. Conclusions: Collectively, these findings reveal variable sites and contribute to a deeper understanding of the genetic diversity in this genus alongside the further development of species-specific primers for noninvasive monitoring techniques. The partitioned phylogenetic analysis of Astacidea revealed a paraphyletic origin of Asian cambarids, which confirms the data in recent studies based on both multilocus analyses and integrative approaches.
... 11 Several NGS technologies are currently available, mainly the PCR capture-based sequencing of predefined areas in oncogenes where actionable alterations are usually found ('hotspots') or hybrid capture-based NGS assay which analyses the entire coding sequence of oncogenes and tumour suppressor genes and achieves higher sensitivity to detect small insertions and deletions (indels), but also gene fusions and copy number alterations (CNAs) in formalin-fixed paraffin-embedded (FFPE) specimens. 12,13 TruSightÔ Oncology 500 (TSO500) is a hybrid capturebased NGS assay that covers the full coding DNA regions of 523 genes and the RNA transcripts of 55 genes and can detect base substitutions and small indels, CNAs, splice variants and gene fusions. In addition, it can accurately measure microsatellite instability (MSI) and tumour mutational burden (TMB). ...
Article
Full-text available
Background Targeted next-generation sequencing (NGS) is recommended to screen actionable genomic alterations (GAs) in patients with non-small-cell lung cancer (NSCLC). We determined the feasibility to detect actionable GAs using TruSight™ Oncology 500 (TSO500) in 200 consecutive patients with NSCLC. Materials and methods DNA and RNA were sequenced on an Illumina® NextSeq 550 instrument and processed using the TSO500 Docker pipeline. Clinical actionability was defined within the molecular tumour board following European Society for Medical Oncology (ESMO) guidelines for oncogene-addicted NSCLC. Overall survival (OS) was estimated as per the presence of druggable GAs and treatment with targeted therapy. Results Most patients were males (69.5%) and former or current smokers (86.5%). Median age was 64 years. The most common histological type and tumour stage were lung adenocarcinoma (81%) and stage IV (64%), respectively. Sequencing was feasible in most patients (93.5%) and actionable GAs were found in 26.5% of patients. A high concordance was observed between single-gene testing and TSO500 NGS panel. Patients harbouring druggable GAs and receiving targeted therapy achieved longer OS compared to patients without druggable GAs. Conversely, patients with druggable GAs not receiving targeted therapy had a trend toward shorter OS compared with driver-negative patients. Conclusions Hybrid capture sequencing using TSO500 panel is feasible to analyse clinical samples from patients with NSCLC and is an efficient tool for screening actionable GAs.
... The problem of sequencing or genotyping errors may be reduced because PCA aggregates the k-mer frequency information; therefore extra counts of frequencies that could potentially accrue due to sequencing errors should not substantially in uence the PCA projection. Additionally, sequencer errors can be identi ed and removed by ltering out k-mers of frequency one (singletons), which are generally considered a result of sequencer errors 45,46 , and not to be included in the nal PCA computation. ...
Preprint
Full-text available
Background: Understanding population structure within species provides information on connections among different populations and how they evolve over time. This knowledge is important for studies ranging from evolutionary biology to large-scale variant-trait association studies. Current approaches to determining population structure include model-based approaches, statistical approaches, and distance-based ancestry inference approaches. In this work, we identify population structure from DNA sequence data using an alignment-free approach. We use the frequencies of short DNA substrings from across the genome (k-mers) with principal component analysis (PCA). K-mer frequencies can be viewed as a summary statistic of a genome and have the advantage of being easily derived from a genome by counting the number of times a k-mer occurred in a sequence. In contrast, most population structure work employing PCA uses multilocus genotype data (SNPs, microsatellites, or haplotypes). No genetic assumptions must be met to generate k-mers, whereas current population structure approaches often depend on several genetic assumptions and can require careful selection of ancestry informative markers to identify populations. Results: In this work, we show that PCA is able to determine population structure just from the frequency of k-mers found in the genome. The application of PCA and a clustering algorithm to k-mer profiles of genomes provides an easy approach to detecting the number and composition of populations (clusters) present in the dataset. The results are comparable to those found by a model-based approach using genetic markers. We validate our method using 48 human genomes from populations identified by the 1000 Human Genomes Project, as well as simulations. Conclusions: This study shows that PCA, together with the clustering algorithm, is able to detect population structure from k-mer frequencies and can separate samples of admixed and non-admixed origin. Using k-mer frequencies to determine population structure has the potential to avoid some challenges of existing methods.
... Despite having developed multiple approaches to next-generation sequencing, Sanger biochemistry continues to serve as the foundation for sequencing production in numerous post-PCR genotyping protocols [19]. Sanger conventional sequencing has been fine-tuned to perform read lengths with high base accuracies as high as 99.999% [20]. Given its pivotal role in the majority of downstream molecular biology applications, any technical lapse in chromatogram interpretation can potentially undermine its validity [21]. ...
Article
Full-text available
Background Sanger dideoxy sequencing is vital in clinical analysis due to its accuracy, ability to analyze genetic markers like SNPs and STRs, capability to generate reliable DNA profiles, and its role in resolving complex clinical cases. The precision and robustness of Sanger sequencing contribute significantly to the scientific basis of clinical investigations. Main body of the abstract Though the reading of chromatograms seems to be a routine step, many errors conducted in PCR may lead to consequent limitations in the readings of AGCT peaks. These errors are possibly associated with improper DNA amplification and its subsequent interpretation of DNA sequencing files, such as noisy peaks, artifacts, and confusion between double-peak technical errors, heterozygosity, and double infection potentials. Thus, it is not feasible to read nucleic acid sequences without giving serious attention to these technical problems. To ensure the accuracy of DNA sequencing outcomes, it is also imperative to detect and rectify technical challenges that may lead to misinterpretation of the DNA sequence, resulting in errors and incongruities in subsequent analyses. Short conclusion This overview sheds light on prominent technical concerns that can emerge prior to and during the interpretation of DNA chromatograms in Sanger sequencing, along with offering strategies to address them effectively. The significance of identifying and tackling these technical limitations during the chromatogram analysis is underscored in this review. Recognizing these concerns can aid in enhancing the quality of downstream analyses for Sanger sequencing results, which holds notable improvement in accuracy, reliability, and ability to provide crucial genetic information in clinical analysis.
... 9 It has been difficult to detect unknown disease-causing genotypes using traditional targeted genetic testing such as Sanger sequencing, owing to time and expense constraints. Next-generation sequencing (NGS) uses cyclic sequencing of massively parallel-aligned DNA fragments via clonal amplification, 10 enabling it the screening of genetic variants that might be diseasecausing and making it suitable for the genetic evaluation of patients with IVF. Hence, we comprehensively evaluated IVF probands using NGS to uncover concealed diseases and to further identify the clinical yield and implications of genetic testing in IVF. ...
Article
Full-text available
Aims Idiopathic ventricular fibrillation (IVF) is a disease in which the cause of ventricular fibrillation cannot be identified despite comprehensive clinical evaluation. This study aimed to investigate the clinical yield and implications of genetic testing for IVF. Methods and results This study was based on the multi-centre inherited arrhythmia syndrome registry in South Korea from 2014 to 2017. Next-generation sequencing–based genetic testing was performed that included 174 genes previously linked to cardiovascular disease. A total of 96 patients were clinically diagnosed with IVF. The mean age of the onset was 41.2 ± 12.7 years, and 79 patients were males (82.3%). Of these, 74 underwent genetic testing and four (5.4%) of the IVF probands had pathogenic or likely pathogenic variants (each having one of MYBPC3, MYH7, DSP, and TNNI3). All pathogenic or likely pathogenic variants were located in genes with definite evidence of a cardiomyopathy phenotype, either hypertrophic cardiomyopathy or arrhythmogenic right ventricular cardiomyopathy. Conclusion Next-generation sequencing–based genetic testing identified pathogenic or likely pathogenic variants in 5.4% of patients initially diagnosed with IVF, suggesting that genetic testing with definite evidence genes of cardiomyopathy may enable molecular diagnosis in a minority of patients with IVF. Further clinical evaluation and follow-up of patients with IVF with positive genotypes are needed to unveil concealed phenotypes, such as the pre-clinical phase of cardiomyopathy.
... With the discovery of Single Nucleotide Polymorphism (SNPs) markers 26 and advances in DNA sequencing technologies in the last two decades 27,28 , genome-wide association studies (GWAS) have emerged as a powerful tool for detecting associations between markers and traits, and for identifying genes involved in the control of these traits 29 . Several studies with GWAS have identified molecular markers associated with resistance to phytonematodes 30,31,32 , including species of the genus Meloidogyne 33,34,35 . ...
Preprint
Full-text available
The phytonematode Meloidogyne paranaensis is one of the main threats to coffee production. The development of Coffea arabica cultivars resistant to this pathogen is an urgent demand for coffee growers. Progenies derived from the wild germplasm Amphillo are considered potential sources of resistance to M. paranaensis, however, the mechanisms involved in this resistance have not yet been elucidated. In the present work, the resistance of different progenies derived from Amphillo was studied and molecular markers associated with resistance were identified. Through the Genomic-Wide Association, SNP markers associated with genes potentially involved in resistance control were identified. A total of 158 genotypes belonging to four progenies derived from crosses between Amphillo and Catuaí Vermelho were analyzed. These coffee plants were phenotyped for five traits related to resistance. A total of 7116 SNP markers were genotyped and, after quality filtering, 931 SNPs were selected to conduct the genome-wide association study. The mixed linear model identified 12 SNPs with significant associations with at least one of the evaluated variables and eighteen genes were mapped. The results obtained support the development of markers for assisted selection, studies on genetic inheritance, and elucidating molecular mechanisms involved in the resistance of C. arabica to M. paranaensis.
... Notably, methods for routine sequencing of xenonucleic acids (XNAs) are decades behind that of DNA and RNA and rely on low-throughput, non-multiplexed measurements, such as gel-shift assays 19,20 , mass spectrometry 21 , and selective conversion of XNAs to standard bases followed by Sanger sequencing 22 . This stands in stark contrast to the state of sequencing for the standard nucleobases (A, T, G, C), which has a multitude of high throughput, multiplexable, and low-cost options 23,24 . To put the disparity of sequencing technology in perspective, XNA sequencing technology is lower throughput, less sensitive, and less generalizable than the methods Sanger and Coulson developed in the 1970s and has no service-oriented solution. ...
Article
Full-text available
The 4-letter DNA alphabet (A, T, G, C) as found in Nature is an elegant, yet non-exhaustive solution to the problem of storage, transfer, and evolution of biological information. Here, we report on strategies for both writing and reading DNA with expanded alphabets composed of up to 12 letters (A, T, G, C, B, S, P, Z, X, K, J, V). For writing, we devise an enzymatic strategy for inserting a singular, orthogonal xenonucleic acid (XNA) base pair into standard DNA sequences using 2′-deoxy-xenonucleoside triphosphates as substrates. Integrating this strategy with combinatorial oligos generated on a chip, we construct libraries containing single XNA bases for parameterizing kmer basecalling models for commercially available nanopore sequencing. These elementary steps are combined to synthesize and sequence DNA containing 12 letters – the upper limit of what is accessible within the electroneutral, canonical base pairing framework. By introducing low-barrier synthesis and sequencing strategies, this work overcomes previous obstacles paving the way for making expanded alphabets widely accessible.
... Advancement in next generation sequencing technology has cut down costs of sequencing DNA to a point so that NGS sequencing is now affordable and the researchers are now switching to functional genomics, including transcriptomic and proteomic approaches to uncover the molecular mechanisms of YMD stress tolerance. Transcriptomics is the quantitative study of all the sets of transcripts inside a cell, and RNA-seq is transcriptome analysis by next generation sequencing (NGS) (Shendure and Ji 2008). The biotic stresses ignite a plethora of genes in the genome of the host which start a cascade of reactions to fight the infection ( Fig. 7.1), and a better understanding of the same has been provided by RNA-seq technology. ...
Chapter
Full-text available
Mungbean and blackgram are the two important legumes widely cultivated in the Asian continent for grain, vegetable, green manure, and fodder. Both legumes have a high nutritional value, having essential amino acids, minerals, and vitamins. In spite of good nutritive value, the crops are being neglected and face a major challenge of low productivity due to multiple factors including biotic and abiotic stresses, lack of interest by farmers and consumers, and unfavorable price policies governing the crop. The biotic stresses caused by various organisms including viruses, fungi, bacteria, and insect pests hinder crop productivity potential. An understanding of complex pathogen-host interactions leading to compatible or incompatible response and resistance mechanism is the key to combating these stresses. Identification of novel and diverse sources of resistance is always a preset for improving the durability against diseases and a lot of efforts have been done in these legumes also. But the work pertaining to molecular breeding for biotic stress resistance utilizing advanced omics tools is not at par with mainstream crops. The traditional molecular markers were utilized to understand resistance, but more efforts with cutting-edge technologies are required to accelerate legume breeding so that the appropriate importance of these nutritious legumes can be achieved among major food crops. The chapter would highlight the major biotic concern faced by mungbean and blackgram and efforts made towards incorporating biotic resistance.
Chapter
Biotechnology is one of the emerging fields that can add new and better application in a wide range of sectors like health care, service sector, agriculture, and processing industry to name some. This book will provide an excellent opportunity to focus on recent developments in the frontier areas of Biotechnology and establish new collaborations in these areas. The book will highlight multidisciplinary perspectives to interested biotechnologists, microbiologists, pharmaceutical experts, bioprocess engineers, agronomists, medical professionals, sustainability researchers and academicians. This technical publication will provide a platform for potential knowledge exhibition on recent trends, theories and practices in the field of Biotechnology
Chapter
Full-text available
Next-generation sequencing (NGS) technologies have revolutionized the field of genomics and have significantly impacted various disciplines, including fisheries and aquaculture. The technology is used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA. NGS platforms provide researchers with unprecedented capabilities to unravel the genetic makeup of species, investigate genetic diversity, identify molecular markers, and understand the genetic basis of important traits. In this paper, we explore the diverse applications of NGS in fisheries and aquaculture, including genome sequencing, transcriptomics, metagenomics, and population genetics. We also discuss the potential of NGS in addressing challenges faced by the industry, such as disease management, selective breeding, and conservation efforts. Finally, we provide a comprehensive conclusion summarizing the key findings and future prospects of NGS in fisheries and aquaculture.
Preprint
Full-text available
The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatic workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four widely adopted bioinformatic pipelines (shiver - for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign, viral-ngs, and V-pipe) using both simulated datasets and real-world HIV-1 paired-end short-read sequences and default settings. All four pipelines produced high-quality consensus genome assemblies and minority variant calls when the reference sequence used for assembly had high similarity to the analyzed sample. However, while shiver and SmaltAlign showed robust performance also with more divergent samples (non-matching subtypes), viral-ngs and V-Pipe proved to be sensitive to genetic distance from the reference sequence. With empirical datasets, SmaltAlign and viral-ngs exhibited substantially shorter runtime compared to V-Pipe and shiver. In terms of applicability, V-Pipe provides the broadest functionalities; SmaltAlign and dshiver combine user-friendliness with robustness; while the use of viral-ngs requires a less computational resources compared to other tools. To conclude, all four pipelines can perform well in terms of quality metrics; however, the reference sequence needs to be adjusted to closely match the sample data for viral-ngs and V-Pipe. Differences in user-friendliness and runtime may guide the choice of the pipeline in a particular setting. The new Dockerized version of shiver offers ease of use in addition to the accuracy and robustness of the original pipeline.
Article
Full-text available
Next-generation sequencing (NGS), represented by Illumina platforms, has been an essential cornerstone of basic and applied research. However, the sequencing error rate of 1 per 1000 base pairs (10−3) represents a serious hurdle for research areas focusing on rare mutations, such as somatic mosaicism or microbe heterogeneity. By examining the high-fidelity sequencing methods developed in the past decade, we summarized three major factors underlying errors and the corresponding 12 strategies mitigating these errors. We then proposed a novel framework to classify 11 preexisting representative methods according to the corresponding combinatory strategies and identified 3 trends that emerged during methodological developments. We further extended this analysis to 8 long-read sequencing methods, emphasizing error reduction strategies. Finally, we suggest 2 promising future directions that could achieve comparable or even higher accuracy with lower costs in both NGS and long-read sequencing.
Article
Full-text available
A BSTRACT The demand for accurate, faster, and inexpensive sequencing of deoxyribonucleic acid (DNA) is increasing and is driving the emergence of next-generation sequencing (NGS) technologies. NGS can provide useful insights to help researchers and clinicians to develop the right treatment options. NGS has wide applications in novel fields in biology and medicine. These technologies are of great aid to decode mysteries of life, to improve the quality of crops to detect the pathogens, and also useful in improving life qualities. Thousands to millions of molecules can be sequenced simultaneously in parallel using various NGS methods. NGS can identify and characterize the microbial species more comprehensively than culture-based methods. Recently, the NGS approach has been used for oral microbial analysis.
Article
Full-text available
The study of the whole of the genetic material contained within the microbial populations found in a certain environment is made possible by metagenomics. This technique enables a thorough knowledge of the variety, function, and interactions of microbial communities that are notoriously difficult to research. Due to the limitations of conventional techniques such as culturing and PCR-based methodologies, soil microbiology is a particularly challenging field. Metagenomics has emerged as an effective technique for overcoming these obstacles and shedding light on the dynamic nature of the microbial communities in soil. This review focuses on the principle of metagenomics techniques, their potential applications and limitations in soil microbial diversity analysis. The effectiveness of target-based metagenomics in determining the function of individual genes and microorganisms in soil ecosystems is also highlighted. Targeted metagenomics, including high-throughput sequencing and stable-isotope probing, is essential for studying microbial taxa and genes in complex ecosystems. Shotgun metagenomics may reveal the diversity of soil bacteria, composition, and function impacted by land use and soil management. Sanger, Next Generation Sequencing, Illumina, and Ion Torrent sequencing revolutionise soil microbiome research. Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)'s third and fourth generation sequencing systems revolutionise long-read technology. GeoChip, clone libraries, metagenomics, and metabarcoding help comprehend soil microbial communities. The article indicates that metagenomics may improve environmental management and agriculture despite existing limitations.Metagenomics has revolutionised soil microbiology research by revealing the complete diversity, function, and interactions of microorganisms in soil. Metagenomics is anticipated to continue defining the future of soil microbiology research despite some limitations, such as the difficulty of locating the appropriate sequencing method for specific genes.
Article
Full-text available
Breast cancer is a prevalent form of cancer worldwide, and the current standard screening method, mammography, often requires invasive biopsy procedures for further assessment. Recent research has explored microRNAs (miRNAs) in circulating blood as potential biomarkers for early breast cancer diagnosis. In this study, we employed a multi-modal spectroscopy approach, combining attenuated total reflection Fourier transform infrared (ATR-FTIR) and surface-enhanced Raman scattering (SERS) to comprehensively characterize the full-spectrum fingerprints of RNA biomarkers in the blood serum of breast cancer patients. The sensitivity of conventional FTIR and Raman spectroscopy was enhanced by ATR-FTIR and SERS through the utilization of a diamond ATR crystal and silver-coated silicon nanopillars, respectively. Moreover, a wider measurement wavelength range was achieved with the multi-modal approach than with a single spectroscopic method alone. We have shown the results on 91 clinical samples, which comprised 44 malignant and 47 benign cases. Principal component analysis (PCA) was performed on the ATR-FTIR, SERS, and multi-modal data. From the peak analysis, we gained insights into biomolecular absorption and scattering-related features, which aid in the differentiation of malignant and benign samples. Applying 32 machine learning algorithms to the PCA results, we identified key molecular fingerprints and demonstrated that the multi-modal approach outperforms individual techniques, achieving higher average validation accuracy (95.1%), blind test accuracy (91.6%), specificity (94.7%), sensitivity (95.5%), and F-score (94.8%). The support vector machine (SVM) model showed the best area under the curve (AUC) characterization value of 0.9979, indicating excellent performance. These findings highlight the potential of the multi-modal spectroscopy approach as an accurate, reliable, and rapid method for distinguishing between malignant and benign breast tumors in women. Such a label-free approach holds promise for improving early breast cancer diagnosis and patient outcomes.
Article
In the last 5 decades, paleontological research has exploded where fossils have enabled robust dating of rocks, improved understanding of origination/extinction rates or mass extinction events, biogeography, adaptive strategies, and many more. New molecular technologies have enabled intensive analyses of vertebrates and invertebrates, plant fossils, fossilized microbes, trace fossils, and fossil molecules, alike. Paleontological research has become interdisciplinary with inputs from geology, chemistry, biology, astronomy, and archaeology. Herein, we review the principles of promising molecular technologies and explore their applications and limitations vis-à-vis paleontological research. This review will attempt to provide a roadmap that can be used for future research directions. Advanced chemical imaging provides the ability to identify and quantify chemical characteristics to evaluate taphonomic damage, original biological structures, or fossils microbes. Molecular methods (e.g., molecular clock, DNA barcode, racemization dating, and biomarkers) offer a unique source of information and provide robust clues into the co-evolution of life in modern and past environments. Two main limitations are noted and include an exceptional preservation of the organic material, which is not always the case, and the complexity and cost of the instruments involved in the analyses. These difficulties are limiting the factual applications in paleontological analysis. Although very little research has been carried out on the aforementioned methods, they however, provide improved answers to highly debated and unsolved biological and climatic issues and a window to better understanding the origin of life. Biomarker proxies will be further developed and refined to answer emerging questions in the Quaternary Period.
Chapter
Full-text available
The morphological, biochemical, cytological and DNA based markers are the types of markers. Each type of marker has their own advantages and disadvantages as described in the text. The DNA based markers are more reliable and been widely used. The DNA based markers are been further classified as the first, second and third generation-based markers. Nowadays, the SSR, ISSR markers are largely been used for the diversity studies. In the recent era the third generation or the sequence-based markers are highly been used for the molecular studies, but their use is highly specific and need automation. That’s why the use of this kind of markers is limited. The given text describes in brief about the types of markers their advantages, disadvantages and their applications.
Article
Full-text available
Measurement of gene expression in the brain requires invasive analysis of brain tissue or non-invasive methods that are limited by low sensitivity. Here we introduce a method for non-invasive, multiplexed, site-specific monitoring of endogenous gene or transgene expression in the brain through engineered reporters called released markers of activity (RMAs). RMAs consist of an easily detectable reporter and a receptor-binding domain that enables transcytosis across the brain endothelium. RMAs are expressed in the brain but exit into the blood, where they can be easily measured. We show that expressing RMAs at a single mouse brain site representing approximately 1% of the brain volume provides up to a 100,000-fold signal increase over the baseline. Expression of RMAs in tens to hundreds of neurons is sufficient for their reliable detection. We demonstrate that chemogenetic activation of cells expressing Fos-responsive RMA increases serum RMA levels >6-fold compared to non-activated controls. RMAs provide a non-invasive method for repeatable, multiplexed monitoring of gene expression in the intact animal brain.
Article
Full-text available
Genome-wide association studies (GWAS) have successfully revealed many disease-associated genetic variants. For a case-control study, the adequate power of an association test can be achieved with a large sample size, although genotyping large samples is expensive. A cost-effective strategy to boost power is to integrate external control samples with publicly available genotyped data. However, the naive integration of external controls may inflate the type I error rates if ignoring the systematic differences (batch effect) between studies, such as the differences in sequencing platforms, genotype-calling procedures, population stratification, and so forth. To account for the batch effect, we propose an approach by integrating External Controls into the Association Test by Regression Calibration (iECAT-RC) in case-control association studies. Extensive simulation studies show that iECAT-RC not only can control type I error rates but also can boost statistical power in all models. We also apply iECAT-RC to the UK Biobank data for M72 Fibroblastic disorders by considering genotype calling as the batch effect. Four SNPs associated with fibroblastic disorders have been detected by iECAT-RC and the other two comparison methods, iECAT-Score and Internal. However, our method has a higher probability of identifying these significant SNPs in the scenario of an unbalanced case-control association study.
Article
Full-text available
Many factors, such as the resistance to pesticides and a lack of knowledge of the morphology and molecular structure of malaria vectors, have made it more challenging to eradicate malaria in numerous malaria-endemic areas of the globe. The primary goal of this review is to discuss malaria vector control methods and the significance of identifying species in vector control initiatives. This was accomplished by reviewing methods of molecular identification of malaria vectors and genetic marker classification in relation to their use for species identification. Due to its specificity and consistency, molecular identification is preferred over morphological identification of malaria vectors. Enhanced molecular capacity for species identification will improve mosquito characterization, leading to accurate control strategies/treatment targeting specific mosquito species, and thus will contribute to malaria eradication. It is crucial for disease epidemiology and surveillance to accurately identify the Plasmodium species that are causing malaria in patients. The capacity for disease surveillance will be significantly increased by the development of more accurate, precise, automated, and high-throughput diagnostic techniques. In conclusion, although morphological identification is quick and achievable at a reduced cost, molecular identification is preferred for specificity and sensitivity. To achieve the targeted malaria elimination goal, proper identification of vectors using accurate techniques for effective control measures should be prioritized.
Article
Phylogenetic inference has become a standard technique in integrative taxonomy and systematics, as well as in biogeography and ecology. DNA barcodes are often used for phylogenetic inference, despite being strongly limited due to their low number of informative sites. Also, because current DNA barcodes are based on a fraction of a single, fast-evolving gene, they are highly unsuitable for resolving deeper phylogenetic relationships due to saturation. In recent years, methods that analyse hundreds and thousands of loci at once have improved the resolution of the Tree of Life, but these methods require resources, experience and molecular laboratories that most taxonomists do not have. This paper introduces a PCR-based protocol that produces long amplicons of both slow- and fast-evolving unlinked mitochondrial and nuclear gene regions, which can be sequenced by the affordable and portable ONT MinION platform with low infrastructure or funding requirements. As a proof of concept, we inferred a phylogeny of a sample of 63 spider species from 20 families using our proposed protocol. The results were overall consistent with the results from approaches based on hundreds and thousands of loci, while requiring just a fraction of the cost and labour of such approaches, making our protocol accessible to taxonomists worldwide.
Article
Full-text available
Bioinformatics is a field that combines computational methods with biology, explicitly focusing on macromolecules. It uses informatics techniques to analyze and structure vast information about these molecules. The data above have been generated by extensive molecular biology initiatives, including but not limited to large-scale projects involving genome sequencing, gene expression analysis, and investigations into genomics, proteomics, and protein-protein interactions. Precision healthcare programs are being widely implemented globally to enhance individualized patient care. The detection and management of genetic illnesses have seen notable advancements, primarily due to the growing availability of affordable and accurate sequencing data. The data above are derived from extensive molecular biology initiatives, including but not limited to large-scale programs focused on genome sequencing, gene expression analysis, genomics analysis, proteomics analysis, and protein-protein interaction study. In recent years, the field of bioinformatics has witnessed notable progressions. The progress in sequencing technology has resulted in a substantial augmentation in the quantity of genetic data produced. As a result, the establishment of approaches that can efficiently and promptly evaluate this data poses a substantial obstacle in the realm of bioinformatics research. This study aims to present a succinct summary of the most recent advancements in bioinformatics approaches that can be used in the domain of personalized medicine. To effectively leverage the potential of personalized medicine, it is imperative to embrace four innovative techniques. This paper will explore the obstacles and potential solutions related to the identification of predictive genetic biomarkers and genetic variations in patients. Moreover, it will tackle the anticipated future bioinformatics issues within this domain.
Article
Full-text available
The identification in Quercus L. species was considered to be difficult all the time. The fundamental phylogenies of Quercus have already been discussed by morphological and molecular means. However, the morphological characteristics of some Quercus groups may not be consistent with the molecular results (such as the group Helferiana), which may lead to blurring of species relationships and prevent further evolutionary researches. To understand the interspecific relationships and phylogenetic positions, we sequenced and assembled the CPGs (160,715 bp-160842 bp) of four Quercus section Cyclobalanopsis species by Illumina pair-end sequencing. The genomic structure, GC content, and IR/SC boundaries exhibited significant conservatism. Six highly variable hotspots were detected in comparison analysis, among which rpoC1, clpP and ycf1 could be used as molecular markers. Besides, two genes (petA, ycf2) were detected to be under positive selection pressure. The phylogenetic analysis showed: Trigonobalanus genus and Fagus genus located at the base of the phylogeny tree; The Quercus genus species were distincted to two clades, including five sections. All Compound Trichome Base species clustered into a single branch, which was in accordance with the results of the morphological studies. But neither of group Gilva nor group Helferiana had formed a monophyly. Six Compound Trichome Base species gathered together in pairs to form three branch respectively (Quercus kerrii and Quercus chungii; Quercus austrocochinchinensis with Quercus gilva; Quercus helferiana and Quercus rex). Due to a low support rate (0.338) in the phylogeny tree, the interspecies relationship between the two branches differentiated by this node remained unclear. We believe that Q. helferiana and Q. kerrii can exist as independent species due to their distance in the phylogeny tree. Our study provided genetic information in Quercus genus, which could be applied to further studies in taxonomy and phylogenetics.
Article
Full-text available
The completion of the human genome draft has taken several years and is only the beginning of a period in which large amounts of DNA and RNA sequence information will be required from many individuals and species. Conventional sequencing technology has limitations in cost, speed, and sensitivity, with the result that the demand for sequence information far outstrips current capacity. There have been several proposals to address these issues by developing the ability to sequence single DNA molecules, but none have been experimentally demonstrated. Here we report the use of DNA polymerase to obtain sequence information from single DNA molecules by using fluorescence microscopy. We monitored repeated incorporation of fluorescently labeled nucleotides into individual DNA strands with single base resolution, allowing the determination of sequence fingerprints up to 5 bp in length. These experiments show that one can study the activity of DNA polymerase at the single molecule level with single base resolution and a high degree of parallelization, thus providing the foundation for a practical single molecule sequencing technology.
Article
Full-text available
Searching for genetic variants and mutations that underlie human diseases, both simple and complex, presents particular challenges. In the case of complex diseases, these searches generally result in a single nucleotide polymorphism (SNP), or set of SNPs, associated with disease risk. Frequently, these SNPs lie outside the gene coding regionst1,2. One is thus left in a quandary: do the detected SNPs represent the only genetic variation in the region or are there additional variants that might show even higher associations with disease risk? in the case of cancer, identification of mutations in tumor suppressor genes has also proved to be an arduous and frequently fruitless task3. The problem also arises in mouse genetics where mutational screens - for example, using ethylnitrosourea (ENU) - frequently require resequencing of large genomic regions to find a single base change4. The problem devolves to one of resequencing a large region of genomic DNA, usually of >100 kilobases (kb), from affected individuals or tissue samples to identify all sequence variants. Here, we describe modifications to direct setection5,6 that allow for the rapid and efficient discovery of new polymorphisms and mutations in large genomic regions. Biotinylated bacterial artificial chromosome (BAC) DNAs are used in two rounds of hybridization selection with a target of total genomic DNA, and the selected sequences are amplified by the polymerase chain reaction (PCR) (Fig. 1). The procedure results in enrichments of 10,000-fold, in which ∼50% of the resulting sequence-ready clones are from the targeted region (Box 1).
Article
Full-text available
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.
Article
Full-text available
In vivo protein-DNA interactions connect each transcription factor with its direct targets to form a gene network scaffold. To map these protein-DNA interactions comprehensively across entire mammalian genomes, we developed a large-scale chromatin immunoprecipitation assay (ChIPSeq) based on direct ultrahigh-throughput DNA sequencing. This sequence census method was then used to map in vivo binding of the neuron-restrictive silencer factor (NRSF; also known as REST, for repressor element–1 silencing transcription factor) to 1946 locations in the human genome. The data display sharp resolution of binding position [±50 base pairs (bp)], which facilitated our finding motifs and allowed us to identify noncanonical NRSF-binding motifs. These ChIPSeq data also have high sensitivity and specificity [ROC (receiver operator characteristic) area ≥ 0.96] and statistical confidence (P <10–4), properties that were important for inferring new candidate interactions. These include key transcription factors in the gene network that regulates pancreatic islet cell development.
Article
Full-text available
In colony collapse disorder (CCD), honey bee colonies inexplicably lose their workers. CCD has resulted in a loss of 50 to 90% of colonies in beekeeping operations across the United States. The observation that irradiated combs from affected colonies can be repopulated with naive bees suggests that infection may contribute to CCD. We used an unbiased metagenomic approach to survey microflora in CCD hives, normal hives, and imported royal jelly. Candidate pathogens were screened for significance of association with CCD by the examination of samples collected from several sites over a period of 3 years. One organism, Israeli acute paralysis virus of bees, was strongly correlated with CCD.
Article
Full-text available
Recent data have revealed that epigenetic alterations, including DNA methylation and chromatin structure changes, are among the earliest molecular abnormalities to occur during tumorigenesis. The inherent thermodynamic stability of cytosine methylation and the apparent high specificity of the alterations for disease may accelerate the development of powerful molecular diagnostics for cancer. We report a genome-wide analysis of DNA methylation alterations in breast cancer. The approach efficiently identified a large collection of novel differentially DNA methylated loci (approximately 200), a subset of which was independently validated across a panel of over 230 clinical samples. The differential cytosine methylation events were independent of patient age, tumor stage, estrogen receptor status or family history of breast cancer. The power of the global approach for discovery is underscored by the identification of a single differentially methylated locus, associated with the GHSR gene, capable of distinguishing infiltrating ductal breast carcinoma from normal and benign breast tissues with a sensitivity and specificity of 90% and 96%, respectively. Notably, the frequency of these molecular abnormalities in breast tumors substantially exceeds the frequency of any other single genetic or epigenetic change reported to date. The discovery of over 50 novel DNA methylation-based biomarkers of breast cancer may provide new routes for development of DNA methylation-based diagnostics and prognostics, as well as reveal epigenetically regulated mechanism involved in breast tumorigenesis.
Article
Full-text available
Promising new sequencing technologies, based on sequencing-by-synthesis (SBS), are starting to deliver large amounts of DNA sequence at very low cost. Polymorphism detection is a key application. We describe general methods for improved quality scores and accurate automated polymorphism detection, and apply them to data from the Roche (454) Genome Sequencer 20. We assess our methods using known-truth data sets, which is critical to the validity of the assessments. We developed informative, base-by-base error predictors for this sequencer and used a variant of the phred binning algorithm to combine them into a single empirically derived quality score. These quality scores are more useful than those produced by the system software: They both better predict actual error rates and identify many more high-quality bases. We developed a SNP detection method, with variants for low coverage, high coverage, and PCR amplicon applications, and evaluated it on known-truth data sets. We demonstrate good specificity in single reads, and excellent specificity (no false positives in 215 kb of genome) in high-coverage data.
Article
Full-text available
Optical nanostructures have enabled the creation of subdiffraction detection volumes for single-molecule fluorescence microscopy. Their applicability is extended by the ability to place molecules in the confined observation volume without interfering with their biological function. Here, we demonstrate that processive DNA synthesis thousands of bases in length was carried out by individual DNA polymerase molecules immobilized in the observation volumes of zero-mode waveguides (ZMWs) in high-density arrays. Selective immobilization of polymerase to the fused silica floor of the ZMW was achieved by passivation of the metal cladding surface using polyphosphonate chemistry, producing enzyme density contrasts of glass over aluminum in excess of 400:1. Yields of single-molecule occupancies of ≈30% were obtained for a range of ZMW diameters (70–100 nm). Results presented here support the application of immobilized single DNA polymerases in ZMW arrays for long-read-length DNA sequencing. • fluorescence • metal passivation • microscopy • polyvinyl phosphonic acid • single molecule
Article
Full-text available
We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn.
Article
Full-text available
Novel high-throughput DNA sequencing technologies allow researchers to characterize a bacterial genome during a single experiment and at a moderate cost. However, the increase in sequencing throughput that is allowed by using such platforms is obtained at the expense of individual sequence read length, which must be assembled into longer contigs to be exploitable. This study focuses on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length. We propose a de novo assembler software that is dedicated to process such data. Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome. The assembly results were validated by comparing data sets that were obtained experimentally for Staphylococcus aureus strain MW2 and Helicobacter acinonychis strain Sheeba with that of their published genomes acquired by conventional sequencing of 1.5- to 3.0-kb fragments. We also provide indications that the broad coverage achieved by high-throughput sequencing might allow for the detection of clonal polymorphisms in the set of DNA molecules being sequenced.
Article
Full-text available
Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.
Article
The majority (>99%) of microorganisms from the environment resist cultivation in the laboratory. Ribosomal RNA analysis suggests that uncultivated organisms are found in nearly every prokaryotic group, and several divisions have no known cultivable representatives. We designed a diffusion chamber that allowed the growth of previously uncultivated microorganisms in a simulated natural environment. Colonies of representative marine organisms were isolated in pure culture. These isolates did not grow on artificial media alone but formed colonies in the presence of other microorganisms. This observation may help explain the nature of microbial uncultivability.
Article
Many areas of biomedical research depend on the analysis of uncommon variations in individual genes or transcripts. Here we describe a method that can quantify such variation at a scale and ease heretofore unattainable. Each DNA molecule in a collection of such molecules is converted into a single magnetic particle to which thousands of copies of DNA identical in sequence to the original are bound. This population of beads then corresponds to a one-to-one representation of the starting DNA molecules. Variation within the original population of DNA molecules can then be simply assessed by counting fluorescently labeled particles via flow cytometry. This approach is called BEAMing on the basis of four of its principal components (beads, emulsion, amplification, and magnetics). Millions of individual DNA molecules can be assessed in this fashion with standard laboratory equipment. Moreover, specific variants can be isolated by flow sorting and used for further experimentation. BEAMing can be used for the identification and quantification of rare mutations as well as to study variations in gene sequences or transcripts in specific populations or tissues.
Article
An efficient, nanoliter-scale microfabricated bioprocessor integrating all three Sanger sequencing steps, thermal cycling, sample purification, and capillary electrophoresis, has been developed and evaluated. Hybrid glass–polydimethylsiloxane (PDMS) wafer-scale construction is used to combine 250-nl reactors, affinity-capture purification chambers, high-performance capillary electrophoresis channels, and pneumatic valves and pumps onto a single microfabricated device. Lab-on-a-chip-level integration enables complete Sanger sequencing from only 1 fmol of DNA template. Up to 556 continuous bases were sequenced with 99% accuracy, demonstrating read lengths required for de novo sequencing of human and other complex genomes. The performance of this miniaturized DNA sequencer provides a benchmark for predicting the ultimate cost and efficiency limits of Sanger sequencing. • capillary electrophoresis • genetic analysis • microfluidic
Article
Measurements of the ionic current flowing through nanometer-scale pores (nanopores) have been used to analyze single DNA and RNA molecules, with the ultimate goal of achieving ultrafast DNA sequencing. However, attempts at purely electronic measurements have not achieved the signal contrast required for single nucleotide differentiation. In this report we propose a novel method of optical detection of DNA sequence translocating through a nanopore. Each base of the target DNA sequence is 1st mapped onto a 2-unit code, 2 10-bp nucleotide sequence, by biochemical conversion into Designed DNA Polymers. These 2-unit codes are then hybridized to complementary, fluorescently labeled, and self-quenching molecular beacons. As the molecular beacons are sequentially unzipped during translocation through a <2-nm-wide nanopore, their fluorescent tags are unquenched and are detected by a custom-built dual-color total internal reflection fluorescence (TIRF) microscope. The 2-color optical signal is then correlated to the target DNA sequence. A dual-color TIRFM microscope with single-molecule resolution was constructed, and controlled fabrication of 1-dimensional and 2-dimensional arrays of solid-state nanopores was performed. A nanofluidic cell assembly was constructed for TIRF-based optical detection of voltage-driven DNA translocation through a nanopore. We present a novel nanopore-based DNA sequencing technique that uses an optical readout of DNA translocating unzipping through a nanopore. Our technique offers better single nucleotide differentiation in sequence readout, as well as the possibility of large-scale parallelism using nanopore arrays.
Article
A novel injection method is developed that utilizes a thermally switchable oligonucleotide affinity capture gel to mediate the concentration, purification, and injection of dsDNA for quantitative microchip capillary electrophoresis analysis. The affinity capture matrix consists of a 20 base acrydite modified oligonucleotide copolymerized into a 6% linear polyacrylamide gel that captures ssDNA or dsDNA analyte including PCR amplicons and synthetic oligonucleotides. Double stranded PCR amplicons with complementarity to the capture probe up to 81 bases from their 5' terminus are reproducibly captured via helix invasion. By integrating the oligo capture matrix directly with the CE separation channel, the electrophoretically mobilized target fragments are quantitatively captured and injected after thermal release for unbiased, efficient, and quantitative analysis. The capture process exhibits optimal efficiency at 44 degrees C and 100 V/cm with a 20 microM affinity capture probe (TM = 57.7 degrees C). A dsDNA titration assay with 20 bp fragments validated that dsDNA is captured at the same efficiency as ssDNA. Dilution studies with a duplex 20mer show that targets can be successfully captured and analyzed with a limit of detection of 1 pM from 250 nL of solution (approximately 150,000 fluorescent molecules). Simultaneous capture and injection of amplicons from E. coli K12 and M13mp18 using a mixture of two different capture probes demonstrates the feasibility of multiplex target capture. Unlike the traditional cross-injector, this method enables efficient capture and injection of dsDNA amplicons which will facilitate the quantitative analysis of products from integrated nanoliter-scale PCR reactors.
Article
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short read technologies. We present a new Eulerian assembler that generates nearly optimal short read assemblies of bacterial genomes and describe an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.
Article
The increasing availability of high-quality reference genomic sequences has created a demand for ways to survey the sequence differences present in individual genomes. Here we describe a DNA sequencing method based on hybridization of a universal panel of tiling probes. Millions of shotgun fragments are amplified in situ and subjected to sequential hybridization with short fluorescent probes. Long fragments of 200 bp facilitate unique placement even in large genomes. The sequencing chemistry is simple, enzyme-free and consumes only dilute solutions of the probes, resulting in reduced sequencing cost and substantially increased speed. A prototype instrument based on commonly available equipment was used to resequence the Bacteriophage lambda and Escherichia coli genomes to better than 99.93% accuracy with a raw throughput of 320 Mbp/day, albeit with a significant number of small gaps attributed to losses in sample preparation.