Article

mit-o-matic: A Comprehensive Computational Pipeline for Clinical Evaluation of Mitochondrial Variations from Next-Generation Sequencing Datasets

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The human mitochondrial genome has been reported to have a very high mutation rate as compared with the nuclear genome. A large number of mitochondrial mutations show significant phenotypic association and are involved in a broad spectrum of diseases. In recent years, there has been a remarkable progress in the understanding of mitochondrial genetics. The availability of Next Generation Sequencing technologies have not only reduced sequencing cost by orders of magnitude but has also provided us good quality mitochondrial genome sequences with high coverage, thereby enabling decoding of a number of human mitochondrial diseases. In this study, we report a computational and experimental pipeline to decipher the human mitochondrial DNA (mtDNA) variations and examine them for their clinical correlation. As a proof of principle, we also present a clinical study of a patient with Leigh disease and confirmed maternal inheritance of the causative allele. The pipeline is made available as a user-friendly online tool to annotate variants and find haplogroup, disease association and heteroplasmic sites. To the best of our knowledge, this is the first and the most comprehensive tool for clinical evaluation of mitochondrial genomic variations from Next Generation Sequencing datasets. The tool is freely available at http://genome.igib.res.in/mitomatic/. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The command line tools ANNOVAR (version date 2015-03-22) [35], dbNSFP (version 3.0b1a) [14], and SnpEff (version 4.1b) [36], although not specific for mtDNA analysis, were used to provide annotations for three mitochondrial mutations involving genes coding for an rRNA, a tRNA and a protein, respectively. Webbased versions of mit-o-matic [37], MitoBamAnnotator [38] and MitImpact 2.0 [15] tools were also applied to the same mutations to compare their performance in variant annotation. ...
... Among the most popular tools for variant prioritization, ANNOVAR [35], SnpEff [36] and dbNSFP [14] are commonly used both for nuclear DNA and mtDNA variations. Moreover mitochondrial-oriented tools have been recently developed, such as mit-o-matic [37], MitImpact [15] and MitoBamAnnotator [38] to ensure appropriate annotations mindful of mitochondrial genetics peculiarities, such as heteroplasmy. A comparison was performed among the aforementioned tools, showing pros and cons of each of them (Additional file 1). ...
Article
Full-text available
Background The abundance of biological data characterizing the genomics era is contributing to a comprehensive understanding of human mitochondrial genetics. Nevertheless, many aspects are still unclear, specifically about the variability of the 22 human mitochondrial transfer RNA (tRNA) genes and their involvement in diseases. The complex enrichment and isolation of tRNAs in vitro leads to an incomplete knowledge of their post-transcriptional modifications and three-dimensional folding, essential for correct tRNA functioning. An accurate annotation of mitochondrial tRNA variants would be definitely useful and appreciated by mitochondrial researchers and clinicians since the most of bioinformatics tools for variant annotation and prioritization available so far cannot shed light on the functional role of tRNA variations. ResultsTo this aim, we updated our MToolBox pipeline for mitochondrial DNA analysis of high throughput and Sanger sequencing data by integrating tRNA variant annotations in order to identify and characterize relevant variants not only in protein coding regions, but also in tRNA genes. The annotation step in the pipeline now provides detailed information for variants mapping onto the 22 mitochondrial tRNAs. For each mt-tRNA position along the entire genome, the relative tRNA numbering, tRNA type, cloverleaf secondary domains (loops and stems), mature nucleotide and interactions in the three-dimensional folding were reported. Moreover, pathogenicity predictions for tRNA and rRNA variants were retrieved from the literature and integrated within the annotations provided by MToolBox, both in the stand-alone version and web-based tool at the Mitochondrial Disease Sequence Data Resource (MSeqDR) website. All the information available in the annotation step of MToolBox were exploited to generate custom tracks which can be displayed in the GBrowse instance at MSeqDR website. Conclusions To the best of our knowledge, specific data regarding mitochondrial variants in tRNA genes were introduced for the first time in a tool for mitochondrial genome analysis, supporting the interpretation of genetic variants in specific genomic contexts.
... Whole-mtDNA sequencing is usually performed by polymerase chain reaction (PCR) amplification of mtDNA with 46 fragments, with subsequent Sanger sequencing using a mitoSEQr resequencing system (Life Technology, Grand Island, NY, USA). Next-generation sequencing (NGS) technology, which has revolutionized the field of molecular biology, has recently been applied in the diagnosis of mitochondrial diseases [28,29]. The NGS technique is now recognized to reduce sequencing cost and analysis time, with high coverage and fidelity, thereby enabling the decoding of a number of human mitochondrial diseases. ...
... In this study, we applied mtDNA-targeted NGS to identify genetic causes of the 2 studied diseases. NGS technology has been recently introduced in mtDNA testing for mitochondrial diseases [28][29][30]. The application of NGS may reduce the testing cost and time, with high fidelity. ...
Article
Full-text available
JGM in the mitochondrial DNA (mtDNA) that lead to oxidative phosphorylation impairment, ATP depletion, cellular dysfunction, and ultimately cell death [1]. Most mitochondrial diseases involve impairment of multiple organs, particularly in tissues with a high energy demand such as nerve and muscle.
... Because of the peculiarities of mitochondrial polyplasmic genetics, assignment of pathogenicity should take into account the degree of heteroplasmy, the haplogroup background, and even environmental factors (Wallace et al. 2003). Several methods have been recently published to meet these goals, such as mit-o-matic (Vellarikkal et al. 2015), MitImpact (Castellana et al. 2015) and MtSNPscore (Bhardwaj et al. 2009). Here, we contribute with a robust statistical approach based on the introduction of two thresholds, NVC and DST. ...
... In this paper, we successfully demonstrate that the prioritization of human mtDNA variants based on the workflow here proposed is able to recognize potential affecting function variants. Compared to the existing pipelines, capable of annotating mitochondrial variations from nextgeneration datasets exclusively (mit-o-matic, Vellarikkal et al. 2015), or extract information from lists of variants or FASTA sequences exclusively (MtSNPscore, Bhardwaj et al. 2009), our prioritization workflow, currently being implemented in MToolBox, appears to be more flexible allowing a functional annotation of variants on large datasets from both next-generation (in BAM, SAM, FASTQ format) and Sanger sequencing (in FASTA format) data. ...
Article
Full-text available
Assigning a pathogenic role to mitochondrial DNA (mtDNA) variants and unveiling the potential involvement of the mitochondrial genome in diseases are challenging tasks in human medicine. Assuming that rare variants are more likely to be damaging, we designed a phylogeny-based prioritization workflow to obtain a reliable pool of candidate variants for further investigations. The prioritization workflow relies on an exhaustive functional annotation through the mtDNA extraction pipeline MToolBox and includes Macro Haplogroup Consensus Sequences to filter out fixed evolutionary variants and report rare or private variants, the nucleotide variability as reported in HmtDB and the disease score based on several predictors of pathogenicity for non-synonymous variants. Cutoffs for both the disease score as well as for the nucleotide variability index were established with the aim to discriminate sequence variants contributing to defective phenotypes. The workflow was validated on mitochondrial sequences from Leber’s Hereditary Optic Neuropathy affected individuals, successfully identifying 23 variants including the majority of the known causative ones. The application of the prioritization workflow to cancer datasets allowed to trim down the number of candidate for subsequent functional analyses, unveiling among these a high percentage of somatic variants. Prioritization criteria were implemented in both standalone (http://sourceforge.net/projects/mtoolbox/) and web version (https://mseqdr.org/mtoolbox.php) of MToolBox. Electronic supplementary material The online version of this article (doi:10.1007/s00439-015-1615-9) contains supplementary material, which is available to authorized users.
... Exome capture kits in diagnostic settings generally do not include mtDNA. However, it was shown that mtDNA sequence can be extracted and reassembled from ES data using tools such as Mitoseek, 44 mito-matic, 45 or MtoolBox, 46 referred to as indirect or untargeted mtDNA sequencing yielding a diagnosis in 0.2% of a large undiagnosed disease cohort. 47 We did not look for mtDNA variants in our patient cohort. ...
Article
Full-text available
Background and Objectives Owing to their extensive clinical and molecular heterogeneity, hereditary neurologic diseases in adults are difficult to diagnose. The current knowledge about the diagnostic yield and clinical utility of exome sequencing (ES) for neurologic diseases in adults is limited. This observational study assesses the diagnostic value of ES and multigene panel analysis in adult-onset neurologic disorders. Methods From January 2019 through April 2022, ES-based multigene panel testing was conducted in 1,411 patients with molecularly unexplained neurologic phenotypes at the Ghent University Hospital. Gene panels were developed for ataxia and spasticity, leukoencephalopathy, movement disorders, paroxysmal episodic disorders, neurodegeneration with brain iron accumulation, progressive myoclonic epilepsy, and amyotrophic lateral sclerosis. Single nucleotide variants, small indels, and copy number variants were analyzed. Across all panels, our analysis covered a total of 725 genes associated with Mendelian inheritance. Results A molecular diagnosis was established in 10% of the cases (144 of 1,411) representing 71 different monogenic disorders. The diagnostic yield depended significantly on the presenting phenotype with the highest yield seen in patients with ataxia or spastic paraparesis (19%). Most of the established diagnoses comprised disorders with an autosomal dominant inheritance (62%), and the most frequently mutated genes were NOTCH3 (13 patients), SPG7 (11 patients), and RFC1 (8 patients). 34% of the disease-causing variants were novel, including a unique likely pathogenic variant in APP (Ghent mutation, p.[Asn698Asp]) in a family presenting with stroke and severe cerebral white matter disease. 7% of the pathogenic variants comprised copy number variants detected in the ES data and confirmed by an independent technique. Discussion ES and multigene panel testing is a powerful and efficient tool to diagnose patients with unexplained, adult-onset neurologic disorders.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socioeconomic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nationwide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nationwide framework to cater to the rare disease community of India.
... However, our analysis has focused on callers that aim to be mtDNA specific, especially since such methods could call both homoplasmic and heteroplasmic variants. Other methods were considered that are mtDNA specific such as Mit-o-matic (Vellarikkal et al., 2015), MitoRS (Marquis et al., 2017), and mitoSuite (Ishiya and Ueda, 2017). We selected methods for evaluation that were publicly available, with a command-line interface and that generate output easily accessible by an NGS pipeline. ...
Article
Full-text available
Mitochondrial DNA (mtDNA) mutations contribute to human disease across a range of severity, from rare, highly penetrant mutations causal for monogenic disorders to mutations with milder contributions to phenotypes. mtDNA variation can exist in all copies of mtDNA or in a percentage of mtDNA copies and can be detected with levels as low as 1%. The large number of copies of mtDNA and the possibility of multiple alternative alleles at the same DNA nucleotide position make the task of identifying allelic variation in mtDNA very challenging. In recent years, specialized variant calling algorithms have been developed that are tailored to identify mtDNA variation from whole-genome sequencing (WGS) data. However, very few studies have systematically evaluated and compared these methods for the detection of both homoplasmy and heteroplasmy. A publicly available synthetic gold standard dataset was used to assess four mtDNA variant callers (Mutserve, mitoCaller, MitoSeek, and MToolBox), and the commonly used Genome Analysis Toolkit “best practices” pipeline, which is included in most current WGS pipelines. We also used WGS data from 126 trios and calculated the percentage of maternally inherited variants as a metric of calling accuracy, especially for homoplasmic variants. We additionally compared multiple pathogenicity prediction resources for mtDNA variants. Although the accuracy of homoplasmic variant detection was high for the majority of the callers with high concordance across callers, we found a very low concordance rate between mtDNA variant callers for heteroplasmic variants ranging from 2.8% to 3.6%, for heteroplasmy thresholds of 5% and 1%. Overall, Mutserve showed the best performance using the synthetic benchmark dataset. The analysis of mtDNA pathogenicity resources also showed low concordance in prediction results. We have shown that while homoplasmic variant calling is consistent between callers, there remains a significant discrepancy in heteroplasmic variant calling. We found that resources like population frequency databases and pathogenicity predictors are now available for variant annotation but still need refinement and improvement. With its peculiarities, the mitochondria require special considerations, and we advocate that caution needs to be taken when analyzing mtDNA data from WGS data.
... Overall, we have developed the first open-source alternative to the enterprise software, Ion Torrent Suite™ Software (TSS), for the Precision ID mtDNA Whole Genome Panel, with improved performance metrics, and with an output in the correct rCRS reference. Since the majority of existing mtDNA bioinformatic tools [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21] are only compatible with the correct rCRS format, and the few tools that perform variant calling in samples sequenced with the Precision ID library kit [31,[34][35][36] keep TSS's modified reference sequence, PCP is currently the sole option that bridges this bioinformatic gap, allowing for the universal variant calling of samples sequenced with the aforementioned library. ...
Article
Full-text available
Despite a multitude of methods for the sample preparation, sequencing, and data analysis of mitochondrial DNA (mtDNA), the demand for innovation remains, particularly in comparison with nuclear DNA (nDNA) research. The Applied Biosystems™ Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) is an innovative library preparation kit suitable for degraded samples and low DNA input. However, its bioinformatic processing occurs in the enterprise Ion Torrent Suite™ Software (TSS), yielding BAM files aligned to an unorthodox version of the revised Cambridge Reference Sequence (rCRS), with a heteroplasmy threshold level of 10%. Here, we present an alternative customizable pipeline, the PrecisionCallerPipeline (PCP), for processing samples with the correct rCRS output after Ion Torrent sequencing with the Precision ID library kit. Using 18 samples (3 original samples and 15 mixtures) derived from the 1000 Genomes Project, we achieved overall improved performance metrics in comparison with the proprietary TSS, with optimal performance at a 2.5% heteroplasmy threshold. We further validated our findings with 50 samples from an ongoing independent cohort of stroke patients, with PCP finding 98.31% of TSS’s variants (TSS found 57.92% of PCP’s variants), with a significant correlation between the variant levels of variants found with both pipelines.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
... Resolving and reporting heteroplasmy are important considerations because it improves the consensus quality and downstream analyses (Rathbun et al., 2017). With this in mind, specific pipelines for mtDNA analysis in modern DNA samples have been generated (Guo et al., 2013;Calabrese et al., 2014;Vellarikkal et al., 2015;Weissensteiner et al., 2016a;Ishiya and Ueda, 2017;Rueda and Torkamani, 2017) and a variant caller implemented in the GATK package (RRID:SCR_001876) (McKenna et al., 2010), namely, Mutect2 (Benjamin et al., 2019), initially developed for somatic mutation detection in tumor samples, has been recently adapted to call mtDNA variants. A number of parameters are settable in Mutect2, including clipping of artifacts associated with end repair insertions near inverted tandem repeats, prevalent when DNA is damaged. ...
Article
Full-text available
Ancient DNA (aDNA) studies are frequently focused on the analysis of the mitochondrial DNA (mtDNA), which is much more abundant than the nuclear genome, hence can be better retrieved from ancient remains. However, postmortem DNA damage and contamination make the data analysis difficult because of DNA fragmentation and nucleotide alterations. In this regard, the assessment of the heteroplasmic fraction in ancient mtDNA has always been considered an unachievable goal due to the complexity in distinguishing true endogenous variants from artifacts. We implemented and applied a computational pipeline for mtDNA analysis to a dataset of 30 ancient human samples from an Iron Age necropolis in Polizzello (Sicily, Italy). The pipeline includes several modules from well-established tools for aDNA analysis and a recently released variant caller, which was specifically conceived for mtDNA, applied for the first time to aDNA data. Through a fine-tuned filtering on variant allele sequencing features, we were able to accurately reconstruct nearly complete (>88%) mtDNA genome for almost all the analyzed samples (27 out of 30), depending on the degree of preservation and the sequencing throughput, and to get a reliable set of variants allowing haplogroup prediction. Additionally, we provide guidelines to deal with possible artifact sources, including nuclear mitochondrial sequence (NumtS) contamination, an often-neglected issue in ancient mtDNA surveys. Potential heteroplasmy levels were also estimated, although most variants were likely homoplasmic, and validated by data simulations, proving that new sequencing technologies and software are sensitive enough to detect partially mutated sites in ancient genomes and discriminate true variants from artifacts. A thorough functional annotation of detected and filtered mtDNA variants was also performed for a comprehensive evaluation of these ancient samples.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socioeconomic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nationwide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nationwide framework to cater to the rare disease community of India.
... Rapid advances in high throughput sequencing technologies over recent years have allowed more thorough characterization of organellar genomes including new software aimed at identifying and describing heteroplasmy (e.g., Vellarikkal et al., 2015;Phan et al., 2019). This work has subsequently resulted in the identification of new cases describing heteroplasmy including interest in modeling heteroplasmy in an ontogenetic phylogeny context as it relates to human health (Wilton et al., 2018). ...
Article
The advent and advance of next generation sequencing over the past two decades made it possible to accumulate large quantities of sequence reads that could be used to assemble complete or nearly complete organelle genomes (plastome or mitogenome). The result has been an explosive increase in the availability of organelle genome sequences with over 4000 different species of green plants currently available on GenBank. During the same time period, plant molecular biologists greatly enhanced the understanding of the structure, repair, replication, recombination, transcription and translation, and inheritance of organelle DNA. Unfortunately many plant evolutionary biologists are unaware of or have overlooked this knowledge, resulting in misrepresentation of several phenomena that are critical for phylogenetic and evolutionary studies using organelle genomes. We believe that confronting these misconceptions about organelle genome organization, composition, and inheritance will improve our understanding of the evolutionary processes that underly organelle evolution. Here we discuss four misconceptions that can limit evolutionary biology studies and lead to inaccurate phylogenies and incorrect structure of the organellar DNA used to infer organelle evolution.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India.Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma.In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socioeconomic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nationwide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nationwide framework to cater to the rare disease community of India.
... (McKenna et al., 2010), are unsuitable for detecting heteroplasmic variants in organellar genomes with highly variable copy numbers. MToolBox (Calabrese et al., 2014) and Mit-o-matic (Vellarikkal et al., 2015) are designed to study human mitochondrial variants. ...
Article
Although heteroplasmy has been studied extensively in animal systems, there is a lack of tools for analyzing, exploring and visualizing heteroplasmy at the genome-wide level in other taxonomic systems. We introduce icHET, which is a computational workflow that produces an interactive visualization that facilitates the exploration, analysis and discovery of heteroplasmy across multiple genomic samples. icHET works on short reads from multiple samples from any organism with an organellar reference genome (mitochondrial or plastid) and a nuclear reference genome. Availability and implementation: The software is available at https://github.com/vtphan/HeteroplasmyWorkflow. Supplementary information: Supplementary data are available at Bioinformatics online.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socioeconomic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nationwide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nationwide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socioeconomic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nationwide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nationwide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socioeconomic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nationwide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nationwide framework to cater to the rare disease community of India.
... The consortium has also undertaken international initiatives to derive the pharmacogenomic landscape in Malays [156] and Qatari populations [157,158], and to identify genetic variants of Arab, Middle East, and North African populations [159,160]. GUaRDIAN has also set up a systematic pipeline for next generation sequencing of the mitochondrial genome for clinical applications, called the mit-o-matic [80]. ...
Article
Full-text available
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India. Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma. In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.
... Different tools such as MitoSeek , mit-o-matic (Vellarikkal et al., 2015) or MtoolBox (C. Calabrese et al., 2014) ...
Article
The expanding use of exome sequencing (ES) in diagnosis generates a huge amount of data, including untargeted mitochondrial DNA (mtDNA) sequences. We developed a strategy to deeply study ES data, focusing on mtDNA genome on a large unspecific cohort in order to increase diagnostic yield. A targeted bioinformatics pipeline assembled mitochondrial genome from ES data to detect pathogenic mtDNA variants in parallel with the “in‐house” nuclear exome pipeline. MtDNA data coming from off‐target sequences (indirect sequencing) were extracted from the BAM files in 928 individuals with developmental and/or neurological anomalies. The mtDNA variants were filtered out based on database information, cohort frequencies, haplogroups and protein consequences. Two homoplasmic pathogenic variants (m.9035T>C and m.11778G>A) were identified in 2/928 unrelated individuals (0.2%): the m.9035T>C (MT‐ATP6) variant in a female with ataxia and the m.11778G>A (MT‐ND4) variant in a male with a complex mosaic disorder and a severe ophthalmological phenotype, uncovering undiagnosed Leber's hereditary optic neuropathy (LHON). Seven secondary findings were also found, predisposing to deafness or LHON, in 7/928 individuals (0.75%). This study demonstrates the usefulness of including a targeted strategy in ES pipeline to detect mtDNA variants, improving results in diagnosis and research, without resampling patients and performing targeted mtDNA strategies. This article is protected by copyright. All rights reserved.
... For example, the higher prevalence of specific subclades of haplogroup J have been shown to modify the pathogenicity and penetrance of LHON (Brown et al., 2002;Ghelli et al., 2009;Caporali et al., 2017). Computing mitochondrial haplogroups from NGS data is relatively easy, as many bioinformatics tools (Tables 1C,D) have been developed based on the PhyloTree data (van Oven and Kayser, 2009) such as HaploGrep2 (Weissensteiner et al., 2016b), Mitomaster , or HmtDB (Clima et al., 2017) available on a web-server, or integrated into an all-in-one pipeline as in MToolBox bioinformatics suite (Calabrese et al., 2014), MseqDR mvTool (Shen et al., 2018), mit-o-matic (Vellarikkal et al., 2015). Conversely, the Phy-Mer software allows the classification of haplogroups from the FASTQ files, i.e., without prior alignment, avoiding mistakes caused by artifactual sequencing variants (Navarro-Gomez et al., 2015). ...
Article
Full-text available
The development of next generation sequencing (NGS) has greatly enhanced the diagnosis of mitochondrial disorders, with a systematic analysis of the whole mitochondrial DNA (mtDNA) sequence and better detection sensitivity. However, the exponential growth of sequencing data renders complex the interpretation of the identified variants, thereby posing new challenges for the molecular diagnosis of mitochondrial diseases. Indeed, mtDNA sequencing by NGS requires specific bioinformatics tools and the adaptation of those developed for nuclear DNA, for the detection and quantification of mtDNA variants from sequence alignment to the calling steps, in order to manage the specific features of the mitochondrial genome including heteroplasmy, i.e., coexistence of mutant and wildtype mtDNA copies. The prioritization of mtDNA variants remains difficult, relying on a limited number of specific resources: population and clinical databases, and in silico tools providing a prediction of the variant pathogenicity. An evaluation of the most prominent bioinformatics tools showed that their ability to predict the pathogenicity was highly variable indicating that special efforts should be directed at developing new bioinformatics tools dedicated to the mitochondrial genome. In addition, massive parallel sequencing raised several issues related to the interpretation of very low mtDNA mutational loads, discovery of variants of unknown significance, and mutations unrelated to patient phenotype or the co-occurrence of mtDNA variants. This review provides an overview of the current strategies and bioinformatics tools for accurate annotation, prioritization and reporting of mtDNA variations from NGS data, in order to carry out accurate genetic counseling in individuals with primary mitochondrial diseases.
... However, with the exception of the newly published server by Weissensteiner et al. [15], there are few (or no) options for services amenable to non-bioinformaticians that appropriately deal with mitochondrial data. Thus, when a researcher performs WES/WGS analysis producing a negative result, and would like to expand the analysis to the mitochondrial genome, he or she will need to perform an exploration of the Linux command-line tools (i.e., MToolBox [16], MitoSeek [17], mit-o-matic [18]; note that MitoBamAnnotator [19] is no longer available) and make a decision according to that search. Per our own experience, comparison of these tools is far from trivial and we believe it results in a barrier, especially for labs that do not have the willingness or the expertise, to systematically analyze mtDNA variants. ...
Article
Full-text available
Background Whole genome and exome sequencing usually include reads containing mitochondrial DNA (mtDNA). Yet, state-of-the-art pipelines and services for human nuclear genome variant calling and annotation do not handle mitochondrial genome data appropriately. As a consequence, any researcher desiring to add mtDNA variant analysis to their investigations is forced to explore the literature for mtDNA pipelines, evaluate them, and implement their own instance of the desired tool. This task is far from trivial, and can be prohibitive for non-bioinformaticians. Results We have developed SG-ADVISER mtDNA, a web server to facilitate the analysis and interpretation of mtDNA genomic data coming from next generation sequencing (NGS) experiments. The server was built in the context of our SG-ADVISER framework and on top of the MtoolBox platform (Calabrese et al., Bioinformatics 30(21):3115–3117, 2014), and includes most of its functionalities (i.e., assembly of mitochondrial genomes, heteroplasmic fractions, haplogroup assignment, functional and prioritization analysis of mitochondrial variants) as well as a back-end and a front-end interface. The server has been tested with unpublished data from 200 individuals of a healthy aging cohort (Erikson et al., Cell 165(4):1002–1011, 2016) and their data is made publicly available here along with a preliminary analysis of the variants. We observed that individuals over ~90 years old carried low levels of heteroplasmic variants in their genomes. Conclusions SG-ADVISER mtDNA is a fast and functional tool that allows for variant calling and annotation of human mtDNA data coming from NGS experiments. The server was built with simplicity in mind, and builds on our own experience in interpreting mtDNA variants in the context of sudden death and rare diseases. Our objective is to provide an interface for non-bioinformaticians aiming to acquire (or contrast) mtDNA annotations via MToolBox. SG-ADVISER web server is freely available to all users at https://genomics.scripps.edu/mtdna. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1778-6) contains supplementary material, which is available to authorized users.
... Reasons for that were ascribed to the intrinsic differences between computational/statistical methods and reference databases, or between training datasets and alignment algorithms [2][3][4][5][6][7]. These facts drove the development of the so called aggregators or meta-predictors, namely those software packages that yield an evaluation of pathogenicity based on the outcomes of other reference predictors, as well as of databases of nuclear and mitochondrial precomputed predictions [9][10][11][12]. Even these were contrasting [2]. ...
Article
Full-text available
Author summary The mitochondrion is an organelle floating in the cytoplasm of almost all eukaryotic cells. Its primary function is to generate energy. It contains an independent DNA (mtDNA), which is inherited maternally in many organisms. This DNA is highly susceptible to mutations since it does not possess the robust DNA repair mechanisms proper of the nuclear DNA. Mutations in the mtDNA were associated to several inherited and acquired mitochondrial diseases, including Alzheimer and Parkinson diseases, and cancer. The assessment of the mutation-disease causal link is an onerous task. It requires important laboratory skills/equipment and, often, an animal facility, which are not always available to any laboratory altogether. More and more often, one falls back on software solutions that rely on structural and functional characteristics of proteins to predict the putative harmfulness of a mutation. Many have been implemented and tested on the nuclear proteins, but only a few were finely tuned to the “neglected genome”. Our work not only presents APOGEE, a machine-learning-based predictor that outperforms all existing predictors in reliability and sensitivity, but it makes freely available the APOGEE’s predictions for all the mitochondrial missense mutations in MitImpact.
... For instance, MitoBamAnnotator (Zhidkov et al., 2011) assesses the functional potential of heteroplasmy. mit-o-matic (Vellarikkal et al., 2015) is another web-based pipeline for clinical annotations of mtDNA variants. However, these tools have some limitations with regard to uploading files on their servers. ...
Article
Full-text available
Recent rapid advances in high-throughput, next-generation sequencing (NGS) technologies have promoted mitochondrial genome studies in the fields of human evolution, medical genetics, and forensic casework. However, scientists unfamiliar with computer programming often find it difficult to handle the massive volumes of data that are generated by NGS. To address this limitation, we developed MitoSuite, a user-friendly graphical tool for analysis of data from high-throughput sequencing of the human mitochondrial genome. MitoSuite generates a visual report on NGS data with simple mouse operations. Moreover, it analyzes high-coverage sequencing data but runs on a stand-alone computer, without the need for file upload. Therefore, MitoSuite offers outstanding usability for handling massive NGS data, and is ideal for evolutionary, clinical, and forensic studies on the human mitochondrial genome variations. It is freely available for download from the website https://mitosuite.com .
... This has led to the discovery of a number of nuclear genes which could potentially modulate mitochondrial dysfunction leading to disease processes. The advent of genomic technologies has also left its mark in characterizing human mitochondrial variations (Maitra et al., 2004;Vellarikkal et al., 2015). Apart from genomic variations, epigenetic variations have been one of the emerging fields in biology. ...
Article
Epigenetic modifications of the nuclear genome have been well studied and it is established that these modifications play a key role in nuclear gene expression. However, the status of mitochondrial epigenetic modifications has not been delved in detail. The recent technological advancements in the genome analyzing tools and techniques, have helped in investigating mitochondrial epigenetic modifications with greater resolution and studies have indicated a regulatory role of the mitochondrial epigenome. Association of mitochondrial DNA methylation with various disease conditions, drug treatment, aging, exposure to environmental pollutants etc. has lent credence to this belief. Herein, we have reviewed studies on mitochondrial epigenetic modifications with a focus to comprehend its regulatory role in gene expression and disease association.
Chapter
The inference of mitochondrial haplogroups is an important step in forensic analysis of DNA samples collected at a crime scene. In this paper we introduced efficient inference algorithms based on Jaccard similarity between variants called from high-throughput sequencing data of such DNA samples and mutations collected in public databases such as PhyloTree. Experimental results on real and simulated datasets show that our mutation analysis methods have accuracy comparable to that of state-of-the-art methods based on haplogroup frequency estimation for both single-individual samples and two-individual mixtures, with a much lower running time.
Article
Mutations in the human mitochondrial genome have been observed in all types of human cancer, indicating that the mutations might contribute to tumorigenesis, metastasis, recurrence, or drug response. This possibility is appealing because of the known shift from oxidative metabolism to glycolysis, known as Warburg effect, that occurs in malignancy. Mitochondrial DNA (mtDNA) mutations could either be maternally inherited and predispose to cancer (germline mutations), or occur sporadically in the mtDNA of specific tissues (tissue- or tumor-specific somatic mutations) and contribute to the tumor initiation and progression process. High-throughput sequencing technologies now enable comprehensive detection of mtDNA variation in tissues and bodily fluids, with the potential to be used as an early-detection tool that may impact the treatment of cancer. Here, we discuss insights into the roles of mtDNA mutations in carcinogenesis, highlighting the complexities involved in the analysis and interpretation of mitochondrial genomic content, technical challenges in studying its contribution to pathogenesis, and the value of mtDNA mutations in developing early detection, diagnosis, prognosis, and therapeutic strategies for cancer.
Thesis
L’avènement du séquençage haut débit d’exome (SHD-E) en diagnostic et en recherche ces dernières années a conduit à l’identification des bases génétiques de nombreuses pathologies mendéliennes, permettant de résoudre de nombreuses situations d’errance diagnostique. Néanmoins, l’analyse des données de SHD-E permet uniquement d’identifier des variations pathogènes ou probablement pathogènes dans 30 à 45 % des situations sans diagnostic. En effet, certaines limites existent, tant au niveau clinique, moléculaire et bioinformatique. L’évolution constante des connaissances cliniques, du nombre de nouveaux gènes impliqués en pathologie humaine, et des corrélations clinico-biologique a un impact important sur l’analyse des données, entraînant une amélioration progressive de la recherche diagnostique. Des limites techniques inhérentes à la technologie, avec en particulier des régions non couvertes, existent, mais se sont également significativement réduites ces dernières années. Enfin, au-delà de l’analyse de SNV et de CNV, d’autres anomalies génétiques peuvent être responsables de maladies rares, nécessitant un développement bioinformatique pour optimiser les résultats. Bien que le séquençage à haut débit du génome permette de résoudre des observations, en particulier en cas de variations dans les régions non codantes ou les variants de structure, il existe encore de nombreuses informations à extraire et à exploiter à partir des données de SHD-E.L’objectif de cette thèse a donc été de participer à l’amélioration des approches bioinformatiques d’analyse de données de SHD-E pour l’identification de nouveaux gènes ou mécanismes moléculaires impliqués dans des maladies génétiques rares afin de réduire l’errance diagnostique des patients.Plusieurs stratégies ont ainsi été mises en place. La première stratégie a consisté en une réanalyse recherche de données de 80 patients ayant bénéficié d’un SHD-E au laboratoire CERBA (thèse CIFRE) dont la lecture diagnostique était négative. Elle a conduit à la mise en évidence deux nouveaux gènes candidats dans la déficience intellectuelle syndromique, dont le gène OTUD7A (article 1). La deuxième stratégie a consisté en la mise au point d’un pipeline bioinformatique pour extraire les données du génome mitochondrial à partir des données de SHD-E. L’ADN mitochondrial n’est pas ciblé par les kits de capture d’exome mais peut être extrait des données capturées indirectement rendant son analyse possible à partir de données de SHD-E préexistantes. A partir de la collection GAD d’exomes de patients sans diagnostic, deux variations causales ont été identifiées chez deux individus atteints de troubles neuro-développementaux sur 928 personnes étudiées, et ainsi résoudre une errance diagnostique dans 0,2 % des patients sans diagnostic (article 2). La troisième stratégie a consisté en la mise en place d’un pipeline bioinformatique d’identification des éléments mobiles au sein des données d’exome, étant attendu qu’environ 0,3 % des variations pathogènes du génome humain ont pour origine l’insertion de novo d’un élément mobile. A partir de la collection GAD d’exomes de 3322 patients sans diagnostic, cette étape a permis d’identifier deux cas en lien avec l’insertion d’un élément Alu au sein d’un exon du gène FERMT1 et du gène GRIN2B (article 3 en cours d’écriture).Cette thèse a permis de repousser certaines limites de la technologie d’exome. D’autres perspectives existent, et sont explorées par l’équipe, en lien avec le projet Européen Solve-RD.
Chapter
Full-text available
Organelles play an important role in a eukaryotic cell. Among them, the two organelles, chloroplast and mitochondria, are responsible for the critical function of photosynthesis and aerobic respiration. Organellar genomes are also very important for plant systematic studies. Here we have described the methods for isolation of the mitochondrial and plastid DNA and its subsequent sequencing with the help of NGS technology. We have also discussed in detail the various tools available for assembly, annotation, and visualization of the organelle genome sequence.
Article
Full-text available
Background Mitochondrial heteroplasmy, the co-existence of different mitochondrial polymorphisms within an individual, has various forensic and clinical implications. But there is still no guideline on the application of massively parallel sequencing (MPS) in heteroplasmy detection. We present here some critical issues that should be considered in heteroplasmy studies using MPS. Methods Among five samples with known innate heteroplasmies, two pairs of mixture were generated for artificial heteroplasmies with target minor allele frequencies (MAFs) ranging from 50% to 1%. Each sample was amplified by two-amplicon method and sequenced by Ion Torrent system. The outcomes of two different analysis tools, Torrent Suite Variant Caller (TVC) and mtDNA-Server (mDS), were compared. Results All the innate heteroplasmies were detected correctly by both analysis tools. Average MAFs of artificial heteroplasmies correlated well to the target values. The detection rates were almost 90% for high-level heteroplasmies, but decreased for low-level heteroplasmies. TVC generally showed lower detection rates than mDS, which seems to be due to their own computation algorithms which drop out some reference-dominant heteroplasmies. Meanwhile, mDS reported several unintended low-level heteroplasmies which were suggested as nuclear mitochondrial DNA sequences. The average coverage depth of each sample placed on the same chip showed considerable variation. The increase of coverage depth had no effect on the detection rates. Conclusion In addition to the general accuracy of the MPS application on detecting heteroplasmy, our study indicates that the understanding of the nature of mitochondrial DNA and analysis algorithm would be crucial for appropriate interpretation of MPS results.
Article
Mitochondrial myopathy, encephalopathy, lactic acidosis, and stroke-like episodes (MELAS) is a condition that affects many parts of the body, particularly the brain and muscles. This study examined a Korean MELAS-like syndrome patient with seizure, stroke-like episode, and optic atrophy. Target sequencing of whole mtDNA and 73 nuclear genes identified compound heterozygous mutations p.R205X and p.L255P in the FASTKD2. Each of his unaffected parents has one of the two mutations, and both mutations were not found in 302 controls. FASTKD2 encodes a FAS-activated serine-threonine (FAST) kinase domain 2 which locates in the mitochondrial inner compartment. A FASTKD2 nonsense mutation was once reported as the cause of a recessive infantile mitochondrial encephalomyopathy. The present case showed relatively mild symptoms with a late onset age, compared to a previous patient with FASTKD2 mutation, implicating an inter-allelic clinical heterogeneity. Because this study is the second report of an autosomal recessive mitochondrial encephalomyopathy patient with a FASTKD2 mutation, it will extend the phenotypic spectrum of the FASTKD2 mutation.
Article
Full-text available
Over 300 million people are affected by about 7000 rare diseases globally. There are tremendous resource limitations and challenges in driving research and drug development for rare diseases. Hence, innovative approaches are needed to identify potential solutions. This review focuses on the resources developed over the past years for analysis of genome data towards understanding disease biology especially in the context of mitochondrial diseases, given that mitochondria are central to major cellular pathways and their dysfunction leads to a broad spectrum of diseases. Platforms for collaboration of research groups, clinicians and patients and the advantages of community collaborative efforts in addressing rare diseases are also discussed. The review also describes crowdsourcing and crowdfunding efforts in rare diseases research and how the upcoming initiatives for understanding disease biology including analyses of large number of genomes are also applicable to rare diseases.
Chapter
Mitochondria generate almost all of the energy needed by the cells to grow and sustain life and hence mitochondrial dysfunction may lead to severe disease phenotypes. The dysfunction may arise because of mutations in mitochondrial DNA (mtDNA), nuclear DNA (nDNA), or both, and manifest in a broad spectrum of disease phenotypes depending on the mutation load. The prevalence of mitochondrial diseases is 1 in 5000. Therefore evidence-based clinical practice protocols are proposed for mitochondrial disease diagnosis or treatment. Next generation sequencing (NGS) is being recommended as the first-line test to provide more accurate diagnosis associated with mitochondrial diseases. This chapter focuses on advancements, advantages, limitations, and future directions of the next generation molecular methods for mitochondrial disease diagnosis and also briefly describes the ongoing translational efforts to systematically address the problem of mitochondrial diseases through focused programs and consortia-based approaches.
Article
Full-text available
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at.
Article
Full-text available
Over 300 million people are affected by about 7000 rare diseases globally. There are tremendous resource limitations and challenges in driving research and drug development for rare diseases. Hence, innovative approaches are needed to identify potential solutions. This review focuses on the resources developed over the past years for analysis of genome data towards understanding disease biology especially in the context of mitochondrial diseases, given that mitochondria are central to major cellular pathways and their dysfunction leads to a broad spectrum of diseases. Platforms for collaboration of research groups, clinicians and patients and the advantages of community collaborative efforts in addressing rare diseases are also discussed. The review also describes crowdsourcing and crowdfunding efforts in rare diseases research and how the upcoming initiatives for understanding disease biology including analyses of large number of genomes are also applicable to rare diseases.
Article
Full-text available
Motivation: The increasing availability of mitochondria-targeted and off-targeted sequencing data in Whole Exome and Genome Sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format (VCF) file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Supplementary information: Further supplementary data are available at Bioinformatics online. Supplementary VCF, simulation and annotation files are available at https://sourceforge.net/projects/mtoolbox/files/.
Article
Full-text available
It has been shown that mitochondrial deoxyribo nucleic acid mutations may play an important role in the development of cardiomyopathy, and various types of cardiomyopathy can be attributed to disturbed mitochondrial oxidative energy metabolism. Several studies have described many mutations in mitochondrial genes encoding for subunits of respiratory chain complexes. Thus, recent studies confirm that pathologic mitochondrial deoxyribo nucleic acid mutations are a major reason of diseases and determining them by next-generation sequencing will improve our understanding of dysregulation of heart development. To analyse mitochondrial deoxyribo nucleic acid mutations, the entire mitochondrial deoxyribo nucleic acid was amplified in two overlapping polymerase chain reaction fragments from the cardiac tissue of the 22 patients with congenital heart disease, undergoing cardiac surgery. Mitochondrial deoxyribo nucleic acid was deep sequenced by next-generation sequencing. A total of 13 novel mitochondrial deoxyribo nucleic acid mutations were identified in nine patients. Of the patients, three have novel mutations together with reported cardiomyopathy mutations. In all, 65 mutations were found, and 13 of them were unreported. This study represents the most comprehensive mitochondrial deoxyribo nucleic acid mutational analysis in patients with congenital heart disease.
Article
Full-text available
The clinical features of mitochondrial disease are complex and highly variable, leading to challenges in establishing a specific diagnosis. Despite being one of the most commonly occurring inherited genetic diseases with an incidence of 1/5000, ~90% of these complex patients remain without a DNA-based diagnosis. We report our efforts to identify the pathogenetic cause for a patient with typical features of mitochondrial disease including infantile cataracts, CPEO, ptosis, progressive distal muscle weakness, and ataxia who carried a diagnosis of mitochondrial disease for over a decade. Whole exome sequencing and bioinformatic analysis of these data were conducted on the proband. Exome sequencing studies showed a homozygous splice site mutation in SETX, which is known to cause Spinocerebellar Ataxia, Autosomal Recessive 1 (SCAR1). Additionally a missense mutation was identified in a highly conserved position of the OCRL gene, which causes Lowe Syndrome and Dent Disease 2. This patient's complex phenotype reflects a complex genetic etiology in which no single gene explained the complete clinical presentation. These genetic studies reveal that this patient does not have mitochondrial disease but rather a genocopy caused by more than one mutant locus. This study demonstrates the benefit of exome sequencing in providing molecular diagnosis to individuals with complex clinical presentations.
Article
Full-text available
Bowtie 1 is a fast and memory-efficient program for aligning short reads to mammalian genomes. Burrows-Wheeler indexing allows Bowtie to align more than 25 million 35-bp reads per CPU hour to the human genome in a memory footprint of as little as 1.1 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a quality-aware search algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve greater alignment speed. Bowtie is free, open source software available for download from http://bowtie.cbcb.umd.edu . The Burrows-Wheeler Transformation of a text T, BWT(T), is constructed as shown to the right. The Burrows- Wheeler Matrix of T is the matrix whose rows are all distinct cyclic rotations of T$ sorted lexicographically ($ is "less than" all other characters). BWT(T) is the sequence of characters in the last column of this matrix.
Article
Full-text available
Human mitochondrial DNA (mtDNA) encodes a set of 37 genes which are essential structural and functional components of the electron transport chain. Variations in these genes have been implicated in a broad spectrum of diseases and are extensively reported in literature and various databases. In this study, we describe MitoLSDB, an integrated platform to catalogue disease association studies on mtDNA (http://mitolsdb.igib.res.in). The main goal of MitoLSDB is to provide a central platform for direct submissions of novel variants that can be curated by the Mitochondrial Research Community. MitoLSDB provides access to standardized and annotated data from literature and databases encompassing information from 5231 individuals, 675 populations and 27 phenotypes. This platform is developed using the Leiden Open (source) Variation Database (LOVD) software. MitoLSDB houses information on all 37 genes in each population amounting to 132397 variants, 5147 unique variants. For each variant its genomic location as per the Revised Cambridge Reference Sequence, codon and amino acid change for variations in protein-coding regions, frequency, disease/phenotype, population, reference and remarks are also listed. MitoLSDB curators have also reported errors documented in literature which includes 94 phantom mutations, 10 NUMTs, six documentation errors and one artefactual recombination. MitoLSDB is the largest repository of mtDNA variants systematically standardized and presented using the LOVD platform. We believe that this is a good starting resource to curate mtDNA variants and will facilitate direct submissions enhancing data coverage, annotation in context of pathogenesis and quality control by ensuring non-redundancy in reporting novel disease associated variants.
Article
Full-text available
Motivation: Exome capture kits have capture efficiencies that range from 40 to 60%. A significant amount of off-target reads are from the mitochondrial genome. These unintentionally sequenced mitochondrial reads provide unique opportunities to study the mitochondria genome. Results: MitoSeek is an open-source software tool that can reliably and easily extract mitochondrial genome information from exome and whole genome sequencing data. MitoSeek evaluates mitochondrial genome alignment quality, estimates relative mitochondrial copy numbers and detects heteroplasmy, somatic mutation and structural variants of the mitochondrial genome. MitoSeek can be set up to run in parallel or serial on large exome sequencing datasets. Availability: https://github.com/riverlee/MitoSeek
Article
Full-text available
Purpose: The application of massively parallel sequencing technology to the analysis of the mitochondrial genome has demonstrated great improvement in the molecular diagnosis of mitochondrial DNA-related disorders. The objective of this study was to investigate the performance characteristics and to gain new insights into the analysis of the mitochondrial genome. Methods: The entire mitochondrial genome was analyzed as a single amplicon using a long-range PCR-based enrichment approach coupled with massively parallel sequencing. The interference of the nuclear mitochondrial DNA homologs was distinguished from the actual mitochondrial DNA sequences by comparison with the results obtained from conventional PCR-based Sanger sequencing using multiple pairs of primers. Results: Our results demonstrated the uniform coverage of the entire mitochondrial genome. Massively parallel sequencing of the single amplicon revealed the presence of single-nucleotide polymorphisms and nuclear homologs of mtDNA sequences that cause the erroneous and inaccurate variant calls when PCR/Sanger sequencing approach was used. This single amplicon massively parallel sequencing strategy provides an accurate quantification of mutation heteroplasmy as well as the detection and mapping of mitochondrial DNA deletions. Conclusion: The ability to quantitatively and qualitatively evaluate every single base of the entire mitochondrial genome is indispensible to the accurate molecular diagnosis and genetic counseling of mitochondrial DNA-related disorders. This new approach may be considered as first-line testing for comprehensive analysis of the mitochondrial genome.Genet Med 2013:15(5):388-394.
Article
Full-text available
Author Summary This manuscript details a novel algorithm to evaluate high-throughput DNA sequence data from whole mitochondrial genomes purified from genomic DNA, which also contains multiple fragmented nuclear copies of mtgenomes (numts). 40 samples were selected from 2 distinct reference (HapMap) populations of African (YRI) and European (CEU) origin. While previous technologies did not allow the assessment of individual mitochondrial molecules, next-generation sequencing technology is an excellent tool for obtaining the mtgenome sequence and its heteroplasmic sites rapidly and accurately through deep coverage of the genome. The computational techniques presented optimize reference-based alignments and introduce a new de novo assembly method. An important contribution of our study was obtaining high accuracy of the resulting called bases that we accomplished by quantitative filtering of reads that were error prone. In addition, several sites were experimentally validated and our method has a strong correlation (R2 = 0.96) with the NIST standard reference sample for heteroplasmy. Overall, our findings indicate that one can now confidently genotype mtDNA variants using next-generation sequencing data and reveal low levels of heteroplasmy (>10%). Beyond enriching our understanding and pathology of certain diseases, this development could be considered as a prelude to sequence-based individualized medicine for the mtgenome.
Article
Full-text available
Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ∼1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when "deep sequencing" genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.
Article
Full-text available
ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Availability: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art Contact:weichun.huang@nih.gov; gabor.marth@bc.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Article
Full-text available
Originally believed to be a rare phenomenon, heteroplasmy - the presence of more than one mitochondrial DNA (mtDNA) variant within a cell, tissue, or individual - is emerging as an important component of eukaryotic genetic diversity. Heteroplasmies can be used as genetic markers in applications ranging from forensics to cancer diagnostics. Yet the frequency of heteroplasmic alleles may vary from generation to generation due to the bottleneck occurring during oogenesis. Therefore, to understand the alterations in allele frequencies at heteroplasmic sites, it is of critical importance to investigate the dynamics of maternal mtDNA transmission. Here we sequenced, at high coverage, mtDNA from blood and buccal tissues of nine individuals from three families with a total of six maternal transmission events. Using simulations and re-sequencing of clonal DNA, we devised a set of criteria for detecting polymorphic sites in heterogeneous genetic samples that is resistant to the noise originating from massively parallel sequencing technologies. Application of these criteria to nine human mtDNA samples revealed four heteroplasmic sites. Our results suggest that the incidence of heteroplasmy may be lower than estimated in some other recent re-sequencing studies, and that mtDNA allelic frequencies differ significantly both between tissues of the same individual and between a mother and her offspring. We designed our study in such a way that the complete analysis described here can be repeated by anyone either at our site or directly on the Amazon Cloud. Our computational pipeline can be easily modified to accommodate other applications, such as viral re-sequencing.
Article
Full-text available
The development and maintenance of mitochondrial heteroplasmy has important consequences for both health and heredity. Previous studies using pathogenic mutations have shown considerable variability between maternally related individuals and studies of several D-loop polymorphisms have suggested a relationship between heteroplasmy and somatic aging. To broadly explore the variation of human heteroplasmy and to clarify the dynamics of somatic heteroplasmy over the course of lifespan, we analyzed mitochondrial sequence variation across a range of ages. We utilized array-generated single-nucleotide polymorphism data that were well correlated with independent measures of heteroplasmy. Significant levels of heteroplasmy were identified at 0.24% of sites evaluated. By examining mother-child pairs, we found that heteroplasmy was inherited (30%) but could occur de novo in offspring or, conversely, be present in mothers but eliminated in their children (70%). Cumulatively, mitochondrial heteroplasmy across the genome increased significantly with advanced age (r = 0.224, P =8 × 10(-30)). Surprisingly, changes in heteroplasmy were not uniform with some sites demonstrating a loss of variation (increased homoplasmy) with aging. These data suggest that both mutation and selective pressure affect blood mitochondrial DNA sequence over the course of the human lifespan and reveal the unexpectedly dynamic nature of human heteroplasmy.
Article
Full-text available
The effect of genetic mutation on phenotype is of significant interest in genetics. The type of genetic mutation that causes a single amino acid substitution (AAS) in a protein sequence is called a non-synonymous single nucleotide polymorphism (nsSNP). An nsSNP could potentially affect the function of the protein, subsequently altering the carrier's phenotype. This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects protein function. To assess the effect of a substitution, SIFT assumes that important positions in a protein sequence have been conserved throughout evolution and therefore substitutions at these positions may affect protein function. Thus, by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the protein sequence. The protocol typically takes 5-20 min, depending on the input. SIFT is available as an online tool (http://sift.jcvi.org).
Article
Full-text available
The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. http://maq.sourceforge.net.
Article
Full-text available
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source (http://bowtie.cbcb.umd.edu).
Article
Full-text available
Somatic mitochondrial mutations are common in human cancers, and can be used as a tool for early detection of cancer. We have developed a mitochondrial Custom Reseq microarray as an array-based sequencing platform for rapid and high-throughput analysis of mitochondrial DNA. The MitoChip contains oligonucleotide probes synthesized using standard photolithography and solid-phase synthesis, and is able to sequence >29 kb of double-stranded DNA in a single assay. Both strands of the entire human mitochondrial coding sequence (15,451 bp) are arrayed on the MitoChip; both strands of an additional 12,935 bp (84% of coding DNA) are arrayed in duplicate. We used 300 ng of genomic DNA to amplify the mitochondrial coding sequence in three overlapping long PCR fragments. We then sequenced >2 million base pairs of mitochondrial DNA, and successfully assigned base calls at 96.0% of nucleotide positions. Replicate experiments demonstrated >99.99% reproducibility. In matched fluid samples (urine and pancreatic juice, respectively) obtained from five patients with bladder cancer and four with pancreatic cancer, the MitoChip detected at least one cancer-associated mitochondrial mutation in six (66%) of nine samples. The MitoChip is a high-throughput sequencing tool for the reliable identification of mitochondrial DNA mutations from primary tumors in clinical samples.
Article
Full-text available
Mitochondrial DNA (mtDNA) variation was investigated in a sample of 299 Latvians, a Baltic-speaking population from Eastern Europe. Sequencing of the first hypervariable segment (HVS-I) in combination with analysis of informative coding region markers revealed that the vast majority of observed mtDNAs belong to haplogroups (hgs) common to most European populations. Analysis of the spatial distribution of mtDNA haplotypes found in Latvians, as well as in Baltic-speaking populations in general, revealed that they share haplotypes with all neighbouring populations irrespective of their linguistic affiliation. Hence, the results of our mtDNA analysis show that the previously described sharp difference between the Y-chromosomal hg N3 distribution in the paternally inherited gene pool of Baltic-speaking populations and of other European Indo-European speakers does not have a corresponding maternal counterpart.
Article
Full-text available
Mitochondria are power organelles generating biochemical energy, ATP, in the cell. Mitochondria play a variety of roles, including integrating extracellular signals and executing critical intracellular events, such as neuronal cell survival and death. Increasing evidence suggests that a cross-talk mechanism between mitochondria and the nucleus is closely related to neuronal function and activity. Nuclear receptors (estrogen receptors, thyroid (T3) hormone receptor, peroxisome proliferators-activated receptor gamma2) and transcription factors (cAMP response binding protein, p53) have been found to target mitochondria and exert prosurvival and prodeath pathways. In this context, the regulation of mitochondrial function via the translocation of nuclear receptors and transcription factors may underlie some of the mechanisms involved in neuronal survival and death. Understanding the function of nuclear receptors and transcription factors in the mitochondria may provide important pharmacological utility in the treatment of neurodegenerative conditions. Thus, the modulation of signaling pathways via mitochondria-targeting nuclear receptors and transcription factors is rapidly emerging as a novel therapeutic target.
Article
Full-text available
Well-resolved molecular gene trees illustrate the concept of descent with modification and exhibit the opposing processes of drift and migration, both of which influence population structure. Phylogenies of the maternally inherited mtDNA genome and the paternally inherited portion of the nonrecombining Y chromosome retain sequential records of the accumulation of genetic diversity. Although knowledge regarding the diversity of the entire human genome will be needed to completely characterize human genetic evolution, these uniparentally inherited loci are unique indicators of gender in modulating the extant population structure. We compare and contrast these loci for patterns of continuity and discreteness and discuss how their phylogenetic diversity and progression provide means to disentangle ancient colonization events by pioneering migrants from subsequent overlying migrations. We introduce new results concerning Y chromosome founder haplogroups C, DE, and F that resolve their previous trifurcation and improve the harmony with the mtDNA recapitulation of the out-of-Africa migration.
Article
Alzheimer disease (AD) and Parkinson disease (PD) are the two most common age-related neurodegenerative diseases characterized by prominent neurodegeneration in selective neural systems. Although a small fraction of AD and PD cases exhibit evidence of heritability, among which many genes have been identified, the majority are sporadic without known causes. Molecular mechanisms underlying neurodegeneration and pathogenesis of these diseases remain elusive. Convincing evidence demonstrates oxidative stress as a prominent feature in AD and PD and links oxidative stress to the development of neuronal death and neural dysfunction, which suggests a key pathogenic role for oxidative stress in both AD and PD. Notably, mitochondrial dysfunction is also a prominent feature in these diseases, which is likely to be of critical importance in the genesis and amplification of reactive oxygen species and the pathophysiology of these diseases. In this review, we focus on changes in mitochondrial DNA and mitochondrial dynamics, two aspects critical to the maintenance of mitochondrial homeostasis and function, in relationship with oxidative stress in the pathogenesis of AD and PD.
Article
Sanger sequencing of multigenic disorders can be technically challenging, time consuming, and prohibitively expensive. High-throughput next-generation sequencing (NGS) can provide a cost-effective method for sequencing targeted genes associated with multigenic disorders. We have developed a NGS clinical targeted gene assay for the mitochondrial genome and for 108 selected nuclear genes associated with mitochondrial disorders. Mitochondrial disorders have a reported incidence of 1 in 5000 live births, encompass a broad range of phenotypes, and are attributed to mutations in the mitochondrial and nuclear genomes. Approximately 20% of mitochondrial disorders result from mutations in mtDNA, with the remaining 80% found in nuclear genes that affect mtDNA levels or mitochondrion protein assembly. In our NGS approach, the 16,569-bp mtDNA is enriched by long-range PCR and the 108 nuclear genes (which represent 1301 amplicons and 680 kb) are enriched by RainDance emulsion PCR. Sequencing is performed on Illumina HiSeq 2000 or MiSeq platforms, and bioinformatics analysis is performed using commercial and in-house developed bioinformatics pipelines. A total of 16 validation and 13 clinical samples were examined. All previously reported variants associated with mitochondrial disorders were found in validation samples, and 5 of the 13 clinical samples were found to have mutations associated with mitochondrial disorders in either the mitochondrial genome or the 108 nuclear genes. All variants were confirmed by Sanger sequencing.
Article
The determination of human mitochondrial DNA (mtDNA) haplogroups is not only crucial in anthropological and forensic studies, but is also helpful in the medical field to prevent the making of wrong disease associations. In recent years, high-throughput technologies and the huge amounts of data they create, as well as the regular updates to the mtDNA phylogenetic tree, mean there is a need for an automated approach which can make a speedier determination of haplogroups than can be made by using the traditional manual method. Here, we update the MitoTool (www.mitotool.org) by incorporating a novel scoring system for the determination of mtDNA into haplogroups, which has advantages on speed, accuracy and ease of implementation. In order to make the access to MitoTool easier, we also provide a stand-alone version of the program that will run on a local computer and this version is freely available at the MitoTool website.
Article
This protocol describes the methodology to characterize mitochondrial DNA (mtDNA) heteroplasmy by parallel sequencing. Mitochondria play an important role in essential cellular functions. Each eukaryotic cell contains hundreds of mitochondria with hundreds of mitochondria genomes. Mutant and wild-type mtDNA may co-exist as heteroplasmy, and cause human disease. The purpose of this protocol is to simultaneously determine mtDNA sequence and quantify the heteroplasmic level. This protocol includes a two-fragment mitochondrial genome DNA PCR amplification. The PCR product is then mixed at an equimolar ratio. The samples are then barcoded and sequenced with high-throughput, next-generation sequencing technology. This technology is highly sensitive, specific, and accurate in determining mtDNA mutations and the level of heteroplasmy.
Article
Mitochondria contain approximately 1,000 different proteins, most of which are imported from the cytosol. Two import pathways that direct proteins into the mitochondrial inner membrane and matrix have been known for many years. The identification of numerous new transport components in recent proteomic studies has led to novel mechanistic insight into these pathways and the discovery of new import pathways into the outer membrane and intermembrane space. Protein translocases do not function as independent units but are integrated into dynamic networks and are connected to machineries that function in bioenergetics, mitochondrial morphology and coupling to the endoplasmic reticulum.
Article
Heteroplasmy, the existence of multiple mtDNA types within an individual, has been previously detected by using mostly indirect methods and focusing largely on just the hypervariable segments of the control region. Next-generation sequencing technologies should enable studies of heteroplasmy across the entire mtDNA genome at much higher resolution, because many independent reads are generated for each position. However, the higher error rate associated with these technologies must be taken into consideration to avoid false detection of heteroplasmy. We used simulations and phiX174 sequence data to design criteria for accurate detection of heteroplasmy with the Illumina Genome Analyzer platform, and we used artificial mixtures and replicate data to test and refine the criteria. We then applied these criteria to mtDNA sequence reads for 131 individuals from five Eurasian populations that had been generated via a parallel tagged approach. We identified 37 heteroplasmies at 10% frequency or higher at 34 sites in 32 individuals. The mutational spectrum does not differ between heteroplasmic mutations and polymorphisms in the same individuals, but the relative mutation rate at heteroplasmic mutations is significantly higher than that estimated for all mutable sites in the human mtDNA genome. Moreover, there is also a significant excess of nonsynonymous mutations observed among heteroplasmies, compared to polymorphism data from the same individuals. Both mutation-drift and negative selection influence the fate of heteroplasmies to determine the polymorphism spectrum in humans. With appropriate criteria for avoiding false positives due to sequencing errors, next-generation technologies can provide novel insights into genome-wide aspects of mtDNA heteroplasmy.
Article
The presence of hundreds of copies of mitochondrial DNA (mtDNA) in each human cell poses a challenge for the complete characterization of mtDNA genomes by conventional sequencing technologies. Here we describe digital sequencing of mtDNA genomes with the use of massively parallel sequencing-by-synthesis approaches. Although the mtDNA of human cells is considered to be homogeneous, we found widespread heterogeneity (heteroplasmy) in the mtDNA of normal human cells. Moreover, the frequency of heteroplasmic variants varied considerably between different tissues in the same individual. In addition to the variants identified in normal tissues, cancer cells harboured further homoplasmic and heteroplasmic mutations that could also be detected in patient plasma. These studies provide insights into the nature and variability of mtDNA sequences and have implications for mitochondrial processes during embryogenesis, cancer biomarker development and forensic analysis. In particular, they demonstrate that individual humans are characterized by a complex mixture of related mitochondrial genotypes rather than a single genotype.
Article
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.
Article
A female infant showing lacticacidemia, hypotonia, and neurodegenerative disease died at 7 mo of age. Autopsy revealed lesions typical of Leigh disease, both in the basal ganglia and in the brain stem. A maternal aunt and uncle died 1 year and 5 mo, respectively, after following a similar clinical course, while another uncle, presently 33 years of age, has retinitis pigmentosa and ataxia and is mentally retarded. PCR restriction-digest analysis of mtDNA isolated from the proband revealed a T-to-G change at position 8993, creating a new AvaI restriction site. The mutation present in the ATP 6 gene results in the substitution of an arginine residue for a leucine. The indexed patient had greater than 95% abnormal mtDNA in her skin fibroblasts, brain, kidney, and liver tissues, as measured by laser densitometry. The maternal aunt who died at age 1 year had greater than 95% abnormal mtDNA in her lymphoblasts. The uncle with retinitis pigmentosa had 78% and 79% abnormal mtDNA in his skin fibroblasts and lymphoblasts, respectively, while an asymptomatic maternal aunt and her son had no trace of this mutation. The mother of the index case had 71% and 39% abnormal mtDNA in her skin fibroblasts and lymphoblasts, respectively, showing that the heteroplasmy can be variable, on a tissue-specific basis, within one individual. This shows that mtDNA mutations at 8993 can produce the clinical phenotype of Leigh disease in addition to the phenotype of ataxia and retinitis pigmentosa described by Holt et al.(ABSTRACT TRUNCATED AT 250 WORDS)
Article
We have developed a comprehensive database (MITOMAP) for the human mitochondrial DNA (mtDNA), the first component of the human genome to be completely sequenced [Anderson et al. (1981) Nature 290, 457–465]. MITOMAP uses the mtDNA sequence as the unifying element for bringing together information on mitochondrial genome structure and function, pathogenic mutations and their clinical characteristics, population associated variation, and gene-gene interactions. As increasingly larger regions of the human genome are sequenced and characterized, the need for integrating such information will grow. Consequently, MITOMAP not only provides a valuable reference for the mitochondrial biologist, it may also provide a model for the development of information storage and retrieval systems for other components of the human genome.
Article
The preprotein translocase of the outer membrane of mitochondria (TOM complex) facilitates the recognition, insertion, and translocation of nuclear-encoded mitochondrial preproteins. We have purified the TOM complex from Neurospora crassa and analyzed its composition and functional properties. The TOM complex contains a cation-selective high-conductance channel. Upon reconstitution into liposomes, it mediates integration of proteins into and translocation across the lipid bilayer. TOM complex particles have a diameter of about 138 A, as revealed by electron microscopy and image analysis; they contain two or three centers of stain-filled openings, which we interpret as pores with an apparent diameter of about 20 A. We conclude that the structure reported here represents the protein-conducting channel of the mitochondrial outer membrane.
Article
The Human Genome Project, from one perspective, began in 1981 with the publication1 of the complete sequence of human mitochondrial DNA (mtDNA). The Cambridge reference sequence (CRS), as it is now designated, continues to be indispensable for studies of human evolution, population genetics and mitochondrial diseases. It has been recognized for some time, however, that the CRS differs at several sites from the mtDNA sequences obtained by other investigators2, 3. These discrepancies may reflect either true errors in the original sequencing analysis or rare polymorphisms in the CRS mtDNA. A further complication is that the original mtDNA sequence was principally derived from a single individual of European descent, although it also contained some sequences from both HeLa and bovine mtDNA (1). To resolve these uncertainties, we have completely resequenced the original placental mtDNA sample.
Article
After reconstitution into liposomes, Tim23p, a mitochondrial inner membrane protein required for protein import, forms an aqueous pore that is activated by a transmembrane potential and mitochondrial targeting peptides. A report in this issue suggests that proteins are translocated into the mitochondrial matrix through a channel formed by Tim23p. These data also suggest a mechanism by which protein import can occur without disrupting the permeablility barrier of the inner membrane.
Article
Mitochondrial respiratory chain disorders comprise a group of perhaps several hundred different genetic diseases. Each individual disorder is rare, but collectively they account for substantial use of health care resources. However, few accurate data on prevalence are available due to problems such as variation in clinical presentation, age of onset, referral practices and limitations of diagnostic methodologies. With this retrospective study, we aimed to determine the minimum birth prevalence of respiratory chain disorders that have onset in childhood, that is the proportion of births that will have onset of symptoms caused by a respiratory chain defect by 16 years of age. Of the 1 706 694 children born in the three south-eastern states of Australia (New South Wales, Victoria and South Australia) between January 1st 1987 and December 31st 1996, samples from 430 were referred for investigation of a respiratory chain disorder. Definite diagnosis of a respiratory chain disorder was made in 86 cases based on defined clinical, pathological, enzyme and molecular criteria. Age at presentation ranged from 0 to 129 months (median 4 months). The total data set predicts a minimum birth prevalence for respiratory chain disorders in children of 5.0/100 000 [95% confidence interval (CI) 4.0-6.2]. A significantly higher figure of 58.6/100 000 (95% CI 34.7-92.6) was noted for Australian families of Lebanese origin. Clinical awareness of respiratory chain disorders and investigation methods have improved since 1987, but not all affected children would have been recognized as such from the more recent years. The minimum birth prevalence of 6.2/100 000 (95% CI 4.5-8.4) for the 43 patients born between 1991 and 1994 is thought to be a more accurate estimate for respiratory chain disorders presenting in childhood. Combining our data with a previous study on prevalence of adult-onset respiratory chain disorders predicts a minimum birth prevalence of 13.1/100 000 or 1/7634 for respiratory chain disorders with onset at any age.
Article
Mitochondria contain translocases for the transport of precursor proteins across their outer and inner membranes. It has been assumed that the translocases also mediate the sorting of proteins to their submitochondrial destination. Here we show that the mitochondrial outer membrane contains a separate sorting and assembly machinery (SAM) that operates after the translocase of the outer membrane (TOM). Mas37 forms a constituent of the SAM complex. The central role of the SAM complex in the sorting and assembly pathway of outer membrane proteins explains the various pleiotropic functions that have been ascribed to Mas37 (refs 4, 11-15). These results suggest that the TOM complex, which can transport all kinds of mitochondrial precursor proteins, is not sufficient for the correct integration of outer membrane proteins with a complicated topology, and instead transfers precursor proteins to the SAM complex.
Article
The human mitochondrial genome is extremely small compared with the nuclear genome, and mitochondrial genetics presents unique clinical and experimental challenges. Despite the diminutive size of the mitochondrial genome, mitochondrial DNA (mtDNA) mutations are an important cause of inherited disease. Recent years have witnessed considerable progress in understanding basic mitochondrial genetics and the relationship between inherited mutations and disease phenotypes, and in identifying acquired mtDNA mutations in both ageing and cancer. However, many challenges remain, including the prevention and treatment of these diseases. This review explores the advances that have been made and the areas in which future progress is likely.
Article
We describe here a pathway for the import of proteins into the intermembrane space (IMS) of mitochondria. Substrates of this pathway are proteins with conserved cysteine motifs, which are critical for import. After passage through the TOM channel, these proteins are covalently trapped by Mia40 via disulfide bridges. Mia40 contains cysteine residues, which are oxidized by the sulfhydryl oxidase Erv1. Depletion of Erv1 or conditions reducing Mia40 prevent protein import. We propose that Erv1 and Mia40 function as a disulfide relay system that catalyzes the import of proteins into the IMS by an oxidative folding mechanism. The existence of a disulfide exchange system in the IMS is unexpected in view of the free exchange of metabolites between IMS and cytosol via porin channels. We suggest that this process reflects the evolutionary origin of the IMS from the periplasmic space of the prokaryotic ancestors of mitochondria.
Article
To explore the prevalence of mitochondrial DNA (mtDNA) mutations in patients with type 2 diabetes mellitus in Hubei. A total of 184 cases of type 2 diabetes mellitus and 210 matched healthy controls with normal glucose tolerance were recruited for the study. The variants of mtDNA, including MIND13316 (G-->A), MIND13394 (T-->C), MTTE14693 (A-->G), MTTL1 3243 (A-->G), MTRNA1310 (C-->T) and 16189 (T-->C), were screened using PCR-restriction fragment length polymorphism (PCR-RFLP) analysis and DNA sequencing. The mutations were analyzed by mfold or tRNAscan-SE softwares. The mutation rates of 3316 (G-->A), 3394 (T-->C), 14693 (A-->G) were 3.26%, 2.72% and 2.17% respectively in type 2 diabetes group, whereas in the control group, the point mutations of 3394 (T-->C) and 14693 (A-->G) were not detected, but two subjects with 3316 (G-->A) were found (0.99%). There were significant differences in mutation rates of 3394 (T-->C) and 14693 (A-->G) between the two groups (P<0.05). In 4 of 184 cases, a T to C transition at nucleotide position 14693 was uncovered for the first time. The prevalence of 16189 variant among type 2 diabetes was significantly higher that of the controls (36.9% vs 26.6%, P=0.03). Moreover, the type 2 diabetes with 16189 variant showed higher fasting serum insulin level and higher HOMA-IR level than those without 16189 variant; stepwise multiple regression analysis showed the 16189 variant was an independent factor contributing to HOMA-IR (R(2)=0.043, P=0.037). Secondary structure prediction revealed that there were differences in 3394 T-->C vs wild-type ND1 protein and in 14693 A-->G vs wild-type tRNA(Glu) protein. The mutations of 3394 (T-->C) and 14693 (A-->G) may contribute to the genetic predisposition to type 2 diabetes; 16189 (T-->C) variant is associated with insulin resistance and risk factor of diabetes.
Article
The mitochondrial genome, contained in the subcellular mitochondrial network, encodes a small number of peptides pivotal for cellular energy production. Mitochondrial genes are highly polymorphic and cataloguing existing variation is of interest for medical scientists involved in the identification of mutations causing mitochondrial dysfunction, as well as for population genetics studies. Human Mitochondrial Genome Database (mtDB) (http://www.genpat.uu.se/mtDB) has provided a comprehensive database of complete human mitochondrial genomes since early 2000. At this time, owing to an increase in the number of published complete human mitochondrial genome sequences, it became necessary to provide a web-based database of human whole genome and complete coding region sequences. As of August 2005 this database contains 2104 sequences (1544 complete genome and 560 coding region) available to download or search for specific polymorphisms. Of special interest to medical researchers and population geneticists evaluating specific positions is a complete list of (currently 3311) mitochondrial polymorphisms among these sequences. Recent expansions in the capabilities of mtDB include a haplotype search function and the ability to identify and download sequences carrying particular variants.
Article
Mitochondrial gene content is highly variable across extant eukaryotes. The number of mitochondrial protein genes varies from 3 to 67, while tRNA gene content varies from 0 to 27. Moreover, these numbers exclude the many diverse lineages of non-respiring eukaryotes that lack a mitochondrial genome yet still contain a mitochondrion, albeit one often highly derived in ultrastructure and metabolic function, such as the hydrogenosome. Diversity in tRNA gene content primarily reflects differential usage of imported tRNAs of nuclear origin. In the case of protein genes, most of this diversity reflects differential degrees of functional gene transfer to the nucleus, with more minor contributions resulting from gene loss from the cell as a consequence of either substitution via a functional nuclear homolog or the cell's dispensation of the function of the gene product. The tempo and pattern of mitochondrial gene loss is highly episodic, both across the broad sweep of eukaryotes and within such well-studied groups as angiosperms. All animals, some plants, and certain other groups of eukaryotes are mired in profound stases in mitochondrial gene content, whereas other lineages have experienced relatively frequent gene loss. Loss and transfer to the nucleus of ribosomal protein and succinate dehydrogenase genes has been especially frequent, sporadic, and episodic during angiosperm evolution. Potential mechanisms for activation of transferred genes have been inferred, and intermediate stages in the process have been identified by comparative studies. Several hypotheses have been proposed for why mitochondrial genes are transferred to the nucleus, why mitochondria retain genomes, and why functional gene transfer is almost exclusively unidirectional.
Somatic mutations of mitochondrial DNA (mtDNA) are increasingly being recognized in many human cancers, but automated sequencing of 16.5 kb of DNA poses an onerous task. We have recently described an oligonucleotide microarray (MitoChip) for rapid and accurate sequencing of the entire mitochondrial genome (Zhou et al., J Mol Diagnostics, 8: 9_14, 2006), greatly facilitating the analysis of mtDNA mutations in cancer. In this report, we perform a comprehensive cataloging of somatic mutations in the mitochondrial genome of human pancreatic cancers using our novel array-based approach. MitoChip analysis was performed on DNA isolated from 15 histologically confirmed resection specimens of pancreatic ductal adenocarcinomas. In all cases, matched nonneoplastic pancreatic tissue was obtained as germline control for mtDNA sequence. DNA was extracted from snap-frozen cryostat-embedded specimens and hybridized to the sequencing microarray after appropriate polymerase chain reaction amplification and labeling steps. The vast majority of somatic mutational analyses of mtDNA in human cancers utilize lymphocyte DNA as germline control, without excluding the potential for organ-specific polymorphisms. Therefore, we also examined a series of 15 paired samples of DNA obtained from nonneoplastic pancreata and corresponding EBV-immortalized lymphoblastoid cell lines to determine whether lymphocyte DNA provides an accurate surrogate for the mtDNA sequence of pancreatic tissue. We sequenced 497,070 base pairs of mtDNA in the 15 matched samples of pancreatic cancer and nonneoplastic pancreatic tissue, and 467,269 base pairs (94.0%) were assigned by the automated genotyping software. All 15 pancreatic cancers demonstrated at least one somatic mtDNA mutation compared to the control germline DNA with a range of 1-14 alterations. Of the 71 somatic mutations observed in our series, 18 were nonsynonymous coding region alterations (i.e., resulting in an amino acid change), 22 were synonymous coding region alterations, and 31 involved noncoding mtDNA segments (including ribosomal and transfer RNAs). Overall, somatic mutations in the coding region most commonly involved the ND4, COI, and CYTB genes; of note, an A-G transition at nucleotide position 841 in the 12sRNA was observed in three independent samples. In the paired analysis of nonneoplastic pancreata and lymphoblastoid cell line DNA, 14 nucleotide discrepancies were observed out of 226,876 nucleotide sequences (a concordance rate of 99.99%), with 9 samples demonstrating a perfect match across all bases assigned. Our findings confirm that somatic mtDNA mutations are common in pancreatic cancers, and therefore, have the potential to be a clinically useful biomarker for early detection. Further, our studies confirm that lymphocyte DNA is an excellent, albeit not perfect, surrogate for nonneoplastic pancreatic tissues in terms of being utilized as a germline control. Finally, our report confirms the utility of a high-throughput array-based platform for mtDNA mutational analyses of human cancers.