[Show abstract][Hide abstract] ABSTRACT: Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51 ×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analysing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of non-synonymous variants in 15 MSY single-copy genes.
Molecular Biology and Evolution 12/2014; · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Tibetans do not exhibit increased hemoglobin concentration at high altitude. We describe a high-frequency missense mutation in the EGLN1 gene, which encodes prolyl hydroxylase 2 (PHD2), that contributes to this adaptive response. We show that a variant in EGLN1, c.[12C>G; 380G>C], contributes functionally to the Tibetan high-altitude phenotype. PHD2 triggers the degradation of hypoxia-inducible factors (HIFs), which mediate many physiological responses to hypoxia, including erythropoiesis. The PHD2 p.[Asp4Glu; Cys127Ser] variant exhibits a lower Km value for oxygen, suggesting that it promotes increased HIF degradation under hypoxic conditions. Whereas hypoxia stimulates the proliferation of wild-type erythroid progenitors, the proliferation of progenitors with the c.[12C>G; 380G>C] mutation in EGLN1 is significantly impaired under hypoxic culture conditions. We show that the c.[12C>G; 380G>C] mutation originated ~8,000 years ago on the same haplotype previously associated with adaptation to high altitude. The c.[12C>G; 380G>C] mutation abrogates hypoxia-induced and HIF-mediated augmentation of erythropoiesis, which provides a molecular mechanism for the observed protection of Tibetans from polycythemia at high altitude.
[Show abstract][Hide abstract] ABSTRACT: genome sequence of the common marmoset (Callithrix jacchus). The 2.26-Gb genome of a female marmoset was assembled using Sanger read data (6×) and a whole-genome shotgun strategy. A first analysis has permitted comparison with the genomes of apes and Old World monkeys and the identification of specific features that might contribute to the unique biology of this diminutive primate, including genetic changes that may influence body size, frequent twinning and chimerism. We observed positive selection in growth hormone/insulin-like growth factor genes (growth pathways), respiratory complex I genes (metabolic pathways), and genes encoding immunobiological factors and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibited evidence of rapid sequence evolution. This genome sequence for a New World monkey enables increased power for comparative analyses among available primate genomes and facilitates biomedical research application. Apparently unique among mammals, marmosets routinely produce dizygotic twins that exchange hematopoietic stem cells in utero, a process that leads to lifelong chimerism 1,2 . As a result of this placental exchange, the blood of adult marmosets normally contains a substan-tial proportion of leukocytes that are not derived from the inherited germ line of the sampled individual but rather were acquired in utero from its co-twin. In addition, marmosets (subfamily Callitrichinae) and other callitrichines are small in body size as a result of natural selection for miniaturization. This reduced body size might be related to gestation of multiples and to the marmoset social system, also unique among primates 3–5 . These animals use a cooperative breeding system in which generally only one pair of adults in any social group constitutes active breeders. Other adult group members participate in the care and feeding of infants but do not reproduce. This alloparen-tal care is rare among anthropoid primates, with the clear exception of humans. The evolutionary appearance of major new groups (for example, superfamilies) of primates has generally been characterized by progressive increases in body size and lifespan, reductions in overall reproductive rate and increases in maternal investment in the rearing of individual offspring. In contrast, marmosets and their callitrichine relatives have undergone a secondary reduction in body size from a larger platyrrhine ancestor 6 and have evolved a reproductive and social system in which the dominant male and female monopolize breeding but benefit from alloparental care provided to their offspring by multiple group members. Here we report the whole-genome sequencing and assembly of the genome of the marmoset, the first New World monkey to be sequenced (Supplementary Note). Our results include comparisons of this platyrrhine genome with the available catarrhine (human, other hominoid and Old World monkey) genomes, identifying pre-viously undetected aspects of catarrhine genome evolution, including positive selection in specific genes and significant conservation of previously unidentified segments of noncoding DNA. The mar-moset genome displays a number of unique features, such as rapid changes in microRNAs (miRNAs) expressed in placenta and nonsyn-onymous changes in protein-coding genes involved in reproductive physiology, which might be related to the frequent twinning and/or chimerism observed. WFIKKN1, which encodes a multidomain protease inhibitor that binds growth factors and bone morphogenetic proteins (BMPs) 7 , has nonsynonymous changes found exclusively in common marmosets and all other tested callitrichine species that twin. In the one calli-trichine species that does not produce twins (Callimico goeldi), one change has reverted to the ancestral sequence found in non-twinning primates. GDF9 and BMP15, genes associated with twinning in sheep and humans, also exhibit nonsynonymous changes in callitrichines. We detected positive selection in five growth hormone/insulin-like growth factor (GH-IGF) axis genes with potential roles in diminutive body size and in eight genes in the nuclear-encoded subunits of res-piratory complex I that affect metabolic rates and body temperature, adaptations associated with the challenges of a small body size. Marmosets exhibit a number of unanticipated differences in miRNAs and their targets, including 321 newly identified miRNA loci. Two large clusters of miRNAs expressed in placenta show substantial sequence divergence in comparison to other primates and are potentially involved in marmoset reproductive traits. We identified considerable evolutionary change in the protein-coding genes targeted by the highly conserved let-7 family and notable coevolution of the rapidly evolving chromosome 22 miRNA cluster and the targets of its encoded miRNAs. The marmoset genome provides unprecedented statistical power to identify sequence constraint among primates, facilitating the
[Show abstract][Hide abstract] ABSTRACT: High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.
[Show abstract][Hide abstract] ABSTRACT: Phevor integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles. Phevor works by combining knowledge resident in multiple biomedical ontologies with the outputs of variant-prioritization tools. It does so by using an algorithm that propagates information across and between ontologies. This process enables Phevor to accurately reprioritize potentially damaging alleles identified by variant-prioritization tools in light of gene function, disease, and phenotype knowledge. Phevor is especially useful for single-exome and family-trio-based diagnostic analyses, the most commonly occurring clinical scenarios and ones for which existing personal genome diagnostic tools are most inaccurate and underpowered. Here, we present a series of benchmark analyses illustrating Phevor's performance characteristics. Also presented are three recent Utah Genome Project case studies in which Phevor was used to identify disease-causing alleles. Collectively, these results show that Phevor improves diagnostic accuracy not only for individuals presenting with established disease phenotypes but also for those with previously undescribed and atypical disease presentations. Importantly, Phevor is not limited to known diseases or known disease-causing alleles. As we demonstrate, Phevor can also use latent information in ontologies to discover genes and disease-causing alleles not previously associated with disease.
The American Journal of Human Genetics 04/2014; 94(4):599-610. · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Objective
We hypothesized that genetic variation affects responsiveness to 17-alpha hydroxyprogesterone caproate (17P) for recurrent preterm birth prevention.
Women of European ancestry with ≥1 spontaneous singleton preterm birth at <34 weeks' gestation who received 17P were recruited prospectively and classified as a 17P responder or nonresponder by the difference in delivery gestational age between 17P-treated and -untreated pregnancies. Samples underwent whole exome sequencing. Coding variants were compared between responders and nonresponders with the use of the Variant Annotation, Analysis, and Search Tool (VAAST), which is a probabilistic search tool for the identification of disease-causing variants, and were compared with a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway candidate gene list. Genes with the highest VAAST scores were then classified by the online Protein ANalysis THrough Evolutionary Relationships (PANTHER) system into known gene ontology molecular functions and biologic processes. Gene distributions within these classifications were compared with an online reference population to identify over- and under- represented gene sets.
Fifty women (9 nonresponders) were included. Responders delivered 9.2 weeks longer with 17P vs 1.3 weeks' gestation for nonresponders (P < .001). A genome-wide search for genetic differences implicated the NOS1 gene to be the most likely associated gene from among genes on the KEGG candidate gene list (P < .00095). PANTHER analysis revealed several over-represented gene ontology categories that included cell adhesion, cell communication, signal transduction, nitric oxide signal transduction, and receptor activity (all with significant Bonferroni-corrected probability values).
We identified sets of over-represented genes in key processes among responders to 17P, which is the first step in the application of pharmacogenomics to preterm birth prevention.
American journal of obstetrics and gynecology 01/2014; · 3.28 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The genetics involved in Ewing sarcoma susceptibility and prognosis are poorly understood. EWS/FLI and related EWS/ETS chimeras upregulate numerous gene targets via promoter-based GGAA-microsatellite response elements. These microsatellites are highly polymorphic in humans, and preliminary evidence suggests EWS/FLI-mediated gene expression is highly dependent on the number of GGAA motifs within the microsatellite.
PLoS ONE 01/2014; 9(8):e104378. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1(st) through 6(th) degree relationships, and 55% of 9(th) through 11(th) degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1(st) through 9(th) degree relationships from whole-genome sequence data.
[Show abstract][Hide abstract] ABSTRACT: Recent studies have used a variety of analytical methods to identify genes targeted by selection in high-altitude populations located throughout the Tibetan Plateau. Despite differences in analytic strategies and sample location, hypoxia-related genes, including EPAS1 and EGLN1, were identified in multiple studies. By applying the same analytic methods to genome-wide SNP information used in our previous study of a Tibetan population (n = 31) from the township of Maduo, located in the northeastern corner of the Qinghai-Tibetan Plateau (4200 m), we have identified common targets of natural selection in a second geographically and linguistically distinct Tibetan population (n = 46) in the Tuo Tuo River township (4500 m). Our analyses provide evidence for natural selection based on iHS and XP-EHH signals in both populations at the p<0.02 significance level for EPAS1, EGLN1, HMOX2, and CYP17A1 and for PKLR, HFE, and HBB and HBG2, which have also been reported in other studies. We highlight differences (i.e., stratification and admixture) in the two distinct Tibetan groups examined here and report selection candidate genes common to both groups. These findings should be considered in the prioritization of selection candidate genes in future genetic studies in Tibet.
PLoS ONE 01/2014; 9(3):e88252. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Although more than 100 non-HLA variants have been tested for associations with juvenile idiopathic arthritis (JIA) in candidate gene studies, only a few have been replicated. We sought to replicate reported associations of single nucleotide polymorphisms (SNPs) in the PTPN22, TNFA and MIF genes in a well-characterized cohort of children with JIA.
We genotyped and analyzed 4 SNPs in 3 genes: PTPN22 C1858T (rs2476601), TNFA G-308A, G-238A (rs1800629, rs361525) and MIF G-173C (rs755622) in 647 JIA cases and 751 healthy controls. We tested for association between each variant and JIA as well as JIA subtypes. We adjusted for multiple testing using permutation procedures. We also performed a meta-analysis that combined our results with published results from JIA association studies.
While the PTPN22 variant showed only modest association with JIA (OR = 1.29, p = 0.0309), it demonstrated a stronger association with the RF-positive polyarticular JIA subtype (OR = 2.12, p = 0.0041). The MIF variant was not associated with the JIA as a whole or with any subtype. The TNFA-238A variant was associated with JIA as a whole (OR 0.66, p = 0.0265), and demonstrated a stronger association with oligoarticular JIA (OR 0.33, p = 0.0006) that was significant after correction for multiple testing. TNFA-308A was not associated with JIA, but was nominally associated with systemic JIA (OR = 0.33, p = 0.0089) and enthesitis-related JIA (OR = 0.40, p = 0.0144). Meta-analyses confirmed significant associations between JIA and PTPN22 (OR 1.44, p <0.0001) and TNFA-238A (OR 0.69, p < 0.0086) variants. Subtype meta-analyses of the PTPN22 variant revealed associations between RF-positive, RF-negative, and oligoarticular JIA, that remained significant after multiple hypothesis correction (p < 0.0005, p = 0.0007, and p < 0.0005, respectively).
We have confirmed associations between JIA and PTPN22 and TNFA G-308A. By performing subtype analyses, we discovered a statistically-significant association between the TNFA-238A variant and oligoarticular JIA. Our meta-analyses confirm the associations between TNFA-238A and JIA, and show that PTPN22 C1858T is associated with JIA as well as with RF-positive, RF-negative and oligoarticular JIA.
[Show abstract][Hide abstract] ABSTRACT: Common variable immunodeficiency (CVID) is a heterogeneous disorder characterized by antibody deficiency, poor humoral response to antigens, and recurrent infections. To investigate the molecular cause of CVID, we carried out exome sequence analysis of a family diagnosed with CVID and identified a heterozygous frameshift mutation, c.2564delA (p.Lys855Serfs(∗)7), in NFKB2 affecting the C terminus of NF-κB2 (also known as p100/p52 or p100/p49). Subsequent screening of NFKB2 in 33 unrelated CVID-affected individuals uncovered a second heterozygous nonsense mutation, c.2557C>T (p.Arg853(∗)), in one simplex case. Affected individuals in both families presented with an unusual combination of childhood-onset hypogammaglobulinemia with recurrent infections, autoimmune features, and adrenal insufficiency. NF-κB2 is the principal protein involved in the noncanonical NF-κB pathway, is evolutionarily conserved, and functions in peripheral lymphoid organ development, B cell development, and antibody production. In addition, Nfkb2 mouse models demonstrate a CVID-like phenotype with hypogammaglobulinemia and poor humoral response to antigens. Immunoblot analysis and immunofluorescence microscopy of transformed B cells from affected individuals show that the NFKB2 mutations affect phosphorylation and proteasomal processing of p100 and, ultimately, p52 nuclear translocation. These findings describe germline mutations in NFKB2 and establish the noncanonical NF-κB signaling pathway as a genetic etiology for this primary immunodeficiency syndrome.
The American Journal of Human Genetics 10/2013; · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Deedu (DU) Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR), neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1), as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG). Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1) shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Because of the role of inflammation in preterm birth (PTB), polymorphisms in and near the interleukin-6 gene (IL6) have been association study targets. Several previous studies have assessed the association between PTB and a single nucleotide polymorphism (SNP), rs1800795, located in the IL6 gene promoter region. Their results have been inconsistent and SNP frequencies have varied strikingly among different populations. We therefore conducted a meta-analysis with subgroup analysis by population strata to: (1) reduce the confounding effect of population structure, (2) increase sample size and statistical power, and (3) elucidate the association between rs1800975 and PTB. RESULTS: We reviewed all published papers for PTB phenotype and SNP rs1800795 genotype. Maternal genotype and fetal genotype were analyzed separately and the analyses were stratified by population. The PTB phenotype was defined as gestational age (GA) < 37 weeks, but results from earlier GA were selected when available. All studies were compared by genotype (CC versus CG+GG), based on functional studies.For the maternal genotype analysis, 1,165 PTBs and 3,830 term controls were evaluated. Populations were stratified into women of European descent (for whom the most data were available) and women of heterogeneous origin or admixed populations. All ancestry was self-reported. Women of European descent had a summary odds ratio (OR) of 0.68, (95% confidence interval (CI) 0.51 -- 0.91), indicating that the CC genotype is protective against PTB. The result for non-European women was not statistically significant (OR 1.01, 95% CI 0.59 - 1.75). For the fetal genotype analysis, four studies were included; there was no significant association with PTB (OR 0.98, 95% CI 0.72 - 1.33). Sensitivity analysis showed that preterm premature rupture of membrane (PPROM) may be a confounding factor contributing to phenotype heterogeneity. CONCLUSIONS: IL6 SNP rs1800795 genotype CC is protective against PTB in women of European descent. It is not significant in other heterogeneous or admixed populations, or in fetal genotype analysis.Population structure is an important confounding factor that should be controlled for in studies of PTB.
[Show abstract][Hide abstract] ABSTRACT: Alu retrotransposons are the most numerous and active mobile elements in humans, causing genetic disease and creating genomic diversity. Mobile element scanning (ME-Scan) enables comprehensive and affordable identification of mobile element insertions (MEI) using targeted high-throughput sequencing of multiplexed MEI junction libraries. In a single experiment, ME-Scan identifies nearly all AluYb8 and AluYb9 elements, with high sensitivity for both rare and common insertions, in 169 individuals of diverse ancestry. ME-Scan detects heterozygous insertions in single individuals with 91% sensitivity. Insertion presence or absence states determined by ME-Scan are 95% concordant with those determined by locus-specific PCR assays. By sampling diverse populations from Africa, South Asia, and Europe, we are able to identify 5,799 Alu insertions, including 2,524 novel ones, some of which occur in exons. Sub-Saharan populations and a Pygmy group in particular carry numerous intermediate-frequency Alu insertions that are absent in non-African groups. There is a significant dearth of exon-interrupting insertions among common Alu polymorphisms, but the density of singleton Alu insertions is constant across exonic and non-exonic regions. In one case, a validated novel singleton Alu interrupts a protein-coding exon of FAM187B. This implies that exonic Alu insertions are generally deleterious and thus eliminated by natural selection, but not so quickly that they cannot be observed as extremely rare variants.
[Show abstract][Hide abstract] ABSTRACT: Mobile elements comprise more than half of the human genome, but until recently their large-scale detection was time consuming and challenging. With the development of new high-throughput sequencing (HTS) technologies, the complete spectrum of mobile element variation in humans can now be identified and analyzed. Thousands of new mobile element insertions (MEIs) have been discovered, yielding new insights into mobile element biology, evolution, and genomic variation. Here, we review several high-throughput methods, with an emphasis on techniques that specifically target MEIs in humans. We highlight recent applications of these methods in evolutionary studies and in the analysis of somatic alterations in human normal and tumor tissues.
[Show abstract][Hide abstract] ABSTRACT: Alternating hemiplegia of childhood (AHC) is a rare, severe neurodevelopmental syndrome characterized by recurrent hemiplegic episodes and distinct neurological manifestations. AHC is usually a sporadic disorder and has unknown etiology. We used exome sequencing of seven patients with AHC and their unaffected parents to identify de novo nonsynonymous mutations in ATP1A3 in all seven individuals. In a subsequent sequence analysis of ATP1A3 in 98 other patients with AHC, we found that ATP1A3 mutations were likely to be responsible for at least 74% of the cases; we also identified one inherited mutation in a case of familial AHC. Notably, most AHC cases are caused by one of seven recurrent ATP1A3 mutations, one of which was observed in 36 patients. Unlike ATP1A3 mutations that cause rapid-onset dystonia-parkinsonism, AHC-causing mutations in this gene caused consistent reductions in ATPase activity without affecting the level of protein expression. This work identifies de novo ATP1A3 mutations as the primary cause of AHC and offers insight into disease pathophysiology by expanding the spectrum of phenotypes associated with mutations in ATP1A3.