[Show abstract][Hide abstract] ABSTRACT: Indian demographic history includes special features such as founder effects, interpopulation segregation, complex social structure with a caste system and elevated frequency of consanguineous marriages. It also presents a higher frequency for some rare mendelian disorders and in the last two decades increased prevalence of some complex disorders. Despite the fact that India represents about one-sixth of the human population, deep genetic studies from this terrain have been scarce. In this study, we analyzed high-density genotyping and whole-exome sequencing data of a North and a South Indian population. Indian populations show higher differentiation levels than those reported between populations of other continents. In this work, we have analyzed its consequences, by specifically assessing the transferability of genetic markers from or to Indian populations. We show that there is limited genetic marker portability from available genetic resources such as HapMap or the 1,000 Genomes Project to Indian populations, which also present an excess of private rare variants. Conversely, tagSNPs show a high level of portability between the two Indian populations, in contrast to the common belief that North and South Indian populations are genetically very different. By estimating kinship from mates and consanguinity in our data from trios, we also describe different patterns of assortative mating and inbreeding in the two populations, in agreement with distinct mating preferences and social structures. In addition, this analysis has allowed us to describe genomic regions under recent adaptive selection, indicating differential adaptive histories for North and South Indian populations. Our findings highlight the importance of considering demography for design and analysis of genetic studies, as well as the need for extending human genetic variation catalogs to new populations and particularly to those with particular demographic histories.
[Show abstract][Hide abstract] ABSTRACT: Whole-exome or gene targeted resequencing in hundreds to thousands of individuals has shown that the majority of genetic variants are at low frequency in human populations. Rare variants are enriched for functional mutations and are expected to explain an important fraction of the genetic etiology of human disease, therefore having a potential medical interest. In this work, we analyze the whole-exome sequences of French-Canadian individuals, a founder population with a unique demographic history that includes an original population bottleneck less than 20 generations ago, followed by a demographic explosion, and the whole exomes of French individuals sampled from France. We show that in less than 20 generations of genetic isolation from the French population, the genetic pool of French-Canadians shows reduced levels of diversity, higher homozygosity, and an excess of rare variants with low variant sharing with Europeans. Furthermore, the French-Canadian population contains a larger proportion of putatively damaging functional variants, which could partially explain the increased incidence of genetic disease in the province. Our results highlight the impact of population demography on genetic fitness and the contribution of rare variants to the human genetic variation landscape, emphasizing the need for deep cataloguing of genetic variants by resequencing worldwide human populations in order to truly assess disease risk.
[Show abstract][Hide abstract] ABSTRACT: Regions of the genome that are under evolutionary constraint across multiple species have previously been used to identify functional sequences in the human genome. Furthermore, it is known that there is an inverse relationship between evolutionary constraint and the allele frequency of a mutation segregating in human populations, implying a direct relationship between interspecies divergence and fitness in humans. Here we utilise this relationship to test differences in the accumulation of putatively deleterious mutations both between populations and on the individual level.
Using whole genome and exome sequencing data from Phase 1 of the 1000 Genome Project for 1,092 individuals from 14 worldwide populations we show that minor allele frequency (MAF) varies as a function of constraint around both coding regions and non-coding sites genome-wide, implying that negative, rather than positive, selection primarily drives the distribution of alleles among individuals via background selection. We find a strong relationship between effective population size and the depth of depression in MAF around the most conserved genes, suggesting that populations with smaller effective size are carrying more deleterious mutations, which also translates into higher genetic load when considering the number of putatively deleterious alleles segregating within each population. Finally, given the extreme richness of the data, we are now able to classify individual genomes by the accumulation of mutations at functional sites using high coverage 1000 Genomes data. Using this approach we detect differences between 'healthy' individuals within populations for the distributions of putatively deleterious rare alleles they are carrying.
These findings demonstrate the extent of background selection in the human genome and highlight the role of population history in shaping patterns of diversity between human individuals. Furthermore, we provide a framework for the utility of personal genomic data for the study of genetic fitness and diseases.
[Show abstract][Hide abstract] ABSTRACT: Most great ape genetic variation remains uncharacterized; however, its study is critical for understanding population history, recombination, selection and susceptibility to disease. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million single nucleotide polymorphisms. Our analysis provides support for genetically distinct populations within each species, signals of gene flow, and the split of common chimpanzees into two distinct groups: Nigeria-Cameroon/western and central/eastern populations. We find extensive inbreeding in almost all wild populations, with eastern gorillas being the most extreme. Inferred effective population sizes have varied radically over time in different lineages and this appears to have a profound effect on the genetic diversity at, or close to, genes in almost all species. We discover and assign 1,982 loss-of-function variants throughout the human and great ape lineages, determining that the rate of gene loss has not been different in the human branch compared to other internal branches in the great ape phylogeny. This comprehensive catalogue of great ape genome diversity provides a framework for understanding evolution and a resource for more effective management of wild and captive great ape populations.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Congenital multiple intestinal atresia (MIA) is a severe, fatal neonatal disorder, involving the occurrence of obstructions in the small and large intestines ultimately leading to organ failure. Surgical interventions are palliative but do not provide long-term survival. Severe immunodeficiency may be associated with the phenotype. A genetic basis for MIA is likely. We had previously ascertained a cohort of patients of French-Canadian origin, most of whom were deceased as infants or in utero. The goal of the study was to identify the molecular basis for the disease in the patients of this cohort. METHODS: We performed whole exome sequencing on samples from five patients of four families. Validation of mutations and familial segregation was performed using standard Sanger sequencing in these and three additional families with deceased cases. Exon skipping was assessed by reverse transcription-PCR and Sanger sequencing. RESULTS: Five patients from four different families were each homozygous for a four base intronic deletion in the gene TTC7A, immediately adjacent to a consensus GT splice donor site. The deletion was demonstrated to have deleterious effects on splicing causing the skipping of the attendant upstream coding exon, thereby leading to a predicted severe protein truncation. Parents were heterozygous carriers of the deletion in these families and in two additional families segregating affected cases. In a seventh family, an affected case was compound heterozygous for the same 4bp deletion and a second missense mutation p.L823P, also predicted as pathogenic. No other sequenced genes possessed deleterious variants explanatory for all patients in the cohort. Neither mutation was seen in a large set of control chromosomes. CONCLUSIONS: Based on our genetic results, TTC7A is the likely causal gene for MIA.
Journal of Medical Genetics 02/2013; · 5.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background Congenital multiple intestinal atresia (MIA) is a severe, fatal neonatal disorder, involving the occurrence of obstructions in the small and large intestines ultimately leading to organ failure. Surgical interventions are palliative but do not provide long-term survival. Severe immunodeficiency may be associated with the phenotype. A genetic basis for MIA is likely. We had previously ascertained a cohort of patients of French-Canadian origin, most of whom were deceased as infants or in utero. The goal of the study was to identify the molecular basis for the disease in the patients of this cohort. Methods We performed whole exome sequencing on samples from five patients of four families. Validation of mutations and familial segregation was performed using standard Sanger sequencing in these and three additional families with deceased cases. Exon skipping was assessed by reverse transcription-PCR and Sanger sequencing. Results Five patients from four different families were each homozygous for a four base intronic deletion in the gene TTC7A, immediately adjacent to a consensus GT splice donor site. The deletion was demonstrated to have deleterious effects on splicing causing the skipping of the attendant upstream coding exon, thereby leading to a predicted severe protein truncation. Parents were heterozygous carriers of the deletion in these families and in two additional families segregating affected cases. In a seventh family, an affected case was compound heterozygous for the same 4bp deletion and a second missense mutation p.L823P, also predicted as pathogenic. No other sequenced genes possessed deleterious variants explanatory for all patients in the cohort. Neither mutation was seen in a large set of control chromosomes. Conclusions Based on our genetic results, TTC7A is the likely causal gene for MIA.
Journal of Medical Genetics 02/2013; · 5.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: One of the most rapidly evolving genes in humans, PRDM9, is a key determinant of the distribution of meiotic recombination events. Mutations in this meiotic-specific gene have previously been associated with male infertility in humans and recent studies suggest that PRDM9 may be involved in pathological genomic rearrangements. By studying genomes from families with children affected by B-cell precursor acute lymphoblastic leukemia (B-ALL), we characterized meiotic recombination patterns within a family with two siblings having hyperdiploid childhood ALL and observed unusual localization of maternal recombination events. The mother of the family carries a rare PRDM9 allele, potentially explaining the unusual patterns found. From exomes sequenced in 44 additional parents of children affected with B-ALL, we discovered a substantial and significant excess of rare allelic forms of PRDM9. The rare PRDM9 alleles are transmitted to the affected children in half the cases, nonetheless there remains a significant excess of rare alleles among patients relative to controls. We successfully replicated this latter observation in an independent cohort of 50 children with B-ALL, where we found an excess of rare PRDM9 alleles in aneuploid and infant B-ALL patients. PRDM9 variability in humans is thought to influence genomic instability, and these data support a potential role for PRDM9 variation in risk of acquiring aneuploidies or genomic rearrangements associated with childhood leukemogenesis.
[Show abstract][Hide abstract] ABSTRACT: The advent of next generation sequencing technologies has opened new possibilities in the analysis of human disease. In this review we present the main next-generation sequencing technologies, with their major contributions and possible applications to the study of the genetic etiology of complex diseases.
Journal of neuroimmunology 01/2012; 248(1-2):10-22. · 2.84 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders. Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia. Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts. In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone. Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls. This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies. Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.
[Show abstract][Hide abstract] ABSTRACT: Recombination varies greatly among species, as illustrated by the poor conservation of the recombination landscape between humans and chimpanzees. Thus, shorter evolutionary time frames are needed to understand the evolution of recombination. Here, we analyze its recent evolution in humans. We calculated the recombination rates between adjacent pairs of 636,933 common single-nucleotide polymorphism loci in 28 worldwide human populations and analyzed them in relation to genetic distances between populations. We found a strong and highly significant correlation between similarity in the recombination rates corrected for effective population size and genetic differentiation between populations. This correlation is observed at the genome-wide level, but also for each chromosome and when genetic distances and recombination similarities are calculated independently from different parts of the genome. Moreover, and more relevant, this relationship is robustly maintained when considering presence/absence of recombination hotspots. Simulations show that this correlation cannot be explained by biases in the inference of recombination rates caused by haplotype sharing among similar populations. This result indicates a rapid pace of evolution of recombination, within the time span of differentiation of modern humans.
PLoS ONE 01/2011; 6(3):e17913. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Searching for associations between genetic variants and complex diseases has been a very active area of research for over two decades. More than 51,000 potential associations have been studied and published, a figure that keeps increasing, especially with the recent explosion of array-based Genome-Wide Association Studies. Even if the number of true associations described so far is high, many of the putative risk variants detected so far have failed to be consistently replicated and are widely considered false positives. Here, we focus on the world-wide patterns of replicability of published association studies.
We report three main findings. First, contrary to previous results, genes associated to complex diseases present lower degrees of genetic differentiation among human populations than average genome-wide levels. Second, also contrary to previous results, the differences in replicability of disease associated-loci between Europeans and East Asians are highly correlated with genetic differentiation between these populations. Finally, highly replicated genes present increased levels of high-frequency derived alleles in European and Asian populations when compared to African populations.
Our findings highlight the heterogeneous nature of the genetic etiology of complex disease, confirm the importance of the recent evolutionary history of our species in current patterns of disease susceptibility and could cast doubts on the status as false positives of some associations that have failed to replicate across populations.
[Show abstract][Hide abstract] ABSTRACT: A large proportion of the death toll associated with malaria is a consequence of malaria infection during pregnancy, causing up to 200,000 infant deaths annually. We previously published the first extensive genetic association study of placental malaria infection, and here we extend this analysis considerably, investigating genetic variation in over 9,000 SNPs in more than 1,000 genes involved in immunity and inflammation for their involvement in susceptibility to placental malaria infection. We applied a new approach incorporating results from both single gene analysis as well as gene-gene interactions on a protein-protein interaction network. We found suggestive associations of variants in the gene KLRK1 in the single gene analysis, as well as evidence for associations of multiple members of the IL-7/IL-7R signalling cascade in the combined analysis. To our knowledge, this is the first large-scale genetic study on placental malaria infection to date, opening the door for follow-up studies trying to elucidate the genetic basis of this neglected form of malaria.
PLoS ONE 01/2011; 6(9):e24996. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.
[Show abstract][Hide abstract] ABSTRACT: Pathogens have represented an important selective force during the adaptation of modern human populations to changing social and other environmental conditions. The evolution of the immune system has therefore been influenced by these pressures. Genomic scans have revealed that immune system is one of the functions enriched with genes under adaptive selection.
Here, we describe how the innate immune system has responded to these challenges, through the analysis of resequencing data for 132 innate immunity genes in two human populations. Results are interpreted in the context of the functional and interaction networks defined by these genes. Nucleotide diversity is lower in the adaptors and modulators functional classes, and is negatively correlated with the centrality of the proteins within the interaction network. We also produced a list of candidate genes under positive or balancing selection in each population detected by neutrality tests and showed that some functional classes are preferential targets for selection.
We found evidence that the role of each gene in the network conditions the capacity to evolve or their evolvability: genes at the core of the network are more constrained, while adaptation mostly occurred at particular positions at the network edges. Interestingly, the functional classes containing most of the genes with signatures of balancing selection are involved in autoinflammatory and autoimmune diseases, suggesting a counterbalance between the beneficial and deleterious effects of the immune response.
[Show abstract][Hide abstract] ABSTRACT: The role of de novo mutations (DNMs) in common diseases remains largely unknown. Nonetheless, the rate of de novo deleterious mutations and the strength of selection against de novo mutations are critical to understanding the genetic architecture of a disease. Discovery of high-impact DNMs requires substantial high-resolution interrogation of partial or complete genomes of families via resequencing. We hypothesized that deleterious DNMs may play a role in cases of autism spectrum disorders (ASD) and schizophrenia (SCZ), two etiologically heterogeneous disorders with significantly reduced reproductive fitness. We present a direct measure of the de novo mutation rate (μ) and selective constraints from DNMs estimated from a deep resequencing data set generated from a large cohort of ASD and SCZ cases (n = 285) and population control individuals (n = 285) with available parental DNA. A survey of ∼430 Mb of DNA from 401 synapse-expressed genes across all cases and 25 Mb of DNA in controls found 28 candidate DNMs, 13 of which were cell line artifacts. Our calculated direct neutral mutation rate (1.36 × 10(-8)) is similar to previous indirect estimates, but we observed a significant excess of potentially deleterious DNMs in ASD and SCZ individuals. Our results emphasize the importance of DNMs as genetic mechanisms in ASD and SCZ and the limitations of using DNA from archived cell lines to identify functional variants.
The American Journal of Human Genetics 09/2010; 87(3):316-24. · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Previous studies of the association between the mannose-binding lectin pathway deficiencies and invasive pneumococcal disease are inconclusive. Invasiveness of Streptococcus pneumoniae is dependent on serotype. We aimed to determine the association between invasive pneumococcal disease and MBL2 and MASP2 genetic variants, regarding serotype distribution. A hospital-based case-control study was conducted in children admitted to hospital in rural Mozambique in June 2002-November 2003. The study included children admitted to hospital with invasive pneumococcal disease, in whom S. pneumoniae was isolated from blood and subsequently serotyped. Sequence-based typing analysis of amplicons covering the polymorphic regions of MASP2 (exon 3) and MBL2 (promoter and exon 1) was performed. An overall high frequency of MBL2 genotypes associated with low serum levels of MBL (43%) was found. Carriers of MBL-deficient genotypes were associated with invasive pneumococcal disease produced by low-invasive serotypes (OR 5.55, 95% CI 1.4-21.9; p = 0.01). Our data suggest that susceptibility to pneumococcal disease among MBL-deficient patients may be influenced by serotype invasiveness. Type-specific capsular serotype of S. pneumoniae would need to be taken into account in further genetic association studies of invasive pneumococcal disease.
European Respiratory Journal 02/2010; 36(4):856-63. · 6.36 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The mannose-binding lectin (MBL) pathway of complement system is activated when carbohydrate-bound MBL forms complexes with different serine proteases (MASP-1, MASP-2 and MASP-3), among which MASP-2 has a predominant functional role. Polymorphisms impairing the quantity and/or the functional activity of proteins encoded by the MBL2 and MASP2 genes have been reported in all human populations showing different allelic frequency and distribution. This likely reflects the existence of environmental influences on MBL2 and MASP2 genetic evolution. Herewith, we conducted a study in a children population from Mozambique to analyse the genetic diversity of sequences corresponding to the promoter and collagen-like region (exon 1) of MBL2 and to the CUB-1 and epidermal growth factor domain (exon 3) of MASP2, which are critical regions for the formation of functional MBL/MASP-2 complexes. Our results show a high prevalence of MBL-intermediate/low genotypes (43.5%); the description of new alleles and a high level of sequence polymorphism at both MBL2 and MASP2, with no statistical evidence for positive or balancing selection. Furthermore, Biacore analyses performed to explore the functional relevance of the MASP2 variants found [T73M (2.9%), R84Q (12.7%) and P111L (25.4%)] were compared with those of two previously reported variants (R103C and D105G). None of the analysed MASP2 variants, with the exception of D105G, interfered with interactions with either MBL or ficolins (H and L).