[Show abstract][Hide abstract] ABSTRACT: Background
Archaeology reports millenary cultural contacts between Peruvian Coast-Andes and the Amazon Yunga, a rainforest transitional region between Andes and Lower Amazonia. To clarify the relationships between cultural and biological evolution of these populations, in particular between Amazon Yungas and Andeans, we used DNA-sequence data, a model-based Bayesian approach and several statistical validations to infer a set of demographic parameters.ResultsWe found that the genetic diversity of the Shimaa (an Amazon Yunga population) is a subset of that of Quechuas from Central-Andes. Using the Isolation-with-Migration population genetics model, we inferred that the Shimaa ancestors were a small subgroup that split less than 5300 years ago (after the development of complex societies) from an ancestral Andean population. After the split, the most plausible scenario compatible with our results is that the ancestors of Shimaas moved toward the Peruvian Amazon Yunga and incorporated the culture and language of some of their neighbors, but not a substantial amount of their genes. We validated our results using Approximate Bayesian Computations, posterior predictive tests and the analysis of pseudo-observed datasets.Conclusions
We presented a case study in which model-based Bayesian approaches, combined with necessary statistical validations, shed light into the prehistoric demographic relationship between Andeans and a population from the Amazon Yunga. Our results offer a testable model for the peopling of this large transitional environmental region between the Andes and the Lower Amazonia. However, studies on larger samples and involving more populations of these regions are necessary to confirm if the predominant Andean biological origin of the Shimaas is the rule, and not the exception.
[Show abstract][Hide abstract] ABSTRACT: We evaluated the efficacy of noninvasive fetal Rhesus D (RHD) genotyping from maternal plasma in a highly admixed population. Fifty-five blood samples from RhD-negative pregnant women from Brazil were processed for extraction of cell-free plasma DNA. Real-time PCR was performed to amplify segments of exons 5 and 7 from the RHD gene, as well as for detection of the SRY gene to confirm the presence of fetal DNA. Fetal genotyping results were compared with the RhD phenotype determined from newborn cord blood samples obtained at birth. Thirty-two samples were RHD-positive, 18 were RHD-negative and 5 were inconclusive due to amplification of only one RHD exon. In 43 samples, the fetal RHD genotype was compared to the neonatal RhD phenotype, and only one result was discordant, due to false-negative serology. There was one false SRY genotyping negative result. We conclude that noninvasive fetal RHD genotyping from maternal blood provides accurate results and suggests its viability as a clinical tool for the management of RhD-negative pregnant women in an admixed population.
[Show abstract][Hide abstract] ABSTRACT: The phagocyte NADPH oxidase catalyzes the reduction of O2 to reactive oxygen species with microbicidal activity. It is composed of two membrane-spanning subunits, gp91-phox and p22-phox (encoded by CYBB and CYBA, respectively), and three cytoplasmic subunits, p40-phox, p47-phox and p67-phox (encoded by NCF4, NCF1 and NCF2, respectively). Mutations in any of these genes can result in chronic granulomatous disease, a primary immunodeficiency characterized by recurrent infections. Using evolutionary mapping, we determined that episodes of adaptive natural selection have shaped the extracellular portion of gp91-phox during the evolution of mammals, which suggests that this region may have a function in host-pathogen interactions. Based on a resequencing analysis of ∼35 kb of CYBB, CYBA, NCF2 and NCF4 in 102 ethnically diverse individuals (24 of African ancestry, 31 of European ancestry, 24 of Asian/Oceanians, and 23 US Hispanics), we show that the pattern of CYBA diversity is compatible with balancing natural selection, perhaps mediated by catalase-positive pathogens. NCF2 in Asian populations shows a pattern of diversity characterized by a differentiated haplotype structure. Our study provides insight into the role of pathogen-driven natural selection in an innate immune pathway and sheds light on the role of CYBA in endothelial, non-phagocytic NADPH oxidases, which are relevant in the pathogenesis of cardiovascular and other complex diseases.
Molecular Biology and Evolution 07/2013; · 10.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The relationship between the "individualism-collectivism" and the serotonin trans-porter functional polymorphism (5-HTTLPR), suggested in the previous reports, was tested in Native South Amerindian populations. A total of 170 individuals from 21 populations were genotyped for the 5-HTTLPR alleles. For comparative purposes, these populations were classified as individualistic (recent history of hunter–gathering) or collectivistic (agriculturalists). These two groups showed an almost identical S allele frequency (75 and 76%, respectively). The analysis of molecular variance showed no structural differences between them. Behavioral typologies like those sug-gested by JY Chiao and KD Blizinsky (Proc R Soc B 277 (2010) 529–537) are always a simplification of complex phenomena and should be regarded with caution. In addition, classification of a whole nation in the individu-alist/collectivist dichotomy is controversial. The focus on modes of subsistence in preindustrial societies, as was tested here, may be a good alternative although the postulated association between the 5-HTTLPR S allele and the collectivist societies was not confirmed. Am J Phys Anthropol 000:000–000, 2013. V C 2013 Wiley Periodicals, Inc.
American Journal of Physical Anthropology 05/2013; · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps. RESULTS: We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at http://code.google.com/p/dynamic-pipeline. The system has been tested on Linux and Windows platforms. CONCLUSIONS: Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats.
[Show abstract][Hide abstract] ABSTRACT: Large-scale genomics initiatives such as the HapMap project and the 1000-genomes rely on powerful bioinformatics support to assist data production and analysis. Contrastingly, few bioinformatics platforms oriented to smaller research groups exist to store, handle, share, and integrate data from different sources, as well as to assist these scientists to perform their analyses efficiently. We developed such a bioinformatics platform, DIVERGENOME, to assist population genetics and genetic epidemiology studies performed by small- to medium-sized research groups. The platform is composed of two integrated components, a relational database (DIVERGENOMEdb), and a set of tools to convert data formats as required by popular software in population genetics and genetic epidemiology (DIVERGENOMEtools). In DIVERGENOMEdb, information on genotypes, polymorphism, laboratory protocols, individuals, populations, and phenotypes is organized in projects. These can be queried according to permissions. Here, we validated DIVERGENOME through a use case regarding the analysis of SLC2A4 genetic diversity in human populations. DIVERGENOME, with its intuitive Web interface and automatic data loading capability, facilitates its use by individuals without bioinformatics background, allowing complex queries to be easily interrogated and straightforward data format conversions (not available in similar platforms). DIVERGENOME is open source, freely available, and can be accessed online (pggenetica.icb.ufmg.br/divergenome) or hosted locally.
[Show abstract][Hide abstract] ABSTRACT: Elucidating the pattern of genetic diversity for non-European populations is necessary to make the benefits of human genetics research available to individuals from these groups. In the era of large human genomic initiatives, Native American populations have been neglected, in particular, the Quechua, the largest South Amerindian group settled along the Andes. We characterized the genetic diversity of a Quechua population in a global setting, using autosomal noncoding sequences (nine unlinked loci for a total of 16 kb), 351 unlinked SNPs and 678 microsatellites and tested predictions of the model of the evolution of Native Americans proposed by (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496). European admixture is <5% and African ancestry is barely detectable in the studied population. The largest genetic distances were between African versus Quechua or Melanesian populations, which is concordant with the African origin of modern humans and the fact that South America was the last part of the world to be peopled. The diversity in the Quechua population is comparable with that of Eurasian populations, and the allele frequency spectrum based on resequencing data does not reflect a reduction in the proportion of rare alleles. Thus, the Quechua population is a large reservoir of common and rare genetic variants of South Amerindians. These results are consistent with and complement our evolutionary model of South Amerindians (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496), proposed based on Y-chromosome data, which predicts high genomic diversity due to the high level of gene flow between Andean populations and their long-term effective population size.
American Journal of Physical Anthropology 03/2012; 147(3):443-51. · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Gastric cancer is one of the most lethal types of cancer and its incidence varies worldwide, with the Andean region of South America showing high incidence rates. We evaluated the genetic structure of the population from Lima (Peru) and performed a case-control genetic association study to test the contribution of African, European, or Native American ancestry to risk for gastric cancer, controlling for the effect of non-genetic factors. A wide set of socioeconomic, dietary, and clinic information was collected for each participant in the study and ancestry was estimated based on 103 ancestry informative markers. Although the urban population from Lima is usually considered as mestizo (i.e., admixed from Africans, Europeans, and Native Americans), we observed a high fraction of Native American ancestry (78.4% for the cases and 74.6% for the controls) and a very low African ancestry (<5%). We determined that higher Native American individual ancestry is associated with gastric cancer, but socioeconomic factors associated both with gastric cancer and Native American ethnicity account for this association. Therefore, the high incidence of gastric cancer in Peru does not seem to be related to susceptibility alleles common in this population. Instead, our result suggests a predominant role for ethnic-associated socioeconomic factors and disparities in access to health services. Since Native Americans are a neglected group in genomic studies, we suggest that the population from Lima and other large cities from Western South America with high Native American ancestry background may be convenient targets for epidemiological studies focused on this ethnic group.
PLoS ONE 01/2012; 7(8):e41200. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Information on one Ecuadorian and three Peruvian Amerindian populations for 11 autosomal short tandem repeat (STR) loci is presented and incorporated in analyses that includes 26 other Native groups spread all over South America. Although in comparison with other studies we used a reduced number of markers, the number of populations included in our analyses is currently unmatched by any genome-wide dataset. The genetic polymorphisms indicate a clear division of the populations into three broad geographical areas: Andes, Amazonia, and the Southeast, which includes the Chaco and southern Brazil. The data also show good agreement with proposed hypotheses of splitting and dispersion of major language groups over the last 3,000 years. Therefore, relevant aspects of Native American history can be traced using as few as 11 STR autosomal markers coupled with a broad geographic distribution of sampled populations.
American Journal of Physical Anthropology 07/2011; 145(3):371-81. · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region.
Human Mutation 03/2011; 32(7):743-50. · 5.21 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Merozoites of Plasmodium falciparum invade through several pathways using different RBC receptors. Field isolates appear to use a greater variability of these receptors than laboratory isolates. Brazilian field isolates were shown to mostly utilize glycophorin A-independent invasion pathways via glycophorin B (GPB) and/or other receptors. The Brazilian population exhibits extensive polymorphism in blood group antigens, however, no studies have been done to relate the prevalence of the antigens that function as receptors for P. falciparum and the ability of the parasite to invade. Our study aimed to establish whether variation in the GYPB*S/s alleles influences susceptibility to infection with P. falciparum in the admixed population of Brazil.
Two groups of Brazilian Amazonians from Porto Velho were studied: P. falciparum infected individuals (cases); and uninfected individuals who were born and/or have lived in the same endemic region for over ten years, were exposed to infection but have not had malaria over the study period (controls). The GPB Ss phenotype and GYPB*S/s alleles were determined by standard methods. Sixty two Ancestry Informative Markers were genotyped on each individual to estimate admixture and control its potential effect on the association between frequency of GYPB*S and malaria infection.
GYPB*S is associated with host susceptibility to infection with P. falciparum; GYPB*S/GYPB*S and GYPB*S/GYPB*s were significantly more prevalent in the in the P. falciparum infected individuals than in the controls (69.87% vs. 49.75%; P<0.02). Moreover, population genetics tests applied on the GYPB exon sequencing data suggest that natural selection shaped the observed pattern of nucleotide diversity.
Epidemiological and evolutionary approaches suggest an important role for the GPB receptor in RBC invasion by P. falciparum in Brazilian Amazons. Moreover, an increased susceptibility to infection by this parasite is associated with the GPB S+ variant in this population.
PLoS ONE 01/2011; 6(1):e16123. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data.
In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp.
We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.
[Show abstract][Hide abstract] ABSTRACT: To describe the genetic diversity of Plasmodium vivax isolates from different areas in the Brazilian Amazon using 11 polymorphic microsatellites and to evaluate the correlation between microsatellite variation and repeat array length.
Microsatellites with variable repeat units and array lengths were selected using in silico search of the P. vivax genome. We designed primers and amplified the selected loci in DNA obtained from patients with P. vivax acute infections.
Positive correlation between repeat array length and microsatellite variation was detected independently of the size of repeat unit (di, tri, or tetranucleotide). We used these markers to describe the genetic variability of P. vivax isolates from four geographic regions of the Brazilian Amazon. Substantial variability was observed among P. vivax isolates within populations, concurrent with high levels of multiple-clone infections and high linkage disequilibrium. Overall, structured populations were observed with moderate to high genetic differentiation.
The markers studied are useful tools for assessing population structure of P. vivax, as demonstrated for Brazilian populations and for searching for evidence of recent selection events associated with different phenotypes, such as drug resistance.
Tropical Medicine & International Health 06/2010; 15(6):718-26. · 2.94 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Glucose is an important source of energy for living organisms. In vertebrates it is ingested with the diet and transported into the cells by conserved mechanisms and molecules, such as the trans-membrane Glucose Transporters (GLUTs). Members of this family have tissue specific expression, biochemical properties and physiologic functions that together regulate glucose levels and distribution. GLUT4 -coded by SLC2A4 (17p13) is an insulin-sensitive transporter with a critical role in glucose homeostasis and diabetes pathogenesis, preferentially expressed in the adipose tissue, heart muscle and skeletal muscle. We tested the hypothesis that natural selection acted on SLC2A4.
We re-sequenced SLC2A4 and genotyped 104 SNPs along a approximately 1 Mb region flanking this gene in 102 ethnically diverse individuals. Across the studied populations (African, European, Asian and Latin-American), all the eight common SNPs are concentrated in the N-terminal region upstream of exon 7 ( approximately 3700 bp), while the C-terminal region downstream of intron 6 ( approximately 2600 bp) harbors only 6 singletons, a pattern that is not compatible with neutrality for this part of the gene. Tests of neutrality based on comparative genomics suggest that: (1) episodes of natural selection (likely a selective sweep) predating the coalescent of human lineages, within the last 25 million years, account for the observed reduced diversity downstream of intron 6 and, (2) the target of natural selection may not be in the SLC2A4 coding sequence.
We propose that the contrast in the pattern of genetic variation between the N-terminal and C-terminal regions are signatures of the action of natural selection and thus follow-up studies should investigate the functional importance of different regions of the SLC2A4 gene.
PLoS ONE 01/2010; 5(3):e9827. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Admixture occurs when individuals from parental populations that have been isolated for hundreds of generations form a new hybrid population. Currently, interest in measuring biogeographic ancestry has spread from anthropology to forensic sciences, direct-to-consumers personal genomics, and civil rights issues of minorities, and it is critical for genetic epidemiology studies of admixed populations. Markers with highly differentiated frequencies among human populations are informative of ancestry and are called ancestry informative markers (AIMs). For tri-hybrid Latin American populations, ancestry information is required for Africans, Europeans and Native Americans. We developed two multiplex panels of AIMs (for 14 SNPs) to be genotyped by two mini-sequencing reactions, suitable for investigators of medium-small laboratories to estimate admixture of Latin American populations. We tested the performance of these AIMs by comparing results obtained with our 14 AIMs with those obtained using 108 AIMs genotyped in the same individuals, for which DNA samples is available for other investigators. We emphasize that this type of comparison should be made when new admixture/population structure panels are developed. At the population level, our 14 AIMs were useful to estimate European admixture, though they overestimated African admixture and underestimated Native American admixture. Combined with more AIMs, our panel could be used to infer individual admixture. We used our panel to infer the pattern of admixture in two urban populations (Montes Claros and Manhuaçu) of the State of Minas Gerais (southeastern Brazil), obtaining a snapshot of their genetic structure in the context of their demographic history.
[Show abstract][Hide abstract] ABSTRACT: Plasmodium vivax malaria is a major public health challenge in Latin America, Asia and Oceania, with 130-435 million clinical cases per year worldwide. Invasion of host blood cells by P. vivax mainly depends on a type I membrane protein called Duffy binding protein (PvDBP). The erythrocyte-binding motif of PvDBP is a 170 amino-acid stretch located in its cysteine-rich region II (PvDBPII), which is the most variable segment of the protein.
To test whether diversifying natural selection has shaped the nucleotide diversity of PvDBPII in Brazilian populations, this region was sequenced in 122 isolates from six different geographic areas. A Bayesian method was applied to test for the action of natural selection under a population genetic model that incorporates recombination. The analysis was integrated with a structural model of PvDBPII, and T- and B-cell epitopes were localized on the 3-D structure.
The results suggest that: (i) recombination plays an important role in determining the haplotype structure of PvDBPII, and (ii) PvDBPII appears to contain neutrally evolving codons as well as codons evolving under natural selection. Diversifying selection preferentially acts on sites identified as epitopes, particularly on amino acid residues 417, 419, and 424, which show strong linkage disequilibrium.
This study shows that some polymorphisms of PvDBPII are present near the erythrocyte-binding domain and might serve to elude antibodies that inhibit cell invasion. Therefore, these polymorphisms should be taken into account when designing vaccines aimed at eliciting antibodies to inhibit erythrocyte invasion.
[Show abstract][Hide abstract] ABSTRACT: Alu insertions provide useful markers for the study of inter-population affinities and historical processes, but data on these systems are not numerous in Native Americans and related populations.
The study aimed to answer the following questions: (a) do the population relationships found agree with ethnic, historical and geographical data? and (b) what can heterozygote levels and associated results inform us about the events that led to the colonization of the New World?
Twelve Alu insertion polymorphisms were studied in 330 individuals belonging to South American Native, Siberian and Mongolian populations. These data were integrated with those from 526 persons, to ascertain the relationships between Asian, Northern Arctic and Amerindian populations.
A decreasing trend concerning heterozygosities and amount of gene flow was observed in the three sets, in the order indicated above. Most results indicated the validity of these subdivisions. However, no clear structure could be observed within South American Natives, indicating the importance of dispersive (genetic drift, founder effects) factors in their differentiation.
The answers to the questions are: (a) yes; and (b) an initial moderate bottleneck, intensified by more recent historical events (isolation and inbreeding), can explain the current Amerindian pattern of diversity.
Annals of Human Biology 08/2009; 33(2):142-60. · 1.48 Impact Factor