About
336
Publications
73,257
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
71,043
Citations
Introduction
Publications
Publications (336)
The Jacquard genetic identity coefficients are of fundamental importance in relatedness research. We address the estimation of these coefficients as well as other relationship parameters that derive from them such as kinship and inbreeding coefficients using a concise matrix framework. Estimation of the Jacquard coefficients via likelihood methods...
Measuring inbreeding and its consequences on fitness is central for many areas in biology including human genetics and the conservation of endangered species. However, there is no consensus on the best method, neither for quantification of inbreeding itself nor for the model to estimate its effect on specific traits. We simulated traits based on si...
[This corrects the article DOI: 10.1371/journal.pgen.1010871.].
Being able to properly quantify genetic differentiation is key to understanding the evolutionary potential of a species. One central parameter in this context is FST, the mean coancestry within populations relative to the mean coancestry between populations. Researchers have been estimating FST globally or between pairs of populations for a long ti...
A new calculation module within the PopStats module of the CODIS software package, based on the underlying mathematics presented in the MixKin software package, has been developed for assigning the Likelihood Ratio (LR) of DNA mixture profiles. This module uses a semi-continuous model that allows for population structure and allelic drop-out and dr...
In his 1972 paper ‘The apportionment of human diversity’, Lewontin showed that, when averaged over loci, genetic diversity is predominantly attributable to differences among individuals within populations. However, selection can alter the apportionment of diversity of specific genes or genomic regions. We examine genetic diversity at the human leuc...
Transnational ivory traffickers continue to smuggle large shipments of elephant ivory out of Africa, yet prosecutions and convictions remain few. We identify trafficking networks on the basis of genetic matching of tusks from the same individual or close relatives in separate shipments. Analyses are drawn from 4,320 savannah (Loxodonta africana) an...
The two alleles an individual carries at a locus are identical by descent (ibd) if they have descended from a single ancestral allele in a reference population, and the probability of such identity is the inbreeding coefficient of the individual. Inbreeding coefficients can be predicted from pedigrees with founders constituting the reference popula...
In his 1972 "The apportionment of human diversity", Richard Lewontin showed that, when averaged over loci, genetic diversity is predominantly attributable to differences among individuals within populations. However, selection on specific genes and genomic regions can alter the apportionment of diversity. We examine genetic diversity at the HLA loc...
A match of HLA loci between patients and donors is critical for successful hematopoietic stem cell transplantation. However, the extreme polymorphism of HLA loci – an outcome of millions of years of natural selection – reduces the chances that two individuals will carry identical combinations of multilocus HLA genotypes. Further, HLA variability is...
The linkage disequilibrium coefficient r2 is a measure of statistical dependence of the alleles possessed by an individual at different genetic loci. It is widely used in association studies to search for the locations of disease-causing genes on chromosomes. Most studies to date treat r2 as a fixed property of two loci in a finite population, and...
Heritability, the proportion of phenotypic variance explained by genetic factors, can be estimated from pedigree data ¹ , but such estimates are uninformative with respect to the underlying genetic architecture. Analyses of data from genome-wide association studies (GWAS) on unrelated individuals have shown that for human traits and disease, approx...
paragraph
The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic...
Rapid growth in world trade has enabled transnational criminal networks to conceal their contraband among the 1 billion containers shipped worldwide annually. Forensic methods are needed to identify the major cartels moving the contraband into transit. We combine DNA-based sample matching and geographic assignment of tusks to show that the two tusk...
The concept of kinship permeates many domains of fundamental and applied biology ranging from social evolution to conservation science to quantitative and human genetics. Until recently, pedigrees were the gold standard to infer kinship, but the advent of next generation sequencing and the availability of dense genetic markers in many species make...
Recently, Lund and Iyer (L&I) raised an argument regarding the use of likelihood ratios in court. In our view, their argument is based on a lack of understanding of the paradigm. L&I argue that the decision maker should not accept the expert's likelihood ratio without further consideration. This is agreed by all parties. In normal practice, there i...
Statistical tests for Hardy-Weinberg equilibrium are important elementary tools in genetic data analysis. X-chromosomal variants have long been tested by applying autosomal test procedures to females only, and gender is usually not considered when testing autosomal variants for equilibrium. Recently, we proposed specific X-chromosomal exact test pr...
Standard statistical tests for equality of allele frequencies in males and females and tests for Hardy-Weinberg equilibrium are tightly linked by their assumptions. Tests for equality of allele frequencies assume Hardy-Weinberg equilibrium, whereas the usual chi-square or exact test for Hardy-Weinberg equilibrium assume equality of allele frequenci...
Significance
Inbreeding depression (ID) is the reduction of fitness in offspring of related parents. This phenomenon can be quantified from SNP data through a number of measures of inbreeding. Our study addresses two key questions. How accurate are the different methods to estimate ID? And how and why should investigators choose among the multiple...
Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We...
Many population genetic activities, ranging from evolutionary studies to association mapping to forensic identification, rely on appropriate estimates of population structure or relatedness. All applications require recognition that quantities with an underlying meaning of allelic dependence are not defined in an absolute sense, but instead are mad...
Motivation:
Whole-genome sequencing (WGS) data is being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. Variant call format (VCF) is a general text-based format developed to store variant genotypes and their annotations. However, VCF files are large and data re...
An update was performed of the classic experiments that led to the view that profile probability assignments are usually within a factor of 10 of each other. The data used in this study consist of 15 Identifiler loci collected from a wide range of forensic populations. Following Budowle et al. [1], the terms cognate and non-cognate are used. The co...
Many population genetic activities, ranging from evolutionary studies to association mapping to forensic identification, rely on appropriate estimates of population structure or relatedness. All applications require recognition that quantities with an underlying meaning of allelic identity by descent are not defined in an absolute sense, but instea...
Many population genetic activities, ranging from evolutionary studies to association mapping to forensic identification, rely on appropriate estimates of population structure or relatedness. All applications require recognition that quantities with an underlying meaning of allelic identity by descent are not defined in an absolute sense, but instea...
Y-STR markers are particularly useful forensically for identifying the male contributor to a male-female DNA mixture. A population genetic approach to estimation of the match probability is dependent on an appropriate estimate of θ, which is calculated empirically as FST. This estimate depends on the choice of FST estimator, the markers included an...
Testing genetic markers for Hardy-Weinberg equilibrium (HWE) is an important tool for detecting genotyping errors in large-scale genotyping studies. For markers at the X chromosome, typically the χ(2) or exact test is applied to the females only, and the hemizygous males are considered to be uninformative. In this paper we show that the males are r...
Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (AC...
Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection agains...
US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) rob...
Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA bas...
The utility of short tandem repeat genetic (STR) markers for forensic science is beyond question and there are over 50 million STR profiles in current national databases. The magnitude and value of those data, however, are likely to be dwarfed by what is emerging from large-scale SNP and DNA sequence assays. Phenotypic characterization may well acc...
This paper addresses the issue of exact-test based statistical inference for Hardy-Weinberg equilibrium in the presence of missing genotype data. Missing genotypes are often discarded when markers are tested for Hardy-Weinberg equilibrium and this can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can im...
Poaching of elephants is now occurring at rates that threaten African populations with extinction. Identifying the number and location of Africa's major poaching hotspots may assist efforts to end poaching and facilitate recovery of elephant populations. We genetically assign origin to 28 large ivory seizures (≥0.5 tons) made between 1996-2014, als...
For human complex traits, non-additive genetic variation has been invoked to explain "missing heritability," but its discovery is often neglected in genome-wide association studies. Here we propose a method of using SNP data to partition and estimate the proportion of phenotypic variance attributed to additive and dominance genetic variation at all...
Genetic risk factors are believed to combine with environmental exposures and contribute to the risk of developing temporomandibular disorder (TMD). In this prospective cohort study, 2,737 people without TMD were assessed for common genetic variation in 358 genes known to contribute to nociceptive pathways, inflammation, and affective distress. Dur...
The challenges of whole-genome data, when genotypes are available from hundreds of thousands of genetic markers, are explored for four topics in statistical genetics: Hardy-Weinberg testing, estimating linkage disequilibrium from unphased genotypic data, association mapping and characterizing population structure.
The current availability of dense sets of marker SNPs for the human genome is having a large impact on genetic studies and offers new possibilities for clinical trials. This chapter offers a unified basis for the analysis of marker and response data, emphasizing the central importance of the correlation, or linkage disequilibrium, between SNP marke...
Microarray SNP genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. "Genomic coverage" is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imp...
Genotyping of classical human leukocyte antigen (HLA) alleles is an essential tool in the analysis of diseases and adverse drug reactions with associations mapping to the major histocompatibility complex (MHC). However, deriving high-resolution HLA types subsequent to whole-genome single-nucleotide polymorphism (SNP) typing or sequencing is often c...
This volume contains a selection of chapters base on papers presented at the Fourth Seattle Symposium in Biostatistics: Clinical Trials. The symposium was held in 2010 to celebrate the 40th anniversary of the University of Washington School of Public Health and Community Medicine. It featured keynote lectures by David DeMets and Susan Ellenberg and...
Characterizing the genetic basis of responses in clinical trials has been made substantially easier and more powerful through the use of single nucleotide polymorphism data. A million of more of these markers can now be scored cheaply with commercial SNP-chips and ten million or more additional SNPs can be inferred by imputation. These rich dataset...
To identify new genetic factors for colorectal cancer (CRC), we conducted a genome-wide association study in east Asians. By analyzing genome-wide data in 2,098 cases and 5,749 controls, we selected 64 promising SNPs for replication in an independent set of samples, including up to 5,358 cases and 5,922 controls. We identified four SNPs with associ...
Characterizing the genetic structure of populations is of importance to evolutionary biology, to human disease gene mapping, and to forensic science. Sewall Wright introduced a set of F-statistics to describe population structure in 1951, and he emphasized that these quantities were ratios of variances. Responding to uncertainty over the best way t...
Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and re...
GWASTools is an R/Bioconductor package for quality control and analysis of genome-wide association studies (GWAS). GWASTools
brings the interactive capability and extensive statistical libraries of R to GWAS. Data are stored in NetCDF format to accommodate
extremely large datasets that cannot fit within R’s memory limits. The documentation includes...
Summary In previous analyses, the variation in actual, or realized, relationship has been derived as a function of map length of chromosomes and type of relationship, the variation being greater the shorter the total chromosome length and the coefficient of variation being greater the more distant the relationship. Here, the results are extended to...
Background / Purpose:
Our aim was to measure the effects of non-additive variance components of quantitative traits in human genetics. We present the theory that underlies these calculations.
Main conclusion:
We have found that quantifying non-additive genetic variance components for human quantitative traits should be possible with enough dat...
DNA profiling of biological material from scenes of crimes is often complicated because the amount of DNA is limited and the quality of the DNA may be compromised. Furthermore, the sensitivity of STR typing kits has been continuously improved to detect low level DNA traces. This may lead to (1) partial DNA profiles and (2) detection of additional a...
Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorp...
We detected clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells with the same abnormal karyotype (>5-10%; presumably of clonal origin) in...
IntroductionThe Central ProblemGenetic SamplingLineage MarkersRelatednessInbreedingTesting for Allele IndependenceAssignment testingConclusion
References
With the expansion of offender/arrestee DNA profile databases, genetic forensic identification has become commonplace in the United States criminal justice system. Implementation of familial searching has been proposed to extend forensic identification to family members of individuals with profiles in offender/arrestee DNA databases. In familial se...
A simple tandem repeat (STR) PCR-based typing system developed for the genetic individualization of domestic cat samples has been used to generate a population genetic database of domestic cat breeds. A panel of 10 tetranucleotide STR loci and a gender-identifying sequence tagged site (STS) were co-amplified in genomic DNA of 1043 individuals repre...
The current genome-wide association (GWA) analysis mainly focuses on the single genetic variant, which may not reveal some the genetic variants that have small individual effects but large joint effects. Considering the multiple SNPs jointly in Genome-wide association (GWA) analysis can increase power. When multiple SNPs are jointly considered, the...
This Journal of Pain Compendium presents the initial outcomes from the first large population-based study designed to identify the biopsychosocial and genetic risk factors that contribute to the onset and persistence of painful temporomandibular joint disorders (TMD) – The OPPERA Study. This study is supported by NIDCR Cooperative Agreement U01 DE0...
Unlabelled:
Genetic factors play a role in the etiology of persistent pain conditions, putatively by modulating underlying processes such as nociceptive sensitivity, psychological well-being, inflammation, and autonomic response. However, to date, only a few genes have been associated with temporomandibular disorders (TMD). This study evaluated 35...
We estimate and partition genetic variation for height, body mass index (BMI), von Willebrand factor and QT interval (QTi) using 586,898 SNPs genotyped on 11,586 unrelated individuals. We estimate that ∼45%, ∼17%, ∼25% and ∼21% of the variance in height, BMI, von Willebrand factor and QTi, respectively, can be explained by all autosomal SNPs and a...
Mitochondrial DNA (mtDNA) and the non-recombining portion of the Y-chromosome are inherited matrilinealy and patrilinealy, respectively, and without recombination. Collectively they are termed 'lineage markers'. Lineage markers may be used in forensic testing of an item, such as a hair from a crime scene, against a hypothesised source, or in relati...
Although the expected relationship or proportion of genome shared by pairs of relatives can be obtained from their pedigrees, the actual quantities deviate as a consequence of Mendelian sampling and depend on the number of chromosomes and map length. Formulae have been published previously for the variance of actual relationship for a number of spe...
Likelihood ratios are necessary to properly interpret mixed stain DNA evidence. They can flexibly consider alternate hypotheses and can account for population substructure. The likelihood ratio should be seen as an estimate and not a fixed value, because the calculations are functions of allelic frequency estimates that were estimated from a small...
Whole genome data are allowing the estimation of population genetic parameters with an accuracy not imagined 50 years ago. Variation in these parameters along the genome is being found empirically where once only approximate theoretical values were available. Along with increased information, however, has come the issue of multiple testing and the...
Genotyping technology now allows the rapid and affordable generation of million-SNP profiles for humans, leading to considerable activity in association mapping. Similar activity is anticipated for many plant species, including Brassica. These plant association mapping activities will require the same care in quality control and quality assurance a...
Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate find...
We propose a multilocus version of F(ST) and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply t...
Coevolving interacting genes undergo complementary mutations to maintain their interaction. Distinct combinations of alleles in coevolving genes interact differently, conferring varying degrees of fitness. If this fitness differential is adequately large, the resulting selection for allele matching could maintain allelic association, even between p...