David J. Balding's research while affiliated with Victoria University Melbourne and other places

Publications (302)

Preprint
We present LDAK-GBAT, a novel tool for gene-based association testing using summary statistics from genome-wide association studies. We first evaluate LDAK-GBAT using ten phenotypes from the UK Biobank. We show that LDAK-GBAT is computationally efficient, taking approximately 30 minutes to analyze imputed data (2.9M common, genic SNPs), and requiri...
Article
The inclusion of ancestrally diverse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analys...
Article
Full-text available
Throughout human evolutionary history, large-scale migrations have led to intermixing (i.e., admixture) between previously separated human groups. Although classical and recent work have shown that studying admixture can yield novel historical insights, the extent to which this process contributed to adaptation remains underexplored. Here, we intro...
Article
Full-text available
We present a novel algorithm, implemented in the software ARGinfer , for probabilistic inference of the Ancestral Recombination Graph under the Coalescent with Recombination. Our Markov Chain Monte Carlo algorithm takes advantage of the Succinct Tree Sequence data structure that has allowed great advances in simulation and point estimation, but not...
Article
Complex‐trait genetics has advanced dramatically through methods to estimate the heritability tagged by SNPs, both genome‐wide and in genomic regions of interest such as those defined by functional annotations. The models underlying many of these analyses are inadequate, and consequently many SNP‐heritability results published to date are inaccurat...
Article
Full-text available
Whole-genome sequencing has facilitated genome-wide analyses of association, prediction and heritability in many organisms. However, such analyses in bacteria are still in their infancy, being limited by difficulties including genome plasticity and strong population structure. Here we propose a suite of methods including linear mixed models, elasti...
Article
Full-text available
Association mapping using crop varieties allows identification of genetic loci of direct relevance to breeding. Here, 150 UK wheat varieties genotyped with 23,288 single nucleotide polymorphisms (SNPs) were used for genome-wide association studies (GWAS) using historical phenotypic data for grain protein content, Hagberg falling number, test weight...
Article
Full-text available
Better drugs are needed for common epilepsies. Drug repurposing offers the potential of significant savings in the time and cost of developing new treatments. In order to select the best candidate drug(s) to repurpose for a disease, it is desirable to predict the relative clinical efficacy that drugs will have against the disease. Common epilepsy c...
Preprint
Full-text available
Throughout human evolutionary history, large-scale migrations have led to intermixing (i.e., admixture) between previously separated human groups. While classical and recent work have shown that studying admixture can yield novel historical insights, the extent to which this process contributed to adaptation remains underexplored. Here, we introduc...
Preprint
Full-text available
Advances in whole-genome genotyping and sequencing have allowed genome-wide analyses of association, prediction and heritability in many organisms. However, the application of such analyses to bacteria is still in its infancy, being limited by difficulties including the plasticity of bacterial genomes and their strong population structure. Here we...
Article
Full-text available
Y chromosome and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited fr...
Preprint
Full-text available
Y-chromosomal and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited f...
Article
Full-text available
Mapping the genes underlying ecologically relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterisation of a species' genetic diversity across the landscape or even over its whole range. The relevant capture of the genetic diversity acros...
Article
Background: Genome-wide association studies (GWAS) have identified genes influencing skin ageing and mole count in Europeans but little is known about the relevance of these (or other genes) in non-Europeans. Objective: To conduct a GWAS for facial skin ageing and mole count in adults < 40 years old, of mixed European, Native American and Africa...
Article
Full-text available
Here we evaluate the accuracy of prediction for eye, hair and skin pigmentation in a dataset of > 6,500 individuals from Mexico, Colombia, Peru, Chile and Brazil (including genome-wide SNP data and quantitative/categorical pigmentation phenotypes - the CANDELA dataset CAN). We evaluated accuracy in relation to different analytical methods and vario...
Preprint
Full-text available
Mapping the genes underlying ecologically-relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterisation of a species’ genetic diversity across the landscape or even over its whole range. The relevant capture of the genetic diversity acros...
Article
Full-text available
To characterize the genetic basis of facial features in Latin Americans, we performed a genome-wide association study (GWAS) of more than 6000 individuals using 59 landmark-based measurements from two-dimensional profile photographs and ~9,000,000 genotyped or imputed single-nucleotide polymorphisms. We detected significant association of 32 traits...
Preprint
Full-text available
We report an evaluation of prediction accuracy for eye, hair and skin pigmentation based on genomic and phenotypic data for over 6,500 admixed Latin Americans (the CANDELA dataset). We examined the impact on prediction accuracy of three main factors: (i) The methods of prediction, including classical statistical methods and machine learning approac...
Article
As legal practitioners and courts become more aware of scientific methods and evidence evaluation, they are demanding measures of the reliability of expert opinion. In particular, there are calls for error rates to accompany opinion evidence in comparative forensic sciences. While error rates or confidence intervals can be useful for those discipli...
Article
Full-text available
There is currently much debate regarding the best model for how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I model, the authors of LD Score Regression recommend the Baseline LD model, and we have recommended the LDAK model. Here we provide a statistical framework for assessing heritability models using summar...
Article
Linkage disequilibrium SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability, and genetic correlation using only genome‐wide association study (GWAS) test statistics. SumHer is a newly introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately acc...
Preprint
Full-text available
There is currently much debate regarding the best way to model how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I Model, the authors of LD Score Regression recommend the Baseline LD Model, while we have instead recommended the LDAK Model. Here we provide a statistical framework for assessing heritability models...
Article
Full-text available
In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be acc...
Article
Full-text available
We present SumHer, software for estimating confounding bias, SNP heritability, enrichments of heritability and genetic correlations using summary statistics from genome-wide association studies. The key difference between SumHer and the existing software LD Score Regression (LDSC) is that SumHer allows the user to specify the heritability model. We...
Article
We compare two open-source programs for the evaluation of evidential weight arising from complex DNA profiles recovered in a crime investigation. Here, “complex” means one or more of: low-template, degraded and mixed-source. Although software for complex DNA profile analysis has made great strides in recent years, the ability of courts to effective...
Preprint
LD SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability and genetic correlation using only genome wide association study (GWAS) test statistics. SumHer is a newly-introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding...
Article
Full-text available
We report a genome-wide association scan in >6,000 Latin Americans for pigmentation of skin and eyes. We found eighteen signals of association at twelve genomic regions. These include one novel locus for skin pigmentation (in 10q26) and three novel loci for eye pigmentation (in 1q32, 20q13 and 22q12). We demonstrate the presence of multiple indepen...
Article
Full-text available
Historical records and genetic analyses indicate that Latin Americans trace their ancestry mainly to the intermixing (admixture) of Native Americans, Europeans and Sub-Saharan Africans. Using novel haplotype-based methods, here we infer sub-continental ancestry in over 6,500 Latin Americans and evaluate the impact of regional ancestry variation on...
Article
We present a new Markov chain Monte Carlo algorithm, implemented in the software Arbores, for inferring the history of a sample of DNA sequences. Our principal innovation is a bridging procedure, previously applied only for simple stochastic processes, in which the local computations within a bridge can proceed independently of the rest of the DNA...
Article
The epilepsies affect around 65 million people worldwide and have a substantial missing heritability component. We report a genome-wide mega-analysis involving 15,212 individuals with epilepsy and 29,677 controls, which reveals 16 genome-wide significant loci, of which 11 are novel. Using various prioritization criteria, we pinpoint the 21 most lik...
Preprint
We present a new Markov chain Monte Carlo algorithm, implemented in software Arbores, for inferring the history of a sample of DNA sequences. Our principal innovation is a bridging procedure, previously applied only for simple stochastic processes, in which the local computations within a bridge can proceed independently of the rest of the DNA sequ...
Article
Full-text available
Direct-to-consumer genetic ancestry testing is a new and growing industry that has gained widespread media coverage and public interest. Its scientific base is in the fields of population and evolutionary genetics and it has benefitted considerably from recent advances in rapid and cost-effective DNA typing technologies. There is a considerable bod...
Article
Full-text available
Mitochondrial DNA (mtDNA) is useful to assist with identification of the source of a biological sample, or to confirm matrilineal relatedness. Although the autosomal genome is much larger, mtDNA has an advantage for forensic applications of multiple copy number per cell, allowing better recovery of sequence information from degraded samples. In add...
Data
Approximate quantiles of the number of matching individuals. Key quantiles of the distributions shown in Fig 2 for the mutation scheme of Översti [13], and for the 1.2M growth demographic scenario. (PDF)
Data
Approximate quantiles of the number of matching individuals. Key quantiles of the distributions shown in Fig 2 for the mutation scheme of Översti [13], and for the 300K constant demographic scenario. (PDF)
Data
Comparison of simulated with US and Iranian databases. The distribution of the numbers of singletons, doubletons and distinct haplotypes in 2,500 random databases of sizes 263 and 351 obtained under our three demographic and two mutation models. The horizontal reference lines are from [15, 16]. [16] does not provide number of singletons and doublet...
Data
Approximate quantiles of the number of matching individuals. Key quantiles of the distributions shown in Fig 2 for the mutation scheme of Rieux [14], and for the 300K constant demographic scenario. (PDF)
Data
Approximate quantiles of the number of matching individuals. Key quantiles of the distributions shown in Fig 2 for the mutation scheme of Rieux [14], and for the 1.2M constant demographic scenario. (PDF)
Data
Approximate quantiles of the number of matching individuals. Key quantiles of the distributions shown in Fig 2 for the mutation scheme of Rieux [14], and for the 1.2M growth demographic scenario. (PDF)
Article
We recently introduced a new approach to the evaluation of weight of evidence (WoE) for Y-chromosome profiles. Rather than attempting to calculate match probabilities, which is particularly problematic for modern Y-profiles with high mutation rates, we proposed using simulation to describe the distribution of the number of males in the population w...
Preprint
Mitochondrial DNA (mtDNA) is useful to assist with identification of the source of a biological sample, or to confirm matrilineal relatedness. Although the autosomal genome is much larger, mtDNA has an advantage for forensic applications of multiple copy number per cell, allowing better recovery of sequence information from degraded samples. In add...
Preprint
We recently introduced a new approach to the evaluation of weight of evidence (WoE) for Y-chromosome profiles. Rather than attempting to calculate match probabilities, which is particularly problematic for modern Y-profiles with high mutation rates, we proposed using simulation to describe the distribution of the number of males in the population w...
Article
In forensic genetics, the likelihood ratio (LR), measuring the value of DNA profile evidence, is computed from a database of allele frequencies. Here, we address the choice of database and adjustments for population structure and sample size in the context of Brazil. The Brazilian population underwent a complex process of colonization, migration an...
Preprint
Full-text available
In our recent publication, ¹ we examined the two heritability models most widely used when estimating SNP heritability: the GCTA Model, which is used by the software GCTA ² and upon which LD Score regression (LDSC) is based, ³ and the LDAK Model, which is used by our software LDAK. ⁴ First we demonstrated the importance of choosing an appropriate h...
Preprint
Full-text available
LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic var...
Preprint
Historical records and genetic analyses indicate that Latin Americans trace their ancestry mainly to the admixture of Native Americans, Europeans and Sub-Saharan Africans ¹ . Using novel haplotype-based methods here we infer the sub-populations involved in admixture for over 6,500 Latin Americans and evaluate the impact of sub-continental ancestry...
Data
Count distribution for a two-person mixture. Distribution of the number of Y profiles included in a mixed Yfiler Plus profile arising from two male contributors. The red curve corresponds to profiles that exactly match the profile of one of the contributors. It is the same as the red (“unconditional”) curve in Fig 4 and is included again here for c...
Article
Full-text available
The introduction of forensic autosomal DNA profiles was controversial, but the problems were successfully addressed, and DNA profiling has gone on to revolutionise forensic science. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source(s) of the DNA....
Data
Similar to S3 Table but with VRS = 0.2 and population size growth. The population growth rate is 2% per generation. Please see S3 Table caption for details. (TEX)
Data
Y-STR mutation rates. Histogram bars show empirical mutation rates for the 29 loci included in the three Y-STR profiling kits (rates obtained as a ratio of the counts in S1 Table). The two duplicated loci (DYS385 and DYF387S1) are each represented as two loci with the same mutation rate. The curve shows the probability density for the Beta(1.5, 200...
Data
Mixtures. An approach similar to the one presented for DNA samples with a single male contributor can be applied for DNA samples with multiple male contributors. (PDF)
Data
Importance sampling ESS. The effective sample size for simulations with constant population size, as a fraction of the 106 simulated |Ω| values, for the importance sampling to approximate distributions for |Ω| conditional on database profile count m, for m from 0 to 6. The database sizes are n = 100 (dashed line) and n = 1,000 (solid line). (TIF)
Data
The number of matching live individuals for VRS = 0.2 and constant population size. Highlighted in grey are key properties of the unconditional distributions of |Ω|, the number of live individuals with profile matching that of Q, shown in Fig 4. Other rows show corresponding properties of conditional distributions given an observation of m copies o...
Data
Properties of the distribution of Δ, the number of father-son steps (or meioses) between Q and another male in Ω. These distributions are shown in Fig 5. (TEX)
Data
Mutation count data. Loci are those from the three kits, Yfiler (17 loci), PowerPlex Y23 (23 loci), and Yfiler Plus (27 loci). From: YHRD, [24] available at yhrd.org/pages/resources/mutation_rates, updated 17 Jan 2017. DYS385 and DYF387S1 are duplicated loci and each is treated as two independent loci (postfix ‘a’ and ‘b’). The same mutation rate w...
Data
Similar to S3 Table but with VRS = 1. Please see S3 Table caption for details. (TEX)
Data
Similar to S3 Table but with VRS = 0. The population size is constant so this case corresponds to a standard Wright-Fisher model. Please see S3 Table caption for details. (TEX)
Data
Similar to S3 Table but with VRS = 1 and population size growth. The population growth rate is 2% per generation. Please see S3 Table caption for details. (TEX)
Data
Similar to S3 Table but with VRS = 0 and population size growth. The population growth rate is 2% per generation. Please see S3 Table caption for details. (TEX)
Data
Mixed profiles. Key properties of the distribution of the number of Y profiles included in a mixed Yfiler Plus profile arising from two male contributors in a constant-size population. See S3 Fig legend for explanations of “other included” and “unconditional”. (TEX)
Article
Full-text available
Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient...
Preprint
The introduction of forensic autosomal DNA profiles was controversial, but the problems were successfully addressed, and DNA profiling has gone on to revolutionise forensic science. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source(s) of the DNA....
Article
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we...
Article
Full-text available
Objective: Mutations in the aryl hydrocarbon receptor-interacting protein (AIP) gene are associated with pituitary adenoma, acromegaly and gigantism. Identical alleles in unrelated pedigrees could be inherited from a common ancestor or result from recurrent mutation events. Design & methods: Observational, inferential and experimental study, inc...
Article
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we...
Preprint
Full-text available
SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human...
Article
Full-text available
Motivation: Sequencing pools of individuals (Pool-Seq) is a cost-effective way to gain insight into the genetics of complex traits, but as yet no parametric method has been developed to both test for genetic effects and estimate their magnitude. Here, we propose GWAlpha, a flexible method to obtain parametric estimates of genetic effects genome-wi...
Article
This letter comments on the report “Forensic science in criminal courts: Ensuring scientific validity of feature-comparison methods” recently released by the President’s Council of Advisors on Science and Technology (PCAST). The report advocates a procedure for evaluation of forensic evidence that is a two-stage procedure in which the first stage i...