Daniel J. Lawson’s research while affiliated with University of Bristol and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (108)


Fig. 2: Comparison of the predictive performance of HCs and PCs on phenotypes. We visualize the average out-of-sample R 2 explained by HCs (red) and PCs (blue) under linear regression models (plot a, for continuous phenotypes) or logistic regression models (plot b, for binary phenotypes) through a 5-fold CV performed on n=462,694 individuals in the UK Biobank, constraining to PCs not associated with genomic structure.
Fig. 3: Comparison of the average estimated effect size of ancestries from the Tractor model with different methods to represent local ancestries under E(A * i1d ) = 0.1. E(A * i1d ) denotes the average probability of Ancestry 1. The x-axis represents the average certainty of LAI (see Methods) and the y-axis represents the average estimated effect sizes for Ancestry 1 (plot a) and Ancestry 2 (plot b). Different MAF thresholds f are compared: 1% ≤ f ≤ 5%, 5% ≤ f ≤ 10%, 10% ≤ f ≤ 20%, and f ≥ 20%. The simulation was repeated 1,000 times with n=20,000 diploid individuals.
Fig. 4: ARS for 5 significant phenotypes ordered by the overall significance level. ARS are computed from n=462,694 individuals in the UK Biobank. The error bars represent the 95% confidence interval. The distribution of simulated ARS for each population is shown as a raincloud plot, under the null hypothesis that SNPs associated with the trait are not associated with ancestries (Methods). The dashed black lines represent the average ARS of all populations weighted by the population sizes. The ARS of populations with Benjamini-Hochberg corrected p-values significant at 5% computed from the two-sided empirical test are annotated.
From individuals to ancestries: towards attributing trait variation to haplotypes
  • Preprint
  • File available

March 2025

·

21 Reads

·

Daniel John Lawson

Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic basis of complex traits and diseases, but limitations in SNP-centric approaches to population stratification limit the resolution of fine-scale population structures. Here we consider the use of haplotypes to represent population structure, leveraging haplotype components (HCs) for an improved understanding of trait associations and adjustment for population stratification. Using data from the UK Biobank, we showed that HCs have stronger associations with a range of phenotypes than principal components (PCs) while containing more predictive power for birthplaces globally. In GWAS, HCs-correction identifies more genome-wide significant association signals for birthplace- and lifestyle-related phenotypes, which are missed by PCs-corrected GWAS. Through thorough testing and simulation, we highlight challenges in performing ancestry-specific GWAS, underscoring the critical role of accurate local ancestry inference in studying admixed populations. We analyzed the haplotype structure of the UK Biobank in terms of 93 genetically-distinct populations, which enabled the computation of Ancestral Risk Scores (ARS) across 8 continental populations, providing insights into population-specific genetic risks for traits and diseases. By integrating haplotype information, this framework provides the potential to address challenges in population stratification, enhances GWAS resolution, and supports equitable health research by facilitating genetic studies in diverse populations.

Download

Assessing geographic polarisation in Britain’s digital landscape through stable dynamic embedding of spatial web data

March 2025

·

2 Reads

EPJ Data Science

This paper employs Unfolded Adjacency Spectral Embedding (UASE) to investigate the temporal evolution of economic relationships between locations in Great Britain. We utilise timestamped, geolocated website hyperlinks data between archived, commercial websites in Britain, which are aggregated to create a set of directed, weighted networks of hyperlink connections between Local Authority Districts (LADs) for each year in the period 2005-2010. Thus, we are able to assess the digital evolution of longstanding economic disparities such as the North-South, Urban-Rural, and London versus the rest of the country divides in Britain. Our method is a robust and scalable statistical testing procedure for detecting changes between communities in dynamic networks where changes are expressed in terms of known covariates. We can describe network trends over time with respect to longitude and latitude covariates, relying on spatio-temporal stability properties of UASE to make comparisons of nodes in graphs over time. These trends can be formally tested with p -values as well as interpreted in terms of covariates and features of the network. We show how the methodology can be made robust to the problems of large-scale real-world data, used to detect changes over time, and identify their characteristics. This work provides the first robust evidence that commercial website hyperlink connectivity patterns between the North and South are diverging over time, highlighting an increasing digital divide.


Figure 4: AUASE is not sensitive to the attribute weight hyperparameter α as shown by accuracy on DBLP dataset.
Figure 5: One-dimensional UMAP visualisation of the AUASE node embeddings for varying α ∈ [0.1, 0.9]. The coloured lines show the mean embedding for each community with a 90% confidence interval.
Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees

March 2025

·

2 Reads

Stability for dynamic network embeddings ensures that nodes behaving the same at different times receive the same embedding, allowing comparison of nodes in the network across time. We present attributed unfolded adjacency spectral embedding (AUASE), a stable unsupervised representation learning framework for dynamic networks in which nodes are attributed with time-varying covariate information. To establish stability, we prove uniform convergence to an associated latent position model. We quantify the benefits of our dynamic embedding by comparing with state-of-the-art network representation learning methods on three real attributed networks. To the best of our knowledge, AUASE is the only attributed dynamic embedding that satisfies stability guarantees without the need for ground truth labels, which we demonstrate provides significant improvements for link prediction and node classification.


Schematic diagram of the ancestral inference pipeline and the performance in the UKB BI individuals
a, Main steps in the ancestral inference pipeline. The pipeline accepts individual genotype data (microarray or sequencing data) as input. The genotype data are phased and imputed against a phasing and imputation reference panel in first ‘Phasing’ and ‘Imputation’ stages, and painted against the painting reference panel including preclustered groupings (5 groups in this example; 127 in our actual analysis). In a final mixture fitting stage, non-negative coefficients summing to one and representing the proportions of ancestries from the labeled groups in the reference panel are inferred. b, Geographical average proportion of DNA in BI individuals, positioned according to their birthplaces (Methods), inferred to come from three regional groupings: North Yorkshire, South Yorkshire and South East England (which correspond to the excess ancestry locations colored red). Pop., population; A, proportion of ancestries from each population.
Ancestry inference for UKB individuals born in the United Kingdom or Ireland and worldwide
a, Ancestry inference stratified by birthplace region for UKB WBI individuals; for each regionally labeled bar plot, each column shows ancestry decomposition for a single individual, with colors representing regions shown on the map and numbers representing counts of individuals from each area. b, As a, but showing decomposition for Asian, Oceanian and selected East African countries. Colors are as shown on the map, with colors for ancestry from additional regions given in the legend. White lines on the map delineate the borders of different countries. Self-reported ethnicity labels are shown below each bar plot. Color legends differ in a and b. Self-reported ethnicity: Afr, African; Asi, Asian; Bri, British; Chi, Chinese; Ind, Indian; Ire, Irish; Mix, mixed; Other, other ethnic group; Pa, Pakistani (Asia); Whi, white (Europe).
Comparison between AC-corrected and PC-corrected GWAS
a,b, Predictions are based on ‘linear’ combinations of ACs or PCs. a, Prediction of first 16 UKB PCs (x axes) using a linear model-based prediction from the 127 ACs (y axes) shows strong correlations (R² values). b, As a, but now predicting 16 UK and worldwide ACs from 140 PCs, often showing poor prediction. c–e, Comparison of AC-corrected (x axis) and PC-corrected (y axis) −log10(P values) for SNPs in three exemplar GWAS for labeled traits: birthplace (c), employment score (d) and waist circumference (e). All plots are colored according to the legend shown at the bottom, indicating earlier evidence from GWAS for each SNP in particular phenotypic categories (gray: SNPs show no prior GWAS evidence, perhaps consistent with likely false-positive associations). The horizontal and vertical dark blue lines indicate the genome-wide P-value threshold (P = 5 × 10⁻⁸) in a −log10 scale, while the light blue line represents y = x. In each plot, the points show only independent SNPs with P < 5 × 10⁻⁸ for one or both approaches.
Separation of local and nonlocal factors influencing portability
a, Test principles: in UKB samples with European (blue) and African (red) ancestries, a causal variant contributing to a trait is captured by a tag SNP whose predictive power (pink arrow thickness) varies by ‘local’ ancestry (upper versus lower chromosomes), or nonlocal factors captured by genome-wide ‘global’ ancestry (left versus right individuals); ANCHOR separates these contributions to PGS portability. b–d, βji\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{j}^{i}$$\end{document} values refer to the mean increase in phenotype per PGS unit increase for local ancestry j and global ancestry i (see Methods for further details). b, ANCHOR performance for 24 simulated traits and 53 UKB quantitative traits with PGSs constructed using different P-value thresholds (P = 0.05 and P = 0.0001; right). True effect size correlations ρ (x axis) between African and European ancestries are compared with the ANCHOR estimator βEuAll/βObs.Eu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{{\mathrm{Eu}}}^{{\mathrm{All}}}/{\beta }_{{\mathrm{Obs}}.{\mathrm{Eu}}}$$\end{document} (y axis). Colors denote African ancestry bins, as defined in c. c, Application of ANCHOR for 53 UKB traits across varying African ancestry binned as shown (x axis; colored regions). For each bin, mean estimates across traits of ratios βEu/βObs.Eu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{\mathrm{Eu}}/{\beta }_{\mathrm{Obs}.{\mathrm{Eu}}}$$\end{document} (blue) and βAf/βObs.Eu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{\mathrm{Af}}/{\beta }_{\mathrm{Obs}.{\mathrm{Eu}}}$$\end{document} (red) are shown. Also shown are ratio estimates for individuals of ~100% European (leftmost point at y = 1) or ~100% African (red horizontal bar) ancestry. CIs crossing y = 1 are consistent with identical effects to ~100% European-ancestry individuals, and similarly for red points or bar. d, Mean increase in standing height per PGS unit increase across populations (seven left-hand columns); alongside corresponding ANCHOR estimates for height (final six columns) labeled by global or local ancestry combinations. Data are presented as (weighted) means (b,c) or as estimated values (d) with 95% central bootstrapped CIs. Error bars indicate 95% bootstrapped CIs, Af, African; Eu, European.
ANCHOR results for 53 UKB traits
Data are presented as estimated values of ratio of true effect sizes with 95% central bootstrapped CIs. βji\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{j}^{i}$$\end{document} values refer to mean increase in phenotype per PGS unit increase for local ancestry j and global ancestry i (see Methods for further details). Colors of βji\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{j}^{i}$$\end{document}: blue, European; purple, projected to 100% African ancestry; red, African ancestry. Black rows represent individual UKB traits; the first standing height row uses an existing PGS¹⁶; the dark green rows show combined estimates. Columns (left to right) estimate ρ for ‘all’ 8,003 African-ancestry individuals, ρ for individuals of 100% projected African ancestry and (as expected, reduced) predictive power for African-ancestry segments. From top to bottom, the rows above and below the first horizontal dashed line represent non-molecular and molecular traits and rows above and below the second dashed line represent individual traits and their weighted average estimation. Vertical dotted lines: grey lines indicate ρ = 0.5 (left of the red dotted lines) and ρ = 1.5 (right of the red dotted lines); red lines indicate ρ = 1. ALP, alkaline phosphatase; FVC, forced vital capacity; HDL, high-density lipoprotein; IGF1, Insulin-like growth factor-1; LDL, low-density lipoprotein.
Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits

February 2025

·

33 Reads

·

3 Citations

Nature Genetics

Sile Hu

·

Lino A. F. Ferreira

·

·

[...]

·

Simon R. Myers

Understanding genetic differences between populations is essential for avoiding confounding in genome-wide association studies and improving polygenic score (PGS) portability. We developed a statistical pipeline to infer fine-scale Ancestry Components and applied it to UK Biobank data. Ancestry Components identify population structure not captured by widely used principal components, improving stratification correction for geographically correlated traits. To estimate the similarity of genetic effect sizes between groups, we developed ANCHOR, which estimates changes in the predictive power of an existing PGS in distinct local ancestry segments. ANCHOR infers highly similar (estimated correlation 0.98 ± 0.07) effect sizes between UK Biobank participants of African and European ancestry for 47 of 53 quantitative phenotypes, suggesting that gene–environment and gene–gene interactions do not play major roles in poor cross-ancestry PGS transferability for these traits in the United Kingdom, and providing optimism that shared causal mutations operate similarly in different populations.



The WBK pedigree
a, The best-fitting pedigree (for uncertainties, see Supplementary Note 4). Sampled individuals are outlined in black with WBK ID number and are coloured by mtDNA haplotype. The founding U5b1 + 16189 + @16192 female is shown at the top, with her four descendants with de novo mutations underneath. Further descendants are connected with dashed lines. Matings between descendants of the founding female are shown in bold, labelled i–v. Deduced relationships not fitted on the pedigree are shown with light-grey lines, with the estimated degree of relatedness. b, Weighted relatedness of each genome plotted versus the point carbon-14 date estimate (average 95% confidence range: 202 years). For each, the sum of their total number of biological kinship links (seventh degree or less) is shown, inversely weighted by the degree of the relationship. Individuals are coloured by mtDNA haplotype; grey indicates singleton haplogroups. The Durotrigian period (solid line) and the range of dates of family members (dashed line) are indicated. The summed relatedness is also shown in box plots (Tukey) by sex for individuals in the latter range; a significant difference between males (M) and females (F) is observed (Welch’s t-test, two-tailed, P = 0.029). The frequency of the dominant mtDNA lineage for each group is the proportion of each boxplot body in colour, which was also significantly different (two-tailed Fisher’s exact test, P = 0.02). c, A flexed inhumation excavated at WBK, typical of the Durotrigian cultural zone (photo credit: Bournemouth University). d, mtDNA and Y chromosome haplogroup frequencies for individuals with at least one genetic relative and sufficient Y chromosome coverage (Supplementary Table 9).
Reduced mitochondrial diversity in British Iron Age communities
Trends in mtDNA haplotype diversity (h) for archaeological sites with two or more individuals after pruning of first-degree pairs. Haplotype diversity is calculated as the probability that two randomly selected haplotypes are different (Methods). In the bottom panels, the h value is plotted against the normalized number of relative pairs seen for each site (1, all pairs are genetic relatives; 0, no pairs are genetic relatives; Supplementary Note 5.3). The shaded area represents the 95% confidence interval around the fitted line. There is a strong negative correlation between mtDNA diversity and the number of relatives present for Iron Age sites (Pearson correlation coefficient, P = 0.001, r = −0.449), which is not observed in previous periods of prehistory. When each period is further split into continental and insular (UK and Ireland) individuals (diamonds and circles), we find that the only significant correlation observed is for the British Iron Age (Pearson correlation coefficient, P = 5.853 × 10⁻⁷, r = −0.717). The top panels show the geographical distribution of these h values for sites with evidence of burial guided by kinship (at least one pair of genetic relatives present). Of the total 156 sites considered, 13 sites are less diverse than WBK: 12 from Britain and 1 from a Celtic La Tène period cemetery (320–180 bc) in Hungary¹⁷. The sample sizes for the h value and normalized relative pair estimation for all sites are presented in Supplementary Table 13.
IBD communities in Iron Age Britain show fine-grained geographical structure and include connections across the English Channel
a, The clusters are based on the consensus of 100 runs of the Leiden algorithm on a weighted graph of IBD shared between archaeological sites and show geographical integrity. Twelve major clusters (defining nodes marked with symbols) are labelled on the basis of geographical affiliations, with further substructure within clusters emphasized using different colour shades. The cross-channel clusters are highlighted with dashed lines joining nearest geographical neighbours across the channel. b, An interpolated map showing the distribution of British Bronze Age ancestry across Iron Age Britain, based on average values generated using ChromoPainter NNLS³⁸ and SOURCEFIND³⁷ approaches. The lowest values are seen along the south-central coast. Sites with less than 75% contribution are marked in black. c, A close-up showing most of the sites from the Dorset cluster (red circles) placed within the regional distribution of Durotriges coin finds. WBK is denoted by ‘W’. The distributions are plotted according to refs. 34,35. d, The EEF ancestry proportion through time for the channel core region of continental influence (blue; outlined with dashed line in b) shows a Late Iron Age increase not observed in the sample from the rest of England and Wales (black). The channel core zone is east of longitude −2.8° (western edge of the Durotrigian zone) and south of latitude 51.5° (River Thames). The period between 1000 and 875 bc (grey rectangle) has been previously associated with an increase in EEF ancestry in southern Britain¹⁷. This window is populated mostly by high-EEF samples from the channel core, whereas data points directly preceding this window are mostly from the peripheral regions that retained a lower level of EEF ancestry throughout the Middle Bronze Age (Extended Data Fig. 7 and Supplementary Note 6.2).
Continental influx and pervasive matrilocality in Iron Age Britain

January 2025

·

379 Reads

·

2 Citations

Nature

Roman writers found the relative empowerment of Celtic women remarkable¹. In southern Britain, the Late Iron Age Durotriges tribe often buried women with substantial grave goods². Here we analyse 57 ancient genomes from Durotrigian burial sites and find an extended kin group centred around a single maternal lineage, with unrelated (presumably inward migrating) burials being predominantly male. Such a matrilocal pattern is undescribed in European prehistory, but when we compare mitochondrial haplotype variation among European archaeological sites spanning six millennia, British Iron Age cemeteries stand out as having marked reductions in diversity driven by the presence of dominant matrilines. Patterns of haplotype sharing reveal that British Iron Age populations form fine-grained geographical clusters with southern links extending across the channel to the continent. Indeed, whereas most of Britain shows majority genomic continuity from the Early Bronze Age to the Iron Age, this is markedly reduced in a southern coastal core region with persistent cross-channel cultural exchange³. This southern core has evidence of population influx in the Middle Bronze Age but also during the Iron Age. This is asynchronous with the rest of the island and points towards a staged, geographically granular absorption of continental influence, possibly including the acquisition of Celtic languages.


Valid Bootstraps for Networks with Applications to Network Visualisation

October 2024

·

18 Reads

Quantifying uncertainty in networks is an important step in modelling relationships and interactions between entities. We consider the challenge of bootstrapping an inhomogeneous random graph when only a single observation of the network is made and the underlying data generating function is unknown. We utilise an exchangeable network test that can empirically validate bootstrap samples generated by any method, by testing if the observed and bootstrapped networks are statistically distinguishable. We find that existing methods fail this test. To address this, we propose a principled, novel, distribution-free network bootstrap using k-nearest neighbour smoothing, that can regularly pass this exchangeable network test in both synthetic and real-data scenarios. We demonstrate the utility of this work in combination with the popular data visualisation method t-SNE, where uncertainty estimates from bootstrapping are used to explain whether visible structures represent real statistically sound structures.


Figure 1: Earthquakes contained in the observational datasets found in EarthquakeNPP. Colours indicate the respective datasets, including the target region, magnitude of completeness M c , number of events and the time period that the dataset spans. In red is a fault map from the GEM Global Active Faults Database (Styron & Pagani, 2020).
Figure 3: Test spatial log-likelihood scores for all the spatio-temporal point process models on each of the EarthquakeNPP datasets. Error bars of the mean and standard deviation are constructed for the NPPs using three repeat runs.
EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes

September 2024

·

72 Reads

Classical point process models, such as the epidemic-type aftershock sequence (ETAS) model, have been widely used for forecasting the event times and locations of earthquakes for decades. Recent advances have led to Neural Point Processes (NPPs), which promise greater flexibility and improvements over classical models. However, the currently-used benchmark dataset for NPPs does not represent an up-to-date challenge in the seismological community since it lacks a key earthquake sequence from the region and improperly splits training and testing data. Furthermore, initial earthquake forecast benchmarking lacks a comparison to state-of-the-art earthquake forecasting models typically used by the seismological community. To address these gaps, we introduce EarthquakeNPP: a collection of benchmark datasets to facilitate testing of NPPs on earthquake data, accompanied by a credible implementation of the ETAS model. The datasets cover a range of small to large target regions within California, dating from 1971 to 2021, and include different methodologies for dataset generation. In a benchmarking experiment, we compare three spatio-temporal NPPs against ETAS and find that none outperform ETAS in either spatial or temporal log-likelihood. These results indicate that current NPP implementations are not yet suitable for practical earthquake forecasting. However, EarthquakeNPP will serve as a platform for collaboration between the seismology and machine learning communities with the goal of improving earthquake predictability.


SB-ETAS: using simulation based inference for scalable, likelihood-free inference for the ETAS model of earthquake occurrences

August 2024

·

64 Reads

Statistics and Computing

The rapid growth of earthquake catalogs, driven by machine learning-based phase picking and denser seismic networks, calls for the application of a broader range of models to determine whether the new data enhances earthquake forecasting capabilities. Additionally, this growth demands that existing forecasting models efficiently scale to handle the increased data volume. Approximate inference methods such as inlabru, which is based on the Integrated nested Laplace approximation, offer improved computational efficiencies and the ability to perform inference on more complex point-process models compared to traditional MCMC approaches. We present SB-ETAS: a simulation based inference procedure for the epidemic-type aftershock sequence (ETAS) model. This approximate Bayesian method uses sequential neural posterior estimation (SNPE) to learn posterior distributions from simulations, rather than typical MCMC sampling using the likelihood. On synthetic earthquake catalogs, SB-ETAS provides better coverage of ETAS posterior distributions compared with inlabru. Furthermore, we demonstrate that using a simulation based procedure for inference improves the scalability from O(n2)O(n2)\mathcal {O}(n^2) to O(nlogn)O(nlogn)\mathcal {O}(n\log n). This makes it feasible to fit to very large earthquake catalogs, such as one for Southern California dating back to 1981. SB-ETAS can find Bayesian estimates of ETAS parameters for this catalog in less than 10 h on a standard laptop, a task that would have taken over 2 weeks using MCMC. Beyond the standard ETAS model, this simulation based framework allows earthquake modellers to define and infer parameters for much more complex models by removing the need to define a likelihood function.


Accuracy (higher is better) for 2 GNNs (GCN or GAT) under 2 representations (block diagonal adjacency or unfolding) for 4 datasets. Bold values indicate the highest accuracy for a given GNN/representation pair.
Valid Conformal Prediction for Dynamic GNNs

May 2024

·

35 Reads

Graph neural networks (GNNs) are powerful black-box models which have shown impressive empirical performance. However, without any form of uncertainty quantification, it can be difficult to trust such models in high-risk scenarios. Conformal prediction aims to address this problem, however, an assumption of exchangeability is required for its validity which has limited its applicability to static graphs and transductive regimes. We propose to use unfolding, which allows any existing static GNN to output a dynamic graph embedding with exchangeability properties. Using this, we extend the validity of conformal prediction to dynamic GNNs in both transductive and semi-inductive regimes. We provide a theoretical guarantee of valid conformal prediction in these cases and demonstrate the empirical validity, as well as the performance gains, of unfolded GNNs against standard GNN architectures on both simulated and real datasets.


Citations (41)


... The second approach is supervised learning, in which target individuals are compared to carefully curated reference populations, and recently admixed individuals (which are the majority of individuals) are not directly used. The goal of supervised learning divides into ancestry estimation which can be used analogously to unsupervised genome-wide ancestry profiles 33 , or local ancestry estimation in which the ancestry of particular sections of DNA is inferred. ...

Reference:

Sparse haplotype-based fine-scale local ancestry inference at scale reveals recent selection on immune responses
Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits

Nature Genetics

... First, there are almost no preagricultural sites in the Europe sample. Europe's neolithization was mostly by means of migration from SW Asia (28,29), so its beginnings are truncated relative to Asia and the Americas, where domestication was autochthonous in some regions. Second, and connected to the first point, Apex sites are rare until ~∆2500 and do not begin to increase in inequality until then. ...

Population genomics of post-glacial western Eurasia

Nature

... Ancient DNA (aDNA) studies (Reich 2018; and other references below), supported by earlier isotopic investigations (for example Price et al. 2001;Schulting & Richards 2002), appear at last to have sorted the major outlines of processes of Neolithisation. Virtually everywhere across Europe, including in regions where there were very respectable archaeological arguments in favour of a significant involvement for indigenous people if not indeed a leading role (eg Allentoft et al. 2024), it now seems that incomers ultimately of Near Eastern genetic ancestry were principally responsible for the introduction of the new way of life. ...

100 ancient genomes show repeated population turnovers in Neolithic Denmark

Nature

... 134 PRDM16 is crucial for converting white adipose tissue to brown adipose tissue as part of the thermogenic program and may also protect liver functions under extremely cold conditions. 135 Adaptation to dietary changes. China, with its vast area and two original agricultural centers, experienced a significant Neolithic transition from huntergathering to farming and animal domestication. ...

The selection landscape and genetic legacy of ancient Eurasians

Nature

... 3 According to the global burden of disease study, in sub-Saharan Africa, the latest estimate of MS cases is approximately 49,000, with approximately 2800 new cases yearly. 2 The rising number of cases highlights a critical need for effective diagnostic and therapeutic strategies tailored to sub-Saharan Africa. 4,5 However, managing MS in this region presents distinctive challenges, including limited health care resources, inadequate infrastructure, and insufficient disease awareness among health care providers (HCPs) and the general population. 6,7 This editorial explores current practices in MS management across sub-Saharan Africa and highlights the significant challenges that HCPs and patients face. ...

Elevated genetic risk for multiple sclerosis emerged in steppe pastoralist populations

Nature

... Genetic analysis of one EP2 cat (Jamieson 2021;Jamieson et al. 2023) attributes this specimen to the IV-A clade, which is the most widespread mitochondrial haplogroup among archaeological and modern cats in Europe. Cat remains from the EP2 were recovered from near the ship's mast step and pump well and were intermixed with ballast, which may raise the possibility that the cat remains date to an earlier voyage in the ship's history (e.g., from Spain to Mexico); however, even this possibility supports the conclusion that this cat's ancestors lived in Europe. ...

Limited historical admixture between European wildcats and domestic cats
  • Citing Article
  • November 2023

Current Biology

... Recent genome studies show that molecular adaptations underpinning virus immunity can also be acquired via adaptive introgression among related taxa [87][88][89][90] . To test for genetic introgression among the ten Rhinolophus species, we ran ABBA-BABA tests 91 Table 39). ...

Genetic swamping of the critically endangered Scottish wildcat was recent and accelerated by disease
  • Citing Article
  • November 2023

Current Biology

... In parallel, using convolutional neural networks (CNNs) to analyze earthquake distribution maps highlights the critical role of spatial relationships in earthquake forecasting [23][24][25] . While these approaches provide enhanced computational efficiency and greater resilience to nonstationarity in earthquake catalogs compared to classical models like the Epidemic-Type Aftershock Sequence (ETAS) 14,[26][27][28] , they still struggle with capturing complex spatio-temporal dynamics and integrating heterogeneous data. Existing ML models for earthquake forecasting are often trained and tested within the same geographic region, limiting the ability to assess their generalizability across diverse seismic environments [23][24][25] . ...

Forecasting the 2016–2017 Central Apennines Earthquake Sequence With a Neural Point Process

... The huge sample sizes that can be achieved with case-control or biobank designs that afford statistical power for the detection of new signals is the main reason that they have become more prevalent than family based designs; but at the cost of having to hence deal with population stratification. This is usually achieved by adjusting for principal components (PCs) calculated from a genotype correlation matrix or using statistics relating to estimates of population fine-structure across the sample (Hu et al. 2023). Such adjustments are "global" in the sense that the PCs added to the association model use genome-wide calculations. ...

Leveraging fine-scale population structure reveals conservation in genetic effect sizes between human populations across a range of human phenotypes

... During that period, Early European Farmers (EEF) and Steppe Bronze Age (SBA) genetic ancestries gradually spread into and across Europe blending with the local Western Hunter-Gatherers (WHG) substratum [22][23][24], bringing together genetic components that had evolved separately for up to 20,000 years [25]. Divergent phenotypes in these source populations have been previously described for a few traits using polygenic scoring of ancient samples [26][27][28], and very recently of ancestral segments from modern samples [29], or looking at specific trait-informative Single Nucleotide Polymorphisms (SNPs) [30][31][32]. ...

The Selection Landscape and Genetic Legacy of Ancient Eurasians