Molly Przeworski’s research while affiliated with Columbia University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (391)


Causal interpretations of family GWAS in the presence of heterogeneous effects
  • Article

September 2024

·

12 Reads

·

2 Citations

Proceedings of the National Academy of Sciences

Carl Veller

·

Molly Przeworski

·

Family-based genome-wide association studies (GWASs) are often claimed to provide an unbiased estimate of the average causal effects (or average treatment effects; ATEs) of alleles, on the basis of an analogy between the random transmission of alleles from parents to children and a randomized controlled trial. We show that this claim does not hold in general. Because Mendelian segregation only randomizes alleles among children of heterozygotes, the effects of alleles in the children of homozygotes are not observable. This feature will matter if an allele has different average effects in the children of homozygotes and heterozygotes, as can arise in the presence of gene-by-environment interactions, gene-by-gene interactions, or differences in linkage disequilibrium patterns. At a single locus, family-based GWAS can be thought of as providing an unbiased estimate of the average effect in the children of heterozygotes (i.e., a local average treatment effect; LATE). This interpretation does not extend to polygenic scores (PGSs), however, because different sets of SNPs are heterozygous in each family. Therefore, other than under specific conditions, the within-family regression slope of a PGS cannot be assumed to provide an unbiased estimate of the LATE for any subset or weighted average of families. In practice, the potential biases of a family-based GWAS are likely smaller than those that can arise from confounding in a standard, population-based GWAS, and so family studies remain important for the dissection of genetic contributions to phenotypic variation. Nonetheless, their causal interpretation is less straightforward than has been widely appreciated.


Fig 1. Pedigree structure and mutation rate estimates in zebra finch. (A) Familial relationships
Fig 2. Cumulative distribution of recombination events along both types of chromosomes. The
Fig 4. GC-biased gene conversion and conversion tract length distribution estimates (A) Point
Fig 5. (A) Density of crossovers (blue) and non-crossovers (red) as a function of chromosome size for the
Estimates of number of non-crossovers per chromatid
Conservation of mutation and recombination parameters between mammals and zebra finch
  • Preprint
  • File available

September 2024

·

30 Reads

·

Daria Bykova

·

Carla R Hoge

·

[...]

·

Molly Przeworski

Most of our understanding of the fundamental processes of mutation and recombination stems from a handful of disparate model organisms and pedigree studies of mammals, with little known about other vertebrates. To gain a broader comparative perspective, we focused on the zebra finch (Taeniopygia castanotis), which, like other birds, differs from mammals in its karyotype (which includes many micro-chromosomes), in the mechanism by which recombination is directed to the genome, and in aspects of ontogenesis. We collected genome sequences from three generation pedigrees that provide information about 80 meioses, inferring 202 single-point de novo mutations, 1,174 crossovers, and 275 non-crossovers. On that basis, we estimated a sex averaged mutation rate of 5.0 x 10-9 per base pair per generation, on par with mammals that have a similar generation time. Also as in mammals, we found a paternal germline mutation bias at later stages of gametogenesis (of 1.7 to 1) but no discernible difference between sexes in early development. We also examined recombination patterns, and found that the sex-averaged crossover rate on macro-chromosomes (1.05 cM/Mb) is again similar to values observed in mammals, as is the spatial distribution of crossovers, with a pronounced enrichment near telomeres. In contrast, non-crossover rates are more uniformly distributed. On micro-chromosomes, sex-averaged crossover rates are substantially higher (4.21 cM/Mb), as expected from crossover homeostasis, and both crossover and non-crossover events are more uniformly distributed. At a finer scale, recombination events overlap CpG islands more often than expected by chance, as expected in the absence of PRDM9. Despite differences in the mechanism by which recombination events are specified and the presence of many micro-chromosomes, estimates of the degree of GC-biased gene conversion (59%), the mean non-crossover conversion tract length (~23 bp), and the non-crossover to crossover ratio (6.7:1) are all comparable to those reported in primates and mice. The conservation of mutation and recombination properties from zebra finch to mammals suggest that these processes have evolved under stabilizing selection.

Download

The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair

June 2024

·

17 Reads

·

2 Citations

The rates at which mutations accumulate across human cell types vary. To identify causes of this variation, mutations are often decomposed into a combination of the single-base substitution (SBS) “signatures” observed in germline, soma, and tumors, with the idea that each signature corresponds to one or a small number of underlying mutagenic processes. Two such signatures turn out to be ubiquitous across cell types: SBS signature 1, which consists primarily of transitions at methylated CpG sites thought to be caused by spontaneous deamination, and the more diffuse SBS signature 5, which is of unknown etiology. In cancers, the number of mutations attributed to these 2 signatures accumulates linearly with age of diagnosis, and thus the signatures have been termed “clock-like.” To better understand this clock-like behavior, we develop a mathematical model that includes DNA replication errors, unrepaired damage, and damage repaired incorrectly. We show that mutational signatures can exhibit clock-like behavior because cell divisions occur at a constant rate and/or because damage rates remain constant over time, and that these distinct sources can be teased apart by comparing cell lineages that divide at different rates. With this goal in mind, we analyze the rate of accumulation of mutations in multiple cell types, including soma as well as male and female germline. We find no detectable increase in SBS signature 1 mutations in neurons and only a very weak increase in mutations assigned to the female germline, but a significant increase with time in rapidly dividing cells, suggesting that SBS signature 1 is driven by rounds of DNA replication occurring at a relatively fixed rate. In contrast, SBS signature 5 increases with time in all cell types, including postmitotic ones, indicating that it accumulates independently of cell divisions; this observation points to errors in DNA repair as the key underlying mechanism. Thus, the two “clock-like” signatures observed across cell types likely have distinct origins, one set by rates of cell division, the other by damage rates.



Patterns of recombination in snakes reveal a tug-of-war between PRDM9 and promoter-like features

February 2024

·

59 Reads

·

17 Citations

Science

In some mammals, notably humans, recombination occurs almost exclusively where the protein PRDM9 binds, whereas in vertebrates lacking an intact PRDM9 , such as birds and canids, recombination rates are elevated near promoter-like features. To determine whether PRDM9 directs recombination in nonmammalian vertebrates, we focused on an exemplar species with a single, intact PRDM9 ortholog, the corn snake ( Pantherophis guttatus ). Analyzing historical recombination rates along the genome and crossovers in pedigrees, we found evidence that PRDM9 specifies the location of recombination events, but we also detected a separable effect of promoter-like features. These findings reveal that the uses of PRDM9 and promoter-like features need not be mutually exclusive and instead reflect a tug-of-war that is more even in some species than others.


Causal interpretations of family GWAS in the presence of heterogeneous effects

November 2023

·

20 Reads

·

1 Citation

Family-based genome-wide association studies (GWAS) have emerged as a gold standard for assessing causal effects of alleles and polygenic scores. Notably, family studies are often claimed to provide an unbiased estimate of the average causal effect (or average treatment effect; ATE) of an allele, on the basis of an analogy between the random transmission of alleles from parents to children and a randomized controlled trial. Here, we show that this interpretation does not hold in general. Because Mendelian segregation only randomizes alleles among children of heterozygotes, the effects of alleles in the children of homozygotes are not observable. Consequently, if an allele has different average effects in the children of homozygotes and heterozygotes, as can arise in the presence of gene-by-environment interactions, gene-by-gene interactions, or differences in LD patterns, family studies provide a biased estimate of the average effect in the sample. At a single locus, family-based association studies can be thought of as providing an unbiased estimate of the average effect in the children of heterozygotes (i.e., a local average treatment effect; LATE). This interpretation does not extend to polygenic scores, however, because different sets of SNPs are heterozygous in each family. Therefore, other than under specific conditions, the within-family regression slope of a PGS cannot be assumed to provide an unbiased estimate for any subset or weighted average of families. Instead, family-based studies can be reinterpreted as enabling an unbiased estimate of the extent to which Mendelian segregation at loci in the PGS contributes to the population-level variance in the trait. Because this estimate does not include the between-family variance, however, this interpretation applies to only (roughly) half of the sample PGS variance. In practice, the potential biases of a family-based GWAS are likely smaller than those arising from confounding in a standard, population-based GWAS, and so family studies remain important for the dissection of genetic contributions to phenotypic variation. Nonetheless, the causal interpretation of family-based GWAS estimates is less straightforward than has been widely appreciated.


Down the Penrose stairs, or how selection for fewer recombination hotspots maintains their existence

October 2023

·

27 Reads

·

12 Citations

eLife

In many species, meiotic recombination events tend to occur in narrow intervals of the genome, known as hotspots. In humans and mice, double strand break (DSB) hotspot locations are determined by the DNA-binding specificity of the zinc finger array of the PRDM9 protein, which is rapidly evolving at residues in contact with DNA. Previous models explained this rapid evolution in terms of the need to restore PRDM9 binding sites lost to gene conversion over time, under the assumption that more PRDM9 binding always leads to more DSBs. This assumption, however, does not align with current evidence. Recent experimental work indicates that PRDM9 binding on both homologs facilitates DSB repair, and that the absence of sufficient symmetric binding disrupts meiosis. We therefore consider an alternative hypothesis: that rapid PRDM9 evolution is driven by the need to restore symmetric binding because of its role in coupling DSB formation and efficient repair. To this end, we model the evolution of PRDM9 from first principles: from its binding dynamics to the population genetic processes that govern the evolution of the zinc finger array and its binding sites. We show that the loss of a small number of strong binding sites leads to the use of a greater number of weaker ones, resulting in a sharp reduction in symmetric binding and favoring new PRDM9 alleles that restore the use of a smaller set of strong binding sites. This decrease, in turn, drives rapid PRDM9 evolutionary turnover. Our results therefore suggest that the advantage of new PRDM9 alleles is in limiting the number of binding sites used effectively, rather than in increasing net PRDM9 binding. By extension, our model suggests that the evolutionary advantage of hotspots may have been to increase the efficiency of DSB repair and/or homolog pairing.


Disentangling sources of clock-like mutations in germline and soma

September 2023

·

61 Reads

·

3 Citations

The rates of mutations vary across cell types. To identify causes of this variation, mutations are often decomposed into a combination of the single base substitution (SBS) "signatures" observed in germline, soma and tumors, with the idea that each signature corresponds to one or a small number of underlying mutagenic processes. Two such signatures turn out to be ubiquitous across cell types: SBS signature 1, which consists primarily of transitions at methylated CpG sites caused by spontaneous deamination, and the more diffuse SBS signature 5, which is of unknown etiology. In cancers, the number of mutations attributed to these two signatures accumulates linearly with age of diagnosis, and thus the signatures have been termed "clock-like." To better understand this clock-like behavior, we develop a mathematical model that includes DNA replication errors, unrepaired damage, and damage repaired incorrectly. We show that mutational signatures can exhibit clock-like behavior because cell divisions occur at a constant rate and/or because damage rates remain constant over time, and that these distinct sources can be teased apart by comparing cell lineages that divide at different rates. With this goal in mind, we analyze the rate of accumulation of mutations in multiple cell types, including soma as well as male and female germline. We find no detectable increase in SBS signature 1 mutations in neurons and only a very weak increase in mutations assigned to the female germline, but a significant increase with time in rapidly-dividing cells, suggesting that SBS signature 1 is driven by rounds of DNA replication occurring at a relatively fixed rate. In contrast, SBS signature 5 increases with time in all cell types, including post-mitotic ones, indicating that it accumulates independently of cell divisions; this observation points to errors in DNA repair as the key underlying mechanism. Thus, the two "clock-like" signatures observed across cell types likely have distinct origins, one set by rates of cell division, the other by damage rates.


Figure 1: Genome sequences of corn snakes and PRDM9 zinc finger alleles in our samples. A) Sample collection locations for wild-caught individuals are shown for the 19 individuals depicted by a diamond. The number in each diamond indicates the mean fold-coverage of whole genome sequencing. B) The pedigree structures for samples from the colony, also including "unrelated" individuals, indicated with an asterisk. The number in each diamond indicates the mean fold-coverage of genome sequencing. C) PRDM9 zinc finger domain structure for 22 PRDM9 alleles, grouped and aligned by the similarity of their computationally-predicted binding affinity. Zinc fingers with distinct predictions for their binding affinities are shown in different colors; loosely, more similar colors represent zinc fingers with more similar computationally-predicted binding affinities (Figure S3). Each observation of a given allele is shown in the table; gold diamonds indicate wild samples and red diamonds colony samples. If the same allele was identified multiple times in closely related individuals, it is only shown once. The purple box highlights a succession of 11 zinc fingers (Shared 11-ZF) that are shared among five different alleles, including the only allele seen more than twice in the sample, PRDM9-A.
Figure 3: Footprints of recombination in divergence data. A) The ratio of losses in the corn snake lineage relative to the black rat snake (in magenta) and the ratio of gains in the corn snake lineage relative to the black rat
Figure 5) Possible role for ZCWPW2 in the tug of war between promoter-like features and PRDM9. A) In vitro binding affinity of mouse (blue) and snake (purple) ZCWPW2 for histone peptides methylated
Patterns of recombination in snakes reveal a tug of war between PRDM9 and promoter-like features

July 2023

·

126 Reads

·

3 Citations

In vertebrates, there are two known mechanisms by which meiotic recombination is directed to the genome: in humans, mice, and other mammals, recombination occurs almost exclusively where the protein PRDM9 binds, while in species lacking an intact PRDM9 , such as birds and canids, recombination rates are elevated near promoter-like features. To test if PRDM9 also directs recombination in non-mammalian vertebrates, we focused on an exemplar species, the corn snake ( Pantherophis guttatus ). Unlike birds, this species possesses a single, intact PRDM9 ortholog. By inferring historical recombination rates along the genome from patterns of linkage disequilibrium and identifying crossovers in pedigrees, we found that PRDM9 specifies the location of recombination events outside of mammals. However, we also detected an independent effect of promoter-like features on recombination, which is more pronounced on macrothan microchromosomes. Thus, our findings reveal that the uses of PRDM9 and promoter-like features are not mutually-exclusive, and instead reflect a tug of war, which varies in strength along the genome and is more lopsided in some species than others. One sentence summary While the localization of meiotic recombination in vertebrates was previously thought to occur using one of two distinct mechanisms, our analysis of recombination in corn snakes reveals that they and likely other vertebrates use both of these mechanisms.


Limited role of generation time changes in driving the evolution of the mutation spectrum in humans

February 2023

·

54 Reads

·

11 Citations

eLife

Recent studies have suggested that the human germline mutation rate and spectrum evolve rapidly. Variation in generation time has been linked to these changes, though its contribution remains unclear. We develop a framework to characterize temporal changes in polymorphisms within and between populations, while controlling for the effects of natural selection and biased gene conversion. Application to the 1000 Genomes Project dataset reveals multiple independent changes that arose after the split of continental groups, including a previously reported, transient elevation in TCC>TTC mutations in Europeans and novel signals of divergence in C>G and T>A mutation rates among population samples. We also find a significant difference between groups sampled in and outside of Africa, in old T>C polymorphisms that predate the out-of-Africa migration. This surprising signal is driven by TpG>CpG mutations, and stems in part from mis-polarized CpG transitions, which are more likely to undergo recurrent mutations. Finally, by relating the mutation spectrum of polymorphisms to parental age effects on de novo mutations, we show that plausible changes in the generation time cannot explain the patterns observed for different mutation types jointly. Thus, other factors--genetic modifiers or environmental exposures--must have had a non-negligible impact on the human mutation landscape.


Citations (41)


... Examples of such assumptions have included purely additive genetic architectures, random mating, absence of indirect genetic effects, representative sampling, Mendelian randomization's exclusion restriction assumption, and independence of genotypes and effects, among others. Understanding that the progress made possible by such simplifications must be balanced by sensitivity analysis and evaluation of alternative models, recent work has interrogated the consequences of perturbing such assumptions, including non-random mating (1)(2)(3)(4)(5), fine-scale population structure (6,7), participation bias (8,9), indirect genetic effects (10)(11)(12), and non-additivity (13)(14)(15)(16)(17)(18). ...

Reference:

Simple models of non-random mating and environmental transmission bias standard human genetics statistical methods
Causal interpretations of family GWAS in the presence of heterogeneous effects
  • Citing Article
  • September 2024

Proceedings of the National Academy of Sciences

... For instance, exogenous processes like smoking or replicative processes like homologous recombination-based repair errors can leave characteristic signatures in the spectrum of tumor mutations. A recent study highlighted that signatures of exogenous and replicative mutations can be distinguished from sequencing data of tissues known to divide at different rates, even if the mutations are of unknown etiology [36]. By measuring the relative contribution of these signatures, one can identify exogenous or replicative mutation dominant tumors. ...

The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair

... The distance between these loci on a single chromosome ranges from 467 Kbp to 23.9 Mbp and would likely span multiple areas of recombination (linkage disequilibrium decay). However, the rates and location of recombination hotspots evolves rapidly in snakes (Hoge et al. 2024;Schield et al. 2020). It is unknown if these clusters of genes are represent a single focus of selection or an inversion, which of course identifies limitations of this target-capture dataset when compared to other approaches, for example genome assemblies from long-read sequencing (e.g., Mérot et al. 2023). ...

Patterns of recombination in snakes reveal a tug-of-war between PRDM9 and promoter-like features
  • Citing Article
  • February 2024

Science

... To focus on confounding, we assume no G×E and G×G interactions (the effects of G×E and G×G on estimates produced by population-and family-based designs are studied in ref. [44]). We derive expressions for estimators of direct effects in both population and withinfamily GWASs, as functions of the true direct and indirect effects at a locus and the genetic confounds induced by other loci. ...

Causal interpretations of family GWAS in the presence of heterogeneous effects

... Second, in some vertebrate species, principally in mammals, recombination is directed towards binding sites of the zinc-finger protein PRDM9 (Baudat et al. 2010;Myers et al. 2010; A c c e p t e d M a n u s c r i p t 2017; Cavassim et al. 2022). In these species, rapid evolution of recombination landscapes is mediated by intra-genomic conflict (Úbeda and Wilkins 2010;Latrille et al. 2017;Baker et al. 2023) and genetic variants altering recombination landscapes in this way are therefore usually studied through the prism of internal genome dynamics and not within the framework of traditional recombination modifier theory (Genestier et al. 2023). ...

Down the Penrose stairs, or how selection for fewer recombination hotspots maintains their existence

eLife

... Tobacco smoke is known to affect the accumulation of de novo micro/minisatellites 6 , but its effects on de novo point mutations, the best-characterised form of genetic variation, have not yet been studied. Genetic factors may also influence germline mutation rate 7 . Rare variants in DNA repair genes are well-known modifiers of somatic mutation rates and spectra [8][9][10] , and have been shown to contribute to elevated rates of germline mutation 4,10 . ...

Disentangling sources of clock-like mutations in germline and soma

... Interestingly, snakes, which have a functional PRDM9, perform recombination at both TSSs and PRDM9-binding sites (Schield et al., 2020). It is believed that there is a tug-of-war competition between TSSs-associated factors and PRDM9 for recruiting downstream factors in the recombination pathway, and that in snakes the strength of these two complexes are somewhat balanced (Hoge et al., 2023). The rapid evolution of PRDM9 may allow different variants, each varying in their relative strength in this tug-of-war competition, to appear in succession within a given lineage. ...

Patterns of recombination in snakes reveal a tug of war between PRDM9 and promoter-like features

... the mutation or recombination landscape between lineages may lead to a reduction in the likelihood of repeated mutations within the same gene, while changes in gene order may alter the regulatory context of the gene and subsequently lower the probability of gene reuse. Empirical studies have shown that mutation landscapes, encompassing point and structural mutations, as well as the transposition of selfish elements, diversify throughout evolution [24,25]. Furthermore, this diversification was reported to increase with divergence time for point mutations in apes [26]. ...

Limited role of generation time changes in driving the evolution of the mutation spectrum in humans

eLife

... However, we note that a major limitation of our study is that we are unable to obtain fine-scale estimates of selection and dominance parameters for strongly deleterious mutations, which as defined here encompass a wide range of |s| from 0.01 to 1. This limitation is due to SFS-based methods being underpowered for estimating the strongly deleterious tail of the DFE [52], due to the fact that such mutations tend not to be segregating in genetic variation datasets [56][57][58]. Moreover, our diffusion-based approach may also be limited in inferring dominance parameters for strongly deleterious mutations given that the diffusion approximation breaks down under strong selection [59]. ...

Relating pathogenic loss-of function mutations in humans to their evolutionary fitness costs

eLife

... Second, there is compelling evidence that, on average, risk alleles associated with these disorders are under weak to strong purifying selection (Rees et al., 2011;Mullins et al., 2017;Keller, 2018;Pardiñas et al., 2018;Huang & Siepel, 2019;Esteller-Cucala et al., 2020;Rapaport et al., 2021;Wendt et al., 2021). These lines of evidence converge in recent population genetic analyses which apply a simple mutation-selection balance model (either as a deterministic approximation or explicitly incorporating drift) to provide direct estimates of the strength of selection against heterozygous LOF in humans (Cassa et al., 2017;Weghorn et al., 2019;Agarwal et al., 2022). Strikingly, negative selection was strongest for LOF-intolerant TR genes (Cassa et al., 2017). ...

Relating pathogenic loss-of function mutations in humans to their evolutionary fitness costs