ArticlePDF Available

Predicting educational achievement from DNA

Authors:

Abstract and Figures

A genome-wide polygenic score (GPS), derived from a 2013 genome-wide association study (N=127,000), explained 2% of the variance in total years of education (EduYears). In a follow-up study (N=329,000), a new EduYears GPS explains up to 4%. Here, we tested the association between this latest EduYears GPS and educational achievement scores at ages 7, 12 and 16 in an independent sample of 5825 UK individuals. We found that EduYears GPS explained greater amounts of variance in educational achievement over time, up to 9% at age 16, accounting for 15% of the heritable variance. This is the strongest GPS prediction to date for quantitative behavioral traits. Individuals in the highest and lowest GPS septiles differed by a whole school grade at age 16. Furthermore, EduYears GPS was associated with general cognitive ability (~3.5%) and family socioeconomic status (~7%). There was no evidence of an interaction between EduYears GPS and family socioeconomic status on educational achievement or on general cognitive ability. These results are a harbinger of future widespread use of GPS to predict genetic risk and resilience in the social and behavioral sciences.Molecular Psychiatry advance online publication, 19 July 2016; doi:10.1038/mp.2016.107.
Content may be subject to copyright.
OPEN
ORIGINAL ARTICLE
Predicting educational achievement from DNA
S Selzam
1
, E Krapohl
1
, S von Stumm
2
,PFOReilly
1
, K Rimfeld
1
, Y Kovas
1,2,3
, PS Dale
4
,JJLee
5
and R Plomin
1
A genome-wide polygenic score (GPS), derived from a 2013 genome-wide association study (N= 127,000), explained 2% of
the variance in total years of education (EduYears). In a follow-up study (N= 329,000), a new EduYears GPS explains up to 4%.
Here, we tested the association between this latest EduYears GPS and educational achievement scores at ages 7, 12 and 16 in an
independent sample of 5825 UK individuals. We found that EduYears GPS explained greater amounts of variance in educational
achievement over time, up to 9% at age 16, accounting for 15% of the heritable variance. This is the strongest GPS prediction to
date for quantitative behavioral traits. Individuals in the highest and lowest GPS septiles differed by a whole school grade at age 16.
Furthermore, EduYears GPS was associated with general cognitive ability (~3.5%) and family socioeconomic status (~7%). There was
no evidence of an interaction between EduYears GPS and family socioeconomic status on educational achievement or on general
cognitive ability. These results are a harbinger of future widespread use of GPS to predict genetic risk and resilience in the social
and behavioral sciences.
Molecular Psychiatry advance online publication, 19 July 2016; doi:10.1038/mp.2016.107
INTRODUCTION
Identifying the genetic variants responsible for the ubiquitous
heritability of behavioral dimensions and disorders is transforming
genetic research in the social and behavioral sciences by making it
possible to predict genetic strengths and weaknesses of
individuals from DNA alone.
1
Over the past decade, genome-
wide association (GWA) research across the life sciences has
revealed that there are almost no genetic variants with large
effects on complex traits and common disorders.
2
This consistent
nding implies that the heritability of behavioral traits is due to
many genetic variants of small effect. GWA studies of behavioral
traits began to be successful as their sample sizes increased
sufciently to detect associations of very small effect size between
single-nucleotide polymorphisms (SNPs) and outcome.
3
Although
the largest effect sizes of the associations between SNPs and
behavioral traits are very small, it is possible to aggregate
the effects of thousands of SNP associations, ranked by effect
size, into a SNP genotypic score for a particular trait.
46
Here, we
refer to this SNP genotypic score as a genome-wide polygenic
score (GPS).
7
Although many different labels have been ascribed
to polygenic scores that usually include the word risk, we prefer
GPS. It highlights the genome-wide nature of these polygenic
scores and encompasses positive as well as negative effects
implied by the normal distribution of polygenic scores.
4
The largest GWA analysis of a behaviorally relevant trait so far
was performed on years of education, which is a proxy for
educational achievement and to a lesser extent for learning
ability.
8
Information about the years spent in education is available
in many GWA samples because it is a demographic descriptor. In
2013, a GWA analysis of EduYears based on 126,559 individuals
was published.
9
The corresponding GPS accounted for 23%
of the variance in years of education in independent samples.
9,10
The latest GWA on years of education published in 2016 included
329,000 individuals.
8
A revised GPS based on this new GWA
almost doubled the effect size, with EduYears GPS explaining 3.9%
of the variance in years of education in an independent sample.
8
EduYears GPS has also been associated with other phenotypes,
most notably, measured educational achievement. In a Dutch
study, the 2013 EduYears GPS accounted for around 2% of the
variance in educational achievement in a sample of about 1000
children tested at age 12.
11
A UK-based longitudinal study of 4500
participants reported signicant associations between the 2013
EduYears GPS and educational achievement at 7, 11 and 16;
12
however, the authors did not report the phenotypic variance
explained by EduYears GPS. In a subsample of the present study
of ~ 3000 individuals, we previously found that the 2013 EduYears
GPS accounted for about 2% of the variance in educational
achievement at age 16.
13
The present study evaluates the extent to which a GPS
constructed on the basis of the published summary statistics of
the 2016 GWA analysis of years of education in adulthood predicts
educational achievement assessed during the school years, which
we have shown to be about 60% heritable estimated by the twin
design.
14,15
Using effect size estimates from the 2016 EduYears
GWA analysis, we calculated a GPS for each individual in a sample
of 5825 unrelated UK students for whom we had educational
achievement scores at ages 7, 12 and 16 based on UK-wide
assessments of the national curriculum.
As mentioned, the 2016 EduYears GPS is based on a GWA
sample almost three times as large as the 2013 GWA (329,000
versus 127,000), and as a result, the amount of variance that
EduYears GPS accounted for in the discovery sample doubled
(~4 versus 2%). Accordingly, here we tested the extent to which
the 2016 EduYears GPS accounts for more variance in educational
1
Kings College London, MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, London, UK;
2
Department of Psychology,
Goldsmiths University of London, London, UK;
3
Laboratory for Cognitive Investigations and Behavioural Genetics, Tomsk State University, Tomsk, Russia;
4
Department of Speech
and Hearing Sciences, University of New Mexico, Albuquerque, NM, USA and
5
Department of Psychology, University of Minnesota Twin Cities, Minneapolis, MN, USA.
Correspondence: Professor R Plomin, Kings College London, MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, 16
DeCrespigny Park, London SE5 8AF, UK.
E-mail: robert.plomin@kcl.ac.uk
Received 1 April 2016; revised 10 May 2016; accepted 23 May 2016
Molecular Psychiatry (2016) 00, 16
www.nature.com/mp
achievement than the 2013 EduYears GPS. In addition, we
addressed two specic questions about the role of EduYears GPS
for educational achievement.
First, we tested the extent to which the 2016 EduYears GPS is
associated with general cognitive ability (g, aka intelligence) and
with family socioeconomic status (SES), both of which pheno-
typically correlate with educational achievement ~ 0.400.50.
16
Using summary statistics derived from GWA analyses, a study
applying the LD score regression method
17
identied very high
genetic correlations between years of education and childhood IQ
(rg = 0.73).
18
In a subsample of ~ 3000 individuals from the current
study, the 2013 EduYears GPS accounted for ~ 2% of the variability
in gat age 16.
19
We also reported that this GPS explained ~ 2.5%
of the variance in family SES, which refers to the SES of the
childrens parents.
13
In the present study, we predicted that the
2016 EduYears GPS would yield stronger associations with gand
family SES than previously found for the 2013 EduYears GPS. In
addition, we tested whether the 2016 EduYears GPS is signicantly
associated with educational achievement independent of gand
family SES.
Second, we tested the hypothesis that SES moderates genetic
inuences on educational achievement and g, as predicted by
previous studies that observed decreased heritability estimates in
low compared with high SES families.
20
This genotypeenviron-
ment interaction hypothesis leads to the prediction that EduYears
GPS is more strongly associated with educational achievement
and gin high compared with low-SES families. In addition, we
tested whether this genotypeenvironment interaction increased
from childhood through adolescence as family SES should have a
progressively stronger effect on these aspects of childrens lives
if the genotypeenvironment interaction hypothesis is correct.
MATERIALS AND METHODS
Participants
This study included unrelated individuals from the multivariate long-
itudinal Twins Early Development Study that recruited almost 17,000 twin
pairs born in England and Wales between 1994 and 1996.
21
The sample
is representative of British families in ethnicity, family SES and parental
occupation.
21
The genotyped subsample is representative of UK census
data at rst contact (Supplementary Table S1). The Institute of Psychiatry,
Psychology and Neuroscience ethics committee (05.Q0706/228) granted
project approval and parental consent was obtained prior to data
collection.
DNA for 3497 individuals was extracted from saliva samples and
hybridized to HumanOmniExpressExome-8v1.2 genotyping arrays at the
MRC SGDP Centre Molecular Genetics Laboratories. The raw image data
from the array were normalized, pre-processed, and ltered in GenomeS-
tudio according to Illumina Exome Chip SOP v1.4. (http://conuence.brc.
iop.kcl.ac.uk:8090/display/PUB/Production+Version%3A+Illumina+Exome
+Chip+SOP+v1.4). In addition, prior to genotype calling, 869 multi-
mapping SNPs and 353 samples with call rate o0.95 were removed.
The ZCALL program
22
was used to augment the genotype calling for
samples and SNPs that passed the initial QC.
DNA from an additional 3665 samples genotyped earlier in the project
was extracted from buccal cheek swabs and genotyped at Affymetrix
(Santa Clara, CA, USA). Samples were successfully hybridized to Affymetrix-
GeneChip 6.0 SNP genotyping arrays (http://www.affymetrix.com/support/
technical/datasheets/genomewide_snp6_datasheet.pdf) using experimental
protocols recommended by the manufacturer (Affymetrix). The raw image
data from the arrays were normalized and pre-processed at the Wellcome
Trust Sanger Institute (Hinxton, UK) for genotyping as part of the Wellcome
Trust Case Control Consortium 2 (https://www.wtccc.org.uk/ccc2/) according
to the manufacturers guidelines (http://www.affymetrix.com/support/down-
loads/manuals/genomewidesnp6_manual.pdf). Genotypes for the Affymetrix
arrays were called using CHIAMO (https://mathgen.stats.ox.ac.uk/genetics_-
software/chiamo/chiamo.html).
After initial quality control and genotype calling, the same quality
control was performed on the samples genotyped on the Illumina and
Affymetrix arrays separately using PLINK,
23
R
24
and VCFtools.
25
Samples
were removed from subsequent analyses on the basis of call rate (o0.99),
suspected non-European ancestry, heterozygosity, array signal intensity
(44 s.d. from the mean) and relatedness. SNPs were excluded if the
minor allele frequency was o0.05%, if more than 1% of genotype
data were missing or if the Hardy Weinberg P-value was lower than 10
5
.
Non-autosomal markers and indels were removed. Association between
the SNP and the array, batch or plate on which samples were genotyped
was calculated; SNPs with an effect P-value less than 10
3
were excluded.
A total sample of 5825 samples, with 2698 individuals genotyped on
Illumina and 3127 individuals genotyped on Affymetrix, remained after
quality control.
Genotypes from the two arrays were separately imputed using the
Haplotype Reference Consortium
26
and Minimac3 1.0.13
27,28
available on
the Michigan Imputation Server (https://imputationserver.sph.umich.edu) as
reference data. A series of quality checks were performed before merging
data from the two arrays imputation (e.g. array effects, allele frequencies
by imputation quality). For the present analyses, we limited our analyses to
variants genotyped or imputed at info 40.95 on both arrays, and with
Hardy Weinberg Equilibrium test P-value 410
5
. After stringent pruning to
remove markers in high linkage disequilibrium (R
2
40.1) and excluding
high linkage disequilibrium genomic regions so as to ensure that only
genome-wide effects were detected, we performed Principal Component
Analysis on a subset of 40, 745 autosomal SNPs that remained after
applying our quality control criteria, and that overlapped between the two
genotyping arrays. To control for population stratication, we regressed
the GPS on the rst 10 principal components and used the residuals in all
subsequent analyses.
Measures
National Curriculum levels age 7 and 12. English and mathematics
National Curriculum levels were collected from teachers when the twins
were aged 7 (M = 7.2, s.d. = 0.27) and 12 (M = 11.4, s.d. = 0.66). National
Curriculum data and genotypes were available for 4047 children at age 7
and 2950 at age 12. The assessments are based on a rubric aligned with
the UK National Curriculum, which is the standardized core academic
curriculum formulated by the National Foundation for Educational
Research (NFER) and the Qualications and Curriculum Authority (QCA)
(NFER: http://www.nfer.ac.uk/index.cfm; QCA: http://www.qca.org.uk). After
receiving parental consent, teachers were contacted directly via mail.
Teacher ratings assessed two main abilities: English (including speaking
and listening,readingand writing) and mathematics (including using
and applying mathematics,numbersand shapes, space and measures).
At age 7 and 12, teachers rated National Curriculum levels on a 5-point
and 9-point scale, respectively, with higher scores representing greater
ability. Mathematics and English abilities correlated 0.74 and 0.81 at age 7
and 12, respectively. Therefore, we created overall academic achievement
mean scores by calculating the standardized mean for the English and
mathematics scores for both ages.
General Certicate of Secondary Education measures age 16. The General
Certicate of Secondary Education (GCSE) is a standardized UK-based
examination taken at the end of compulsory education at age 16. In
addition to the compulsory core subjects of English, mathematics and
science, students can choose from a variety of subjects such as physical
education, music, geography, modern foreign languages, and information
and communication technology.
GCSE results were obtained by questionnaires sent via mail and by
telephone interviews of parents and twins themselves. The grades were
coded to range from 4 (G; the minimum pass grade) to 11 (A*; the best
possible grade). The GCSE score used in this study represents the mean of
the compulsory core subjects mathematics and English (if both English
language and English literature were taken, a mean grade for English was
derived). The two subjects correlated 0.70. We included only mathematics
and English grades in the composite score to improve comparability
between the educational achievement measures at the different ages. Self-
reported GCSE grades of Twins Early Development Study participants show
high accuracy, correlating 0.98 English and 0.99 for mathematics grades
with data obtained for a subsample from the National Pupil database (NPD:
https://www.gov.uk/government/collections/national-pupil-database).
14
Data for subject grades and genotypes were available for 4301 twins
(mean age = 16.62, s.d. = 0.32).
General cognitive ability (g). To measure general cognitive ability, the
twins were assessed on various tests including verbal and non-verbal
abilities at age 7, 12 and 16. A mean score composite was derived from
Predicting educational achievement from DNA
S Selzam et al
2
Molecular Psychiatry (2016), 1 6
four tests (Conceptual Grouping,
29
Similarities,
30
Vocabulary,
30
Picture
Completion
30
) at age 7; three tests (Ravens Progressive Matrices,
31
General Knowledge
32
Picture Completion
30
) at age 12; and two tests
(Ravens Progressive Matricesand Mill Hill Vocabulary test) at age 16.
Behavioral and genotypic data were available for 3559 individuals at age 7
(M = 7.17, s.d. = 0.29); 3349 individuals at age 12 (M = 11.46, s.d. = 0.64) and
1743 individuals at age 16 (M = 16.52, s.d. = 0.30). General cognitive ability
measures at the different ages correlated on average 0.48. For simplicity
we created a general cognitive ability mean composite based on data
available at ages 7, 12 and 16. Only participants with data from at least two
ages were included (N= 2228), and mean imputation was performed on
those with a missing third measure. We also report results related to
general cognitive ability measured at each age individually in
Supplementary Table S6.
Family SES. A composite of several factors such as parental education and
occupation is considered to reect SES better than any single factor.
33
Data
from 4958 genotyped individuals were available for family SES. This
measure represents maternal age at birth of eldest child, the mean score of
maternal and paternal highest education level, as well as the respondents
(mother or father) occupation, administered by the Standard Occupational
Classication 2000 (Ofce for National Statistics, 2000) at child age 2, which
was the rst age of contact.
Small but signicant mean differences between girls and boys were
found for educational achievement at all ages (Supplementary Table S2).
Small age effects were found for educational achievement within each of
the three ages (Supplementary Table S2). Therefore, all measures with the
exception of SES and EduYears GPS were recalculated as standardized
residuals corrected for gender and age. To account for a slight negative
skew in educational achievement tests at age 7 and 16 and a slight positive
skew at age 12, measures were quantile normalized.
34
Statistical analyses
Genome-wide polygenic scores. We computed GPS for 5825 unrelated
individuals using β-weights and P-values from summary statistics obtained
by GWA analysis. Summary statistics were derived from the 2016 GWA
study on years of education
8
with a sample size of 328,918 individuals. It
should be noted that the summary statistics we used are slightly different
to those of the 2016 EduYears study:
8
here 23andMe data are excluded due
to legal restrictions, and an initial release of the UK Biobank data are
included (see Supplementary Table S3 for cohort details). GPS based on
these modied summary statistics correlated highly (r= 0.86) with the
published GPS
8
when both GPS were constructed using the Health and
Retirement Study as target sample. Quality-controlled SNPs were clumped
for linkage disequilibrium in PRSice,
35
using R
2
= 0.1 cutoff within a 250-kb
window. In toal, 108,737 SNPs remained after linkage disequilibrium
clumping. We used PRSice
35
to calculate polygenic scores. Firstly, PRSice
calculated GPS for each individual in our sample by summing the trait-
associated SNPs that are weighted by their effect size derived from GWA
analysis. PRSice then performed a regression analysis to test for association
between GPS and each of our outcomes (educational achievement at 7, 12,
16, SES and g). This is repeated for GPS calculated at a large number of
P-value thresholds, ranging from 0.001 to 1 (increments of 0.001) in the
GWA results, under the high-resolution scoring option in PRSice. Through
this high-resolution scoring we identied the best-tGPS for all measures
(Supplementary Table S4), which were used throughout our analyses for
each respective trait. The best-tGPS is identied as that which gives the
smallest P-value for association with outcome among all the regression
tests performed on the GPS (see Supplementary Figures S4). Given the
multiple testing involved in high-resolution scoring we use an association
signicance threshold of P= 0.001, as recommended in Euesden et al.
35
For our GPS analyses, we have more than 80% power to explain 0.2% of
the phenotypic variance (see Supplementary Methods S1 for details). To
test interactions between different levels of EduYears GPS and family SES,
we have more than 80% power to detect a small interaction effect of
η
2
= 0.02 (given α= 0.05; N= 600; number of groups = 4).
We performed regression analyses with EduYears GPS as a predictor of
educational achievement at ages 7, 12 and 16, as well as of gand family
SES. To test for potential differences between correlations between
EduYears GPS and educational achievement at the different ages, we
performed Fishersr-to-ztransformations. We also used multiple regression
to test whether associations between EduYears GPS and educational
achievement remain after controlling for family SES and g. We also tested
for mean differences in educational achievement between the extreme
septiles of EduYears GPS at each age using analyses of variance. Finally,
interaction effects between EduYears GPS and SES on educational
achievement and on gwere analyzed using multiple regression models
that included each main effect and the interaction effect term.
RESULTS
Polygenic score analyses
As illustrated in Figure 1, EduYears GPS accounted for a signicant
proportion of variance in educational achievement at all ages,
increasing from age 7 (R
2
= 0.028,Po0.001) to age 12 (R
2
= 0.046,
Po0.001) to age 16 (R
2
= 0.091, Po0.001). Betas indicated that an
increase of one standard deviation in EduYears GPS resulted in a
z-standardized mean educational achievement score increase of
0.17, 0.21 and 0.30 at age 7, 12 and 16, respectively. The increase
in association between EduYears GPS and educational achieve-
ment between age 7 and age 16 was signicant, as was the
association between age 12 and age 16, but not between age 7
and 12 (Supplementary Table S5).
EduYears GPS was also associated with g(R
2
= 0.036,Po0.001)
and family SES (R
2
= 0.073,Po0.001) (Figure 1). Additionally,
EduYears GPS signicantly predicted gat ages 7, 12 and 16
(Supplementary Table S6); these associations were not statistically
different. Because educational achievement, g, and family SES are
intercorrelated phenotypically (Supplementary Table S6), we
tested the effect of EduYears GPS on educational achievement
independent of gand SES by including gand SES into a regression
model before entering EduYears GPS. After adjusting the P-value
threshold for multiple testing (see the Materials and methods
section), EduYears GPS remained a signicant predictor of
educational achievement at age 16 after accounting for gand
SES, although the effect size was reduced to 1.2% of the variance
explained (Supplementary Table S7).
Extreme group differences
Figure 2 shows the z-standardized mean educational achievement
scores by EduYears GPS septiles. At all ages, individuals scoring in
the highest EduYears GPS septile performed on average
Figure 1. Variance explained (R
2
) and standard error of EduYears GPS
predicting: EA 7 =educational achievement age 7; EA 12 =educa-
tional achievement age 12; EA 16 =educational achievement age 16;
g=general cognitive ability; SES =family socioeconomic status; in
this analysis and all subsequent analyses, the unique best-tGPS
was used for each respective trait; see the Materials and methods
section for details. GPS, genome-wide polygenic score.
Predicting educational achievement from DNA
S Selzam et al
3
Molecular Psychiatry (2016), 1 6
signicantly and substantially better at school than those scoring
in the lowest GPS septile (Supplementary Table S8). By age 16,
there was almost a standard deviation difference in educational
achievement between the lowest and highest GPS groups, which
represents a whole school grade difference. Similar results were
obtained for EduYears GPS extreme quintiles rather than septiles
(Supplementary Table S9 and Supplementary Figure S1).
Using Monte Carlo integration,
36
we calculated a substantial
non-overlap of 38% between educational achievement distri-
butions at age 16 for the lowest and highest GPS septiles
(Supplementary Figure S2).
Genotypeenvironment interaction effects
The genetic inuence of EduYears GPS on educational achieve-
ment at age 16 and on gwas not greater in high SES than in low-
SES families, as would be predicted by the genotypeenvironment
interaction hypothesis described earlier. As illustrated in Figure 3a,
at age 16 the difference between low and high GPS groups
was similar for low-SES and high-SES groups, despite the higher
mean educational achievement of the high-SES group. We
also did not nd G × E interaction for general cognitive ability
(Figure 3b), and educational achievement at ages 7 and 12
(Supplementary Figure S3). Hierarchical multiple regression
analyses that tested for G × E interaction using continuous
data yielded no signicant interactions between EduYears
GPS and SES as they relate to educational achievement at ages
7, 12 and 16 (Supplementary Table S10) or as they relate to g
(Supplementary Table S11).
DISCUSSION
Our results show that DNA can be used to predict educational
achievement, especially at the end of the compulsory school
years. Although the 2016 EduYears GPS accounted for ~ 4% of
the variance in the GWA target trait of years of education in
independent samples, we found that the 2016 EduYears GPS
accounted for 9% of the variance in educational achievement at
age 16, tripling the effect size from previous reports
13
based on
the 2013 EduYears GPS.
9
The predictive power of EduYears GPS can
be seen especially at the extremes of the distribution of GPS
scores, suggesting that it is possible to identify individuals early in
life at genetic risk and resilience, moving us closer to the
possibility of early intervention and personalized learning.
37
We have previously reported a heritability estimate of 60% for
educational achievement at age 16 using a sample from which the
Figure 2. Standardized means and standard errors for educational achievement at age 7, 12 and 16 by genome-wide polygenic score (GPS)
septile. EduYears GPS was rescor ed as septiles (1 =lowest, 7 =highest).
Figure 3. (a) Standardized educational achievement mean scores at age 16 by EduYears GPS and family SES for individuals scoring in the
highest and lowest 20% of the distribution of EduYears GPS. There was no evidence for an interaction effect (F(1,605) =1.29, P=0.18); (b)
general cognitive ability mean scores by EduYears GPS and family SES for individuals scoring in the highest and lowest 20% of the distribution
of EduYears GPS. No interaction effect was found (F(1,327) =1.06, P=0.30). GPS, genome-wide polygenic score; SES, socioeconomic status.
Predicting educational achievement from DNA
S Selzam et al
4
Molecular Psychiatry (2016), 1 6
present sample was drawn.
14
The present study demonstrated
that EduYears GPS predicts 9% of the total variance in educational
achievement, thus accounting for only 15% of the heritability
estimated by the twin design. However, unlike twin study
estimates of heritability, GPS is derived from GWA studies, which
are limited to additive effects of the common variants employed
on SNP arrays. For this reason, SNP-based estimates of heritability,
which have these same limitations, represent the current upper
limit for GPS prediction. For educational achievement, SNP-based
estimates of heritability are about 30%,
13
and EduYears GPS
explains almost one-third of the heritable variance from SNP-
based studies at age 16.
We believe that the substantial increase in heritability explained
by the 2016 EduYears GPS represents a turning point in the social
and behavioral sciences because it makes it possible to predict
educational achievement for individuals directly from their DNA.
Although other variables account for more of the variance of
educational achievement, DNA has a unique predictive status in
that inherited DNA sequence variation does not change from the
single cell with which life begins. For this reason, unlike the case
with many other predictors, the correlation between EduYears GPS
and educational attainment cannot feasibly be interpreted in
terms of reverse causation. That is, the correlation between
EduYears GPS and educational achievement cannot be caused by
the effect of educational achievement on inherited DNA sequence
variation. In contrast, although gpredicts much more of the
variance of educational achievement at age 16 (29% in our study),
this correlation could be confounded by factors related to both
educational achievement and g, such as social and family risk
factors. Similarly, educational achievement at age 7 predicts 35%
of the variance of educational achievement at age 16 but this
correlation could also be due to other factors, including
genetics,
14
that affect educational achievement at both ages.
Moreover, educational achievement and gcannot be assessed at
earlier stages of development. Family SES, which also predicts
substantial variance of educational achievement at age 16 (21% in
our study), can be assessed early but this correlation is also likely
to be partly caused by other factors, including genetics,
13
that
affect both family SES and educational achievement. Although
family SES can be assessed early, it can change over time, whereas
DNA variations within individuals are stable across the lifespan.
Moreover, family SES is a family-wide index not specicto
individual children in a family.
EduYears GPS predicts educational achievement independently
of gand family SES only at age 16, which may be due to the
associations between g, educational achievement, family SES and
EduYears GPS. It is possible that family SES and gare earlier in the
chain of the causal pathway from genetic variants to educational
achievement, which may explain the attenuated relationship
between EduYears GPS and educational achievement at age 7 and
12 after controlling for these variables. Our ndings suggest
pleiotropic effects of EduYears GPS on educational achievement, g,
and family SES, which are in line with previous reports that
describe the genetic overlap between educational achievement, g,
and family SES.
12,13,38
However, the threefold increase in predic-
tion of educational achievement at age 16 from the 2016 EduYears
GPS as compared with the 2013 EduYears GPS (~3% vs 9%) was
not mirrored in the prediction of g(~2% vs ~ 3.5%). The nding
that EduYears GPS accounts for more variance in educational
achievement than in gis likely due to the fact that educational
achievement is inuenced by gas well as many other factors that
are under genetic inuence.
14
Variance explained by the 2016 EduYears GPS in family SES also
increased almost threefold compared with previous results with
the 2013 EduYears GPS in the a subsample of the current study
(~2.5% vs ~ 7%).
13
Explaining ~ 7% in family SES by EduYears GPS
is impressive for two reasons. First, the childrens genotypes are
only an approximation of their parentsgenotypes; the effect of
EduYears GPS on SES should be even stronger for the parentsown
GPS. Second, our ndings account for a third of the SNP-based
heritability estimate for family SES (~20%),
39
which, as noted
earlier, represents the upper limit for GWA and GPS studies. With
that, our results demonstrate that family SES is genetically
inuenced and that its genetic effects are also partly shared with
educational achievement.
When interpreting the current results, three caveats should be
considered. First, the nding that the predictive validity of
EduYears GPS increases across the school years may be due to
increasing approximation of our measures to the EduYears GWA
target trait of years of education. That is, our measure of
educational achievement at age 16 is a standardized examination
taken at the end of compulsory education that strongly inuences
whether pupils go on to higher education. Alternatively, it is also
possible that GCSE results are more reliable measures than
national curriculum teacher ratings, which might contribute to the
difference in variance explained in these variables by EduYears
GPS. Second, as we measured family SES in a traditional way by
including parental education, this could have increased the
association of the SES composite with EduYears GPS. Although
parental education and occupation are related, future studies
should investigate if the relationship between EduYears GPS
and SES varies as a function of different SES indicators. Third, our
nding that EduYears accounts for 9% of the variance of
educational achievement at age 16 needs to be tested for
generalization in other samples and beyond the UK.
The nding that individualspolygenic scores for years of
education predict educational achievement entails no necessary
policy implications. However, our ndings corroborate that
individual differences in educational achievement are partly due
to DNA differences between children and are not solely created
by environmental forces. By creating a dialogue between
scientists and policymakers, the introduction of polygenic scores
may soon become a useful tool for early prediction and
prevention of educational problems and for personalized learning.
CONFLICT OF INTEREST
The authors declare no conict of interest.
ACKNOWLEDGMENTS
We gratefully acknowledge the ongoing contribution of the participants in the
Twins Early Development Study (TEDS) and their families. TEDS is supported by a
program grant to RP from the UK Medical Research Council (MR/M021475/1
and previously G0901245), with additional support from the US National Institutes of
Health (HD044454; HD059215). SS is supported by the MRC/IoPPN Excellence
Award and by the EU Framework Programme 7 (602768). EK and KR are supported by
a Medical Research Council studentship. RP is supported by a Medical Research
Council Advanced Investigator award (295366). The funders had no role in study
design, data collection and analysis, decision to publish or preparation of the
manuscript.
AUTHOR CONTRIBUTIONS
RP directs and received funding for the Twins Early Development Study
(TEDS). RP and SS conceived the present study. SS analyzed and interpreted the
data. RP supervised the project and interpreted the data. RP and SS wrote
the manuscript with help from EK, SvS, PFO, KR, YK, PSD and JJL.
REFERENCES
1 Plomin R, Simpson MA. The future of genomics for developmentalists. Dev
Psychopathol 2013; 25: 12631278.
2 Chabris CF, Lee JJ, Cesarini D, Benjamin DJ, Laibson DI. The fourth law of behavior
genetics. Curr Dir Psychol Sci 2015; 24:304312.
3 Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery.
Am J Hum Genet 2012; 90:724.
Predicting educational achievement from DNA
S Selzam et al
5
Molecular Psychiatry (2016), 1 6
4 Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet
2013; 9: e1003348.
5 Harlaar N, Butcher LM, Meaburn E, Sham P, Craig IW, Plomin R. A behavioural
genomic analysis of DNA markers associated with general cognitive ability in
7-year-olds. J Child Psychol Psychiatry 2005; 46: 10971107.
6 Wray NR, Lee SH, Mehta D, Vinkhuyzen AAE, Dudbridge F, Middeldorp CM.
Research review: polygenic methods and their application to psychiatric traits.
J Child Psychol Psychiatry 2014; 55: 10681087.
7 Plomin R, Deary IJ. Genetics and intelligence differences: ve special ndings.
Mol Psychiatry 2015; 20:98108.
8 Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA et al. Genome-
wide association study identies 74 loci associated with educational attainment.
Nature 2016; 533: 539542.
9 Rietveld CA, Medland SE, Derringer J, Yang J, Esko T, Martin NW et al. GWAS of
126,559 individuals identies genetic variants associated with educational
attainment. Science 2013; 340: 14671471.
10 Domingue BW, Belsky DW, Conley D, Harris KM, Boardman JD. Polygenic inuence
on educational attainment. AERA Open 2015; 1: 2332858415599972.
11 de Zeeuw EL, van Beijsterveldt CEM, Glasner TJ, Bartels M, Ehli EA, Davies GE et al.
Polygenic scores associated with educational attainment in adults predict
educational achievement and ADHD symptoms in children. Am J Med Gen Part B
2014; 165: 510520.
12 Davies NM, Hemani G, Timpson NJ, Windmeijer F, Davey Smith G. The role of
common genetic variation in educational attainment and income: evidence from
the National Child Development Study. Sci Rep 2015; 5: 16509.
13 Krapohl E, Plomin R. Genetic link between family socioeconomic status and
childrens educational achievement estimated from genome-wide SNPs.
Mol Psychiatry 2015; 21:437443.
14 Krapohl E, Rimfeld K, Shakeshaft NG, Trzaskowski M, McMillan A, Pingault J-B et al.
The high heritability of educational achievement reects many genetically inu-
enced traits, not just intelligence. Proc Natl Acad Sci USA 2014; 111: 1527315278.
15 Kovas Y, Haworth CMA, Dale PS, Plomin R. The genetic and environmental origins
of learning abilities and disabilities in the early school years. Monogr Soc Res Child
Dev 2007; 72: vii, 1144.
16 Strenze T. Intelligence and socioeconomic success: a meta-analytic review of
longitudinal research. Intelligence 2007; 35:401426.
17 Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J. Schizophrenia Working
Group of the Psychiatric Genomics et al. LD score regression distinguishes
confounding from polygenicity in genome-wide association studies. Nat Genet
2015; 47: 291295.
18 Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R et al. An atlas of
genetic correlations across human diseases and traits. Nat Genet 2015; 47: 12361241.
19 Krapohl E, Euesden J, Zabaneh D, Pingault JB, Rimfeld K, von Stumm S et al.
Phenome-wide analysis of genome-wide polygenic scores. Mol Psychiatry 2015;
e-pub ahead of print 25 August 2015.
20 Tucker-Drob EM, Bates TC. Large cross-national differences in gene × socio-
economic status interaction on intelligence. Psychol Sci 2015; 27: 138149.
21 Haworth C, Davis OS, Plomin R. Twins Early Development Study (TEDS): a
genetically sensitive investigation of cognitive and behavioral development from
childhood to young adulthood. Twin Res Hum Genet 2013; 16: 117125.
22 Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M et al. zCall: a rare
variant caller for array-based genotyping. Bioinformatics 2012; 28: 25432545.
23 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D et al. PLINK: a
tool set for whole-genome association and population-based linkage analyses.
Am J Hum Genet 2007; 81:559575.
24 R Core Team . R: A Language and Environment for Statistical Compu ting. R Foun-
dation for Statistical Computing [Internet], 2013. Available at https://www.
r-project.org.
25 Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al. The variant
call format and VCFtools. Bioinformatics 2011; 27: 21562158.
26 McCarthy S, Das S, Kretzschmar W, Durbin R, Abecasis G, Marchini J.
A reference panel of 64,976 haplotypes for genotype imputation. bioRxiv 2015;
035170.
27 Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate
genotype imputation in genome-wide association studies through pre-phasing.
Nat Genet 2012; 44: 955959.
28 Fuchsberger C, Abecasis GR, Hinds DA. minimac2: faster genotype imputation.
Bioinforma Oxf Engl 2015; 31:782784.
29 McCarthy D. McCarthy Scales of Childrens Abilities. The Psychological Corporatio n:
New York, 1972.
30 Wechsler D. Wechsler Intelligence Scale for Children, 3rd edn. The Psychological
Corporation: UK, 1992.
31 Raven JC, Court JH, Raven J. Manual for Ravens Progressive Matrices and
Vocabulary Scales. Oxford University: Oxford, 1996.
32 Kaplan E, Fein D, Kramer J, Delis D, Morris R. WISC-III as a Process Instrument
(WISC-III-PI). The Psychological Corporation: New York, 1999.
33 White KR. The relation between socioeconomic status and academic achieve-
ment. Psychol Bull 1982; 91:461481.
34 Van der Waerden BL. On the sources of my book Moderne Algebra. Hist Math
1975; 2:3140.
35 Euesden J, Lewis CM, OReilly PF. PRSice: Polygenic Risk Score software. Bioin-
formatics 2015; 31: 14661468.
36 Hammersley JM, Handscomb DC. Monte Carlo Methods. Wiley: Hoboken, 1964.
37 Asbury K, Plomin R. G is for Genes: The Impact of Genetics on Education and
Achievement, 1st edn. John Wiley & Sons: Chichester, West Sussex, 2013.
38 Marioni RE, Davies G, Hayward C, Liewald D, Kerr SM, Campbell A et al. Molecular
genetic contributions to socioeconomic status and intelligence. Intelligence 2014;
44:2632.
39 Hanscombe KB, Trzaskowski M, Haworth CMA, Davis OSP, Dale PS, Plomin R.
Socioeconomic status (SES) and childrens intelligence (IQ): in a UK-representative
sample SES moderates the environmental, not genetic, effect on IQ. PLoS One
2012; 7: e30320.
This work is licensed under a Creative Commons Attribution 4.0
International License. The images or other third party material in this
article are included in the articles Creative Commons license, unless indicated
otherwise in the credit line; if the material is not included under the Creative Commons
license, users will need to obtain permission from the license holder to reproduce the
material. To view a copy of this license, visit http://creativecommons.org/licenses/
by/4.0/
© The Author(s) 2016
Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)
Predicting educational achievement from DNA
S Selzam et al
6
Molecular Psychiatry (2016), 1 6

Supplementary resource (1)

... A total sample size of 10,346 (including 7026 unrelated individuals and 3320 additional dizygotic co-twins) remained after QC. For detailed description of subsequent phasing, imputation, and genomic principal component generation, see ref. 37 . ...
... We employed LDPred2, a Bayesian method that corrects for local linkage disequilibrium, to compute GPS for all genotyped participants. For an in-depth explanation of GPS construction in TEDS, see 37 . All GPS were residualized against chip type, batch, and their first ten principal components. ...
Article
Full-text available
Academic underachievement refers to school performance which falls below expectations. Focusing on the pivotal first stage of education, we explored a quantitative measure of underachievement using genomically predicted achievement delta (GPAΔ), which reflects the difference between observed and expected achievement predicted by genome-wide polygenic scores. We analyzed the relationship between GPAΔ at age 7 and achievement trajectories from ages 7 to 16, using longitudinal data from 4175 participants in the Twins Early Development Study to assess empirically the extent to which students regress to their genomically predicted levels by age 16. We found that the achievement of underachievers and overachievers who deviated from their genomic predictions at age 7 regressed on average by one-third towards their genomically predicted levels. We also found that GPAΔ at age 7 was as predictive of achievement trajectories as a traditional ability-based index of underachievement. Targeting GPAΔ underachievers might prove cost-effective because such interventions seem more likely to succeed by going with the genetic flow rather than swimming upstream, helping GPAΔ underachievers reach their genetic potential as predicted by their GPS. However, this is a hypothesis that needs to be tested in intervention research investigating whether GPAΔ underachievers respond better to the intervention than other underachievers. We discuss the practicality of genomic indices in assessing underachievement.
... Our results confirm this with a large representative sample of longitudinal panel data. This trend is also consistent with common findings that the genetic influence on achievement scores increases across school years (Selzam et al., 2017;von Stumm et al., 2020). In fact, increased genetic effects would be expected to lead to increased stability. ...
Article
Full-text available
Researchers have focused extensively on understanding the factors influencing students’ academic achievement over time. However, existing longitudinal studies have often examined only a limited number of predictors at one time, leaving gaps in our knowledge about how these predictors collectively contribute to achievement beyond prior performance and how their impact evolves during students’ development. To address this, we employed machine learning to analyze longitudinal survey data from 3,425 German secondary school students spanning 5 to 9 years. Our objectives were twofold: to model and compare the predictive capabilities of 105 predictors on math achievement and to track changes in their importance over time. We first predicted standardized math achievement scores in Years 6–9 using the variables assessed in the previous year (“next year prediction”). Second, we examined the utility of the variables assessed in Year 5 at predicting future math achievement at varying time lags (1–4 years ahead)—“varying lag prediction.” In the next year prediction analysis, prior math achievement was the strongest predictor, gaining importance over time. In the varying lag prediction analysis, the predictive power of Year 5 math achievement waned with longer time lags. In both analyses, additional predictors, including intelligence quotient, grades, motivation and emotion, cognitive strategies, classroom/home environments, and demographics (including socioeconomic status), exhibited relatively smaller yet consistent contributions, underscoring their distinct roles in predicting math achievement over time. The findings have implications for both future research and educational practices, which are discussed in detail.
... Technoscientific developments in genomic methodologies have been promoted as a 'genomic revolution' for educational research and policy since around 2010 (Morris et al. 2022). Using data-intensive instruments and methods, behavior geneticists have begun studying what are taken to be traits and outcomes relevant to education (Malanchini et al. 2020), including cognitive ability, intelligence, educational attainment, achievement, and noncognitive skills (Selzam et al. 2017;Rimfeld et al. 2018;Demange et al. 2021). Sociogenomics research, which combines genomics and quantitative social sciences, and genoeconomics, the application of genomics in economics (Benjamin et al. 2012;Freese 2018;Braudt 2018;Mills and Tropf 2020), have extended genomic data analysis to a growing range of socio-economic outcomes and public policy areas like education (Domingue et al. 2015;Belsky et al. 2016;Cesarini and Visscher 2017). ...
Article
Full-text available
Technoscientific transformations in molecular genomics have begun to influence knowledge production in education. Interdisciplinary scientific consortia are seeking to identify ‘genetic influences’ on ‘educationally relevant’ traits, behaviors, and outcomes. This article examines the emerging ‘knowledge infrastructure’ of educational genomics, attending to the assembly and choreography of organizational associations, epistemic architecture, and technoscientific apparatuses implicated in the generation of genomic understandings from masses of bioinformation. As an infrastructure of datafied knowledge production, educational genomics is embedded in data-centered epistemologies and practices which recast educational problems in terms of molecular genetic associations—insights about which are deemed discoverable from digital bioinformation and potentially open to genetically informed interventions in policy and practice. While scientists claim to be ‘opening the black box of the genome’ and its association with educational outcomes, we open the black box of educational genomics itself as a source of emerging scientific authority. Data-intensive educational genomics does not straightforwardly ‘discover’ the biological bases of educationally relevant behaviors and outcomes. Rather, this knowledge infrastructure is also an experimental ‘ontological infrastructure’ supporting particular ways of knowing, understanding, explaining, and intervening in education, and recasting the human subjects of education as being surveyable and predictable through the algorithmic processing of bioinformation.
... Differences at within-sibling level thus are likely to reflect how each sibling perceives, evokes, and shapes the family environments (35). These additional analyses allowed us to dig deeper into the mechanisms through which family environments might contribute to the strengthening of the association between genetic propensity and academic outcomes over development (36,37). Evidence of mediation effects observed at the between-sibling level, but not at the within-siblings level, would be more consistent with passive gene-environment correlation processes, while environmental mediation of the within-sibling prediction would suggest evocative/active gene-environment correlation processes. ...
Preprint
Full-text available
Academic achievement is partly heritable and highly polygenic. However, genetic effects on academic achievement are not independent of environmental processes. We investigated whether aspects of the family environment mediated genetic effects on academic achievement across development. Our sample included 5,151 children who participated in the Twins Early Development Study, as well as their parents and teachers. Data on academic achievement and family environments were available at ages 7, 9, 12 and 16. We computed educational attainment polygenic scores (PGS) and further separated genetic effects into cognitive and noncognitive PGS. Three core findings emerged. First, aspects of the family environment, but not the wider neighbourhood context, consistently mediated the PGS effects on achievement across development, accounting for up to 34.3% of the total effect. Family characteristics mattered beyond socio-economic status. Second, family environments were more robustly linked to noncognitive PGS effects on academic achievement than cognitive PGS effects. Third, when we investigated whether environmental mediation effects could also be observed when considering differences between siblings, adjusting for family fixed effects, we found that environmental mediation was nearly exclusively observed between families. This is consistent with the proposition that family environmental contexts contribute to academic development via passive gene-environment correlation processes. Our results show how parents shape environments that foster their children's academic development partly based on their own genetic disposition, particularly towards noncognitive skills.
... Given that genetic testing can provide probabilistic prediction of future life outcomes (e.g. educational achievement; [79]), it seems natural for people who believe that genes are important for different traits and behaviours to want to know what is 'written there' and potentially to alter 'destiny'. People can introduce various changes to their lifestyles to ameliorate the potential negative effects of genes on their behaviour and health. ...
Article
Full-text available
Understanding reasons for why people choose to have or not to have a genetic test is essential given the ever-increasing use of genetic technologies in everyday life. The present study explored the multiple drivers of people’s attitudes towards genetic testing. Using the International Genetic Literacy and Attitudes Survey (iGLAS), we collected data on: (1) willingness to undergo testing; (2) genetic literacy; (3) motivated cognition; and (4) demographic and cultural characteristics. The 37 variables were explored in the largest to-date sample of 4311 participants from diverse demographic and cultural backgrounds. The results showed that 82% of participants were willing to undergo genetic testing for improved treatment; and over 73%—for research. The 35 predictor variables together explained only a small proportion of variance: 7%—in the willingness to test for Treatment; and 6%—for Research. The strongest predictors of willingness to undergo genetic testing were genetic knowledge and deterministic beliefs. Concerns about data misuse and about finding out unwanted health-related information were weakly negatively associated with willingness to undergo genetic testing. We also found some differences in factors linked to attitudes towards genetic testing across the countries included in this study. Our study demonstrates that decision-making regarding genetic testing is influenced by a large number of potentially interacting factors. Further research into these factors may help consumers to make decisions regarding genetic testing that are right for their specific circumstances.
Article
Full-text available
The day has arrived that genetic tests for educational outcomes are available to the public. Today parents and students alike can send off a sample of blood or saliva and receive a ‘genetic report’ for a range of characteristics relevant to education, including intelligence, math ability, reading ability, and educational attainment. DTC availability is compounded by a growing “precision education” initiative, which proposes the application of DNA tests in schools to tailor educational curricula to children’s genomic profiles. Here I argue that these happenings are a strong signal of the geneticization of education; the process by which educational abilities and outcomes come to be examined, understood, explained, and treated as primarily genetic characteristics. I clarify what it means to geneticize education, highlight the nature and limitations of the underlying science, explore both real and potential downstream bioethical implications, and make proposals for mitigating negative impacts.
Article
Full-text available
Recent advances in genomics make it possible to predict individual differences in education from polygenic scores that are person-specific aggregates of inherited DNA differences. Here, we systematically reviewed and meta-analyzed the strength of these DNA-based predictions for educational attainment (e.g., years spent in full-time education) and educational achievement (e.g., school grades). For educational attainment (k = 20, n = 16, Ntotal = 314,757), a multilevel meta-analysis showed an association with polygenic scores of ρ = .27 (95% CI from .22 to .32). For educational achievement (k = 19, n = 10, Ntotal = 83,788), the association was ρ = .24 (95% CI from .18 to .30). Eurocentric biases were evident with only 15% of estimates being reported in samples of non-European ancestry. After accounting for sample ancestry, age at assessment, and education measure, the meta-analytic estimates increased to ρ = .29 (95% CI from .24 to .33) for educational attainment and ρ = .50 (95% CI from .39 to .61) for educational achievement, indicative of large effect sizes. All meta-analytic estimates were associated with significant heterogeneity. Our findings suggest that DNA-based predictions of education are sizeable but vary across samples and studies. We outline three steps to safeguard potential applications of polygenic score predictions in education to maximize their benefits for personalizing learning, while minimizing the bioethical risks of perpetuating social, cultural, and economic inequalities.
Article
Most previous research found that within-family resemblance on social outcomes and intelligence is mostly due to genetic factors with a limited role of the shared environment, with the exception of educational attainment. Hypotheses about a gene-environment interaction with SES, with a presumably smaller role of genetic factors in families with low social status, have been only partially confirmed. However, these results do not necessarily generalize to all societies, and data from Central or Eastern European countries is currently deficient. In the current work we replicate using data from the Hungarian Twin Registry that intelligence, income, and educational attainment are substantially heritable, with limited role of the shared environment. In contrast to studies in Anglo-Saxon or Western European countries, we found an influence of the shared environment on standardized high school test scores, especially history. Both genetic and shared environmental (but not nonshared environmental) correlations were substantial, in line with generalist genes and shared environments but specific nonshared environmental effects. The results show that the heritability of social traits is observable in Central/Eastern Europe, but they highlight a potentially problematic aspect of Hungarian high school final tests, as students' family of origin appears to be a potent determinant of grades.
Article
STUDY QUESTION Do the genetic determinants of idiopathic severe spermatogenic failure (SPGF) differ between generations? SUMMARY ANSWER Our data support that the genetic component of idiopathic SPGF is impacted by dynamic changes in environmental exposures over decades. WHAT IS KNOWN ALREADY The idiopathic form of SPGF has a multifactorial etiology wherein an interaction between genetic, epigenetic, and environmental factors leads to the disease onset and progression. At the genetic level, genome-wide association studies (GWASs) allow the analysis of millions of genetic variants across the genome in a hypothesis-free manner, as a valuable tool for identifying susceptibility risk loci. However, little is known about the specific role of non-genetic factors and their influence on the genetic determinants in this type of conditions. STUDY DESIGN, SIZE, DURATION Case-control genetic association analyses were performed including a total of 912 SPGF cases and 1360 unaffected controls. PARTICIPANTS/MATERIALS, SETTING, METHODS All participants had European ancestry (Iberian and German). SPGF cases were diagnosed during the last decade either with idiopathic non-obstructive azoospermia (n = 547) or with idiopathic non-obstructive oligozoospermia (n = 365). Case-control genetic association analyses were performed by logistic regression models considering the generation as a covariate and by in silico functional characterization of the susceptibility genomic regions. MAIN RESULTS AND THE ROLE OF CHANCE This analysis revealed 13 novel genetic association signals with SPGF, with eight of them being independent. The observed associations were mostly explained by the interaction between each lead variant and the age-group. Additionally, we established links between these loci and diverse non-genetic factors, such as toxic or dietary habits, respiratory disorders, and autoimmune diseases, which might potentially influence the genetic architecture of idiopathic SPGF. LARGE SCALE DATA GWAS data are available from the authors upon reasonable request. LIMITATIONS, REASONS FOR CAUTION Additional independent studies involving large cohorts in ethnically diverse populations are warranted to confirm our findings. WIDER IMPLICATIONS OF THE FINDINGS Overall, this study proposes an innovative strategy to achieve a more precise understanding of conditions such as SPGF by considering the interactions between a variable exposome through different generations and genetic predisposition to complex diseases. STUDY FUNDING/COMPETING INTEREST(S) This work was supported by the “Plan Andaluz de Investigación, Desarrollo e Innovación (PAIDI 2020)” (ref. PY20_00212, P20_00583), the Spanish Ministry of Economy and Competitiveness through the Spanish National Plan for Scientific and Technical Research and Innovation (ref. PID2020-120157RB-I00 funded by MCIN/ AEI/10.13039/501100011033), and the ‘Proyectos I+D+i del Programa Operativo FEDER 2020’ (ref. B-CTS-584-UGR20). ToxOmics-Centre for Toxicogenomics and Human Health, Genetics, Oncology and Human Toxicology, is also partially supported by the Portuguese Foundation for Science and Technology (Projects: UIDB/00009/2020; UIDP/00009/2020). The authors declare no competing interests. TRIAL REGISTRATION NUMBER N/A.
Preprint
Full-text available
Academic achievement is partly heritable and highly polygenic. However, genetic effects on academic achievement are not independent of environmental processes. We investigated whether aspects of the family environment mediated genetic effects on academic achievement across development. Our sample included 5,151 children who participated in the Twins Early Development Study, as well as their parents and teachers. Data on academic achievement and family environments (parenting, home environments, and geocoded indices of neighbourhood characteristics) were available at ages 7, 9, 12 and 16. We computed educational attainment polygenic scores (PGS), and further separated genetic effects into cognitive and noncognitive PGS. Three core findings emerged. First, aspects of the family environment, but not the wider neighbourhood context, consistently mediated the PGS effects on achievement across development –accounting for up to 34.3% of the total effect. Family characteristics mattered beyond socio-economic status. Second, family environments were more robustly linked to noncognitive PGS effects on academic achievement than cognitive PGS effects. Third, when we investigated whether environmental mediation effects could also be observed when considering differences between siblings, adjusting for family fixed effects, we found that environmental mediation was nearly exclusively observed between families. This is consistent with the proposition that family environmental contexts contribute to academic development via passive gene-environment correlation processes. Our results show how parents shape environments that foster their children’s academic development partly based on their own genetic disposition, particularly towards noncognitive skills.
Article
Full-text available
A core hypothesis in developmental theory predicts that genetic influences on intelligence and academic achievement are suppressed under conditions of socioeconomic privation and more fully realized under conditions of socioeconomic advantage: a Gene × Childhood Socioeconomic Status (SES) interaction. Tests of this hypothesis have produced apparently inconsistent results. We performed a meta-analysis of tests of Gene × SES interaction on intelligence and academic-achievement test scores, allowing for stratification by nation (United States vs. non-United States), and we conducted rigorous tests for publication bias and between-studies heterogeneity. In U.S. studies, we found clear support for moderately sized Gene × SES effects. In studies from Western Europe and Australia, where social policies ensure more uniform access to high-quality education and health care, Gene × SES effects were zero or reversed.
Article
Full-text available
We investigated the role of common genetic variation in educational attainment and household income. We used data from 5,458 participants of the National Child Development Study to estimate: 1) the associations of rs9320913, rs11584700 and rs4851266 and socioeconomic position and educational phenotypes; and 2) the univariate chip-heritability of each phenotype, and the genetic correlation between each phenotype and educational attainment at age 16. The three SNPs were associated with most measures of educational attainment. Common genetic variation contributed to 6 of 14 socioeconomic background phenotypes, and 17 of 29 educational phenotypes. We found evidence of genetic correlations between educational attainment at age 16 and 4 of 14 social background and 8 of 28 educational phenotypes. This suggests common genetic variation contributes both to differences in educational attainment and its relationship with other phenotypes. However, we remain cautious that cryptic population structure, assortative mating, and dynastic effects may influence these associations.
Book
G is for Genes shows how a dialogue between geneticists and educationalists can have beneficial results for the education of all children-and can also benefit schools, teachers, and society at large. Draws on behavioral genetic research from around the world, including the UK-based Twins' Early Development Study (TEDS), one of the largest twin studies in the world. Offers a unique viewpoint by bringing together genetics and education, disciplines with a historically difficult relationship. Shows that genetic influence is not the same as genetic determinism and that the environment matters at least as much as genes. Designed to spark a public debate about what naturally-occurring individual differences mean for education and equality.
Article
We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
Article
Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals1. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample1, 2 of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.
Chapter
Underpinning this chapter is the fact - and it is a fact - that cognitive ability is subject to significant genetic influence, particularly as children grow into teenagers and adults. And herein lies one of the principal fault lines between geneticists and educationalists. IQ is just one predictor of achievement, albeit a strong one; there are others. Before genetic researchers became involved, a body of evidence had already been amassed showing that how good you believe you are at something – your self-perceived ability – can predict how good you actually are at it. Confidence genes seem to influence school performance both in conjunction with – and independent of – IQ genes, leading some to believe that in a roomful of equally bright and high-achieving people it is those who are self-confident who will go the extra mile.