Page 1

Mathematical Ability of 10-Year-Old Boys and Girls: Genetic and

Environmental Etiology of Typical and Low Performance

Yulia Kovas, PhD, Claire M. A. Haworth, BA, Stephen A. Petrill, PhD, and Robert Plomin, PhD

Abstract

The genetic and environmental etiologies of 3 aspects of low mathematical performance (math

disability) and the full range of variability (math ability) were compared for boys and girls in a sample

of 5,348 children age 10 years (members of 2,674 pairs of same-sex and opposite-sex twins) from

the United Kingdom (UK). The measures, which we developed for Web-based testing, included

problems from 3 domains of mathematics taught as part of the UK National Curriculum. Using

quantitative genetic model-fitting analyses, similar results were found for math disabilities and

abilities for all 3 measures: Moderate genetic influence and environmental influence were mainly

due to nonshared environmental factors that were unique to the individual, with little influence from

shared environment. No sex differences were found in the etiologies of math abilities and disabilities.

We conclude that low mathematical performance is the quantitative extreme of the same genetic and

environmental factors responsible for variation throughout the distribution.

Compared to reading disabilities (Pennington & Olson, 2005), relatively little is known about

the genetic and environmental etiologies of mathematics abilities and disabilities, the extent

to which these etiologies differ for low performance and typical performance, and the extent

to which these etiologies differ for male and female students. The present study provides the

first analysis of these issues using a sample of twins large enough to investigate the origins of

low mathematics performance (which we will refer to as math disability) in the context of

normal variation (or ability) and to consider sex differences in both abilities and disabilities.

Four previous twin studies of general mathematics performance have yielded a wide range of

results (see Plomin & Kovas, 2005). Our previous research on mathematics disabilities and

abilities, based on teacher ratings using the United Kingdom (UK) National Curriculum

criteria, was the first study to compare mathematics disabilities and abilities and found similar

genetic and environmental results for math disabilities and abilities (Oliver et al., 2004). This

work focused on three aspects of mathematics identified by the UK National Curriculum for

7-year-old children: Using and Applying Mathematics; Number and Algebra; and Shapes,

Space and Measures. Substantial genetic influence (65% heritabilities) was found both for

mathematics disabilities (lowest 15%) and for mathematics abilities (the rest of the

distribution). Moreover, the results indicated not only that the magnitude of genetic and

environmental influences was similar but also that the same genetic and environmental factors

drive individual differences in the whole of the distribution of ability—at the low end as well

as the rest of the distribution (see Method section for an explanation of the relevant analytic

methods). At the level of the environment, the position of an individual on the mathematics

continuum depends on a cumulative effect of many risk and protective environments. Similarly,

at the genetic level, the position of an individual on an ability continuum depends on a

cumulative effect of risk or protective variants of many genes. With many environmental and

Address: Yulia Kovas, SGDP, Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK; y.kovas@iop.kcl.ac.uk.

The TEDS web page can be accessed at http://www.teds.ac.uk

NIH Public Access

Author Manuscript

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

Published in final edited form as:

J Learn Disabil. 2007 ; 40(6): 554–567.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

genetic effects and possible interactions and correlations between them contributing to the

position of an individual on an ability distribution, the influence of each effect is likely to be

small and probabilistic, leading to a normal distribution.

Moreover, for both math disabilities and abilities, similarity between twins in the same family

appeared to stem almost entirely from similarity in their genetic makeup rather than similarity

in their environments. That is, relevant environmental influences were of the nonshared type

that make children in the same family different from each other. This previous study included

only same-sex twins and did not examine possible sex differences in genetic and environmental

etiologies for mathematics disabilities and abilities. The present study includes opposite-sex

twins, which provide unique power for investigating etiological issues concerning sex

differences.

Although there is considerable evidence for the validity of teacher ratings based on UK National

Curriculum criteria, it is important to confirm these results using test scores. However, a

practical problem in conducting genetic research with individually administered tests is that

large samples of twins are needed, but it is expensive to test large samples in person, especially

when the twins are distributed over a wide area. We have addressed this problem by developing

Web-based tests that make it possible to assess large samples efficiently and economically.

We assessed three aspects of mathematics based on the UK National Curriculum for 10-year-

old children:

1.

understanding of numerical and algebraic processes to be applied when solving

problems,

2.

understanding of non-numerical mathematical processes and concepts, and

3.

fact retrieval and computation.

A preliminary study when the data from only one cohort (i.e., half of the sample) were available

also assessed two other aspects of mathematics (interpreting information and using

mathematics in problem-solving situations); an article focused on the multivariate genetic

relationship among these five aspects of mathematics concluded that there was substantial

overlap, thus leading to the decision to drop these two tests when assessing the rest of the

sample (Kovas, Petrill & Plomin, 2007). This preliminary study only addressed individual

differences across the entire sample for boys and girls combined—the sample was not large

enough to investigate mathematics disability, to compare the etiologies of disabilities and

abilities, or to examine sex differences. The present article reports results at 10 years, using a

sample more than twice as large, which is sufficiently large to compare results for disabilities

and abilities and for boys and girls.

Concerning the comparison between math disabilities and abilities, our previous research based

on teacher ratings (Oliver et al., 2004) and similar findings for reading disabilities and abilities

(Plomin & Kovas, 2005) led us to predict similar results for the present study based on

individually administered tests of mathematics performance. In respect to sex differences, we

examined for the first time sex differences in the etiologies of diverse aspects of mathematics

for both disabilities and abilities. Although mean sex differences in mathematical ability have

been reported in the past (Maccoby & Jacklin, 1974), recently a “gender similarities”

hypothesis has emerged, suggesting that male and female students are similar on most

psychological variables, including mathematics (Hyde, 2005; Spelke, 2005). Although we

consider sex differences in means and variances, our focus is on sex differences in genetic and

environmental etiologies of mathematical abilities and disabilities, which are independent of

mean sex differences. Research is not available to suggest hypotheses in relation to sex

differences in genetic and environmental etiologies in mathematics, or whether a different

Kovas et al.Page 2

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

picture of sex differences emerges for math disabilities and abilities and for different aspects

of mathematics.

To summarize, the present study uses a large sample of twins that allows us to provide the first

analysis of three important issues:

1.

genetic and environmental etiologies of mathematics abilities and disabilities in 10

year-old children,

2.

the extent to which these etiologies differ for low performance and typical

performance, and

3.

the extent to which these etiologies differ for boys and girls.

Method

Participants and Procedure

Participants were part of the Twins’ Early Development Study (TEDS), a longitudinal study

involving a representative sample of all twins born in England and Wales in 1994, 1995, and

1996 (Trouton, Spinath, & Plomin, 2002). When the twins were 10 years old, data on a wide

variety of cognitive measures were collected from 7,442 children born between January 1994

and August 1996. This number refers to all children who took part in the battery after the

following exclusion criteria were applied: specific medical syndromes, such as Down

syndrome and other chromosomal anomalies, cystic fibrosis, cerebral palsy, hearing loss,

autism spectrum disorder, organic brain damage, extreme outliers for birth weight, gestational

age, maternal alcohol consumption during pregnancy, and special care after birth. We excluded

from the sample 274 children (137 pairs) for whom English was not the only language spoken

at home. After these exclusions, the sample consisted of 7,168 children (3,584 pairs). From

this sample, data for mathematics were available for 5,348 individuals (2,674 pairs of same-

sex and opposite-sex twins). Due to the nature of the test administration, each of these children

had complete data on all three sub-tests. To examine whether this attrition would affect further

results, we compared National Curriculum math scores for children who completed the math

battery with those who did not. Using standard scores based on the entire TEDS sample (after

medical exclusions), we showed that children who completed the Web-based math battery

performed slightly better than average on math (M = 0.08, SD = 0.92). Those who did not

complete the math battery had slightly lower than average math scores (M = −0.18, SD = 1.03).

These mean differences account for less than 1% of the variance. Zygosity was ascertained by

parental ratings, with an error rate not exceeding 5%, as validated by DNA genotyping (for

details, see Freeman et al., 2003).

Measures

Informed consent was obtained in writing from all of the families who agreed to take part in

the study. To assess mathematics skills, we developed a Web-based battery that included

questions from three different components of mathematics. The items were based on the

National Foundation for Educational Research 5–14 Mathematics Series, which is linked

closely to curriculum requirements in the UK and the English Numeracy Strategy (NFER-

Nelson, 1994NFER-Nelson, 1999NFER-Nelson, 2001). Such curriculum-based assessment

alleviates some of the potential biases associated with other achievement tests (Good & Salvia,

1988). From Booklets 6, 7, 8, 9, 10, and 11 (referring to age of students), a total of 77 target

items were chosen. The items were organized by mathematical category (see hereafter) and

level of difficulty. The level of difficulty was based on the National Curriculum level and the

standardization sample percentage correct for each item (reported in the Group Record Sheets;

NFER-Nelson). A set of adaptive branching rules was developed separately for each of the

five categories, so that all the children started with the same items, but then were branched to

Kovas et al.Page 3

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

easier or harder items depending on their performance. The presentation of items was streamed,

so that items from different categories were mixed, but the data recording and branching were

done within each category. Participants could attempt each item only once.

As with many psychological tests that use branching (e.g., third edition of the Wechsler

Intelligence Scale for Children; WISC-III-UK; Wechsler, 1992), the generic scoring rules were

as follows: 1 point was recorded for each correct response, for each unadministered item

preceding the child’s starting point, and for each item skipped through branching to harder

items. After a certain number of failures, a discontinuation rule was applied within each

category, and no points were recorded for all items after discontinuation. Thus, for each of the

77 items, a score of 1 or 0 was recorded for each child. For example, in the Computation and

Knowledge category (total number of items = 31), all children started at Item 10. The following

rules were then applied:

1.

If Items 10, 11, and 12 were all answered incorrectly, the child was branched to Item

1 and had to continue with the test attempting all remaining items, or until the

discontinuation criterion was met.

2.

If Items 10, 11, and 12 were all answered correctly, the child received credit for all

preceding items (1–9), and was branched to Item 24. If Items 24, 25, and 26 were all

answered incorrectly, the child was branched back to Item 13 and had to continue

with the test (skipping all items previously administered), attempting all remaining

items, or until the discontinuation criterion was met. If Items 24, 25, and 26 were not

all answered incorrectly, the child received credit for all preceding items (13–23) and

then continued with the test, attempting Items 27, 28, 29, 30, and 31, or until the

discontinuation criterion was met.

3.

If Items 10, 11, and 12 were not all answered incorrectly or correctly, the child

received credit for all preceding items (1–9) and then had to continue with the test,

attempting at all remaining items or until the discontinuation criterion was met.

4.

Discontinuation criterion was 3 incorrect answers in a row (did not apply across

branching points).

As with other psychological tests with items of increasing difficulty and using similar rules,

this scoring system is equivalent to that in which all children attempt all items, allowing us to

calculate the total number and proportion of correct responses for each child for each category

and to test the internal consistency of each category (see hereafter). Specific branching and

discontinuation rules and the number of skipped (credited) items for each category are available

from the authors.

The items were drawn from the following three categories:

• Understanding Number (27 items) requires an understanding of the numerical and

algebraic process to be applied when solving problems (e.g., understanding that

multiplication and division are inverse operations). For example, “Look at the number

6,085. Change the order of the figures around to make the biggest number possible.”

Another example is, “Type the missing number in the box: 27 + 27 + 27 + 27 + 27 =

27 × __.”

• Non-Numerical Processes (19 items) requires understanding of non-numerical

mathematical processes and concepts, such as rotational or reflective symmetry and

other spatial operations. The questions do not have any significant numerical content

that needs to be considered by the pupils. Three examples follow: “Which is the

longest drinking straw? Click on it.” “One of these shapes has corners that are the

Kovas et al.Page 4

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

same. Click on this shape.” “Which card appears the same when turned upside down?

Click on it.”

• Computation and Knowledge (31 items) assesses the ability to perform

straightforward computations using well-rehearsed pencil-and-paper techniques and

the ability to recall mathematical facts and terminology. These questions are either

mechanistic or rely on memorizing mathematical facts and terminology. The

operation is stated or is relatively unambiguous. Three examples follow. “Type in the

answer: 76–39 = __.” “All 4-sided shapes are called? Click on the answer (squares

rectangles parallelograms kites quadrilaterals).” “Type in the answer: 149 + 785 =

__.”

In creating this battery, we aimed to choose tests that would between them cover a wide range

of mathematical abilities of 10-year-old children, without subscribing to a particular theoretical

position on the structure of the mathematical domain. Some overlap in what these tests measure

is inevitable, as they each make demands on overlapping cognitive and performance factors

(e.g., attention, motivation, memory), and it is never possible to achieve a completely “pure”

measure of any of these components. In fact, the three tests correlated .62, and a principal

components analysis of the three tests yielded a general factor (un-rotated first principal

component) that explained 75% of the total variance. Nonetheless, we present the results for

the three tests separately for the following reasons. Although one can never completely control

for the multiple influences on performance of a mathematical test, it is nevertheless possible

to choose tests that emphasize one component of mathematics more than another. Most

important, the three aspects of mathematics that we assessed were not arbitrarily chosen: They

are used in the UK National Curriculum to assess aspects of mathematics.

As noted previously, the use of Web-based assessment facilitates data collection because it

allows data from large, widely dispersed samples to be collected quickly, cheaply, and reliably.

Web-based data collection is less error prone because it does not require human transcription

and data entry (Kraut et al., 2004; Naglieri et al., 2004). Another positive aspect of Web testing

is that the social pressure or embarrassment that might be present in face-to-face testing is

reduced (Birnbaum, 2004; Kraut et al., 2004). Moreover, several recent empirical studies have

found that Web-based findings generalize across presentation formats and are consistent with

findings from traditional methods (e.g., Gosling, Vazire, Srivastava, & John, 2004).

In TEDS, 80% of the families have daily access to the Internet (based on a pilot study with 100

randomly selected TEDS families), which is similar to the results of market surveys of UK

families with adolescents. Most children without access to the Internet at home have access in

their schools and local libraries.

In designing our Web-based battery, we guarded against potential problems associated with

research on the Internet. The Web page and testing were administered by a secure server in the

TEDS office (see Note). We used a secure site for data storage, and identifying information

was kept separately from the data. Appropriate safeguards were in place that prevented children

from answering the same item more than once. We provided technical support and other advice

to parents and children who were advised to call our toll-free telephone number in case of any

problems or questions.

Parents supervised the testing by coming online first with a user name and password for the

family, examining a demonstration test and completing a consent form. Then parents allowed

each twin to complete the test in turn. Parents were urged not to assist the twins with answers,

and we are confident that most parents complied with this requirement.

Kovas et al.Page 5

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 6

The internal consistency of the Web-administered measures was examined, yielding high

Cronbach’s alpha coefficients (Understanding Number, α = .88; Non-Numerical Processes,

α = .78; Computation and Knowledge, α = .93). Moreover, we compared children’s overall

Web-based performance in mathematics at 10 years to their overall mathematics performance

in the classroom as assessed by their teachers on the National Curriculum criteria when the

children were 10 years old. We found a correlation of .53 (p < .001, n = 1,878). Only one twin

from each pair was randomly selected for this analysis; a similar correlation of .50 was found

for the other half of the sample.

As a direct test of the reliability and validity of the Web-based measures, we conducted a test–

retest study in which 30 children (members of 15 twin pairs) who had completed the Web-

based testing completed the tests in person using the standard 12-year paper-and-pencil version

of the test (NFER-Nelson, 2001). Stratified sampling was used to ensure coverage of the full

range of ability. The interval between test and retest was 1 to 3 months, with an average of 2.2

months. The total math score from our Web-based tests correlated .92 with the total score from

the in-person testing for the total sample of 30 children; generalized estimation equations

(GEE) that took into account the nested covariance structure also yielded a correlation of .93.

For the three subtests reported in this article, the correlations between the Web-based and the

paper-and-pencil scores were .77, .64, and .81 for Understanding Number, Non-Numerical

Processes, and Computation and Knowledge, respectively. These results demonstrate that our

Web-based testing is both highly reliable and valid.

Results

Descriptive Statistics and Further Exclusions

The data were first explored using descriptive statistics analyses in SPSS. Descriptive statistics

for the three categories (Understanding Number, Non-Numerical Processes, and Computation

and Knowledge) are summarized in Table 1. The means and standard deviations (% of correct

response scores) for boys and girls, for monozygotic (MZ) and dizygotic (DZ) twins, and for

same-sex and opposite-sex DZ twins were very similar.

An analysis of variance (ANOVA) was performed to assess the mean effects of sex and zygosity

on mathematical ability in our sample. In the analysis, zygosity had 3 levels: MZ, DZ same-

sex, and DZ opposite-sex. No significant effect of zygosity was found for any of the variables.

The main effect of sex was significant for all categories, with boys on average performing

better than girls. However, this effect was negligible, accounting for less than 1% of the

variance. No significant sex by zygosity interactions were found. The three means (and

standard deviations) were very similar, suggesting that the categories were of similar difficulty.

For subsequent analyses, the scores from each test were separately standardized using the

means and standard deviations of the entire sample (after the medical and ethnic exclusions

described in the Method section) so that each test had zero mean and unit variance for the total

sample of 5,348 twins. The scores were corrected for age so that age does not contribute to

twin resemblance, which is standard in analyses of twin data (McGue & Bouchard, 1984).

Although all children were tested at 10 years of age, the results could be affected even by small

differences in age at the time of testing at this important stage of development, which would

inflate estimates of shared environment because members of a twin pair are of exactly the same

age. For the analyses of individual differences, the scores were also corrected for sex

differences. This was not done for the extremes analyses in order not to affect the

representativeness of groups at low-ability cutoffs.

In addition to the exclusions described in the Method section, we also excluded from further

analyses all pairs in which one or both twins had missing data; this was done separately for

Kovas et al.Page 6

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 7

each category. For the individual difference analyses, to avoid the possibility that our results

were affected by very extreme scores, all pairs in which one or both twins scored 3 or more

standard deviations below or above the mean were excluded from each category.

Genetic Analysis

The twin method, one of the major tools of quantitative genetic research, addresses the origins

of individual differences by estimating the proportion of variance that can be attributed to

genetic, shared environment, and non-shared environment factors (Plomin, DeFries, McClearn,

& McGuffin, 2001). In the case of complex traits that are likely to be influenced by multiple

factors, the genetic component of variance refers to the influence of alleles at all gene loci that

affect the trait. The similarity between twins for any particular trait may be due to a common

set of genes. It may also be due to the shared environment, which refers to environmental

influences that vary in the population and that contribute to the similarity between co-twins.

For example, twins experience similar conditions during gestation, have the same

socioeconomic status, live in the same family, and usually go to the same school. These factors

could reasonably be expected to increase similarity between co-twins. Non-shared

environment refers to any aspect of environmental influence that makes co-twins different from

each other, including measurement error. Such influences involve aspects of the environment

that are specific to an individual, such as traumas and diseases, idiosyncratic experiences,

different peers, differential treatment by the parents and teachers, and different perceptions of

such influences.

Genetic influence can be estimated by comparing intraclass correlations for monozygotic (MZ)

twins, who are genetically identical, and dizygotic (DZ) twins, whose genetic relatedness is on

average .50. For this reason, the phenotypic variance of a trait is attributed to genetic variance

(called heritability) to the extent that the MZ twin correlation exceeds the DZ twin correlation.

The relatedness for shared (common) environmental influences is assumed to be 1.0 for both

MZ and DZ twin pairs who grow up in the same family, because they experience similar

prenatal and postnatal environments. Shared environmental influences are indicated to the

extent that DZ twins’ correlation is more than half of the MZ correlation.

Twin intraclass correlations for the three measures are shown in Table 2. The results for all

categories are similar and are consistent with those previously reported for teacher-assessed

global measures of mathematics in the same sample when the children were 7 years old (Kovas,

Harlaar, Petrill, & Plomin, 2005;Oliver et al., 2004). The twin correlations of .51–.59 for MZ

and .24–.42 for DZ twins suggest at least moderate genetic influences, with environmental

factors being primarily of nonshared nature. Correlations for same-sex and opposite-sex DZ

twins are similar, suggesting the absence of sex differences in genetic and environmental

parameter estimates.

To assess whether being in the same class and having the same teacher increased similarity

between co-twins and affected the genetic findings, we reran our correlational analyses,

splitting the data by same versus different teachers. The two groups were equal in size. The

correlations were highly similar for the two groups; for example, for the Understanding

Numbers measure, the MZ correlations were .70 and .71 for same and different teacher,

respectively; for DZ twins, the correlations were .45 and .49, respectively.

Model Fitting

Structural equation model fitting is a comprehensive way of estimating variance components

based on the principles described earlier. In this study, structural equation modeling was used

to examine (a) the genetic and environmental etiology of individual differences across the range

of ability; (b) the genetic and environmental etiology of low mathematical performance

Kovas et al.Page 7

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 8

(extremes analyses); and (c) the question of whether the same or different factors influence

individual differences in boys and girls across the whole range of ability and in the low-ability

group.

Individual Difference Analyses

Variations of the so-called ACE model can be used for analyses of individual differences. The

ACE model apportions the phenotypic variance into genetic (A), shared environmental (C),

and non-shared environmental (E) components, assuming no effects of nonadditive genetics

or nonrandom mating. The ACE parameters and their confidence intervals can be estimated

by fitting the models to variance/covariance matrices using the model fitting program Mx

(Neale, 1997).

There are three possibilities (see Table 3) with respect to the causes of individual differences

in boys and girls, regardless of the mean differences between the sexes (Neale & Maes,

2003). The first possibility is that different genetic and environmental factors are responsible

for individual differences in mathematics for boys and girls—these are called qualitative

differences. Such sex-specific effects are not limited to genes on the X chromosome but can

also involve genes on the autosomal chromosomes that affect boys and girls differently—for

example, because the genes interact with sex hormones. The second possibility, which is not

mutually exclusive with the first, is that the same etiological influences affect individual

differences in boys and girls, but that they do so to a different extent—these are known as

quantitative differences. The third possibility is that even if there are mean differences, there

are no differences in the etiology of individual differences for boys and girls—the same genes

and environments operate to the same extent in both sexes. That is, boys as a group may exhibit

a disadvantage in mathematics, but the factors that make one boy different from another are

the same as those that make one girl different from another girl. These three possibilities

(qualitative differences, quantitative differences, and no differences) were assessed using sex-

limited structural equation modeling.

Each possibility is associated with a set of parameters in the models. Qualitative differences

are reflected in the genetic correlation (rg) between DZ opposite-sex twins. In DZ same-sex

pairs, the assumption is that on average the twins share 50% of their varying DNA, and the

coefficient of genetic relatedness (i.e., the genetic correlation between the two children) is

therefore .5. If there are qualitative differences in etiology between boys and girls (i.e., different

genetic and environmental factors), the genetic correlation in DZ opposite-sex twins will be

less than .5. If there are quantitative differences (i.e., the same factors, but exerting different

magnitudes of effect) rather than qualitative differences, the genetic correlation for DZ

opposite-sex pairs will be .5, but the parameter estimates for the A, C, and E components will

be significantly different for male–male pairs and female–female pairs. If there are no

differences between boys and girls, the DZ opposite-sex (DZ-os) pairs will have a genetic

correlation of .5 and the A, C, and E estimates for male–male and female–female pairs will be

the same, although the phenotypic variance might nonetheless differ for the two sexes because

mean differences are often associated with variance differences (i.e., higher means have higher

variances).

Using the model-fitting program Mx (Neale, 1997), we first tested the full model, which allows

all parameters to vary: rg in the DZ opposite-sex pairs; A, C, and E estimates; and variance

estimates. This model was fit to variance/covariance matrices derived from the data. A series

of nested models was then tested. The first nested model was a common-effects sex-limitation

model, which fixes rg to .5 in the DZ-os, but allows different A, C, E and variance estimates.

The second nested model was a scalar-effects sex-limitation model, which constrains the rg in

the DZ-os, as well as the A, C, and E parameters, but allows differences in phenotypic variance

between male and female twins by modeling the variance in one sex to be a scalar multiple of

Kovas et al.Page 8

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 9

the variance in the other sex. The third and final nested model tests the null hypothesis and

constrains all the parameters to be equal for male and female twins.

For each model, the ACE parameters and their confidence intervals were estimated. The overall

fit of each model was evaluated using three indices. The χ2 statistic, where degrees of freedom

equal the number of observed correlations minus the number of estimated parameters, indicates

the fit of the full model and also tests the fit of nested models, with a lower value indicating

better fit (with degrees of freedom equal to the difference in degrees of freedom between the

full and nested models). The other two indices used were Akaike’s information criterion (AIC

= χ2–2df; Akaike, 1987) and the root mean square error of approximation (RMSEA), with

lower values representing better fitting models.

The results of the model fitting are summarized in Table 4. All model comparisons favored the

null model. In other words, fixing ACE parameters and variances to be the same for boys and

girls and fixing the genetic correlation for DZ opposite-sex twin pairs to .5 resulted in

nonsignificant changes in the fit of the model. This suggests that the quantity and quality of

genetic and environmental effects are the same in boys and girls. For this reason, only the

parameter estimates from the null model are presented in Table 4.

As suggested by the twin correlations in Table 2, the overall pattern of results suggests that all

measured aspects of mathematics are moderately heritable (.32–.45). Shared environment has

a much smaller influence on these abilities (.07–.23), although the effect is significant for two

of the tests, as indicated by the 95% confidence intervals that do not include zero. Non-shared

environmental influence is moderate for all measures (.42–.48), which is partly due to the

inclusion of measurement error in this component. Although the heritability of the

NonNumerical Processes test is somewhat lower (.32) than the heritability of the other two

tests (.42 and .45), the overlapping 95% confidence intervals indicate that this difference is not

significant. We obtained similar results (available from the authors) when a composite of the

three measures was analyzed.

Extremes Analyses

Probands and Probandwise Concordances—For each of the measures, we defined

probands as either 5% and 15% of the whole sample, identifying statistically low performance

on that measure. Table 5 shows the means, ranges, and standard deviations (total number of

correct items) for probands for each of the three measures for both proband cutoffs. Table 5

also refers to National Curriculum standardization information in relation to these scores. These

data suggest that the probands in our sample (at both cutoffs) represent a group of children

performing below their grade expectation and failing items that are solved correctly by the

majority of much younger children.

Probandwise concordances (i.e., the ratio of the number of probands in concordant pairs to the

total number of probands) were calculated separately for each measure and each of the five

sex–zygosity groups. Probandwise concordances represent the risk that a co-twin of a proband

is affected (Plomin et al., 2001). Table 6 shows that concordances for MZ twins are higher than

for DZ twins, suggesting a genetic influence. The results are similar for boys and girls and for

same-sex and opposite-sex DZ twins.

Liability–Threshold Analysis—Unlike twin correlations, twin concordances cannot be

used to estimate genetic and environmental parameters, because they do not in themselves

include information about the population incidence. The liability–threshold model, which is a

natural extension of quantitative genetic models for quantitative traits, is widely used in

genetics to analyze concordance data for dichotomous traits (Sham, 1998). The model assumes

an underlying continuous liability that has a normal distribution, with a mean of 0 and a variance

Kovas et al.Page 9

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 10

of 1 in the general population. If the liability to a disorder is quantitative rather than categorical,

the disorder is assumed to be present in all individuals whose liability is above a certain

threshold value and to be absent in all other individuals. The value of the threshold can be

estimated from the population frequency of the disorder. The liability is not measured directly

but is estimated from the observed categorical data. For the purposes of this study, the data

from the entire twin sample were organized into 2 × 2 contingency tables, where the cells

represent pairs in which both twins are unaffected, both twins are probands, and two discordant

cells where one twin or the other is a proband. These data can be used to quantify genetic and

environmental sources of variation in liability in the population. In this study, a structural

equation model was fit to the contingency tables by maximum likelihood, using the Mx program

to estimate ACE parameters (Neale, 1997).

A full sex-limited liability–threshold model and a series of nested models were tested and

compared; the full model, common-effects model, and scalar-effects model used were the same

as those used in the individual difference modeling. Moreover, the null model in this case

equated thresholds for male and female twins; the threshold corresponds to the proportion of

affected individuals for the two sexes. Whether the same threshold could be fit for male and

female twins was tested to determine whether rates of disorder differed for male and female

twins.

Comparisons for all except one model favored the null-effects sex-limitation model that

constrains genetic and environmental parameters and the thresholds of prevalence to be equal

between the sexes. In other words, as was the case for the individual difference analyses, fixing

ACE parameters to be the same for boys and girls and fixing the genetic correlation for DZ

opposite-sex twin pairs to .5 resulted in nonsignificant changes in the fit, suggesting that the

quantity and quality of genetic and environmental effects are the same in boys and girls. The

prevalence of mathematical disability was also generally the same for boys and girls.

Table 7 summarizes the results of the best-fit models for the low extremes of the three measures.

As expected from the twin concordances in Table 6, and similar to the individual difference

results, the three measures show significant and substantial heritability, modest shared

environmental influence that reaches significance in only one of the six analyses, and moderate

but significant non-shared environmental influence. There is some suggestion that heritability

may be greater at the very extreme end of the distribution (bottom 5% of our sample), although

the sample sizes are very small for this extreme cutoff, resulting in very wide confidence

intervals. The overlapping confidence intervals for heritability estimates at the 5% and 15%

cutoffs indicate that the heritability estimates do not differ significantly.

DF Extremes Analysis—The liability–threshold model assumes a continuous dimension,

even though it is based on dichotomous data (i.e., presence or absence of disability). The

liability–threshold analysis converts dichotomous diagnostic data to poly-choric correlations

(Smith, 1974) to analyze the hypothetical construct of a threshold with an underlying

continuous liability. In contrast, DF extremes analysis assesses rather than assumes a

continuum. If all of the assumptions of the liability–threshold model are correct for a particular

disorder, it will yield results similar to DF extremes analysis to the extent that the quantitative

dimension assessed is linked genetically to the qualitative disorder (Plomin, 1991).

DF extremes analysis assesses genetic links between disability and ability by bringing together

dichotomous diagnoses of disability and quantitative traits of ability. Rather than assessing

twin similarity in terms of individual differences on a quantitative trait of ability or in terms

of concordance for a diagnostic cutoff, DF extremes analysis assesses twin similarity as the

extent to which the mean standardized quantitative trait score of co-twins is as low as the mean

standardized score of selected extreme or diagnosed probands (see Plomin & Kovas, 2005, for

Kovas et al.Page 10

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 11

a detailed explanation of DF extremes analysis and for discussion on contrasting the three

methods). This measure of twin similarity is called a group twin correlation (or transformed

co-twin mean) in DF extremes analysis, because it focuses on the mean quantitative trait score

of co-twins rather than on individual differences. Genetic influence is implied if group twin

correlations are greater for MZ than for DZ twins. Doubling the difference between MZ and

DZ group twin correlations estimates the genetic contribution to the average phenotypic

difference between the probands and the population. The ratio between this genetic estimate

and the phenotypic difference between the probands and the population is called group

heritability. It should be noted that group heritability does not refer to individual differences

among the probands—the question is not why one proband has a slightly greater disability than

another, but rather why the probands as a group have a much greater disability than the rest of

the population.

Although DF extremes group heritability can be estimated by doubling the difference in MZ

and DZ group twin correlations (Plomin, 1991), DF extremes analysis is more properly

conducted using a regression model (DeFries & Fulker, 1988). The DF extremes model fits

standardized scores for MZ and DZ twins to the regression equation, C = B1P + B2R + A, where

C is the predicted score for the co-twin, P is the proband score, R is the coefficient of genetic

relatedness (1.0 for MZ twins and .5 for DZ twins), and A is the regression constant. B1 is the

partial regression of the co-twin score on the proband—an index of average MZ and DZ twin

resemblance independent of B2. The focus of DF extremes analysis is on B2. B2 is the partial

regression of the co-twin score on R independent of B1. It is equivalent to twice the difference

between the means for MZ and DZ co-twins adjusted for differences between MZ and DZ

probands. In other words, B2 is the genetic contribution to the phenotypic mean difference

between the probands and the population. Group heritability is estimated by dividing B2 by the

difference between the means for probands and the population.

Finding group heritability implies that disability and ability are both heritable and that there

are genetic links between the disability and the normal variation in the ability. That is, group

heritability itself, not the comparison between group heritability and the other estimates of

heritability, indicates genetic links between disability and ability. If a measure of extremes (or

a diagnosis) were not linked genetically to a quantitative trait, group heritability would be zero.

For example, it is possible that a severe form of learning disability is due to a single-gene

disorder that contributes little to the normal variation in learning ability. However, most

researchers now believe that common disorders are caused by common genetic variants—the

common disease/common variant hypothesis (Collins, Euyer, & Chakravarti, 1997)—rather

than by a concatenation of rare single-gene disorders. To the extent that the same genes

contribute to learning disability and normal variation in learning ability, group heritability can

be observed, but the magnitude of group heritability depends on the individual heritability for

normal variation and the heritability of disability gleaned from concordances for disability.

In this study, scores were standardized and transformed to adjust for proband mean differences

between MZ and DZ groups, so that genetic and environmental parameters could be estimated

from model fitting, on the basis of the regression, C(M) = B1P(M) +B2R + A, where C(M), the

co-twin’s mathematics score, is predicted from P(M), the proband’s mathematics score, and

the coefficient of relatedness (R), which is 1.0 for MZ (genetically identical) and .5 for DZ

twins (who are on average 50% similar genetically). The regression weight B2 estimates group

heritability because it tests whether twin similarity varies as a function of the degree of genetic

relatedness (R).

The results from the DF extremes analysis are presented in Table 8. Group heritabilities for all

measures are significant and substantial, suggesting a genetic link between ability and

disability. Shared environmental estimates are modest and generally not significant. Similar

Kovas et al.Page 11

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 12

to the results from the liability–threshold analysis, there is again a hint of greater genetic

influence for disability for two of the three measures, but again these differences are not

significant.

Finally, Figure 1 summarizes the genetic results of the study, presenting point estimates and

confidence intervals for heritability from analyses of individual differences in ability

(“continuous ability”) and for disability using two cutoffs (5% and 15%) as analyzed by DF

extremes analyses and by the liability–threshold model. It can be seen that all estimates are

similar, although heritability is somewhat greater for the 5% cutoff for two of the measures.

Discussion

The first aim of our study was to investigate the pattern of genetic and environmental

contributions to the typical range of individual differences for three aspects of mathematics.

The results showed that the relative contributions of genes and environments to individual

differences in diverse aspects of mathematics are similar. Genetic influence was moderate for

all three measures. Consistent with previous research, we found that most environmental

influences on diverse aspects of mathematics were nonshared, although this estimate also

includes measurement error. This means that the environmental factors that shape individual

differences in mathematics are mostly not shared between co-twins (even MZ twins) in the

same family. In other words, having the same parents, the same socioeconomic status, and

going to the same school does not contribute to the similarity between the two children beyond

the similarity due to their shared genes. The analysis of the data by same versus different

teachers showed no differences in correlations for the two groups. These findings suggest that

being in the same class does not contribute to children’s similarity in mathematical

performance. That is, classroom environments affect mathematical ability in different children

(even twins and even MZ twins) in different ways; these effects are subsumed under the

nonshared environmental estimate.

The second aim of the study was to investigate whether the etiology of mathematical difficulties

is similar and is linked to that of typical performance in mathematics. Despite some apparent

differences in the estimates between individual differences and extremes (see Tables 4 and 7),

heritability estimates from the individual difference analyses never fell outside the 95%

confidence intervals of the estimates from the analyses of extremes. Overall, it can be concluded

that the ACE estimates for individual differences and extremes are similar, at least within the

limits of the power provided by this largest yet twin study. That is, liability to disability in

different aspects of mathematics and individual differences in mathematical abilities are

influenced by moderate genetic influences and modest shared environmental influences.

Moreover, group heritabilities from DF extremes analysis were substantial, which suggests

strong links between genetic etiologies of mathematical difficulties and variation in

mathematical ability in the typical range. There is some suggestion that genetic factors may be

more important at the very low end of performance. This might mean that beyond a certain

threshold of accumulated genetic risk factors, the influence of some otherwise relevant

environmental factors is suppressed.

The third aim of the study was to investigate any possible sex differences in the etiology of

typical and low mathematics performance. The results showed negligible mean sex differences

for all measures, explaining less than 1% of the variance. In terms of etiology, no sex differences

were found in either the whole range of ability or the low extremes. The results suggest that

the same factors affect individual differences in boys and girls to the same extent (quantitative

effects). Similarly, the same factors are responsible for low performance for both boys and

girls for all three measures of mathematics (qualitative effects).

Kovas et al. Page 12

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 13

Similar results for the three aspects of mathematics assessed in this study and their

intercorrelations (.62 on average) suggest that the same genetic and environmental factors

might affect them. Previous multivariate genetic research into learning abilities and disabilities

has suggested that the observed overlap within and between different abilities is largely due to

common genes (Plomin & Kovas, 2005). Our analyses of the first half of the present dataset

(Kovas et al., 2007) suggested that genetic effects are largely shared for the three measures.

What makes them different are nonshared environments, specific to each aspect of

mathematics.

The obvious limitation of this study is that no specific genes or environments were considered.

A next step for genetic research is to find specific genetic markers associated with mathematical

disability, which is now made more feasible using genome-wide association strategies that test

hundreds of thousands of DNA markers simultaneously (Plomin, 2005). Our results suggest

that when such genetic associations are identified, they will be associated with variation in the

whole range of ability, not just with disability. Similarly, our results suggest that the same

nonshared environmental factors will be associated with ability and disability. When specific

genetic and environmental factors are identified, they will facilitate a direct test of our

conclusion that disability is merely the quantitative extreme of a continuum of ability. We

predict that associations will be found throughout the normal distribution, as much for high

performance as for low performance.

Although the results of this study have no immediate implications for the ways in which

mathematics is taught, we believe that they provide grounds for better conceptualizing several

important issues. For example, our findings suggest that common mathematical disability is

etiologically the low end of a quantitative continuum, rather than being driven by unique

genetic and environmental factors that are not involved in individual differences. This finding

has implications for defining mathematical disability as well as for research into factors that

are responsible for low mathematical ability.

The finding that factors that make one boy different from another boy are the same as those

that make one girl different from another girl is also important in that it suggests, for example,

that the same remediation and prevention of mathematical difficulties will be successful for

both sexes. Finally, another far-reaching implication is that environmental factors important

for mathematical ability seem to operate mostly on an individual basis, interacting with a child’s

genetic makeup and other individual-specific environmental influences. It is too early to say

whether this calls for a more individualized education or for the recognition that the same

methods or teaching styles will not be optimal for all students. Moreover, though small, shared

environmental influences were significant for some measures; therefore, research is also

needed that attempts to identify the specific factors responsible for these influences. What this

study demonstrates is that more genetically sensitive research is needed in this area, as it can

make important contributions to the study of mathematical ability and disability, especially

when the specific genes and nonshared environmental factors are identified.

References

Akaike H. Factor analysis and AIC. Psychometrika 1987;52:317–332.

Birnbaum MH. Human research and data collection via the Internet. Annual Review of Psychology

2004;55:803–832.

Collins FS, Euyer MS, Chakravarti A. Variations on a theme: Cataloguing human DNA sequence

variation. Science 1997;278:1580–1581. [PubMed: 9411782]

DeFries JC, Fulker DW. Multiple regression analysis of twin data: Etiology of deviant scores versus

individual differences. Acta Geneticae Medicae et Gemellologicae 1988;37:205–216.

Kovas et al.Page 13

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 14

Freeman B, Smith N, Curtis C, Huckett L, Mill J, Craig IW. DNA from buccal swabs recruited by mail:

Evaluation of storage effects on long-term stability and suitability for multiplex polymerase chain

reaction genotyping. Behavior Genetics 2003;33:67–72. [PubMed: 12645823]

Good RH III, Salvia J. Curriculum bias in published, norm-referenced reading tests: Demonstrable effects.

School Psychology Review 1988;17(1):51–60.

Gosling SD, Vazire S, Srivastava S, John OP. Should we trust web-based studies? A comparative analysis

of six preconceptions about Internet questionnaires. The American Psychologist 2004;59:93–104.

[PubMed: 14992636]

Hyde JS. The gender similarities hypothesis. The American Psychologist 2005;60:581–592. [PubMed:

16173891]

Kovas Y, Harlaar N, Petrill SA, Plomin R. “Generalist genes” and mathematics in 7-year-old twins.

Intelligence 2005;33:473–489. [PubMed: 19319204]

Kovas Y, Petrill SA, Plomin R. The origins of diverse domains of mathematics: Generalist genes but

specialist environments. Journal of Educational Psychology 2007;99(1):128–139. [PubMed:

19756208]

Kraut R, Olson J, Banaji M, Bruckman A, Cohen J, Couper M. Psychological research online: Report of

the Board of Scientific Affairs’ Advisory Group on the conduct of research on the Internet. The

American Psychologist 2004;59:105–117. [PubMed: 14992637]

Maccoby, EE.; Jacklin, CN. The psychology of sex differences. Stanford, CA: Stanford University Press;

1974.

McGue M, Bouchard TJ. Adjustment of twin data for the effects of age and sex. Behavior Genetics

1984;14:325–343. [PubMed: 6542356]

Naglieri JA, Drasgow F, Schmit M, Handler L, Prifitera A, Margolis A, et al. Psychological testing on

the Internet: New problems, old issues. The American Psychologist 2004;59:150–162. [PubMed:

15222858]

Neale, MC. Mx: Statistical modeling. Richmond, VA: Department of Psychiatry; 1997.

Neale, MC.; Maes, HHM. Methodology for genetic studies of twins and families. Dordrecht, the

Netherlands: Kluwer; 2003.

NFER-Nelson. Maths 5–14 series. London: Author; 1994.

NFER-Nelson. Maths 5–14 series. London: Author; 1999.

NFER-Nelson. Maths 5–14 series. London: Author; 2001.

Oliver B, Harlaar N, Hayiou-Thomas ME, Kovas Y, Walker SO, Petrill SA, et al. A twin study of teacher-

reported mathematics performance and low performance in 7-year-olds. Journal of Educational

Psychology 2004;96:504–517.

Pennington, BF.; Olson, RK. Genetics of dyslexia. In: Snowling, M.; Hulme, C., editors. The science of

reading: A handbook. Oxford: Black-well; 2005. p. 453-472.

Plomin, R. Genetic risk and psychosocial disorders: Links between the normal and abnormal. In: Rutter,

M.; Casaer, P., editors. Biological risk factors for psychosocial disorders. Cambridge, UK:

Cambridge University Press; 1991. p. 101-138.

Plomin R. Finding genes in child psychology and psychiatry: When are we going to be there? Journal of

Child Psychology and Psychiatry and Allied Disciplines 2005;46:1030–1038.

Plomin, R.; DeFries, JC.; McClearn, GE.; McGuffin, P. Behavioral genetics. 4. New York: Worth; 2001.

Plomin R, Kovas Y. Generalist genes and learning disabilities. Psychological Bulletin 2005;131:592–

617. [PubMed: 16060804]

Sham, P. Statistics in human genetics. London: Arnold; 1998.

Smith C. Concordance in twins: Methods and interpretation. American Journal of Human Genetics

1974;26:454–466. [PubMed: 4858380]

Spelke ES. Sex differences in intrinsic aptitude for mathemaics and science? A critical review. The

American Psychologist 2005;60:950–958. [PubMed: 16366817]

Trouton A, Spinath FM, Plomin R. Twins Early Development Study (TEDS): A multivariate longitudinal

genetic investigation of language, cognition and behaviour problems in childhood. Twin Research

2002;5:444–448. [PubMed: 12537874]

Wechsler, D. Wechsler intelligence scale for children. 3. London: Psychological Corp; 1992.

Kovas et al.Page 14

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 15

Biographies

Yulia Kovas, PhD, is a researcher at the Social, Genetic and Developmental Psychiatry

Research Centre (SGDP), Institute of Psychiatry, London. Her current interests include the

genetic and environmental etiology of individual differences in learning abilities and

disabilities and the etiology of comorbidity between different learning disabilities.

Claire M. A. Haworth, BA, is a PhD student at the SGDP, Institute of Psychiatry. Her current

interests include the genetic and environmental etiology of individual differences in academic

abilities and disabilities and the etiology of change and continuity throughout childhood.

Stephen A. Petrill, PhD, is a professor of biobehavioral health and assistant director of the

Center for Developmental and Health Genetics at the Pennsylvania State University. His

interests involve understanding the genetic and environmental influences on the development

of and covariance between early reading, math, and language skills.

Robert Plomin, PhD, is professor of behavioral genetics at the Institute of Psychiatry, where

he is deputy director of the SGDP. His current interest is in harnessing the power of molecular

genetics to identify genes for psychological traits.

Kovas et al.Page 15

J Learn Disabil. Author manuscript; available in PMC 2010 November 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript