Page 1

Original Contribution

A Method for Using Incomplete Triads to Test Maternally Mediated Genetic

Effects and Parent-of-Origin Effects in Relation to a Quantitative Trait

Emily O. Kistner1, Claire Infante-Rivard2, and Clarice R. Weinberg1

1Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC.

2Department of Epidemiology, Biostatistics and Occupational Health, Faculty of Medicine, McGill University, Montre ´al,

Que ´bec, Canada.

Received for publication May 16, 2005; accepted for publication September 13, 2005.

Theauthorsrecentlydevelopedasemiparametric family-based testforlinkageandassociationbetweenmarkers

and quantitative traits. This quantitative polytomous logistic regression test allows for analysis of families with

incomplete information on parental genotype. In addition, it is not necessary to assume normality of the quantitative

trait. Previous simulations have shown that the new test is as powerful as the other widely used tests for linkage

disequilibrium in relation to a quantitative trait. Here the authors propose an extension to quantitative polytomous

logistic regression that allows testing for maternally mediated effects and parent-of-origin effects in the same

framework. Missing data on parental genotype are accommodated through an expectation-maximization algorithm

approach. Simulations show robustness of the new tests and good power for detecting effects in the presence or

absence of offspring effects. Methods are illustrated with birth weight and gestational length, two quantitative

outcomes for which data were collected in a Montreal, Canada, study of intrauterine growth restriction between

May 1998 and June 2000.

association; cytochrome P-450 enzyme system; epidemiologic methods; genomic imprinting; linkage (genetics);

logistic models; polymorphism, single nucleotide

Abbreviation: CYP, cytochrome P-450.

Family-based tests of linkage and association between

markers and quantitative traits are popular in part because

the tests are valid in the presence of genetic stratification,

whereas straightforward case-control designs do not allow

valid testing with admixed populations. Some tests accom-

modate data from incomplete nuclear families. Some meth-

ods, such asthe quantitativetransmission disequilibrium test

and the Family Genotype Analysis Program, assume that the

trait has, or has been transformed to have, a normal distribu-

tion (1, 2). Other methods, such as the family-based associ-

ation test, are nonparametric and rely on covariance between

a function of the trait and the genotype (3). For our methods,

we consider a design in which a diallelic gene is studied in

triads consisting of individuals and their parents and a quan-

titative trait is measured in the offspring. In a previous paper

(4), we proposed a semiparametric test, using a polytomous

logistic regression and expectation-maximization approach,

whichallowsfortraitsthatarenotnormallydistributedandac-

commodatesfamilieswithmissingdataonparentalgenotypes.

In addition to testing for effects of the inherited allele in-

dicatinglinkageandassociationbetweenatraitandamarker,

researchersmayalsobeinterestedintestingmaternaleffects,

which can influence the offspring through the intrauterine

environment, or parent-of-origin effects, in which an inher-

ited allele can influence the offspring differently depending

on whether it was maternal or paternal in origin. The latter

occursinspecieswithplacentasthroughanepigeneticmech-

anism called ‘‘imprinting’’ (5).

Correspondence to Dr. Clarice Weinberg, Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park,

NC 27709 (e-mail: weinberg@niehs.nih.gov).

1

American Journal of Epidemiology

Copyright ª 2005 by the Johns Hopkins Bloomberg School of Public Health

All rights reserved; printed in U.S.A.

DOI: 10.1093/aje/kwj030

American Journal of Epidemiology Advance Access published December 7, 2005

by guest on May 30, 2013

http://aje.oxfordjournals.org/

Downloaded from

Page 2

Previously, mixture models have been suggested which

allow testing of offspring effects, maternal effects, and

parent-of-origin effects with triad data (6). The mixture

model approach allows for missing data on parental geno-

types, but specialized software is required. Other methods in

the literature include a variance components approach for

assessing parent-of-origin effects, a method which requires

knowing which copies of the alleles are inherited from the

sameparentalchromosomes(or‘‘identity-by-descent’’geno-

type information) among siblings (7). In addition, a simple

linear model approach has been suggested, which does not

allow for missing data on parental genotypes (8). Neither the

family-based association test nor the quantitative transmis-

siondisequilibriumtestallowsformaternaleffectsorparent-

of-origin effects.

With genotyped triads, we show in this paper how both

phenomena can be tested within the polytomous logistic re-

gression framework previously described (4). The proposed

approach allows for missing parental data and does not re-

quire sibling pairs, although information from multiple off-

spring can be incorporated, as we demonstrated in another

paper (9). When including multiple siblings, dependence

among siblings is accounted for by using weighted score

functions. These tests extend the log-linear model for qual-

itative traits (10–12). Simulations are used to assess their

robustness and power. We illustrate these methods by testing

for effects of a cytochrome P-450 (CYP) variant allele on

both birth weight and gestational length.

MATERIALS AND METHODS

Proposed complete-data approach

Suppose we have triad genotype data for a diallelic (or

dichotomized) marker and quantitative trait data for the off-

spring. Let C equal 0, 1, or 2 depending on the number of

copies of the variant allele the child carries, with M and F

being similarly definedfor the motherand father. The choice

of which allele to designate as the ‘‘variant’’ turns out not to

have any effect on the inference. Let X denote the quantita-

tive trait value for the offspring. The unordered parental

genotypes define six mating types, denoted as MT ¼ f00,

01, 02, 11, 12, 22g.

Our previously proposed method modeled the offspring’s

genotype as a function of the quantitative trait, conditioning

on parental genetic mating type (4). The idea was that if the

genetic marker under study does not influence the trait and is

not in linkage with any gene that does, its transmission from

parents to children should be random and not related to the

child’s trait value. In the previously proposed quantitative

polytomous logistic regression method, a likelihood was

based on a model for Pr[C ¼ cjMT ¼ mt, X ¼ x] (4). Under

the nullhypothesis,thismodel willhavezerocoefficientsfor

x, providing a simple test using standard software.

When maternal effects or parent-of-origin effects are of

interest, a straightforward extension to the above model can

be written. The null hypothesis for maternal effects is that

for a given pair of parental genotypes, the question of which

one is maternal and which one is paternal is unrelated to the

quantitative trait in the offspring. The null hypothesis for

parent-of-origin effects is that the value of the quantitative

trait in a heterozygous offspring is unrelated to whether the

variant copy came from the mother or the father, conditional

on the unordered parental genotypes. We will not need to

assume mating symmetry in the population at large; for ex-

ample, the proportion of parent couples with F ¼ 1, M ¼ 0

need not equal that with F ¼ 0, M ¼ 1. The extension relies

on a factorization, such that

Pr½M;CjMT;X?¼Pr½CjMT;X?3Pr½MjMT;X;C?:

The test of offspring effects and the test of maternal or

parent-of-origin effects can be considered independently

becausetheabovelikelihoodfactors.Thefirstfactorinequa-

tion 1 can be modeled using the original quantitative polyt-

omous logistic regression model. The model for the second

factor, which depends on maternally mediated effects and

parent-of-origin effects, is developed below.

A logistic regression model is based on informative fam-

ilies, in which the mother’s and father’s genotypes differ—

that is, on three mating types, MT ¼ f01, 02, 12g. We model

the probability that the mother has more copies than the

father for each mating type. Omitting parent-of-origin ef-

fects, the conditional probabilities that (M > F) are indepen-

dent of C and can be written

ð1Þ

Pr½M ¼1jMT ¼01;X ¼x?

¼expðd01xþl01Þ=½1þexpðd01xþl01Þ?;

ð2Þ

Pr½M ¼2jMT ¼12;X ¼x?

¼expðd12xþl12Þ=½1þexpðd12xþl12Þ?;

ð3Þ

Pr½M ¼2jMT ¼02;X ¼x?¼expððd01þd12Þx

þl02=½1þexpððd01þd12Þxþl02?:

For this model, which assumes that there are no parent-of-

origin effects, two parameters are of interest, d01and d12.

We restrict the parameters, such that d02¼ d01þ d12, im-

posing the simplifying assumption that the shift in the off-

spring’s X value for a mother with two copies of the variant

allele versus zero copies would be the sum of the hypothet-

ical shift associated with one copy versus zero copies plus

the shift associated with two copies versus one copy.

The parameter d01(d12) allows the quantitative trait to be

systematically higher or lower if the mother has one copy

(two copies) of the variant allele relative to a mother with

zero copies (one copy). d01can also be interpreted as the

change in the log odds of the mother’s having one copy of

the variant as opposed to zero copies for every one-unit in-

crease in the trait; a similar interpretation applies to d12. The

intercepts lMFare nuisance parameters that allow for mating

asymmetries unrelated to the trait under study.

If parent-of-origin effects are also of interest, a binary

variable is used to indicate whether the offspring inherited

only one copy of the variant allele—denoted, for example,

by I(C¼1). This indicator is multiplied by the trait and in-

cluded in the model, thereby allowing the trait coefficient to

be different for a child with a single maternal copy. This

works because when the child has one copy and M > F, that

ð4Þ

2 Kistner et al.

by guest on May 30, 2013

http://aje.oxfordjournals.org/

Downloaded from

Page 3

one copy has to have come from the mother. For example,

within the MT ¼ 01 mating type, the new probability that

(M > F) can be written

Pr½M ¼1jMT ¼01;X ¼x;C¼c?

¼

expðd01xþc1xIðc¼1Þþl01Þ

1þexpðd01xþc1xIðc¼1Þþl01Þ:

Table 1 shows the logits of the conditional probabilities for

the model of both maternal and parent-of-origin effects.

Here the parameter k1can be interpreted as the change in

the log odds that a heterozygous child inherited a maternal

copy of the variant instead of a paternal copy for every one-

unit increase in the trait value.

Following the models described above, when there are no

maternal effects between the marker and the quantitative

trait, the two parameters d01and d12are both zero. Intui-

tively, we exploit the fact that under the null hypothesis, the

relative likelihood that the mother has more copies than the

father should not be correlated with the quantitative trait in

the offspring (13). Under this null hypothesis, the likelihood

ratio test statistic is distributed approximately chi-squared

with 2 df. For the test of no parent-of-origin effects, a 1-df

likelihood ratio test of c1¼ 0 is constructed.

In the following section, we extend the model to allow for

missing data on parental genotype. A complete model for

Pr(MFC ¼ mfcjX ¼ x) makes use of the model for offspring

effects augmented by a marginal model for parental mating

type, as described in our previous paper (4). In addition,

because we include unconstrained intercepts for each mat-

ing type and marginally model the mating type as a function

of x, no Hardy-Weinberg equilibrium, random mating, or

even Mendelian assumptions are necessary to ensure the

validity of this method.

ð5Þ

Proposed missing-data approach

For simplicity, we consider missing paternal genotype

information, but the same approach works for missing data

on maternal genotypes. The missing genotypes are assumed

to be missing ‘‘at random’’ in Little and Rubin’s (14) sense,

but we do not need to assume that rates of missing genotype

data are the same for mothers and fathers. Here we use an

expectation-maximization algorithm described by Dempster

et al. (15). The missing-at-random assumption says that

missingness does not depend on the unknown parental ge-

notype, conditional on the observed data for the family. A

complete model for PrðMFC ¼ mfcjX ¼ xÞ [ pmfcðxÞ; the

probability that M ¼ m, F ¼ f, and C ¼ c given X ¼ x for the

offspring, is written using straightforward conditional prob-

ability algebra:

pmfcðxÞ¼Pr½mjmt;c;x?Pr½cjmt;x?Pr½mtjx?:

The first factor is modeled using our logistic regression

maternal effects model for Pr(M > F). The second and third

factors are from models proposed earlier (see our previous

paper (4) for detailed descriptions). The model of Pr[cjmt,

x] is a polytomous logistic regression model that allows

testing for linkage and association between the offspring’s

trait of interest and the offspring’s genetic marker, while the

model of Pr[mtjx] specifies a marginal model for the paren-

tal mating type as a function of the offspring’s x. This mar-

ginal model for the parents provides full flexibility: We need

not assume that Hardy-Weinberg equilibrium governs the

mating type distributions or that the distribution of the quan-

titative trait does not vary across subpopulations.

Again, if there are no missing data for parents, the loga-

rithm of the corresponding complete-data likelihood is

X

For families in which only one parent and one offspring

have been genotyped, their contribution to the likelihood

is the sum of the probabilities from the multinomial cells

corresponding to the possible M, F, and C. For example,

suppose M and C are 0 and 1. Then F is either 1 or 2, but

not 0. This means that the contribution of this family to the

observed-data log-likelihood is

ð6Þ

logðLÞ¼

i;mfc

logðPrðmfcjxiÞ:

ð7Þ

logðPrðM ¼0;C¼1;F ¼1jX ¼xiÞ

þPrðM ¼0;C¼1;F ¼2jX ¼xiÞÞ:

Generalizing this, with missing data, the logarithm of the

observed-data likelihood is

ð8Þ

logðLÞ¼

R

triadsithat

arecomplete

logðPrðmfcjxiÞ

þ

R

triadsjthat

areincomplete

log

R

m;f;ccompatiblewith

observedgenotypes

forincompletetriadj

PrðmfcjxjÞ

0

B

B

@

B

B

1

C

C

A

C

C

: ð9Þ

The observed-data likelihood in equation 9 is maximized

over choices of model parameters using the expectation-

maximization algorithm (15). The approach involves esti-

mating the data in the expectation step and then maximizing

the complete-data likelihood (equation 7) in the maximiza-

tion step. The theory guarantees that the likelihood for the

observed data, as in equation 9, will increase with each

iteration of the expectation-maximization algorithm, and

convergence will be achieved if a unique maximum exists.

As shown in our previous paper (4), the probabilities that

triads with missing data fall into each of the possible MFC

categoriesareestimatedintheexpectationstep.Considerthe

family described above. The conditional probability that the

missing father has one copy of the variant allele is equal to

TABLE 1.

regression model of maternal effects and parent-of-origin

effects

Probabilities from the polytomous logistic

Parents (MT) Offspring (C) Logit(Pr(M > FjMT, X, C))

d01x þ d12x þ c1x þ l02

d01x þ l01

d01x þ c1x þ l01

d12x þ l12

d12x þ c1x þ l12

021

010

1

121

2

Maternal and Parent-of-Origin Effects for Quantitative Traits3

by guest on May 30, 2013

http://aje.oxfordjournals.org/

Downloaded from

Page 4

PrðM ¼0;F ¼1;C¼1jxiÞ=

½PrðM ¼0;F ¼1;C¼1jxiÞ

þPrðM ¼0;F ¼2;C¼1jxiÞ?:

Current model-based estimates of Pr[mjmt, c, xi], Pr[cjmt,

xi], and Pr[mtjxi] are substituted into pmfc(xi) of equation 6,

which are then, in turn, substituted into equations like

equation10toobtaintheconditionalprobabilitiesassociated

with possible missing parental genotypes, specifically

Pr[F ¼ f jM ¼ m, C ¼ c, X ¼ xi].

Then, in the maximization step, the maternal model de-

scribed above for complete data is used to update the param-

eter estimates with observations from both the complete

triads and the estimated triads. In addition, the polytomous

logistic regression model for Pr[cjmt, xi] and the marginal

model for mating type probabilities Pr[mtjxi] are maxi-

mized again. The algorithm alternates between the expecta-

tion step and the maximization step until the estimates from

the model converge. Once convergence is achieved, the ob-

served-data likelihood of equation 9 is used to compute the

log of the maximized likelihood. We repeat this approach

for the reduced model of interest—for example, the model

without d01and d12. To test for parent-of-origin effects, the

expectation-maximization algorithm and likelihood com-

putation is repeated with and without inclusion of the pre-

dictor xI(C¼1). Using this approach, the desired likelihood

ratio test statistics are computed, based on the change in

minus twice the log of the observed-data likelihood.

ð10Þ

Simulation methods

We generated simulations of 1,000 studies of parent and

offspring triads with offspring quantitative trait xi. The sim-

ulated samples each consistedof either 300 or500 triads, but

not all triads were informative when testing for maternal

effects or a parent-of-origin effect. Note that with diallelic

markers, testing results are the same regardless of which

allele is considered thevariant allele. We mixed two subpop-

ulations, each in Hardy-Weinberg equilibrium with random

mating within each subpopulation, in equal proportion to

form a population with genetic stratification. For one sub-

population, the allele prevalence was 50 percent, and the

mean quantitative trait value l1was 0. For the other sub-

population, the allele prevalence was 90 percent and l1was

1.5. The quantitative trait, x, was normally distributed with

a variance of 1.0 in both subpopulations. Under the null

hypothesis, this produces a strong spurious (in the sense that

it is noncausal) marginal correlation between x and the num-

ber of copies of the variant allele in the admixed population

for all scenarios simulated.

A maternally mediated effect of the marker was simulated

by imposing a shift, k, on the offspring x values across both

subpopulations, where the exact shift depended on the num-

ber of alleles in the mother. For tests of maternal effects, the

true effect corresponded to shifts in the mean quantitative

trait equal to k/2 for M ¼ 1 and k for M ¼ 2. Two scenarios

were generated, with and without offspring effects in addi-

tion to maternal effects. When allowing additional offspring

effects, the true mean was also shifted by ?0.5 for C ¼ 0 and

by 0.5 for C ¼ 2 in comparison with the referent C ¼ 1. For

tests of parent-of-origin effects, in a third scenario the only

effect corresponded to a shift in the mean quantitative trait

value equal to k for offspring inheriting a maternal copy of

the allele.

The horizontal axes in the figures described below repre-

sentthekshiftcorrespondingtothematernaleffect(figures1

and2)andtheparent-of-origineffectofinheritingamaternal

copy (figure 3). Here the variance explained by the maternal

shift k ranges from 0 percent to 30 percent, nearly linearly

acrossthehorizontalaxis(figures1and2).Fortheparent-of-

origin shift, the variance explained by the effect ranges

nearly linearly from 0 percent to 60 percent (figure 3).

In order to quantify the power of the missing-data ap-

proach, we compared the power that would be obtained if

all data were available with the power obtained when imple-

menting the expectation-maximization algorithm approach,

with a random 25 percent of the fathers missing data for all

scenarios considered. Becauseof the symmetry ofthe binary

logistic model, the results would be exactly the same if, in-

stead, 25 percent of the mothers had randomly missing data.

RESULTS

For all of the scenarios described above, the 2-df likeli-

hood ratio maternal effects test and the 1-df parent-of-origin

effects test demonstrated a nearly nominal empirical type I

error at a ¼ 0.05 for sample sizes of 300 and 500 families.

(See the intercepts in figures 1–3.) The standard errors of

the empirical type I error rate are all approximately 0.007

Sample Size and % of Fathers Missing

n = 500, 0%

n = 300, 0%

n = 500, 25%

n = 300, 25%

Estimated Power

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Lambda

0.0 0.2 0.40.6 0.8 1.0 1.21.4

FIGURE 1.

effects. A smooth line was fitted to the data using a spline routine from

SAS/GRAPH (SASInstitute,Inc, Cary,North Carolina).Data followan

additive genotypic effect, meaning that the mean quantitative trait

value is shifted by k/2 for offspring whose mothers have one copy of

the variant allele and by k for offspring whose mothers have two

copies of the allele. The values of k used were in the set f0.0, 0.2, 0.4,

0.6, 0.8, 1.0, 1.2, 1.4g.

Simulation-based estimated power of a test of maternal

4 Kistner et al.

by guest on May 30, 2013

http://aje.oxfordjournals.org/

Downloaded from

Page 5

(for 0.05) when 1,000 data sets are generated. For the ma-

ternal effects test, only approximately 234 families out of

the 500 (140 out of 300) were informing the model, whereas

for the parent-of-origin effects test, only approximately 134

(80 out of 300) families were informing the model. Type I

error rates for nominal levels of 0.01 and 0.10 were also

consistent with the nominal type I error (data not shown)

for all three scenarios, both with complete data and with

incomplete data.

All tests of maternal effects demonstrated good power,

and the expectation-maximization algorithm approach with

25 percent of fathers missing data performed with almost as

much power as the approach for no missing paternal data

(figures 1 and 2). Under the two scenarios with and without

additional offspring effects, the test of maternal effects per-

formed nearly identically (figures 1 and 2). The test for

parent-of-origin effects was less powerful than the test for

maternal effects, because fewer families were informing the

model (figure 3). As was found for the maternal effects

model, the missing data approach with a parent-of-origin

effect was only slightly less powerful than the approach with

no missing data (figure 3).

APPLICATION

Polymorphisms in the gene involved with the production

of CYP proteins, CYP1A1, have been shown to influence the

activation of polycyclic aromatic hydrocarbons. A family-

based design was used to study offspring effects, maternal

effects, and gene-by-environment interactions using single

nucleotide polymorphisms in the CYP1A1 gene, together

with several other candidate genes (16, 17).

For a study of intrauterine growth restriction conducted at

Centre Hospitalier Universitaire Me `re-Enfant de l’Ho ˆpital

Sainte-Justine in Montreal, Canada, between May 1998

and June 2000, case-parent triads were sampled from fami-

lies with newborns weighing less than the 10th percentile

according to gestational age and sex.A sample of unaffected

control families was also selected in order to test for trans-

mission distortion in the overall population, which may exist

if transmissions of the gene are not Mendelian due to allele-

related selective survival. Controls were matched to the

cases on the basis of gestational week, sex,and race. In total,

965 case triads and control triads were genotyped for the

CYP1A1 variants. The log-linear model was usedto estimate

relative risks associated with specific alleles for the 493

case-parent triads (12).

For three variants considered in the CYP1A1 gene, the

only statistically significant effects reported were for an off-

spring CYP1A1*2A variant (17). A decreased risk of intra-

uterine growth restriction was associated with offspring

having one copy of the CYP1A1*2A variant. A relative risk

of 0.73 (95 percent confidence interval: 0.51, 1.05) as-

sociated with one copy and a relative risk of 2.24 (95 per-

cent confidence interval: 0.77, 6.45) associated with two

copies were reported (17). A 2-df likelihood ratio test of

Sample Size and % of Fathers Missing

n = 500, 0%

n = 300, 0%

n = 500, 25%

n = 300, 25%

Estimated Power

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Lambda

0.00.2 0.4 0.60.81.0 1.2 1.4

FIGURE 2.

effects in the presence of offspring effects. A smooth line was fitted to

the data using a spline routine from SAS/GRAPH (SAS Institute, Inc,

Cary, North Carolina). Data follow an additive genotypic effect,

meaning that the mean quantitative trait value is shifted by k/2 for

offspring whose mothers have one copy of the variant allele and by k

for offspring whose mothers have two copies of the allele. The values

of k used were in the set f0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4g. Here

underlying offspring effects were added, such that the mean quanti-

tative trait value was also shifted by ?0.5 for offspring with zero copies

of the allele and by 0.5 for offspring with two copies of the allele.

Simulation-based estimated power of a test of maternal

Sample Size and % of Fathers Missing

n = 500, 0%

n = 300, 0%

n = 500, 25%

n = 300, 25%

Estimated Power

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Lambda

0.00.40.8 1.2 1.62.0 2.4

FIGURE 3.

origin effects. A smooth line was fitted to the data using a spline

routine from SAS/GRAPH (SAS Institute, Inc, Cary, North Carolina).

Data were generated by shifting the mean quantitative trait value by k

for offspring who inherited a maternal copy of the variant allele. The

values of k used were in the set f0.0, 0.4, 0.8, 1.2, 1.6, 2.0, 2.4g. The

trait was simulated from a mixture of normal distributions, both with

constant variance 1.0.

Simulation-based estimated power of a test of parent-of-

Maternal and Parent-of-Origin Effects for Quantitative Traits5

by guest on May 30, 2013

http://aje.oxfordjournals.org/

Downloaded from