Detection of nonneutral substitution rates
on mammalian phylogenies
Katherine S. Pollard,1,4Melissa J. Hubisz,2Kate R. Rosenbloom,3and Adam Siepel2
1Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA;2Department of Biological
Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA;3Center for Biomolecular Science and
Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely
used to identify candidate functional elements in genomic sequences. However, most existing methods consider either
reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the
branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of
substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for
addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions,
and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program
called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36
mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong
selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer
elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of
conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint,
and differences in clade-specific selection in the primate and glires clades. We also describe new ‘‘Conservation’’ tracks in the
[Supplemental material is available online at http:/ /www.genome.org.]
In recent years, the technique of scanning aligned genomic se-
quences for elements that are evolving faster, slower, or by differ-
ent patterns than would be expected under neutral drift has
emerged as a powerful approach for discovering novel functional
elements. This technique is particularly useful in mammalian ge-
nomes,becauseof their size, complexity, andrelativeintractability
in experimental investigation. Computational scans of mamma-
lian genomes have been used to identify various classes of func-
tionalelements,includingprotein-codinggenes(Guigo ´ etal.2003;
Siepel et al. 2007), RNA genes (Pedersen et al. 2006), enhancers
(Nobrega et al. 2003), and micro-RNA target sites (Xie et al. 2005).
These methods have become steadily more valuable as deep
alignments of orthologous sequences—which until recently cov-
ered only a small fraction of the genome (The ENCODE Project
Consortium 2007)—have become available genome-wide (Miller
et al. 2007; Mammalian Genome Sequencing and Analysis Con-
sortium, in prep.).
Awide variety of methodshave been introduced fordetecting
signatures of nonneutral evolution in aligned genomic sequences,
and they can be categorized in various ways (Supplemental mate-
rial S1; Supplemental Table S1). For example, some methods de-
pend on pre-defined annotations for training (as in gene-finding),
while others can be used in a fully ‘‘unsupervised’’ manner with
unannotated genomic sequences; some methods make full use of
the phylogenetic relationships among the species in question,
while others consider only pairwise comparisons; and some
methods make use of statistical models of molecular evolution,
while others use heuristic scores or invoke parsimony assump-
tions. In this study, we focus on unsupervised, statistical, phylo-
genetic methods, which we believe have the greatest promise for
general functional element discovery and characterization, even if
they are sometimes outperformed by more specialized approaches
in particular classification tasks (e.g., Gross et al. 2007).
sequences of interest has been conservation or constraint—that is,
a reduced rate of evolution compared to what is expected under
neutral drift (Boffelli et al. 2003; Margulies et al. 2003; Cooper et al.
been introduced for detecting sequences that are experiencing ‘‘ac-
celeration,’’ or faster-than-neutral evolution, with particular em-
phasis on scanning aligned genomic sequences for fast-evolving el-
ements in the human lineage (Pollard et al. 2006b; Prabhakar et al.
2006; Bird et al. 2007) or other mammalian lineages (Haygood et al.
2007; Kim and Pritchard 2007; see also Wong and Nielsen 2004).
Most conservation-detection methods have assumed uniform selec-
tion pressures across the branches of a phylogeny, but several accel-
(Pollard et al. 2006a; Prabhakar et al. 2006; Bird et al. 2007; Kim and
Pritchard 2007). In addition, most conservation-detection methods
have been designed to scan entire genomic alignments, using a slid-
ing window (Margulies et al. 2003; Cooper et al. 2004), a hidden
have become available, by measuring conservation at individual
nucleotidesand thenidentifying runs of sites with a combinedscore
above an empirically determined threshold (Cooper et al. 2005;
Asthana et al. 2007). In contrast, acceleration-detection methods
have generally been applied to predefined elements of interest.
In this study, we treat conservation and acceleration in
a unified manner and examine the general problem of detecting
departures from the neutral rate of substitution in either direction.
We consider elements of any length (including single nucleotides)
E-mail firstname.lastname@example.org; fax (415) 355-0141.
Article published online before print. Article and publication date are at
online through the Genome Research Open Access option.
110 Genome Research
20:110–121 ? 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10; www.genome.org
and allow for clade-specific selection as well as selection that acts
uniformly across the phylogeny. In these respects, our study is
similar to that of Kim and Pritchard (2007), but we focus more on
questions of methodology, considering various alternative ap-
proaches to this problem. In particular, we conduct an extensive
methods for detecting nonneutral substitution rates on a phylog-
eny: a likelihood ratio test, a score test, a method based on the dis-
tribution of the number of substitutions per site (Siepel et al. 2006),
et al. 2005). We find that all four methods have fairly good power
with currently available data for mammals, but they do have clear
weak or lineage-specific selection. Somewhat surprisingly, the four
tests are nearly identical to one another in power, despite their dif-
ferent statistical underpinnings. We have implemented all four
methods in a program called phyloP (‘‘phylogenetic P-values’’),
which is freely available as part of the PHAST package (http://
compgen.bscb.cornell.edu/phast). We apply phyloP to multiple
alignments of 36 species in the ENCODE regions and analyze pat-
terns of conservation/acceleration for various annotation classes
and clades of interest. We also introduce new ‘‘Conservation’’ tracks
scores (alongside phastCons scores) for genome-wide alignments of
44 vertebrate species.
Statistical tests and software implementation
The general statistical problem considered in this study is to
in a given genomic element, relative to what would be expected
under neutral drift. As in most previous work, we assume a pre-
computed alignment of orthologous sequences from multiple spe-
cies, and a corresponding neutral phylogeny with branch lengths
in units of expected substitutions per site. To allow for sufficient
power for short elements (1–10 bp), we consider tests that group
multiple branches of the tree together, focusing on two particular
types of tests: ‘‘all-branch tests,’’ which examine increases or de-
creases in rate across all branches of the phylogeny; and ‘‘subtree
tests,’’ which examine increases or decreases in rate within a par-
ticular subtree (clade) of interest, relative to the rate in the re-
mainder of the phylogeny (Supplemental Fig. S1). We consider
four alternative methods for testing for nonneutral evolution,
which we denote LRT, SCORE, SPH, and GERP. These methods are
summarized in Table 1 and described in detail in the Methods and
Supplemental material (Supplemental sections S2.1–S2.4). While
these four methods share certain features (e.g., several of them
make use of a common subroutine that estimates branch-length
scaling parameters by maximum likelihood), in general, they have
distributions, and approximations. A major goal of this study is to
compare and contrast their performance empirically, using data
simulated from molecular evolutionary models and permuted real
data. The four tests were implemented in the phyloP program in
the PHAST package (Supplemental section S2.5).
We performed two typesof simulation experiments to evaluatethe
false-positive rates and power of the tests implemented in phyloP.
In the first, ‘‘parametric’’ series of experiments, we generated syn-
thetic alignments from ‘‘neutral’’ and ‘‘selected’’ phylogenetic
models and then measured how well elements of various sizes
could be distinguished based on phyloP scores. In these experi-
ments, neutral sites were generated from a phylogenetic model
estimated from fourfold degenerate (4D) sites in 36-species align-
ments from the ENCODEregions (Methods). Selectedmodels were
generated from this model by scaling all branches by a factor r (all-
branch cases) or the branches in a subtree by a factor l (subtree
cases), for various choices of r and l (Methods). Thus, the simu-
lation scheme reflects the same assumptions as the tests them-
absolute performance. Nevertheless, it is useful for comparing the
relative performance of the different methods under known, pre-
cisely characterized conditions. To complement these parametric
bootstrapping experiments, drawing ‘‘neutral’’ and ‘‘selected’’ align-
ment columns from 4D and second codon position (CDS2) sites,
respectively. (CDS2 sites were chosen because all nucleotide sub-
stitutions at these positions are nonsynonymous.) These simula-
tions require fewer assumptions about the nucleotide substitution
process, but their usefulness depends on how representative CDS2
sites are of the broader class of genomic sites under selection. In
addition, they apply only to the all-branch tests, and not to the
subtree tests. In all experiments, we computed phyloP scores for
false-positive rates(FPRs),true-positive rates (TPRs),false discovery
these statistics and the area under the receiver operating charac-
teristic curve (AUC).
The most striking result from these experiments is that the
four tests have nearly identical power across a broad range of sce-
narios (Fig. 1; Table 2; Supplemental Tables S2–S13), despite being
based on quite different statistical principles. This is true for non-
parametric and parametric simulations alike. Similar levels of
power might be expected when signals are strong, but high con-
cordance is also evident under weak conservation or acceleration.
The only major differences between methods occur for subtree-
specific conservation, for which the LRT method shows somewhat
higher power than the SCORE method, which, in turn, shows
higher power than the SPH method, especially for short elements
(Supplemental Tables S2–S4). The SPH method also displays
slightly reduced power for all-branch tests based on a reduced
species set in the case of strong conservation (Supplemental Table
S9), probably as a result of the highly discrete nature of the test
statistic used by the SPH method (see Discussion). In all other
cases, the methods are essentially indistinguishable.
Second, the absolute level of power for all four methods—
with this 36-species mammalian phylogeny—appears to be fairly
good. The parametric simulations indicate that, in the case of
strong conservation (all-branch rescaling by a factor r = 0:1) or
acceleration (r = 3:3), selection at single nucleotides can be iden-
tified reasonably reliably—for example, with TPRs of 90% (for
r = 0:1) or 92% (for r = 3:3) at a per-element FPR of 5% (AUC of
0.97 and 0.99, respectively). At more moderate levels of conser-
vation (e.g., r = 0:3, as observed on average in phastCons ele-
ments) (Siepel et al. 2005), power is weaker for single nucleotides,
but for 3-bp elements it is possible to achieve a TPR of 97% at a 5%
FPR (AUC = 0:99). Similarly, with acceleration at twice the neutral
rate in 3-bp elements, a TPR of 87% at a 5% FPR is achievable
(AUC = 0:97). For 10-bp elements (about the size of a typical
transcription factor binding site), much milder constraint or
acceleration can be detected with good power (e.g., r=0:7 with
Detection of nonneutral substitution rates
AUC =0:94 or r=1:43 with AUC =0:96) (Table 2). At 50 bp, de-
parturesfrom the neutralrateby only ;10% are reliablydetectable
(data not shown). Detection power increases as r decreases or in-
creases relative to the neutral value of r=1, but it does so more
rapidly for acceleration than for conservation. This behavior ap-
(with no substitutions)—despite their perfect conservation, these
sites typically have nonnegligible probability under the null
model, limiting the ability of the tests to distinguish even extreme
conservation from neutrality. For various element lengths, power
in the nonparametric experiments is comparable to that observed
in parametric (all-branch) experiments at r=0:5.
We attempted to translate these FPRs (for each TPR) into
predicted FDRs (expected fractions of predicted elements that are
false-positives), which are of particular interest in applications in
genomics. If the fraction of sites under selection is g, then
FDR» 1+ð1 ? bÞg
að1 ? gÞ
for FPR a and TPR ð1 ? bÞ (Supplemental section S2.8). When g is
small (as it is believed to be in mammalian genomes), a relatively
small FPR can still produce a large FDR for fixed TPR, essentially
because the pool of sites from which false-positives are drawn is
much larger than the pool from which true-positives are drawn.
However, predicting the FDR is not straightforward because the
true value of g is unknown and sites under selection obey some
unknown distribution of selection scenarios, each with its own
a function of TPR by two indirect methods, focusing on the case of
the all-branch LRT. First, we used CDS2 sites as a proxy for sites
under selection, estimating TPR(s), for score thresholds, as the
fraction of CDS2 elements having score $ s. Second, we estimated
Summary of statistical tests considered in this study
Likelihood ratio testTraditional hypothesis test for
parametric models, central in the
a null model and an alternative
model, defined by different rate
parameters (u0and u1,
respectively), are both fitted to an
alignment X by maximum
likelihood, and twice the
difference in their maximized log
Another traditional hypothesis test,
with similar asymptotic properties
as the LRT but the advantage that
only the null model needs to be
fitted to the data. The test statistic
in this case is derived from the
values of the score function U and
the Fisher information matrix I,
both evaluated at the maximum
likelihood estimate under the null
Test based on the total number of
substitutions n during the
evolution of the element X, under
a phylogenetic model c. An exact
null distribution is computed by
a dynamic programming
algorithm that depends on
uniformization of the continuous-
time Markov chain. The actual
number for the observed data is
approximated by the posterior
mean, which is computed
Test based on a statistic called
‘‘rejected substitutions,’’ defined
as the total branch length of the
neutral phylogeny minus the total
branch length after maximum
likelihood estimation of a scale
factor r. This test can be used in
the all-branch setting but not the
Huelsenbeck and Rannala 1997;
Casella and Berger 2002;
Pollard et al. 2006b
Rao 1948, 2005
E n c;X
Exact p n c j
Siepel et al. 2006
GERPTð1 ? ^ rÞ
EmpiricalCooper et al. 2005
aSee Methods for complete details.
bOption to –method argument in phyloP that specifies each test; also used throughout this study as an abbreviation for the test.
cNull distribution of test statistic assumed when computing P-values. The x2distributions for the LRT and SCORE tests hold asymptotically but are
approximate for finite data sets. See Methods for discussion of issues that arise in one-sided tests.
dThe abbreviation ‘‘SPH’’ stands for ‘‘Siepel-Pollard-Haussler,’’ the authors of the conference paper in which the relevant algorithms were introduced.
Pollard et al.
112 Genome Research
a distribution of phyloP scores for sites under selection by decom-
posing the full score distribution into a neutral component and
a selected component, and calculating TPR(s) from this distribu-
tion (Supplemental section S2.8). We used maximum-likelihood
estimates of g in the CDS2 case and lower-bound estimates in the
mixture decomposition case. These calculations suggest that if
single-nucleotide elementsare subjectto selectiveeffects similarto
those in CDS2 sites, more than half can be detected with FDR » 5%
(Fig. 2). However, much higher FDRs must be tolerated if a large
majority of 1-bp elements are to be detected (e.g., FDR » 50% to
detect two-thirds of 1-bp elements, or FDR » 80% to detect 80%). If
a broader class of elements subject to weaker selective effects is
considered (as inferred by the mixture decomposition method),
power is somewhat weaker, with an ;30% TPR at a 5% FDR and an
;40% TPR at a 50% FDR for 1-bp elements. Power for 3-bp ele-
ments shows a similar overall pattern, but is considerably higher
one-half (mixture) and three-quarters (CDS2) of elements detect-
able at 5% FDR. While these estimates are clearly subject to a great
deal of uncertainty, they suggest that current alignments are suf-
ficiently informative to allow substantial fractions—but not large
majorities—of 1- to 3-bp elements to be detected at low FDRs.
Power will improve as more sequence data become available.
Our third main finding is that the subtree tests (LRT, SCORE,
and SPH only) have substantially lower power than the all-branch
tests, yet do have reasonable power for slightly longer elements,
provided subtrees of adequate size are considered. In our experi-
ments, we considered three different clades, with different num-
bers of species and branch lengths: the primate (14 species, short
branches), glires (five species, longer branches), and laurasiatherian
(10 species, longer branches) clades (Fig. 3). In all cases, power is
poor for individual nucleotides, except in the case of extreme
clade-specific acceleration (subtree rescaling by a factor l=10)
(Supplemental Tables S2–S4). At 3 bp, power is improved for mod-
erate to strong departures from neutrality (l#0:3, l$3:33) but
generally remains poor (Fig. 3). By 10 bp, however, power has im-
proved considerably, with elements under moderate clade-specific
conservation (l=0:3) showing power comparable to that seen for
the all-branchtest for 3-bp elementswith r=0:5 (primates; AUC of
0.93–0.95) or r=0:3 (laurasiatherians; AUC of 0.98–0.99). Pre-
dictably, power is generally highest for the laurasiatherians and
lowest for the primates, with the glires clade being intermediate
To further examine the sensitivity of our results to modeling
assumptions, weappliedphyloPto twoadditionalsets ofsynthetic
alignments, simulated in more realistic ways. First, we generated
data under a model that allows for rate variation across sites (Yang
1994), using parameters estimated from AR and CDS2 sites for
neutral and selected sites, respectively (Supplemental section
S2.7.1). Second, we relaxed the assumption (made by all subtree
tests in phyloP) that all branches in a subtree of interest use one
substitution rate, while all other branches use another, by in-
troducing variousamounts of ‘‘noise’’to the branch-length scaling
factors during data generation (Supplemental section S2.7.2). We
exactly as above (i.e., the tests were not altered to reflect the new
assumptions). These experiments indicated that simplifications in
power somewhat, but in general, the effect is not dramatic, and
relative performance is mostly unaffected (Supplemental sections
S3.5.1 and S3.5.2).
Finally, we compared and evaluated several other aspects of
performance, including running time, two-sided versus one-sided
tests, the effect of considering subsets of species, and the accuracy
of reported P-values. The running times of the LRT and GERP
methods were comparable, while the SCORE method was consid-
erably faster, and the SPH method was considerably slower—by
more than an order of magnitude in some cases. Results of the
other experiments were largely consistent with expectations (Sup-
plemental sections S3.3–S3.7; Supplemental Tables S6–S14).
positive versus true-positive rates for the all-branch tests implemented in
phyloP: (red) LRT, (green) SCORE, (blue) SPH, and (purple) GERP. In-
dividual plots show results for simulated data sets with either 3-bp (top)or
1-bp (bottom)elements generated from models with a range of deviations
r from the neutral rate r=1:0 (columns).
Receiver operating characteristic (ROC) curves showing false-
Area under the ROC curve for phyloP one-sided all-branch tests
1 bp3 bp 10 bp
GERPLRT SCORESPHGERPLRTSCORESPHGERPLRTSCORE SPH
Detection of nonneutral substitution rates
Analysis of ENCODE regions
Having established with simulated data that phyloP performs
fairly well for a range of realistic parameter settings, we next ap-
pliedthe methodto realbiologicaldata. Herewe againmadeuse of
the alignments for the 44 ENCODE regions (Margulies et al. 2007)
(see Methods), which, at present, constitute the largest published
comparative genomic data set for mammals.
First, we analyzed distributions of phyloP scores for various
classes of sites, focusing on the LRT method and three tests—
the all-branch test and subtree tests for the primate and the
glires clades. These scores were produced by running phyloP in
‘‘CONACC’’ mode, which produces positive scores for predicted
conservation and negative scores for predicted acceleration (see
Methods). In the all-branch case, we computed single-nucleotide
scores, but for the subtree tests, which have less power, we com-
puted scores in a 10-bp sliding window. We considered vari-
ous annotation types, including known protein-coding genes
and noncoding RNAs (ncRNAs), putative
transcriptional fragments of unknown
function (Un.TxFrags), sequence-specific
regulatory factor binding regions (RFBR-
Seqsp), and predicted transcription factor
binding sites within ChIP/chip-identified
regions (TFBS). For the protein-coding
genes, we separately considered coding
regions (CDSs; positions CDS1, CDS2, and
CDS3), 59and 39 untranslated regions
(UTRs), 59and 39 flanking regions (200 bp
upstream of the 59 UTR and downstream
For comparison, we also considered scores
in putatively conserved phastCons ele-
ments and putatively neutral ancestral re-
The distributions of all-branch scores
generally in expected ways (Fig. 4A; Sup-
CDS2 sites are strongly enriched for high
slightly more conserved than CDS1 sites,
while CDS3 sites and sites in 59 UTRs,
59 flanks, 39 UTRs, and 39 flanks (in decreasing order) show clear,
functional elements (ncRNAs and TFBSs) show levels of conser-
vation intermediate between CDS1/CDS2 sites and UTRs, and the
bulk distributions for intronic, intergenic, and AR sites are all quite
similar. Un.TxFrag sites show no significant enrichment for con-
straint, as observed previously (The ENCODE Project Consortium
of fast-evolving sites, most likely as a result of an enrichment for
hypermutable CpGs in coding regions (Eo ¨ry et al. 2009). Basewise
phyloP scores can also be summarized at individual positions
within functional elements, by averaging across elements of the
same type. Such ‘‘conservation profiles’’ for protein-coding genes
and transcription factor binding sites highlight several known
features of these elements (Fig. 4B; Supplemental Fig. S5), pro-
viding further validation that phyloP scores capture biologically
meaningful signals in the data.
By decomposing the score distributions for each annotation
class into ‘‘neutral’’ and ‘‘selected’’ components, it is possible to
obtain lower-bound estimates for the fractions of sites that have
experienced long-term selective constraint (Methods). By this ap-
proach, we estimate that 5.3% of all sites in the ENCODE re-
gions show evidence of conservation, in good agreement with
ENCODE Project Consortium 2007). Furthermore, we estimate
that about two-thirds of CDS1 and CDS2 sites have evolved under
constraint, as well as about one-third of ncRNA sites, one-fourth of
CDS3 sites, one-fifth of TFBS sites, and 12%–16% of sites in UTRs
and 59 flanking regions (Fig. 4C). Not surprisingly, the estimated
fraction of constrained sites is highest for phastCons elements
(87.4%). Consistent with previous findings (Asthana et al. 2007),
phastCons elements or annotated CDSs, UTRs, and ncRNAs are
conserved, suggesting that many unannotated functional sites
may remain, even within the ENCODE regions. In general, these
estimated fractions are remarkably concordant with estimates
from a recent genome-wide pairwise analysis of hominid and
in proportion to the values estimated from 4D sites. Three subtrees are highlighted: (maroon) primates,
tests as applied to 3-bp and 10-bp elements under clade-specific selection in the primates (top) and
laurasiatherians (bottom). (The SPH method did not perform as well, and the subtree test is not sup-
ported with the GERP method.) Results are shown for the case in which r=1:0 and l=0:3, meaning that
the clade of interest is evolving at approximately one-third the neutral rate, while the rest of the tree is
Subtree ROC curves. (Left) Phylogenetic tree used in this study, with branch lengths drawn
rate (FDR) versus true-positive rate (TPR) based on two indirect methods,
for 1-bp and 3-bp elements. (CDS2) Average TPRs are estimated from
second codon position sites; (mixture) average TPRs are estimated by
decomposing the genome-wide score distribution into components cor-
responding to neutral and selected sites. Details are given in Supple-
mental section S2.8.
Estimated FDR for all-branch LRT. Estimates of false discovery
Pollard et al.
114 Genome Research
murid genomes, based on quite different methods (Eo ¨ry et al.
Unlikethedistributions of all-branch scores,the distributions
of subtree scores for the primate clade are quite similar across
annotation types, suggesting that the null hypothesis of equal
holds fairly well (Fig. 5A). The glires clade, however, shows much
more pronounced differences in subtree score distributions (Fig.
5B), suggesting an increased tendency for clade-specific selection.
In particular, the CDS, phastCons, 59 UTR, and 59 flank classes
(in decreasing order) show clear shifts toward higher scores in
the glires. This trend holds if a series of strict filters is applied to
the alignments, indicating that it is not an artifact of missing
data or nonorthologous alignments (Supplemental section S2.10).
The observed difference between the primates and glires distribu-
tions also does not appear to result from differences in power
in these two clades (Supplemental section S3.8). The shift toward
larger glires-subtree scores in functional elements appears to
be driven by increases in negative selection rather than decreases
in positive selection, because it is strongest for sites that are
evolving at or beneath the neutral rate outside the glires subtree
(Supplemental Fig. S8). A possible explanation for this shift would
be increased strength of selection owing to larger effective pop-
ulationsizesin theglires(Keightley et al.2005;see also Kosiolet al.
Finally, we used clade-specific phyloP scores to test for acceler-
ated evolution in conserved (and hence likely functional) elements
within the ENCODE regions, again focusing on the primate and
glires clades. We used phastCons and strict alignment-quality filters
19,498 for glires analysis (see Methods). These elements were scored
to the rest of the tree using the subtree LRT. At FDR » 5%, we iden-
tified 216 primate accelerated regions (PARs) and 3529 glires accel-
erated regions (GARs). The two lists of accelerated regions are gen-
erally similar in terms of genomic locations, but a slightly larger
proportion of PARs fall in coding sequences of GENCODE genes
(7.4% vs. 4.5% of GARs). Known and predicted RNA genes overlap
GARs are described in Supplemental Tables S15–S16 and Supple-
mental Figure S9. Interestingly, the glires clade shows a pronounced
excess of accelerated regions over a large range of nominal P-value
thresholds, again suggesting the possibility of increased selection
in this clade. However, differences in the starting set of elements,
in the power of the subtree tests, and asymmetries in the human-
referenced alignments may also contribute to this observation.
Conservation tracks in UCSC Genome Browser
PhyloP scores for genome-wide multiple alignments of 44 vertebrate
species (including 32 mammals) have been incorporated into a new
ucsc.edu, hg18 assembly). This track shows phyloP scores for in-
dividual sites alongside conservation scores and conserved elements
and the primates only (Fig. 6). The phyloP and phastCons scores
provide complementary measures of nonneutral substitution rates,
with phyloP capturing both conservation and acceleration and op-
erating independently at each site, and phastCons measuring con-
servation only in a way that considers ‘‘runs’’ of conserved sites
(throughthe useofan HMM).Aseparate trackshowsphyloPsubtree
scores for the primate and glires clades (data not shown).
functions (CDFs) for phyloP scores in sites of different annotation classes,
based on the LRT method and 36-species multiple alignments for the
ENCODE regions. Positive scores indicate conservation, and negative
scores indicate acceleration (CONACC mode) (see Methods). Curves are
and 39 UTRs, noncoding RNAs (ncRNAs), predicted transcription factor
binding sites (TFBS), conserved elements identified by phastCons, inter-
genic sites, and ancestral repeats (AR). (See Supplemental Fig. S6 for ad-
ditional annotation classes.) (B) Average conservation scores as a function
of genomic position within 52 predicted NRSF binding sites in the EN-
CODE regions. Binding sites were predicted at ChIP/chip peaks using the
motif from TRANSFAC (FDR = 20%) (Supplemental section S2.9). A se-
quence logo representation of the motif is shown for comparison. Notice
the general correlation between information content and cross-species
conservation across the positions of the motif (see Moses et al. 2003). (C)
Estimated fractions of sites under selection for each annotation class.
Classes include those from A, plus 59 and 39 flanking regions of genes,
sequence-specific regulatory binding regions (RFBR-Seqsp), putative
transcriptional fragments of unknown function (Un.TxFrags), intronic
sites, and nonconserved nongenic (NCNG) sites. These are estimates of
lower bounds computed bya simple mixture-decomposition method (see
Methods) and should be considered approximate. All classes show
a highly significant enrichment for conserved sites relative to the AR dis-
tribution by a one-sided Mann-Whitney U test (P»0) except the 39 flank,
intronic, Un.TxFrags, and NCNG categories (all P»1).
Distributions of all-branch scores. (A) Cumulative distribution
Detection of nonneutral substitution rates
Methods for detecting signatures of selection from rates and pat-
terns of substitution have a long history in the field of molecular
evolution (e.g., Kimura 1977; Miyata et al. 1980). In recent years,
methods of this kind have also become important tools in applied
genomics because of their usefulness in detecting and character-
izing functional elements. As more and more genomic sequence
data become available, it should be possible to take this line of
research to its logical conclusion and characterize selection pres-
sures at very high resolution—perhaps even at the level of in-
dividual nucleotides. In this study, we examine the problem of
detecting nonneutral substitution rates from aligned genomic se-
quences, focusing on what is possible with currently available data
for the eutherian mammals. Our contributions consist of four
main components: (1) a detailed comparison of four alternative
approaches to this problem; (2) estimates of the absolute power of
these methods; (3) an analysis of patterns of conservation/accel-
eration in the ENCODE regions; and (4) the release of a software
tool, called phyloP, and an associated track in the UCSC Genome
Browser, which we expect to be useful resources for the compara-
tive genomics community.
The four methods considered here show surprisingly little
difference in statistical power, given their quite different theoret-
ical foundations. The LRT and SCORE methods use test statistics
based on the full likelihood function and might be expected to
transitions occur at much higher rates than transversions—than
the number of substitutions. In addition, the GERP method con-
siders only a point estimate of the number of substitutions, ig-
noring its variance, and the SPH method makes use of a highly
discrete test statistic (an integral number of substitutions), which
should (and does to an extent) limit its power, especially for short
elements. However, in practice, these methodological differences
seem to be relatively unimportant in distinguishing between
neutral and selected sites. Instead, it seems that information about
substitution rates can be accessed by a variety of means, provided
good use is made of the phylogeny and substitution model. This
argument may extend to methods that make only partial use
of a phylogeny and/or a continuous-
time Markov substitution model, such as
binCons, the parsimony-based P-value
method (Margulies et al. 2003), and
SCONE (Asthana et al. 2007). Indeed,
SCONE and GERP have been found to
have similar performance in experiments
similar to the ones performed here
(Asthana et al. 2007).
Regardless of which method is used,
the ability to detect constraint or accel-
eration depends in a predictable way on
the amount of ‘‘signal’’ in the data. Power
increases with the magnitude of the de-
parture from the neutral model (as mea-
sured by r or l), the length of the ele-
These results are qualitatively consistent
with the predictions of theoretical mod-
els (Eddy 2005; McAuliffe et al. 2005;
Stone et al. 2005) and with previous
empirical studies (Cooper et al. 2003;
Margulies et al. 2003). However, they are supported in this case by
a somewhat more extensive set of experiments, considering both
parametric and nonparametric methods, conservation and ac-
celeration, all-branch and clade-specific selection, and richer
phylogenetic models. Our results suggest that, while it is pre-
mature to claim single-nucleotide resolution in the detection of
nonneutral substitution rates, elements 1–3 bp in length can be
detected with reasonable power—for example, 30%–75% TPRs at
5% FDRs. Similarly, moderately strong clade-specific selection can
be detected at the level of 10-bp elements. Even in scenarios in
which power is weak, useful information can be obtained by
pooling together similar sites from across the genome (as in Fig. 4).
Of course, power will steadily improve as additional genomes are
The similarity in power of the methods considered here could
be taken to suggest that little is to be gained by further methodo-
logical work on the problem of detecting selection from aligned
sequences. However, these methods are all based completely on
substitution rates and ignore other sources of information about
natural selection, such as patterns of substitution (Moses et al.
2004; Pedersen et al. 2006) or rates and patterns of insertion and
deletion (Kellis et al. 2003; Siepel and Haussler 2004a; Lunter et al.
2006). A recently introduced method, called SiPhy, attempts to
exploit the pattern of substitution by using an LRT similar to
phyloP’s all-branch test, except that it treats the equilibrium nu-
cleotide frequencies as free parameters to be estimated at each site
for the alternative model (together with a branch-length scaling
parameter) (Garber et al. 2009). In principle, this approach should
increase power for subtle selective pressures that influence base
preferences but have only a mild effect on the overall substitution
rate. However, there are risks associated with the use of a richer
alternative model. Because SiPhy assumes constant equilibrium
frequencies for its null model, it essentially performs a compound
elements (and have increased TPRs and FPRs) in regions of the
genome with unusual base composition. Compared with rate-
more directly associated with mutation and repair than with nat-
ural selection, such as transcription-coupled repair (Green et al.
2003), biased gene conversion (Marais 2003; Dreszer et al. 2007),
functions (CDFs) of scores for selected annotation classes as computed by the subtree test for the
primate (A) and glires (B) clades. As in previous figures, CONACC scores computed by the LRT method
are shown, but in this case, scores are computed in a 10-bp sliding window. In both figures most
distributions are significantly different from the AR distribution by a two-sided Mann-Whitney U test
evenwhenthecurves appear verysimilar,becausethe data setsaregenerallyquite large(exceptionsare
phastCons and TFBS in A and 59 flank and TFBS in B).
Distributions of subtree scores for the primate and glires clades. Cumulative distribution
Pollard et al.
116 Genome Research
Conservation track, including a 44-way vertebrate alignment and nine conservation subtracks. The subtracks display phyloP scores (in blue and red),
phastCons scores (green), and phastCons-predicted conserved elements (pink, purple, and mustard) for all species, the 32 placental mammals, and the
zero for most noncoding regions but elevated in exons (thick blue bars at top) as well as in conserved noncoding elements (orange arrow). (B) At finer
resolution, however, phyloP reveals significantly more variation from base to base than does the hidden Markov model–based phastCons. In this coding
exon, codon position effects are clearly evident from phyloP but not from phastCons. (C,D) The phyloP tracks also indicate accelerated evolution (with
negative scores, shown in red), while phastCons measures conservation only. Here an exon with a striking fast-evolving segment is shown. Interestingly,
cDNA data from other mammals suggest that this exon derives from a fusion of two ancestral exons, with the fast-evolving segment corresponding to the
Conservation track in UCSC Genome Browser. A portion of the desmoglein 1 (DSG1) gene on human chromosome 18 shown with the new
Detection of nonneutral substitution rates
and methylation of cytosines (Ehrlich and Wang 1981). It is likely
that factors such as these are at least partly responsible for the
nearly twofold increase in the number of evolutionarily ‘‘con-
strained’’ sites detected by this method in the ENCODE regions
(Garber et al. 2009). Nevertheless, pattern-based methods for
worthy of further investigation.
The rate-based methods considered here also have several
limitations worth noting. First, the phylogenetic models on which
they are based,whilerich in some respects,are highlysimplifiedin
others. For example, these models ignore regional variation (Wolfe
et al. 1989; Mouse Genome Sequencing Consortium 2002) and
context dependencies (Hwang and Green 2004; Siepel and
Haussler 2004b) in neutral substitution rates, variation in G+C
content (Hardison et al. 2003), transcription-associated muta-
tional asymmetry (Green et al. 2003), and differences between
clades in selection on 4D sites (Eo ¨ry et al. 2009). Second, the tests
(and our parametric experiments) assume constant levels of di-
rectional selection, producing sustained increases or decreases in
evolutionary rate over long periods of evolutionary time. While
these assumptions appear to be reasonable for some types of func-
tional elements (such as conserved protein-coding genes), they
undoubtedly do not hold in many cases. Finally, these methods all
depend on accurate alignments of mammalian genomes. Genome-
wide multiplealignment remains a challenging, unsolvedproblem,
and alignment error can have a substantial influence on predic-
tions of constrainedelements(Margulies etal. 2007). New methods
offer some hope that it may be possible to address problems of
functional element identification while integrating or sampling
over alignments, thereby mitigating the effects of alignment error
from a single fixed alignment (Satija et al. 2009). However, at
present, these methods require orders of magnitude more compu-
tational time than methods that assumed fixed alignments and are
not feasible for use on a genome-wide scale. Still, it may be possible
to use heuristic methods to substantially improve the speed of such
methods (Bradley et al. 2009; Paten et al. 2009), or to quantify
alignment uncertainty and then use this information in down-
stream functional element identification (Lunter et al. 2008). In
short, many opportunities remain for improving the biological re-
alism, statistical power, and robustness of methods for identifying
functional elements from comparative sequence data.
The statistical tests considered in this study can all be placed in the
following general framework. Let cNbe a neutral phylogenetic
model, consisting of a tree topology, a vector of branch lengths bN,
a set of equilibrium nucleotide frequencies, and a substitution rate
matrix. cNcan be estimated from large quantities of genomic data
and is assumed to be known. Let cðuÞ, for a vector of non-negative
branch-length scaling parameters u (of the same dimension as bN),
be a scaled phylogenetic model identical to cNexcept that it has
branch lengths bu=u ? bN, a Hadamard (pointwise) product of u and
bN. We considertwo parameterizations of u : (1)the uniformscaling
scales all branches by r and additionally scales all branches in the
subtree beneath a specified node u by a second non-negative scalar
parameter l, that is, uðr;l;uÞ=ðuðiÞ: i=1;...;jbNjÞ such that uðiÞ=rl
these parameterizations are nested, with uðr;l=1;uÞ=uðrÞ for all u.
For a given alignment X of length L, assumed to have independent
columns all distributed according to some cðuÞ, the two-sided all-
branch test comparesa null hypothesis H0: u=1 with an alternative
hypothesis H1: u=uðrÞ; r$0; r 6¼ 1. The two-sided subtree test, for
a given node u (and associated subtree), compares a null hypothesis
H0: u=uðrÞ; r$0 with an alternative hypothesis H1: u=uðr;l;uÞ;
r$0; l$0; l6¼1.Thus,theall-branchtestcanbethoughtofasatest
of r=1 and the subtree test as a test of l=1 (with r$0 as a free pa-
rameter). One-sided tests can similarly be defined for conservation
(full tree: r<1, subtree: l<1) or acceleration (full tree: r>1, subtree:
Likelihood ratio tests
The LRTs are based on the test statistic T =2 L^u1
^u0and^u1are the maximum likelihood estimates of the parameter
vectors associated with the null and alternative hypotheses, re-
spectively. These estimates are obtained numerically, as described
below. For our two-sided tests, the regularity conditions required
for T to have an asymptotic x2
1null distribution hold. For our one-
sided tests, however, the null hypothesis is at the boundary of the
parameter space under the alternative hypothesis, which causes
the asymptotic distribution to become a 50:50 mixture of a x2
distribution and a point mass at zero (Self and Liang 1987). With
both one-sided and two-sided tests, the asymptotic distributions
are used to compute approximate P-values. Additional details are
in Supplemental section S2.2.
The LRT-based scores used in the analysis of the ENCODE
regions and in the conservation tracks are computed as ?log10P,
wherePisa two-sidedP-value.Inordertodistinguish conservation
and acceleration scores, the scores are negated if the estimated
r (or l) suggest faster-than-neutral evolution (Supplemental sec-
the option --mode CONACC.
The score tests are based on test statistics of the form S=UT ^u0
, where U is the score function (the vector of partial
derivatives of the log likelihood) and I is the Fisher information
matrix.Both U and I are definedwith respectto u1but evaluated at
the maximum-likelihood estimate under the null hypothesis,
u1=^u0. S is known, like T above, to have an asymptotic x2
distribution, where j is the difference in the number of free pa-
rameters of u0and u1(j=1 here), and this asymptotic distribution
is used by phyloP in computing P-values. Notably, the score test
has the same local power as the LRT, being a most powerful test for
small deviations from the null hypothesis (in this case, weak
conservation or acceleration). However, the score test requires fit-
ting only the null model to the data, instead of both the null and
alternative models, which results in a substantial savings in com-
putation. Indeed, with our all-branch test, no estimation is re-
quired because the null model has no free parameters. The subtree
case requires estimation of a single scale factor r to fit the null
model to the data. For both types of tests, computation of the
Fisher information matrix appears to be intractable, so we ap-
proximate it by Monte Carlo sampling. Additional details are in
Supplemental section S2.3.
The all-branch SPHtest is based on a test statistic (denoted n) equal
to the number of substitutions that have occurred along the
branches of a phylogeny in an alignment X, assuming a given
neutral model c (Siepel et al. 2006). The exact null distribution of
this statistic, P n c j
ðÞ, can be approximated arbitrarily closely by
Pollard et al.
118 Genome Research
algorithm that resembles Felsenstein’s ‘‘pruning’’ algorithm. A
closely related algorithm can be used to compute an estimate of n
foran observed alignment X, andthisestimate can be comparedto
the null distribution to compute a P-value. In the case of subtree
tests, a similar procedure is used, but the joint distribution for the
number of substitutions in a subtree of interest and the rest of the
tree is considered (Siepel et al. 2006; Supplemental section S2.4).
For various reasons, the P-values computed by this procedure tend
to be somewhat conservative (Supplemental Fig. S4).
GERP makes use of a statistic called ‘‘rejected substitutions’’ (RS),
which is defined as the number of substitutions expected under
a neutral model minus the number ‘‘observed’’ (estimated) for
a particularalignmentX (Cooperetal.2005)—thatis,theexpected
number of mutations that would have been fixed under neutrality
but instead were ‘‘rejected’’ by purifying selection. For a given
neutral model cNand alignment X, GERP estimates a scaling pa-
rameter r for cNby maximum likelihood (as in the LRTandSCORE
tests), and estimates RS as RS=T ? ^ rT =Tð1 ? ^ rÞ, where T is the
factor. While the other test statistics considered are conservative in
thepresenceof missingdata, RScanbe quitesensitivetoit,because
a branch of length t for which no data are available will still con-
tribute ^ rt to the overall value of the test statistic. Therefore, a sep-
arate value of T is computed for each alignment X, by considering
just the branches of the phylogeny for which aligned nucleotides
are available. In addition, RS is set to zero if bases are available for
edu/sidowlab/downloads/gerp) assumes the use of the HKY85
substitution model and combines the steps of estimating the
neutral substitution model and computing the desired RS values.
To facilitate comparisons with the other tests, we reimplemented
the core functionality of GERP within phyloP, treating it as anal-
ogous to the other all-branch tests in all respects. (It is not sup-
ported for subtree tests.) Like the GERP program, phyloP simply
outputs the raw RS values and allows P-values to be computed
separately in post-processing, if desired. To generate the ROC
curves, we applied varying thresholds to RS for one-sided tests of
conservation, to ?RS for one-sided tests of acceleration, and to jRSj
for two-sided tests. A comparison of phyloP’s GERP mode with the
were very similar in performance (Supplemental section S3.1).
Three of the four tests above depend on numerical estimation of
the scale factors r and/or l by maximum likelihood for each
alignment segment X. This is accomplished using the Newton-
Raphson method for one-dimensional optimization of r and the
BFGS method for two-dimensional optimization of ðr;lÞ. In
practice, this optimization is the rate-limiting step in most analy-
ses, so various techniques were used to improve its efficiency
(Supplemental section S2.1). It is worth noting that these esti-
mated scale factors may be of interest for other reasons. For ex-
ample, under a model in which all mutationsare either deleterious
or neutral and deleterious mutations are rapidly eliminated by
naturalselection,^ r isanestimatorofthefractionofmutationsthat
areneutral(inagivenelementX),andð1 ? ^ rÞisanestimatorofthe
fraction that are deleterious (e.g., Kondrashov and Crow 1993).
It should be emphasized that phyloP computes all P-values in-
dependently, disregardingcorrelations betweentests. Adjustments
for multiple hypothesis tests are needed when jointly interpreting
the reported P-values for a collection of sites or elements.
Alignments and neutral model
Alignments of the 44 ENCODE regions were produced with the
TBA program (Blanchette et al. 2004), as described by Margulies
et al. (2007), but using an expanded set of sequences (June 2008
freeze; 33 eutherian mammals vs. 21 previously analyzed). The
neutral model was estimated from 4D sites in these alignments,
using the phyloFit program in PHAST. After estimation, the model
was adjusted to maintain the estimated nucleotide exchangeabil-
ities but ensure a stationary distribution equal to the genome-wide
average (Supplemental section S2.6). The same neutral model was
used for the simulation experiments and the analysis of real data.
Parametric simulations were based on alignment columns gener-
ated by forward sampling from phylogenetic models, using the
program phyloBoot in PHAST. Neutral alignment columns (for
evaluating false-positive rates) were generated from the estimated
neutral model, and selected alignment columns (for evaluating
true-positive rates) were generated from versions of this model in
whichallbrancheswerescaledbya parameter r, or thebranchesin
a subtree of interest were scaled by a parameter l. For both r and l,
the following set of scale factors was considered: q=10;10=q:q 2
The simulated data did not contain alignment gaps or missing
data. In some cases, data were generated with rate variation across
sites (Supplemental section S2.7.1) or by adding ‘‘noise’’ to the
scale factors, so they did not exactly match the assumptions of the
subtree test (Supplemental section S2.7.2). Nonparametric exper-
(also using phyloBoot) from sets of columns from ancestral repeats
(neutral) and second codon positions (selected).
g g= 0:1;0:3;0:5; 0:7; 0:9; 1:11; 1:43; 2:00; 3:33; 10:00fg.
Annotations for ENCODE regions
Protein-coding gene annotations were based on 408 non-
overlapping genes from the GENCODE set (Harrow et al. 2006).
ncRNAs consisted of eight well-characterized structural and regu-
latory RNAs from the ENCODE regions (SNORA70 [also known as
U70], SNORA36A [alsoknown as ACA36], SNORA56[also known as
ACA56], MIR192, MIR194-2, MIR196B, MIR483, and H19). RFBR-
Seqsp and Un.TxFrag annotations were obtained from The EN-
CODE Project Consortium (2007). Specific TFBS sites were pre-
dicted by our own methods, using three transcription factors for
which ChIP-chip data and binding motifs were available (22 MYC
[also known as c-Myc], 52 REST [also known as NRSF], and 21
STAT1 sites) (Supplemental section S2.9). For ARs, we extracted
RepeatMasker (http://www.repeatmasker.org) annotations corre-
sponding to repeat families and classes previously identified as
ancestral to the eutherian mammals (Mouse Genome Sequencing
Consortium 2002). In all cases, annotations were defined in hu-
Estimation of fractions of sites under selection
Fractions of sites under selection were estimated by a method
similar to that used by Chiaromonte et al. (2003), but based on
empirical cumulative distribution functions (CDFs) instead of es-
timated density functions. Specifically, the distribution of phyloP
scores for each annotation class a was assumed to be a mixture of
Detection of nonneutral substitution rates
neutral and selected components, FaðsÞ=ð1 ? paÞGðsÞ+paHaðsÞ,
the CDF for the sites that are neutrally evolving, Hais the CDF for
the sites under selection, and pa(0#pa#1Þ is the fraction of sites
in class a that are under selection. Owing to nonnegativity of Ha,
FaðsÞ$ð1 ? paÞGðsÞ for all s, so a lower bound for pais given by
^ pa= 1 ? mins
This lower bound was estimated by substituting the empirical
CDFs for ARs and for all sites of each annotation class a for G and
Ha, respectively, excluding the smallest scores (<?1:5) so that the
estimated bounds were not determined by the extreme left tails of
the empiricalCDFs (which reflect sparse data). For various reasons,
these estimates should be considered crude—for example, they
may be influenced by differences between the ARs and the various
annotation classes in base composition, substitution patterns, or
amounts of missing data. Nevertheless, they agree well with esti-
mates obtained by quite different methods (Eo ¨ry et al. 2009).
A set of conserved elements was identified for each of the clade-
specific acceleration tests (primates and glires), by running phast-
Cons on alignments in which the species in the clade of interest
had been removed, then applying several filters to eliminate po-
tential alignment and assembly errors (Supplemental section
S2.11). Each set of filtered elements was scored with the one-sided
acceleration LRT for the appropriate subtree. Nominal P-values
were adjusted for multiple comparisons using the FDR-controlling
method of Benjamini and Hochberg (1995).
We thank Elliott Margulies for providing the multiple sequence
alignments and estimated neutral model for the ENCODE regions;
Jim Booth for suggesting the score test as an alternative to the
likelihood ratio test; Hiram Clawson for setting up the new Con-
Jim Kent for feedback and support in track development; and
Andre Martins for helping with the analysis of transcription factor
binding sites. This work was supported by the National Institute of
General Medical Sciences (grant GM82901) and by early career
awards from the Alfred P. Sloan Foundation, the David and Lucile
Packard Foundation, and the National Science Foundation (grant
Asthana S, Roytberg M, Stamatoyannopoulos JA, Sunyaev S. 2007. Analysis
of sequence conservation at nucleotide resolution. PLoS Comput Biol
3: e254. doi: 10.1371/journal.pcbi.0030254.
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: A
practical and powerful approach to multiple testing. J R Stat Soc Ser B
Methodol 57: 289–300.
ME, Dermitzakis ET. 2007. Fast-evolving noncoding sequences in the
human genome. Genome Biol 8: R118. doi: 10.1186/gb-2007-8-6-r118.
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch
R, Rosenbloom K, Clawson H, Green ED, et al. 2004. Aligning multiple
genomic sequences with the threaded blockset aligner. Genome Res
Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L,
Rubin EM. 2003. Phylogenetic shadowing of primate sequences to find
functional regions of the human genome. Science 299: 1391–1394.
BradleyRK, Roberts A,Smoot M,JuvekarS, DoJ,DeweyC,HolmesI,Pachter
L. 2009. Fast statistical alignment. PLoS Comput Biol 5: e1000392. doi:
Casella G, Berger RL. 2002. Statistical inference. Duxbury, Pacific Grove, CA.
Chiaromonte F, Weber RJ, Roskin KM, Diekhans M, Kent WJ, Haussler D.
2003. The share of human genomic DNA under selection estimated
from human–mouse genomic alignments. Cold Spring Harb Symp Quant
Biol 68: 245–254.
Cooper GM, Brudno M, NISC Comparative Sequencing Program, Green ED,
Batzoglou S, Sidow A. 2003. Quantitative estimates of sequence
divergence for comparative analyses of mammalian genomes. Genome
Res 13: 813–820.
Cooper GM, Brudno M, Stone EA, Dubchak I, Batzoglou S, Sidow A. 2004.
Characterization of evolutionary rates and constraints in three
mammalian genomes. Genome Res 14: 539–548.
Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. 2005.
Distribution and intensity of constraint in mammalian genomic
sequence. Genome Res 15: 901–913.
Dreszer TR, Wall GD, Haussler D, Pollard KS. 2007. Biased clustered
substitutions in the human genome: The footprints of male-driven
biased gene conversion. Genome Res 17: 1420–1430.
Eddy SR. 2005. A model of the statistical power of comparative genome
sequence analysis. PLoS Biol 3: e10. doi: 10.1371/journal.pbio.0030010.
Ehrlich M, Wang RY. 1981. 5-Methylcytosine in eukaryotic DNA. Science
The ENCODE Project Consortium. 2007. Identification and analysis of
functional elements in 1% of the human genome by the ENCODE pilot
project. Nature 447: 799–816.
Eo ¨ry L, Halligan DL, Keightley PD. 2009. Distributions of selectively
constrained sites and deleterious mutation rates in the hominid and
murid genomes. Mol Biol Evol (in press). doi: 10.1093/molbev/msp219.
Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X. 2009.
Identifying novel constrained elements by exploiting biased
substitution patterns. Bioinformatics 25: 54–62.
Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing
Program, Green ED. 2003. Transcription-associated mutational
asymmetry in mammalian evolution. Nat Genet 33: 514–517.
Gross SS, Do CB, Sirota M, Batzoglou S. 2007. CONTRAST: A discriminative,
phylogeny-free approach to multiple informant de novo gene
prediction. Genome Biol 8: R269. doi: 10.1186/gb-2007-8-12-r269.
Guigo ´ R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril
JF, Keibler E, Lyle R, Ucla C, et al. 2003. Comparison of mouse and
human genomes followed by experimental verification yields an
estimated 1,019 additional genes. Proc Natl Acad Sci 100: 1140–1145.
Hardison RC, Roskin KM, Yang S, Diekhans M, Kent WJ, Weber R, Elnitski L,
Li J, O’Connor M, Kolbe D, et al. 2003. Covariation in frequencies of
substitution, deletion, transposition, and recombination during
eutherian evolution. Genome Res 13: 13–26.
Harrow J,DenoeudF,FrankishA, ReymondA, Chen C-K, Chrast J,Lagarde J,
Gilbert JGR, Storey R, Swarbreck D, et al. 2006. GENCODE: Producing
a reference annotation for ENCODE. Genome Biol 7: S4. doi: 10.1186/
Haygood R, Fedrigo O, Hanson B, Yokoyama K-D, Wray GA. 2007. Promoter
regions of many neural- and nutrition-related genes have experienced
positive selection during human evolution. Nat Genet 39: 1140–1144.
Huelsenbeck J, Rannala B. 1997. Phylogenetic methods come of age: Testing
hypotheses in an evolutionary context. Science 276: 227–232.
Hwang D, Green P. 2004. Bayesian Markov chain Monte Carlo sequence
analysis reveals varying neutral substitution patterns in mammalian
evolution. Proc Natl Acad Sci 101: 13994–14001.
Keightley PD, Lercher MJ, Eyre-Walker A. 2005. Evidence for widespread
degradation of gene control regions in hominid genomes. PLoS Biol 3:
e42. doi: 10.1371/journal.pbio.0030042.
comparison of yeast species to identify genes and regulatory elements.
Nature 423: 241–254.
Kim SY, Pritchard JK. 2007. Adaptive evolution of conserved noncoding
elements in mammals. PLoS Genet 3: 1572–1586.
neutral theory of molecular evolution. Nature 267: 275–276.
Kondrashov AS, Crow JF. 1993. A molecular approach to estimating the
human deleterious mutation rate. Hum Mutat 2: 229–234.
2008. Patterns of positive selection in six mammalian genomes. PLoS
Genet 4: e1000144. doi: 10.1371/journal.pgen.1000144.
B,Raney BJ,PohlA,PheasantM,etal.2009. TheUCSC Genome Browser
Database: Update 2009. Nucleic Acids Res 37: D755–D761.
Lunter G, Ponting CP, Hein J. 2006. Genome-wide identification of human
functionalDNA using aneutral indel model. PLoS ComputBiol 2: e5. doi:
Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J. 2008.
Uncertainty in homology inferences: Assessing and improving genomic
sequence alignment. Genome Res 18: 298–309.
Pollard et al.
Marais G. 2003. Biased gene conversion: Implications for genome and sex
evolution. Trends Genet 19: 330–338.
Margulies EH, Blanchette M. NISC Comparative Sequencing Program,
Haussler D, Green ED. 2003. Identification and characterization of
multi-species conserved sequences. Genome Res 13: 2507–2518.
Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A,
Birney E, Keefe D, Schwartz AS, Hou M, et al. 2007. Analyses of deep
mammalian sequence alignments and constraint predictions for 1% of
the human genome. Genome Res 17: 760–774.
McAuliffe JD, Jordan MI, Pachter L. 2005. Subtree power analysis and
species selection for comparative genomics. Proc Natl Acad Sci 102:
Miller W, Rosenbloom K, Hardison R, Hou M, Taylor J, Raney B, Burhans R,
King D, Baertsch R, Blankenberg D, et al. 2007. 28-Way vertebrate
alignment and conservation track in the UCSC Genome Browser.
Genome Res 17: 1797–1808.
Miyata T, Yasunaga T, Nishida T. 1980. Nucleotide sequence divergence and
functional constraint in mRNA evolution. Proc Natl Acad Sci 77: 7328–
variation in the rate of evolution in transcription factor binding sites.
BMC Evol Biol 3: 19. doi: 10.1186/1471-2148-3-19.
Moses AM, Chiang DY, Pollard DA, Iyer VN, Eisen MB. 2004. MONKEY:
Identifying conserved transcription-factor binding sites in multiple
alignments using a binding site-specific evolutionary model. Genome
Biol 5: R98. doi: 10.1186/gb-2004-5-12-r98.
Mouse Genome Sequencing Consortium. 2002. Initial sequencing and
comparative analysis of the mouse genome. Nature 420: 520–
deserts for long-range enhancers. Science 302: 413. doi: 10.1126/
Paten B, Herrero J, Beal K, Birney E. 2009. Sequence progressive alignment,
a framework for practical large-scale probabilistic consistency
alignment. Bioinformatics 25: 295–301.
Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander
conserved RNA secondary structures in the human genome. PLoS
Comput Biol 2: e33. doi: 10.1371/journal.pcbi.0020033.
Pollard K, Salama S, King B, Kern A, Dreszer T, Katzman S, Siepel A, Pedersen
regions in the human genome. PLoS Genet 2: e168. doi: 10.1371/
Pollard KS, Salama SR, Lambert N, Lambot M-A, Coppens S, Pedersen JS,
Katzman S, King B, Onodera C, Siepel A, et al. 2006b. An RNA gene
expressed during cortical development evolved rapidly in humans.
Nature 443: 167–172.
Prabhakar S, Noonan JP, Paabo S, Rubin EM. 2006. Accelerated evolution of
conserved noncoding sequences in humans. Science 314: 786. doi:
parameters with applications to problems of estimation. Proc Camb
Philol Soc 44: 50–57.
Rao CR. 2005. Score test: Historical review and recent developments. In
Advances in ranking and selection, multiple comparisons, and reliability (eds.
N Balakrishnan et al.), pp. 3–20. Birkha ¨user, Boston, MA.
Satija R, Novak A, Miklos I, Lyngso R, Hein J. 2009. BigFoot: Bayesian
alignment and phylogenetic footprinting with MCMC. BMC Evol Biol 9:
217. doi: 10.1186/1471-2148-9-217.
Self S, Liang K. 1987. Asymptotic properties of maximum likelihood
estimators and likelihood ratio tests under nonstandard conditions.
J Am Stat Assoc 82: 605–610.
Siepel A, Haussler D. 2004a. Computational identification of evolutionarily
conserved exons. In Proc. 8th Int’l Conf. on Research in Computational
Molecular Biology, pp. 177–186. ACM Press, New York.
Siepel A, Haussler D. 2004b. Phylogenetic estimation of context-dependent
substitution rates by maximum likelihood. Mol Biol Evol 21: 468–488.
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,
Clawson H, Spieth J, Hillier LW, Richards S, et al. 2005. Evolutionarily
conserved elements in vertebrate, insect, worm, and yeast genomes.
Genome Res 15: 1034–1050.
Siepel A, Pollard K, Haussler D. 2006. New methods for detecting lineage-
specific selection. In Proc. 10th Int’l Conf. on Research in Computational
Molecular Biology, pp. 190–205. Springer-Verlag, Berlin, Germany.
Siepel A, Diekhans M, Brejova B, Langton L, Stevens M, Comstock C, Davis
C, Ewing B, Oommen S, Lau C, et al. 2007. Targeted discovery of novel
human exons by comparative genomics. Genome Res 17: 1763–1773.
Stone EA, Cooper GM, Sidow A. 2005. Trade-offs in detecting evolutionarily
constrained sequence by comparative genomics. Annu Rev Genomics
Hum Genet 6: 143–164.
Wolfe KH, Sharp PM, Li W-H. 1989. Mutation rates differ among regions of
the mammalian genome. Nature 337: 283–285.
Wong WSW, Nielsen R. 2004. Detecting selection in noncoding regions of
nucleotide sequences. Genetics 167: 949–958.
Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES,
Kellis M. 2005. Systematic discovery of regulatory motifs in human
promoters and 39 UTRs by comparison of several mammals. Nature 434:
Yang Z. 1994. Maximum likelihood phylogenetic estimation from DNA
Received June 26, 2009; accepted in revised form October 5, 2009.
Detection of nonneutral substitution rates