A ‘‘Forward Genomics’’ Approach Links Genotype
to Phenotype using Independent Phenotypic
Losses among Related Species
Michael Hiller,1,5,* Bruce T. Schaar,1Vahan B. Indjeian,1,3David M. Kingsley,1,3Lee R. Hagey,4and Gill Bejerano1,2,*
1Department of Developmental Biology
2Department of Computer Science
Stanford University, Stanford, CA 94305, USA
3Howard Hughes Medical Institute
4Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
5Presentaddress: Max Planck Institute of Molecular Cell Biology and Genetics andMax Planck Institute for the Physics of Complex Systems,
01307 Dresden, Germany
*Correspondence: email@example.com (M.H.), firstname.lastname@example.org (G.B.)
Genotype-phenotype mapping is hampered by
countless genomic changes between species. We
introduce a computational ‘‘forward genomics’’
strategy that—given only an independently lost
phenotype and whole genomes—matches genomic
and phenotypic loss patterns to associate specific
genomic regions with this phenotype. We conducted
genome-wide screens for two metabolic pheno-
types. First, our approach correctly matches the in-
activated Gulo gene exactly with the species that
lost the ability to synthesize vitamin C. Second, we
attribute naturally low biliary phospholipid levels in
guinea pigs and horses to the inactivated phospho-
lipid transporter Abcb4. Human ABCB4 mutations
also result in low phospholipid levels but lead
to severe liver disease, suggesting compensatory
mechanisms in guinea pig and horse. Our simulation
studies, counts of independent changes in existing
phenotype surveys, and the forthcoming availability
ofmany new genomes allsuggest that forwardgeno-
mics can be applied to many phenotypes, including
those relevant for human evolution and disease.
Despite a wealth of information obtained by comparative geno-
mics (Green et al., 2010; McLean et al., 2011; Pollard et al.,
2006; Prabhakar et al., 2006; Zhu et al., 2007), it remains
extremely difficult to link any of the millions of genomic changes
with the many interesting and important phenotypic differ-
ences found between even closely related species (Cheng
et al., 2005; Varki and Altheide, 2005). To overcome this diffi-
culty, we devise a method that takes advantage of parallel
evolution of phenotypic traits across a larger phylogeny. Due
to purifying selection, extant species that preserve an ancestral
trait should also conserve the ancestral genes—or, more
generally, genomic regions—underlying this trait (Figure 1A).
In contrast, in species that lose this trait, nonpleiotropic
genomic regions necessary only for the lost trait should switch
to evolve neutrally following the phenotypic loss. If sufficient
evolutionary time has passed since trait loss, relaxed selection
will erode these DNA sequences in trait-loss lineages, resulting
in elevated divergence from the ancestral sequence or even
complete loss. If trait loss happens in independent lineages,
these regions should specifically diverge in exactly those line-
ages over time, regardless of whether the initial inactivating
mutations leading to trait loss are identical in the independent
lineages (Figure 1, and exemplified below). Thus, independent
trait loss should give a unique evolutionary sequence diver-
gence signature that violates the expected evolutionary pattern
Together with approaches such as genome-wide association
studies (GWAS), our strategy is conceptually similar to forward
genetics, where one starts with a given phenotype and seeks
the underlying mutation. We call this family of approaches
‘‘forward genomics.’’ With two examples and simulation studies,
we show that—given a specific, independently lost phenotype—
a cross-species genomic screen for genes and genomic regions
with this specific signature can highlight functional components
of the genetic information for this trait.
RESULTS AND DISCUSSION
Measuring Sequence Divergence
To measure sequence divergence, we first reconstruct the most
likely DNA sequence of the boreoeutherian ancestor (Figure 2C)
separately for each mammalian conserved (coding or noncod-
ing) region, with an estimated 98% accuracy (Blanchette et al.,
2004). Then, we compute the percent of identical bases pre-
served by each extant species, while taking great care to distin-
guish assembly gaps (missing data) and low-quality sequences
from real mutations. For coding genes, we combine DNA se-
quence divergence of all coding exons. We then screen for
genes or individual regions where all trait-loss species have
Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors 817
diverged more from the ancestral sequence (lower percent of
identical bases) than all trait-preserving species.
Gulo Loss Perfectly Matches the Loss of Vitamin C
The ability to synthesize the essential nutrient L-ascorbic acid
(vitamin C) is an ancestral trait that arose in vertebrates. Using
a biochemical assay, it has been found that—unlike most
mammals—guinea pigs, certain primates including humans,
and multiple bat species have lost the ability to synthesize
sizing species require a dietary source of vitamin C to prevent
scurvy, a disease in which defective collagen prevents the for-
mation of strong connective tissue. Mapping the presence/
mammals (termed ‘‘phenotree’’ hereafter) shows that this trait
was lost at least four times independently (Figure 2C).
We asked whether forward genomics can discover the
underlying cause for loss of vitamin C synthesis, given only
this phenotree and a whole genome alignment of species for
which vitamin C-synthesizing ability was biochemically mea-
sured (Extended Experimental Procedures available online).
This genomic screen finds gulonolactone (L-) oxidase (Gulo)
as the only gene that perfectly matches our phenotree with
substantially more divergence in all nonsynthesizers (Figure 2).
The detection of Gulo is extremely robust to details of our
forward genomics screen. Gulo is repeatedly singled out
when screening individual exons instead of genes, even when
doing a genome-wide screen of 544,549 individual conserved
coding and noncoding regions, when reconstructing to dif-
ferent ancestors, and also when measuring divergence after
correcting for the different evolutionary rates of different species
Gulo encodes a key enzyme responsible for vitamin C
synthesis. A targeted Gulo knockout in mouse abolishes the
ability to synthesize vitamin C (Maeda et al., 2000). Inactivating
mutations in the Gulo gene have been identified in several
nonsynthesizing species (Cui et al., 2011b; Nishikimi et al.,
1992, 1994; Ohta and Nishikimi, 1999). We find Gulo inactiva-
tion in additional species (Figure S2) and show that Gulo is
inactivated in all and only the sequenced nonsynthesizers.
Other sequenced mammals not measured for their ability to
synthesize vitamin C (kangaroo rat, pika, alpaca, dolphin, and
hedgehog) all lack Gulo-inactivating mutations, suggesting trait
retention. Although Gulo is a clear candidate gene for this
trait, forward genomics can detect it based on the vitamin C
phenotree and genome alignments alone. Gulo also exemplifies
that forward genomics does not require that the initial inactivat-
ing mutations leading to trait loss are identical in different line-
ages (Figure 1), as no single exon exhibits an inactivating
mutation in all four nonsynthesizing lineages (Figure S2A), sug-
gesting independent gene inactivation at different genomic
Abcb4 Loss Perfectly Matches Low Biliary Phospholipid
We next tested whether our forward genomics approach can
be applied to continuous traits that vary in magnitude between
species. The composition of bile has been studied in many
animals due to its relevance in digestion, cholesterol metabo-
lism, and gall stone formation. The levels of biliary phospho-
lipids, which protect against bile-acid-induced cell damage
(Puglielli et al., 1994), vary in mammals (Figure 3A). Whereas
most measured mammals have levels well above 1 mM, in-
cluding humans at 2.8 mM, biliary phospholipid levels are
particularly low in guinea pig (0.11 mM) and horse (0.38 mM)
(Coleman et al., 1979; Engelking et al., 1989). We converted
the continuous trait into a presence/absence trait by postu-
lating that a subset of species with the lowest phospholipid
levels may share a common associated genomic loss signa-
ture. Screening only the 11 genomes of species with measured
phospholipid levels (Extended Experimental Procedures), there
are 796 genes that are most diverged in guinea pig alone
(our lowest value): a list too long to analyze. However, when
grouping guinea pig and horse (next-lowest phospholipid
level), only eight genes are more diverged in these two indepen-
dent lineages (Figures 3A and 3B). Of these eight genes, only
Abcb4 has a bile-related function (Figure 3C). Abcb4 is also
the strongest candidate gene when screening at the exon level
A ancestral trait inheritance
C shortly after third trait loss
first trait loss
D some evolutionary time later
regions detectable by
other functional region
trait inactivating mutationtrait present
progression in time
Figure 1. Evolutionary Model and Assump-
(A) An ancestral trait is passed to descendant
species, along with the genomic regions re-
quired for this trait, which evolve under purifying
(B) One lineage loses the ancestral trait due to an
inactivating mutation in a trait-required region.
(C) Following trait loss, all trait-specific (non-
pleiotropic) regions switch to evolve neutrally and
begin to accumulate random mutations in the
first trait-loss lineage. Meanwhile, two additional
independent lineages lose this trait, due to inde-
pendent mutations occurring either in the same or
in other trait-required regions.
(D) All trait-specific regions continue to erode independently in the three different trait-loss lineages, whereas their counterparts in the trait-preserving species
are conserved due to purifying selection. This characteristic evolutionary signature can be detected with forward genomics, revealing functional components of
this (monogenic or polygenic) trait.
818 Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors
Abcb4 encodes the multidrug resistance 3 P-glycoprotein,
a transporter that is crucial for phospholipid secretion into
bile. Interestingly, ABCB4 mutations in humans lead to biliary
phospholipid levels similar to those in guinea pigs (0.05–
0.13 mM), resulting in gall stone formation and bile canaliculi
damage that can often only be treated by liver transplantation
(Davit-Spraul et al., 2010). Targeted Abcb4 knockout mice have
undetectable biliary phospholipid levels (Smit et al., 1993) and
are used as models for the human disease state. Our screen is
the first to discover numerous Abcb4 gene-inactivating muta-
tions, as well as relaxed selection on the defunct Abcb4 protein
sequence in naturally occurring species (Figure 3; Table S1).
The discovery that two well-studied mammals have naturally in-
genomic changes in guinea pigs and horses presumably com-
pensate for deleterious consequences of Abcb4 loss. Whereas
horse bile is not well characterized, guinea pig bile differs from
the bile of other rodents by being very dilute and highly concen-
trated in the less hydrophobic ursodeoxycholic acid (Hofmann
et al., 2010). Further studies into nature’s own compensatory
mechanisms may lead to new therapeutic targets and strategies
for ameliorating the consequences of Abcb4 mutations in
Simulating Trait Loss Suggests Broad Forward
To further test whether neutral evolution following independent
trait loss results in an evolutionary signature that can be
detected by our forward genomics approach, we simulated
the independent loss of the vitamin C and biliary phospholipid
traits. We used the real mammalian phylogeny and computa-
tionally evolved a short ancestral genome containing 103,088
coding and noncoding regions. Following the assumptions in
Figure 1, we simulated trait loss by evolving the 11 (27) exons
Figure 2. A Forward Genomics Screen to Match an Ancestral Presence/Absence Trait Pinpoints Gulo Inactivation in Vitamin C-Non-
(A) For every gene (dot) in the mouse genome (x axis), we measured how well it matches the given phenotree by counting the number of species (y axis) whose
divergence level violates the expectation of divergence or conservation based on the vitamin C phenotree shown in (C). Gulo, with 0 violations, is the only gene
that perfectly matches.
(B)Elevated ratio of nonsynonymous to synonymous (Ka/Ks) substitutions shows that remaining megabat and guineapig exons evolve under relaxed pressure to
preserve the Gulo protein sequence.
(C) Nonsynthesizing species show elevated sequence divergence in the Gulo coding sequence, with a divergence margin (gray) that perfectly separates them
from synthesizing species. Note that the microbat and megabat lineage have independently lost this trait, as intermediate bat species (without a sequenced
genome) were biochemically shown to synthesize vitamin C (Cui et al., 2011a).
(D) Graphical sequence alignment of the Gulo coding region. Rows match species in (C). Large deletions (red blocks) occurred only in nonsynthesizing species.
See also Figures S1 and S2.
Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors 819
of Gulo (Abcb4) neutrally in the independent trait-loss species,
while evolving other regions under purifying selection. We then
tested whether forward genomics can find some of the 11 (27)
exons of Gulo (Abcb4) among the >100,000 other regions,
based only on their specific divergence signature. We found
that nearly all simulation runs detect Gulo/Abcb4 exons, and
that these genes most often are the only/strongest hit (Figures
4A–4C). The simulated data offer support that the specific
evolutionary signature of independent trait loss can be used to
identify the genes for both traits by highlighting some of their
We further used the simulation framework to test other trait-
loss scenarios. We found that performance depends on the
evolutionary time since trait loss, the evolutionary rate of change
the strength of purifying selection in trait-preserving species
Figure 3. Forward Genomics Implicates Independent Inactivation of the Human Disease Gene ABCB4 in Two Species with Low Levels of
(A)Thelevelofbiliary phospholipids isacontinuoustraitthatvariesover200-foldbetweenmammals.Sevenhundredandninety-sixgenesshowmoredivergence
(B) We plot(y axis) the number of violations of eachgene (dot)in the mouse genome (x axis) against the biliary phospholipid level phenotree in (E). The eight genes
with 0 violations are labeled.
(C) Of the eight genes, only Abcb4 (bold) has a bile-related function.
(D) Increased Abcb4 nonsynonymous to synonymous (Ka/Ks) substitution ratios for guinea pig and horse.
(E and F) Divergence from the reconstructed common ancestor (E) and a graphical sequence alignment representation (F) of the Abcb4 coding sequence reveal
elevated divergence and deletions (red blocks) in trait-loss species only.
See also Figure S3 and Table S1.
820 Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors
(Figures S4A–S4D). Importantly, despite these dependencies,
forward genomics can detect some genetic trait components
for a wide range of plausible trait-loss scenarios, with the excep-
tion of very recent losses.
Many Phenotypes Are Changed in Independent
Our forward genomics approach is constrained by the existence
and accessibility of independently lost traits. Fortunately, the
scientific community has been scoring traits since the days
of Linnaeus (Linnaeus, 1758), with patterns of trait evolution
collected into large databases such as MorphoBank (http://
www.morphobank.org). We analyzed three vertebrate pheno-
type sets (selected for availability of a corresponding molecular
phylogeny and for addressing distinct vertebrate clades), mea-
suring 207, 166, and 88 different traits in ?20 species of bats,
primates, and anglerfish, respectively. By mapping traits to the
phylogeny, we find that ?40% of scored traits show changes
in independent lineages, consistent across all three sets (Fig-
ure 4D). In all cases, the sampled species make only a fraction
of their clade. As expected, independently changed traits
should increase as more and more species are sampled (Fig-
ure S4E). Few of the species sampled in these phenotype sets
are currently sequenced, but many of them are targeted for
genome sequencing by the Genome10K project (Haussler
hundreds of phenotypes can be subjected to our approach.
By requiring a longer evolutionary timescale and focusing
on between-species differences, our method complements
approaches, such as GWAS, to map phenotypic differences
between individuals of a population. The biliary phospholipid
example suggests potential applicability to continuous traits by
testing different thresholds, to traits with changes in only two
independent lineages, and to traits where phenotypic informa-
tion is available for only a few of the sequenced species. Our
forward genomics approach relies only on ancestral DNA se-
quence information loss and can therefore detect both coding
and noncoding regions (Figure S4). Pleiotropic regions neces-
sary for both the lost and unrelated traits may only experience
a relaxation of selective pressure, making it harder to detect
them. However, although the degree of pleiotropy in our genome
is still not completely known, recent work provides evidence that
many genes and quantitative trait loci (QTLs) affect only a very
small number of traits (Wagner and Zhang, 2011), and regulatory
noncoding regions are thought to be less pleiotropic than the
genes they regulate (Carroll, 2008). Also, future screens could
improve on detecting the more subtle signatures of relaxed
selection. Although forward genomics may not be successful
for all trait losses, the availability of thousands of vertebrate
and arthropod genomes in the coming years (Haussler et al.,
Figure 4. Broad Applicability of Our Forward Genomics Approach
(A) We show the branches in the phylogeny that evolve neutrally for the trait-associated gene in the trait-loss simulation (red: vitamin C synthesis; green: biliary
phospholipids). For biliary phospholipids, we simulated a loss that happened either 0.05 or 0.1 substitutions per site ago.
(B) Simulations suggest that the evolutionary signature of independent loss of vitamin C synthesis can highlight exons of the trait-associated gene in nine of ten
iterations (iteration 5 gave no hit). We observed no false positives.
(C) Simulations of the biliary phospholipid trait showthat in at least seven of ten iterations, the single top-ranked hit is an exon of the trait-associated gene (shown
true positives (green) usually rank highly.
(D) In three very different vertebrate phenotype-scoring studies, an average of 42% of phenotypes have changes in two or more independent lineages, the
conditions required for forward genomics analysis.
See also Figure S4.
Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors 821
2009; Robinson et al., 2011) suggests applicability to many
additional traits. Leveraging the unique signature of indepen-
dently evolved phenotypic patterns can help to tie the study
of nature’s tremendous phenotypic diversity to its underlying
genomic basis. To this end, we provide a web portal at http://
phenotree.stanford.edu/ to allow users to run any phenotree
search against our data. The portal provides a visualization
and download of the results and links to the University of Califor-
nia, Santa Cruz (UCSC) genome browser. Finally, it is not
straightforward to distinguish real mutations from genomic arti-
facts. Most current gene family collections focus only on collect-
ing intact members by mapping gene structures across species.
When mapping fails, no call is made to distinguish between true
gene loss and an unresolved state due to low quality or missing
sequence data. A systematic search for genes lost in wild-type
species will likely suggest new organisms useful for studying
the phenotypic consequences of human disease mutations.
Alignments and Input Data
We extended the mouse (mm9 assembly) 30-way genome alignment provided
by the UCSC genome browser (Dreszer et al., 2012) by 13 species, using the
UCSC pipeline of lastz, chaining, netting, and multiz (Extended Experimental
Procedures). The alignment is available through our portal. To obtain con-
served regions, we downloaded PhastCons (Siepel et al., 2005) most con-
served elements from the UCSC genome browser. Elements within 30 bp of
each other were merged. We retained only elements R70 bp after merging.
Exons of ‘‘known genes’’ were downloaded from the UCSC genome browser.
Ancestral Sequence Reconstruction and Percent Identity Values
platypus, chicken, zebra finch, or lizard), we used prequel (Siepel et al., 2005)
(parameters --no-probs --keep-gaps) to reconstruct the ancestral sequences.
We define the percent of identical bases in the pairwise alignment between
ancestor and extant species as id/(id + subs + ins + del) * 100, where id, subs,
ins, and del are the numbers of identical bases, substitutions, inserted bases,
and deleted bases, respectively. We ignore low-quality base positions (>1%
error rate). We consider the following large-scale events: (1) Large lineage-
specific insertions were counted as inserted bases. (2) Parts of the conserved
region that are unalignable between ancestor and extant species (defined as
being so diverged that no sequence alignment can be computed; unalignable
regions are annotated in the multiple alignment) were added to the number of
substitutions. Specifically, we add the number of ancestral bases correspond-
ing to this unalignable region if the unalignable part is at the end, or we add the
maximum of the number of ancestral bases and the number of extant bases in
this region if the unalignable part is flanked by aligning blocks on both sides.
Unalignable regions that mapto an assembly gap in thisspecieswere ignored.
If the entire conserved region is unalignable or deleted, the percent identity (%
id) is set to 0. (3) If the aligning sequence is not colinear in the extant species
(different strands or different chromosomes/scaffolds), we do not compute
a %id value for this species because these cases frequently arise due to
incomplete genome assemblies or assembly artifacts. Percent identity ranges
from 0 (complete loss) to 100 (identity to the ancestor). For gene-based
screens, we computed a %id value for the entire coding region of a gene by
summing the number of matches for each coding exon and dividing that
number by the summed alignment lengths.
Screening for Conserved Regions that Evolve Faster in Species
Lacking a Trait
We searched for regions/genes where the %id values in all trait-loss species
are at least 1% lower than in all trait-preserving species. We ignored species
with missing data (no %id value). We excluded regions that had missing
data for many species. To get a quantitative measurement of how well a
region/gene matches the phenotree, we computed the minimal number of
‘‘violating’’ trait-loss or trait-preserving species whose %id values have to
is the number of trait-loss species for both vitamin C and biliary phospholipids,
as there are more trait-preserving than trait-loss species.
Bioinformatics Analysis of the Gulo Locus and Resequencing
Standard bioinformatics analysis of the Gulo gene and its genomic locus as
well as standard PCR-based resequencing are described in the Extended
Simulating Independent Trait Loss
evolver/) with standard parameters (Extended Experimental Procedures) and
the real mammalian phylogeny. To simulate independent trait loss, we
randomly picked a small number of functional regions and assumed that
they encode trait-specific information (true positives). All other regions are
false positives. We evolved these regions neutrally for a fixed terminal part
of the branch leading to a trait-loss lineage.
We restricted the ancestral genome simulation to human chromosome 1
with its 1,603 RefSeq genes (downloaded from UCSC) as this already
consumed ?50,000 CPU hours and 1.6 TB of disk space. We considered all
coding exons and functional noncoding regions R70 bp that have an average
accept probability of %0.7 (i.e., we ignore regions under very weak selection),
giving a total of 103,088 regions. The selected trait-specific regions (%27 in all
tests) comprise only 0.026% of all regions. Evolver outputs the pairwise align-
ment of the evolved and ancestral genome, which we used to measure %id for
trait-loss species are R1% more diverged than all trait-preserving species.
To simulate the loss of vitamin C synthesis, we picked a gene with 11 coding
exons (like Gulo; 11 true positives). True positives evolved neutrally for the
branches in the human-tarsier subtree, the final 0.07 substitutions per site of
the guinea pig branch, the microbat (M. lucifugus) branch, and the final
0.045 substitutions per site of the megabat (P. vampyrus) branch (Extended
a gene with 27 coding exons (like Abcb4) and simulated two scenarios: trait
loss 0.05 and 0.1 substitutions per site ago.
The GenBank accession numbers for the Gulo resequencing data are
Supplemental Information includes Extended Experimental Procedures, four
figures, and one table and can be found with this article online at http://dx.
This is an open-access article distributed under the terms of the Creative
Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported
License (CC-BY-NC-ND; http://creativecommons.org/licenses/by-nc-nd/3.
We thank the Mammalian Genome Project for available mammalian genomes,
the Broad Institute for the microbat assembly, David Ray for microbat tissue,
Julie Feinstein and the Ambrose Monell Cryo Collection (AMCC) at the Amer-
UCSC genome browser team for software and genome annotations, Hiram
Clawson and Brian Raney for help with sequence quality scores and alignment
822 Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors
annotations, Karla Neugebauer and members of the Bejerano lab for helpful Download full-text
discussions, and Ravi Parikh and Harendra Guturu for our web portal. This
work was supported by fellowships from the German Research Foundation
(Hi 1423/2-1) and Human Frontier Science Program (LT000896/2009-L) to
M.H. and NIH grants R01HD059862 and R01HG005058 and the NSF Center
for Science of Information (CSoI) under grant agreement CCF-0939370 to
G.B. D.M.K. is an investigator of the Howard Hughes Medical Institute. G.B.
is a Packard Fellow and Microsoft Faculty Fellow.
Received: April 26, 2012
Revised: July 31, 2012
Accepted: August 30, 2012
Published online: September 27, 2012
Blanchette, M., Green, E.D., Miller, W., and Haussler, D. (2004). Reconstruct-
ing large regions of an ancestral mammalian genome in silico. Genome Res.
Carroll, S.B. (2008). Evo-devo and an expanding evolutionary synthesis:
a genetic theory of morphological evolution. Cell 134, 25–36.
Cheng, Z., Ventura, M., She, X., Khaitovich, P., Graves, T., Osoegawa, K.,
Church, D., DeJong, P., Wilson, R.K., Pa ¨a ¨bo, S., et al. (2005). A genome-
wide comparison of recent chimpanzee and human segmental duplications.
Nature 437, 88–93.
Coleman, R., Iqbal, S., Godfrey, P.P., and Billington, D. (1979). Membranes
and bile formation. Composition of several mammalian biles and their
membrane-damaging properties. Biochem. J. 178, 201–208.
Cui, J., Pan, Y.H., Zhang, Y., Jones, G., and Zhang, S. (2011a). Progressive
pseudogenization: vitamin C synthesis and its loss in bats. Mol. Biol. Evol.
Cui, J., Yuan, X., Wang, L., Jones, G., and Zhang, S. (2011b). Recent loss of
vitamin C biosynthesis ability in bats. PLoS ONE 6, e27114.
Davit-Spraul, A., Gonzales, E., Baussan, C., and Jacquemin, E. (2010). The
spectrum of liver diseases related to ABCB4 gene mutations: pathophysiology
and clinical aspects. Semin. Liver Dis. 30, 134–146.
Dreszer, T.R., Karolchik, D., Zweig, A.S., Hinrichs, A.S., Raney, B.J., Kuhn,
R.M., Meyer, L.R., Wong, M., Sloan, C.A., Rosenbloom, K.R., et al. (2012).
The UCSC Genome Browser database: extensions and updates 2011. Nucleic
Acids Res. 40(Database issue), D918–D923.
Engelking, L.R., Anwer, M.S., and Hofmann, A.F. (1989). Basal and bile salt-
stimulated bile flow and biliary lipid excretion in ponies. Am. J. Vet. Res. 50,
Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Pat-
terson, N., Li, H., Zhai, W., Fritz, M.H., et al. (2010). A draft sequence of the
Neandertal genome. Science 328, 710–722.
Haussler, D., O’Brien, S., Ryder, O., Barker, F., Clamp, M., Crawford, A., Han-
ner, R., Hanotte, O., Johnson, W., McGuire, J., et al; Genome 10K Community
of Scientists. (2009). Genome 10K: a proposal to obtain whole-genome
sequence for 10,000 vertebrate species. J. Hered. 100, 659–674.
Hofmann, A.F., Hagey, L.R., and Krasowski, M.D. (2010). Bile salts of verte-
brates: structural variation and possible evolutionary significance. J. Lipid
Res. 51, 226–246.
Linnaeus, C. (1758). Systema Naturae (Holmiae: Impensis Direct).
Linster, C.L., and Van Schaftingen, E. (2007). Vitamin C. Biosynthesis, recy-
cling and degradation in mammals. FEBS J. 274, 1–22.
Maeda, N., Hagihara, H., Nakata, Y., Hiller, S., Wilder, J., and Reddick, R.
(2000). Aortic wall damage in mice unable to synthesize ascorbic acid. Proc.
Natl. Acad. Sci. USA 97, 841–846.
McLean, C.Y., Reno, P.L., Pollen, A.A., Bassan, A.I., Capellini, T.D., Guenther,
C., Indjeian, V.B., Lim, X., Menke, D.B., Schaar, B.T., et al. (2011). Human-
specific loss of regulatory DNA and the evolution of human-specific traits.
Nature 471, 216–219.
Nishikimi, M., Kawai, T., and Yagi, K. (1992). Guinea pigs possess a highly
mutated gene for L-gulono-gamma-lactone oxidase, the key enzyme for L-as-
corbic acid biosynthesis missing in this species. J. Biol. Chem. 267, 21967–
Nishikimi, M., Fukuyama, R., Minoshima, S., Shimizu, N., and Yagi, K. (1994).
lono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis
missing in man. J. Biol. Chem. 269, 13685–13688.
Ohta, Y., and Nishikimi, M. (1999). Random nucleotide substitutions in primate
nonfunctional geneforL-gulono-gamma-lactoneoxidase, themissingenzyme
in L-ascorbic acid biosynthesis. Biochim. Biophys. Acta 1472, 408–411.
Pollard, K.S., Salama, S.R., Lambert, N., Lambot, M.A., Coppens, S., Peder-
sen, J.S., Katzman, S., King, B., Onodera, C., Siepel, A., et al. (2006). An
RNA gene expressed during cortical development evolved rapidly in humans.
Nature 443, 167–172.
Prabhakar, S., Noonan, J.P., Pa ¨a ¨bo, S., and Rubin, E.M. (2006). Accelerated
evolution of conserved noncoding sequences in humans. Science 314, 786.
Puglielli, L., Amigo, L., Arrese, M., Nu ´n ˜ez, L., Rigotti, A., Garrido, J., Gonza ´lez,
S., Mingrone, G., Greco, A.V., Accatino, L., et al. (1994). Protective role of
biliary cholesterol and phospholipid lamellae against bile acid-induced cell
damage. Gastroenterology 107, 244–254.
Robinson, G.E., Hackett, K.J., Purcell-Miramontes, M., Brown, S.J., Evans,
J.D., Goldsmith, M.R., Lawson, D., Okamuro, J., Robertson, H.M., and
Schneider, D.J. (2011). Creating a buzz about insect genomes. Science 331,
Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom,
K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolution-
arily conserved elements in vertebrate, insect, worm, and yeast genomes.
Genome Res. 15, 1034–1050.
Smit, J.J., Schinkel, A.H., Oude Elferink, R.P., Groen, A.K., Wagenaar, E., van
Deemter, L., Mol, C.A., Ottenhoff, R., van der Lugt, N.M., van Roon, M.A., et al.
(1993). Homozygous disruption of the murine mdr2 P-glycoprotein gene leads
to a complete absence of phospholipid from bile and to liver disease. Cell 75,
Varki, A., and Altheide, T.K. (2005). Comparing the human and chimpanzee
genomes: searching for needles in a haystack. Genome Res. 15, 1746–1758.
Wagner, G.P., and Zhang, J. (2011). The pleiotropic structure of the genotype-
phenotype map: the evolvability of complex organisms. Nat. Rev. Genet. 12,
Zhu, J., Sanborn, J.Z., Diekhans, M., Lowe, C.B., Pringle, T.H., and Haussler,
D. (2007). Comparative genomics search for losses of long-established genes
on the human lineage. PLoS Comput. Biol. 3, e247.
Cell Reports 2, 817–823, October 25, 2012 ª2012 The Authors 823