A “Forward Genomics” Approach Links Genotype to Phenotype using Independent Phenotypic Losses among Related Species

Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA. Electronic address: .
Cell Reports (Impact Factor: 8.36). 09/2012; 2(4). DOI: 10.1016/j.celrep.2012.08.032
Source: PubMed


Genotype-phenotype mapping is hampered by countless genomic changes between species. We introduce a computational "forward genomics" strategy that-given only an independently lost phenotype and whole genomes-matches genomic and phenotypic loss patterns to associate specific genomic regions with this phenotype. We conducted genome-wide screens for two metabolic phenotypes. First, our approach correctly matches the inactivated Gulo gene exactly with the species that lost the ability to synthesize vitamin C. Second, we attribute naturally low biliary phospholipid levels in guinea pigs and horses to the inactivated phospholipid transporter Abcb4. Human ABCB4 mutations also result in low phospholipid levels but lead to severe liver disease, suggesting compensatory mechanisms in guinea pig and horse. Our simulation studies, counts of independent changes in existing phenotype surveys, and the forthcoming availability of many new genomes all suggest that forward genomics can be applied to many phenotypes, including those relevant for human evolution and disease.

Download full-text


Available from: Michael Hiller,
  • Source
    • "The analysis of phenotypic traits in a phylogenetic framework is key to addressing the evolutionary questions posed by an increasingly diverse set of domains. For example, understanding the evolution of pharyngeal jaw mechanics in fishes (Price et al. 2010), identifying phenotype-associated genes and regulators in forward genomics approaches (Hiller et al. 2012), exploring the key factors in land plant evolution (Rudall et al. 2013), or discovering the role of phenotypic traits in colonization ability (Van Bocxlaer et al. 2010), all rely on the mapping of phenotypic data to phylogeny. Although robust molecular phylogenies have become easier to generate, more broadly available, and increasingly comprehensive, the phenotypic data on which these studies rely have not. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in non-computer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, i.e., cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
    Systematic Biology 05/2015; 64(6). DOI:10.1093/sysbio/syv031 · 14.39 Impact Factor
  • Source
    • "When a gene fails to map to a genome assembly, it is difficult to distinguish between true gene loss and an unresolved state due to low quality or missing sequencing product (Hiller et al., 2012). This problem is often exacerbated by the use of previously mis-assembled genomes to assemble new genome sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The genomic and developmental complexity of vertebrates is commonly attributed to two rounds of whole genome duplications which occurred at the base of the vertebrate radiation. These duplications led to the rise of several, multi-gene families of developmental proteins like the fibroblast growth factors (FGFs); a signaling protein family which functions at various stages of embryonic development. One of the major FGF assemblages arising from these duplications is the FGF8 subfamily, which includes FGF8, FGF17, and FGF18 in tetrapods. While FGF8 and FGF18 are found in all tetrapods and are critical for embryonic survival, genomic analyses suggest putative loss of FGF17 in various lineages ranging from frogs and fish, to the chicken. This study utilizes 27 avian genomes in conjunction with molecular analyses of chicken embryos to confirm the loss of FGF17 in chicken as a true, biological occurrence. FGF17 is also missing in the turkey, black grouse, Japanese quail and northern bobwhite genomes. These species, along with chicken, form a monophyletic clade in the order Galliformes. Four additional species, members of the clade Passeroidea, within the order Passeriformes, are also missing FGF17. Additionally, analysis of intact FGF17 in other avian lineages reveals that it is still under strong purifying selection, despite being seemingly dispensable. Thus, FGF17 likely represents a molecular spandrel arising from a genome duplication event and due to its high connectivity with FGF8/FGF18, and potential for interference with their function, is retained under strong purifying selection, despite itself not having a strong selective advantage. Copyright © 2015. Published by Elsevier B.V.
    Gene 03/2015; 563(2). DOI:10.1016/j.gene.2015.03.027 · 2.14 Impact Factor
  • Source
    • "A recent proof-of-concept for this approach focused on the ability to synthesize vitamin C, an ancestral vertebrate trait that was lost in at least four independent mammalian lineages. A large phylogenetic tree aligning 27 sequenced mammalian genomes identified only a single gene that was lost in all of these and only these four lineages; this gene is indeed central to vitamin C synthesis (Hiller et al., 2012). It seems doubtful that such a straightforward strategy will be feasible for more complex multigenic traits, ones that emerged more recently in evolution, or ones for which large phylogenetic trees of gain and loss are not available. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome technologies are transforming all areas of biology, including the study of hormones, brain and behaviour. For birds, annotated reference genome assemblies are rapidly being produced for many avian species. Here we briefly review the basic concepts and tools used in genomics. We then consider how these are informing the study of avian behavioural neuroendocrinology, focusing in particular on lessons from the study of songbirds. We discuss the impact of having a complete "parts list" for an organism; the transformational potential of studying large sets of genes at once instead one gene at a time; the growing recognition that environmental and behavioural signals trigger massive shifts in gene expression in the brain; and the prospects for using comparative genomics to uncover the genetic roots of behavioural variation. Throughout, we identify promising new directions for bolstering the application of genomic information to further advance the study of avian brain and behaviour.
    Frontiers in Neuroendocrinology 10/2013; 35(1). DOI:10.1016/j.yfrne.2013.09.004 · 7.04 Impact Factor
Show more