ChapterPDF Available

Next generation apomorphy: The ubiquity of taxonomically restricted genes

Authors:

Figures

No caption available
… 
Content may be subject to copyright.
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
1!
!
Next-generation!apomorphy:!the!ubiquity!of!
taxonomically!restricted!genes!
!
P.!A.!Nelson1!and!R.!J.!A.!Buggs2!
!
1!"#$%&'(")*+,"-./&01233&!"#$%&4)*5&6%&7"+%8%/&94&:3;1:/&'<4&
2<=>##$&#?&!"#$#@"=%$&%(8&9>*A"=%$&<="*(=*,/&BC**(&7%+.&'(")*+,"-.&#?&6#(8#(/&7"$*&
D(8&E#%8/&6#(8#(/&D0&FG</&'H&
!
1.!!Introduction!
!
2.!!The!contingent!nature!of!TRG!classification!
!! 2.1.!Contingency!due!to!taxonomic!category!
!!!2.2.!Contingency!due!to!similarity!threshold!
2.3.!Contingency!due!to!sampling!
!
3.!The!ubiquity!of!TRGs!
!!!3.1!Bacterial!pan-genomes!
!! 3.2!Virus!reservoirs!
!!!3.3!Eukaryotes!
!
4.!!The!functional!significance!of!TRGs!
!!!4.1!General!evidence!
!!! 4.2!Five!examples!of!TRG!function!
!!!!
5.!!The!origin!and!evolution!of!TRGs!
!! !5.1!Standard!models!of!novel!gene!evolution!
!! !5.2!I*&(#)#!gene!evolution!
!! !5.3!The!need!for!data-driven!research!
!
6.!!Systematics!of!TRGs!
!! !6.1!Phylostratigraphy!
!! !6.2!Phylogenetic!reconstruction!
!6.3!Supporting!characters!
!
7.!!Concluding!remarks! !
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
2!
1.!!!!Introduction!
!
The!ability!to!sequence!whole!genomes!at!ever!increasing!rates!has!led!to!the!
discovery!of!vast!numbers!of!genes!that!are!uniquely!found!in!a!single!taxon!(i.e.!
apomorphic!genes).!Before!the!advent!of!automated!DNA!sequencing!in!the!early!
1990s,!genetic!comparison!of!organisms!was!only!feasible!through!the!targeted!
amplification!of!homologous!genes!that!are!shared!among!divergent!taxa,!and!
reliable!identification!of!taxon-specific!genes!was!almost!impossible.!Shortly!
after!the!publication!of!the!first!whole!genome!in!1995,!it!became!clear!that!
species!possessed!many!more!taxonomically!unique,!or!restricted,!gene!
sequences!than!expected.!When!seven!whole!genomes!had!been!published,!
Russell!F.!Doolittle,!a!molecular!biologist!of!many!decades’!experience,!
commented:!“I!am!surprised!that!so!many!open!reading!frames!remain!as!
unidentified![i.e.!unique]!reading!frames”!(1997,!516).!Five!years,!when!60!
whole!genomes!had!been!sequenced,!he!called!taxonomically!unique!sequences!
“the!biggest!surprise!in!genome!sequencing”!(2002,!698).!
!
Today,!with!whole!genome!sequencing!further!facilitated!by!next!generation!
technologies,!these!taxonomically!restricted!genes!(TRGs;!Khalturin!et!al.!2009),!
also!known!as!orphan!genes!(Dujon!1996),!or!“ORFans”!(Fischer!and!Eisenberg!
1999)!continue!to!be!discovered!in!every!newly!sequenced!species!genome!
(Figures!1!and!2).!These!genes!represent!one!of!the!most!intriguing!aspects!of!
systematics,!lying!at!the!intersection!of!genomics,!genetics,!comparative!and!
structural!biology,!phylogenetics!and!evolution.!Yet,!by!their!very!nature,!they!
are!difficult!to!study!using!conventional!comparative!approaches!and!attract!
little!research!funding.!!
!
In!this!chapter!we!review!the!current!status!of!this!conundrum!in!the!light!of!
rapid!advances!in!genomics.!Section!2!examines!the!definition!of!TRGs/ORFans,!
noting!that!this!is!an!inherently!comparative!concept!and!the!status!of!any!gene!
as!a!TRG/ORFan!is!therefore!highly!contingent.!Section!3!emphasizes!their!
ubiquity.!Section!4!discusses!the!biological!significance!of!some!TRGs!in!terms!of!
putative!functions.!Section!5!discusses!hypotheses!for!the!origins!and!evolution!
of!TRGs.!Section!6!examines!the!relevance!of!TRGs!to!systematics.!
!
2.!!!The!contingent!nature!of!TRG!classification!!
!
Assigning!any!gene!the!status!of!“taxonomically!restricted”!or!“orphan”!is!
necessarily!a!relative!judgment;!an!“orphan”!gene!always!holds!its!status!
provisionally.!!Its!status!is!contingent!on!three!factors:!(1)!a!taxonomic!category,!
(2)!a!similarity!threshold!used!as!a!proxy!for!homology,!and!(3)!genomic!
database!size!and!sampling:!i.e.,!the!total!pool!of!known!gene!sequences,!which!
jointly!yield!the!universe!of!objects!for!comparison.!Because!both!factors!(1)!and!
(2)!involve!judgments!on!which!workers!may!differ,!and!(3)!is!constantly!in!flux!
(i.e.,!growing),!the!status!of!any!gene!as!a!TRG!will!be!necessarily!conditional.!!
Any!gene,!at!any!time,!may!move!from!being!an!orphan!to!an!ortholog!(the!
contrast!class,!by!definition,!of!taxonomically!unique!sequences);!an!example!of!
such!a!re-evaluation!is!given!for!the!Drosophila!gene!#,J%+!in!Section!2.3!below.!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
3!
Given!the!importance!of!these!issues!of!definition,!we!consider!them!in!turn!
below.!
!
2.1!Contingency!due!to!taxonomic!category!
!
The!term!“orphan”!was!introduced!in!1996!by!Dujon!(1996)!with!reference!to!
the!yeast!genome,!but!as!taxonomic!sampling!of!whole!genome!sequences!was!
tiny!at!the!time,!the!level!of!taxonomic!restriction!implied!by!the!term!was!not!
clearly!defined.!Some!authors!now!restrict!the!term!to!sequences!from!the!
genomes!of!single!species!(i.e.!autapomorphic!genes),!while!others!(e.g.,!Narra!et!
al.!2008)!use!it!to!refer!to!genes!with!orthologs!found!in!multiple!closely!allied!
genera!(i.e.!synapomorphic!genes).!Others!apply!additional!descriptors!and!
referred!to!“singleton!ORFans”,!“orthologous!ORFans”!and!“paralogous!ORFans”!
(Siew!and!Fischer!2003).!This!lack!of!consistent!usage!engenders!confusion!–!
one!investigator’s!ORFan!will!be!another’s!ortholog!–!and,!this!has!given!rise!to!
the!longer,!but!more!useful!term!“taxonomically!restricted!gene”!(TRG),!
promoted!by!Wilson!et!al.!(2005,!2007)!and!Bosch!and!colleagues!(Khalturin!et!
al.!2009),!among!others.!!Using!“taxonomically!restricted!gene”!to!refer!to!
sequences!with!limited!systematic!distribution!encourages!(indeed,!requires)!
that!one!specify!the!taxon!in!question.!!With!the!taxonomic!level!thus!defined,!it!
is!much!less!likely!that!ambiguity!of!meaning!will!creep!in.!In!any!case,!the!
designation!of!any!gene!as!“taxonomically!restricted”!can!be!no!more!stable!than!
the!boundaries!or!definition!of!the!source!taxon!itself.!
!
2.2!Contingency!due!to!similarity!threshold!
!
In!the!early!1960s,!in!a!series!of!prescient!publications,!Zuckerkandl!and!Pauling!
described!“ways!of!gaining!information!about!evolutionary!history!through!the!
comparison!of!homologous!polypeptide!chains”!(1965,!360).!!As!with!classical!
anatomical!homology,!the!signal!of!history!was!to!be!extracted!from!similarity:!
that!is,!the!more!closely!related!(causally,!via!material!descent)!two!or!more!
biological!objects!are,!the!more!similar!they!will!be.!!By!this!same!logic,!the!$*,,!
similar!two!objects!are,!the!less!closely-related!they!are.!!The!definition!of!
“homology”!and!the!criteria!used!to!determine!its!presence!in!molecular!data!
have!long!been!subjects!of!controversy!(see,!e.g.,!Reeck!et!al.!1987;!Hillis!1994;!
Eisen!1998).!!At!the!heart!the!use!of!sequence!similarity!to!assess!homology!lies!
a!probabilistic!intuition,!well!expressed!by!Patterson!(1988,!615):!“if!two!
structures!are!complex!enough!and!similar!in!detail,!probability!dictates!that!
they!must!be!homologous!rather!than!convergent”.!
!
Both!genes!and!proteins!occur!as!discrete!strings,!enabling!their!direct!alignment!
from!different!species,!with!counting!of!differences!as!a!measure!of!distance.!
With!the!development!of!heuristic!tools,!such!as!BLAST!(Basic!Local!Alignment!
Search!Tool;!Altschul!et!al.!1990,!1997),!there!has!been!widespread!use!of!
parameter!thresholds!for!“homologous!sequence”!identification,!such!as!BLAST!
“Expect”!values!of!!0.001!to!0.00001!(Siew!and!Fischer!2003).!While!this!is!a!
useful!practical!approach,!it!should!be!borne!in!mind!that!quantitative!measures!
of!similarity!are!being!used!to!make!a!qualitative,!binary!assessment!of!the!status!
of!a!gene:!under!the!most!widely-accepted!evolutionary!definition!of!“homology,”!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
4!
entities!are!either!homologous!or!they!are!not!(Reeck!et!al.!1987).1!!Thus!a!
somewhat!arbitrary!probabilistic!convention!is!applied!to!a!relation!that!is!
binary!and!qualitative.!!Differences!in!threshold!levels!used!affect!greatly!the!
detection!of!homology!with!BLAST!searches!(Rost,!1999,!Koski!and!Golding,!
2001),!and!hence!our!assessment!of!the!frequency!and!occurrence!of!TRGs.!The!
usefulness!and!shortcomings!of!using!BLAST!to!detect!TRGs!is!explored!at!
greater!length!by!Tautz!and!Domazet-Lošo!(2011),!who!recommend!use!of!
position-specific!iterated!BLAST,!with!manual!supervision,!for!tracing!patterns!of!
homology!rigorously.!
!
2.3!Contingency!due!to!sampling!
!
No!TRG!could!be!named!as!such!in!a!world!where!only!one!genome!had!been!
sequenced,!nor!could!TRGs!be!found!where!only!homologous!genes!had!been!
sampled!from!a!range!of!genomes.!!Our!confidence!of!uniqueness!(for!any!
sequence)!is!directly!proportional!to!the!completeness!of!taxonomic!sampling.!
We!should!expect!that!increased!genomic!sampling!will!provide!matches!
(orthologs)!for!many!TRGs.!!The!gene!#,J%+,!first!identified!as!a!TRG!in!
I+#,#K>"$%!A*$%(#@%,-*+,!provides!an!instructive!illustration.!!Although!
necessary!for!germ-cell!formation!in!I5&A*$%(#@%,-*+,!“unlike!many!other!genes!
with!indispensable!roles!in!development,!#,J%+!is!not!a!widely!conserved!gene:!it!
proved!absent!from!the!first!non-fly!insect!genomes!sequenced,!and!has!no!clear!
homologue!in!any!other!animal”!(Extavour!2011).!!Using!“a!relaxed!and!modified!
BLAST!strategy,”!however!(see!section!2.2),!Lynch!and!colleagues!(2011)!located!
an!#,J%+!ortholog,!G)L#,J,!in!the!wasp!G%,#("%.!!Thus!we!return!to!the!point!
which!opened!this!section:!the!status!of!any!gene!as!“orphan”!or!“taxonomically!
restricted,”!intrinsically!a!relative!judgment,!calls!for!alertness!on!the!part!of!
investigators!to!the!three!principal!criteria!(similarity!threshold,!taxonomic!
category,!completeness!of!sample)!employed.!
!
3.!!The!ubiquity!of!TRGs!!!
!
Every!sequenced!genome!has!revealed!a!substantial!number!of!TRGs.!When!this!
first!occured,!it!was!widely!assumed!by!many!that!TRGs!were!simply!artifacts!of!
limited!sampling:!
!
…when!only!a!handful!of!complete!genome!sequences!were!available,!a!number!
of!possible!explanations!for!the!abundance!of!ORFans!were!suggested.!!One!
explanation!was!that!the!relatively!high!proportion!of!ORFans!may!be!due!to!an!
artifact!of!sparse!sampling!of!the!sequence!space,!and!that!with!the!availability!
of!more!genomes,!most!ORFans!would!disappear.!!(Siew!and!Fischer!2003,!7)!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1!We!note!however!that!given!the!possibility!of!lateral!gene!transfer!(LGT),!some!
authors!have!used!the!term!“partial!homology”!for!recombinant!coding!
sequences,!where!(for!example)!domain!A!comes!from!species!P,!whereas!
domain!B!comes!from!species!Q,!and!A!and!B!are!conjoined!in!species!R!to!form!a!
new!protein!C!(see!Chan!et!al.!2009)!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
5!
Perhaps!surprisingly,!this!expectation!was!not!borne!out.!!The!majority!of!early!
genome!sequences!were!of!bacteria!and!a!2005!study!of!122!bacterial!genomes!
showed!that!the!number!of!TRGs!found!was!rising!in!a!linear!fashion!with!
number!of!genomes!sequenced,!showing!no!signs!of!a!plateau!(Wilson!et!al.,!
2005).!!More!recently,!Beiko!(2011)!surveyed!over!a!thousand!complete!
bacterial!and!archaeal!genomes!(see!Figure!1),!noting!that!no!plateau!for!new!
TRGs!can!yet!be!envisaged.!!Given!the!amount!of!novel!genetic!information!in!
new!genomes,”!he!writes!(2011,!5),!“and!the!increasing!rate!at!which!genomes!
are!being!sequenced,!there!is!consequently!no!reason!to!suspect!that!the!rate!of!
accumulation!of!novel!genes!will!decrease!in!the!near!future.”!
!
3.1!Bacterial!pan-genomes!
This!vista!of!genetic!novelty!existing!beyond!the!horizon!of!what!has!already!
been!sequenced,!can!perhaps!be!seen!most!dramatically!in!the!notion!of!the!
‘open!pan-genome.’!!!The!concept!of!a!common!potential!genome!for!all!bacteria!
was!articulated!by!Sonea!and!Panisett!(1980)!and!the!term!“pan-genome”!was!
introduced!by!Tettelin!et!al.!(2005),!as!they!attempted!to!describe!the!full!genetic!
diversity!found!within!a!single!bacterial!species.!!After!comparing!the!complete!
genomes!of!eight!strains!of!the!pathogen!<-+*K-#=#==C,&%@%$%=-"%*!(also!known!as!
Group!B!<-+*K-#=#==C,,!or!GBS),!Tettelin!et!al.!(2005)!found!that!each!newly-
sequenced!strain!contained!genes!not!previously!seen!in!any!other!strain.!!Fitting!
their!data!to!an!exponential!decay!function,!they!predicted!that!“for!every!new!
GBS!genome!sequenced,!an!average!of!33!new!strain-specific!genes!will!be!
identified!and!added!to!the!pan-genome.”!Similar!studies!examining!D,=>*+"=>"%&
=#$"!(Rasko!et!al.!2008;!Touchon!et!al.!2009)!found!“continual!addition!of!new!
genes!with!each!newly!sequenced!genome,”!and!thus,!the!same!‘open’!pattern:!
“no!single!strain!can!be!regarded!as!highly!representative!of!the!species…the!
pan-genome!is!far!from!being!fully!uncovered”!(Touchon!et!al.!2009,!p.!5).!!Those!
sequences!found!in!all!strains!constitute!the!‘core!genome’!–!mainly!encoding!
housekeeping!functions!such!as!translation!or!core!metabolic!processes!–!
whereas!the!strain-specific!sequences!are!usually!described!as!the!‘accessory’!or!
‘dispensable!genome,’2!needed!for!existence!“in!a!specific!environment…linked!
to!virulence,!capsular!serotype,!adaptation,!and!antibiotic!resistance!and!might!
reflect!the!organisms’!predominant!lifestyle”!(Mira!et!al.!2010,!47).!
!
Not!all!bacterial!species!exhibit!“open”!genome!patterns;!“closed”!pan-genomes,!
such!as!found!in!<-%K>.$#=#==C,&%C+*C,/!!%="$$C,&%(->+%=",!(Tettelin!et!al.!2008),!
and!9%AK.$#M%=-*+!sp.!(Lefébure!et!al.!2010)!are!characterized!by!rarefaction!
curves!“that!converge!to!a!small!but!finite!number!of!asymptotically!discovered!
new!genes”!(Tettelin!et!al.!2008,!475)!–!meaning!that!additional!sequencing!of!
strains!within!these!species!is!unlikely!to!reveal!new!genetic!diversity.!!!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2!‘Dispensable’!is!something!of!a!misnomer.!!Noting!that!the!functions!specified!
by!the!‘dispensable’!sequences!often!involve!“characters!that!are!a!direct!
response!to!the!environment,”!Mira!et!al.!(2010,!55)!stress!that!“a!gene!within!
the!accessory!genome…should!not!be!literally!regarded!as!dispensable.”!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
6!
While!a!variety!of!models!have!been!proposed!to!explain!bacterial!pan-genome!
patterns!(see,!e.g.,!Boissy!et!al.!2011,!Baumdicker!et!al.!2010,!2012),!the!take-
home!lesson!is!the!enormous!genetic!diversity!within!the!domain!as!a!whole.!!
For!example,!in!an!analysis!of!573!sequenced!bacterial!genomes,!Lapierre!and!
Gogarten!(2009)!sampled!genes!randomly,!and!then!queried!the!entire!pool!of!
genomes!to!find!BLAST!hits!for!the!sampled!genes,!categorizing!gene!families!
according!to!the!degree!in!which!they!were!shared!among!species.!!Within!each!
individual!genome,!the!typical!gene!complement!was!approximately:!8%!core!
conserved,!64%!“character”!genes!(“essential!for!colonization!and!survival!in!
particular!environmental!niches”)!and!28%!“accessory”!genes!(TRGs!mainly!of!
unknown!function).!!This!meant!that!within!the!pan-genome!of!about!150k!gene!
families,!approximately!0.2%!were!core,!5%!were!“character”!and!over!94%!
were!“accessory”.!!Fitting!the!data!to!an!exponential!decay!function,!Lapierre!and!
Gogarten!concluded,!“the!pan-genome!of!the!bacterial!domain!is!of!infinite!size.”3!!
!
3.2!Virus!reservoirs!
!
Viruses!are!especially!rich!in!TRGs!(Edwards!and!Rohwer,!2005,!Bench!et!al.,!
2007,!Forterre!and!Prangishvili,!2009,!Prangishvili!et!al.,!2006).!Boyer!et!al.!
(2010)!estimate!between!30!to!>70!percent!of!viral!genomes!constitute!TRGs,!
compared!to!10-15!percent!of!TRGs!in!archaeal!and!bacterial!genomes!(Koonin!
2011,!110).!!Metagenomic!surveys!of!viral!populations!in!seawater,!drawing!on!
the!presence!“of!an!average!of!107!virus-like!particles!per!milliliter!of!surface!
seawater”!and!“an!estimated!1030!viruses!in!the!global!oceans”!(Breitbart!2012),!
have!motivated!theoretical!speculations!about!a!vast!reservoir!of!viral!sequences,!
dwarfing!in!size!the!prokaryotic!and!eukaryotic!genomic!universes.!!Shapiro!
(2011),!Koonin!(2011),!Abroi!and!Gough!(2011),!and!others!have!hypothesized!
that!this!enormous!“virosphere”!provides!a!“research!and!development”!realm!
where!“experimentation!with!genomic!processes”!(Shapiro!2011,!133)!yields!a!
supply!of!novel!sequences!(TRGs),!which!may!eventually!be!taken!up!by!
prokaryotes!via!viral!transfer.!
!
3.3!Eukaryotes!
!
All!eukaryote!genome!sequences,!including!those!from!yeasts!(Kessler!et!al.,!
2003,!Pena-Castillo!and!Hughes,!2007),!plants!(Rutter!et!al.,!2012,!Donoghue!et!
al.,!2011,!Campbell!et!al.,!2007)!and!primates!(Wu!et!al.,!2011,!Knowles!and!
McLysaght,!2009,!Clamp!et!al.,!2007)!have!yielded!TRGs.!Every!new!completed!
genome!sequence!reveals!a!significant!percentage!of!new!TRGs;!indeed,!“as!
orphan!genes![TRGs]!represent!a!substantial!fraction!of!every!extant!genome,!the!
total!number!of!orphans!across!all!evolutionary!lineages!by!far!exceeds!the!
number!of!known!gene!families”!(Tautz!and!Domazet-Lošo!2011,!p.!693).!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
3!“Infinite”!should!be!understood!to!mean!“indefinitely!large,”!given!that!the!
number!of!bacterial!cells!(and!hence!possible!bacterial!genes)!on!Earth,!while!
vast,!is!finite.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
7!
As!with!bacteria,!increased!sampling!of!genomes!has!rapidly!increased!the!
number!of!TRGs!discovered.!!Within!the!Nematoda,!for!instance,!EST!datasets!
and!whole!genome!data!complied!by!Wasmuth!et!al.!(2008),!showed!the!
following:!
!
Cross-comparison!of!the!95&*$*@%(,!and!95&M+"@@,%*!proteomes!identified!~10%!
of!unique!genes!in!each!species.!Throwing!the!draft!!5&A%$%."!genome!into!the!
mix,!revealed!~40%!of!its!proteins!did!not!share!homology!to!95&*$*@%(,,!95&
M+"@@,%*!nor!I+#,#K>"$%&A*$%(#@%,-*+…!Adding!partial!proteomes!from!37!
additional!nematode!species!reduced!the!number!of!private!genes!to!~8%!in!
each!species.!While!we!expect!this!proportion!to!decline!as!nematode!EST!
sequencing!continues,!along!with!the!release!of!genomes,!we!expect!that!each!
fully!sequenced!genome!has!a!significant!complement!of!novel!genes!that!have!
arisen!since!they!last!shared!a!common!ancestor,!less!than!100!million!years!ago.!
If!this!pattern!is!true!of!all!the!>1!million!predicted!nematode!species,!then!
‘nematode!protein!space’,!the!portion!of!possible!sequence!structures!actually!
occupied!by!nematode!proteins,!is!likely!to!be!huge.!Our!analyses!suggest!that!
nematode!protein!space!is!huge,!and!that!it!is!likely!that!our!survey!has!merely!
scraped!its!surface.!!(Wasmuth!et!al.!2008,!pp.!11-12)!
!
See!also!Rödelsperger!et!al.!(2013,!p.!1):!“Strikingly,!approximately!one-third!of!
the!genes!in!every!sequenced!nematode!genome!has!no!recognisable!
homologues!outside!their!genus.”!
!
In!their!survey!of!orphan!percentages!within!the!insects,!Wissler!et!al.!(2013)!
found!that!“averaged!over!all!included!insect!and!arthropod!outgroup!species,!
approximately!13%!of!all!genes!lack!a!homologous!protein!in!any!other!species
(2013,!p.!444).!!Given!that!~14,000!ant!species!alone!have!been!described!
worldwide,!the!potential!for!further!TRG!discovery!simply!by!samplingin!the!
Formicidae!(not!to!mention!other!insects)!is!mind-boggling.!!
!
The!pan-genome!concept!developed!for!bacteria!has!also!been!applied!to!
eukaryotes!including!maize!(Morgante!et!al.,!2007),!yeasts!(Dunn!et!al.,!2012)!
and!humans!(Li!et!al.,!2010b).!
!
4.!!The!functional!significance!of!TRGs!!!
!
The!simplest!and!most!common!way!of!gaining!an!indication!of!a!newly!
sequenced!gene’s!function!is!to!compare!it!to!other!known!sequences!whose!
function!has!been!elucidated!in!a!model!organism.!For!TRGs,!this!is!by!definition!
not!an!option,!meaning!that!functional!characterization!must!occur!on!a!case-by-
case!basis.!This!is!expensive!and!time-consuming.!In!general,!recently!discovered!
genes!tend!to!attract!limited!research!attention.!For!example,!a!recent!
bibliographic!analysis!found!that!75%!of!protein!research!was!still!focused!on!
the!10%!of!human!proteins!that!were!known!before!the!human!genome!was!
sequenced!(Edwards!et!al.,!2011).!!
!
4.1!General!evidence!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
8!
Several!lines!of!reasoning!suggest!that!TRGs!are!functional.!(1)!The!fact!that!they!
have!been!annotated!in!genome!sequences!in!the!first!place!is!often!due!to!the!
fact!that!EST!data!align!to!them!and!therefore!they!are!at!least!expressed.!(2)!
Wilson!et!al.!(2007)!have!developed!a!“Quality!Index!for!Predicted!Proteins’’!that!
scores!the!likelihood!that!a!protein!is!functional,!using!non-homology-based!
criteria.!Applying!this!to!TRGs!suggests!that!many!of!them!are!functional!(Wilson!
et!al.,!2007).!(3)!Comparison!of!TRGs!with!different!levels!of!taxonomic!
restriction!has!identified!characteristics!correlated!with!degree!of!taxonomic!
restriction!(Daubin!and!Ochman,!2004,!Wolf!et!al.,!2009),!such!as!gradual!
reduction!in!length!and!GC!content.!This!continuum!of!characteristics!between!
widespread,!functionally!characterized!genes!and!restricted,!little-studied!genes!
has!been!taken!as!evidence!that!the!latter!are!functional!and!not!artefacts!(Wolf!
et!al.,!2009).!(4)!If!TRGs!are!functional,!their!frequency!should!correlate!with!the!
degree!to!which!their!species!is!ecologically!or!taxonomically!removed!from!
other!species!whose!genomes!have!been!sequenced!–!this!seems!to!be!the!case!
(Wilson!et!al.,!2005,!Khalturin!et!al.,!2009);!if!TRGs!were!merely!annotation!
artefacts,!we!would!expect!them!to!be!approximately!equally!common!per!
megabase!in!any!genome.!For!these!four!reasons,!there!seems!to!be!good!reason!
to!expect!that!many!TRGs!do!have!a!function.!Below!we!give!five!examples!from!
contrasting!taxonomic!groups!where!this!has!clearly!been!shown!to!be!the!case.!
!
4.2!Five!examples!of!TRG!function!
!
4.2.1.!Viruses:!Nwgl!in!T4!bacteriophages!
!
Frequencies!of!TRGs!in!viral!genomes!are!higher!than!in!any!other!biological!
entity,!and!it!is!likely!that!many!of!these!sequences!are!functionally!significant!or!
even!essential.!!Ang!and!Georgopoulos!(2012)!note!that!“even!closely!related!
bacteriophages!carry!their!own!set!of!unique!genes!that!most!likely!favor!their!
growth!on!certain!bacterial!hosts”!(p.!989).!!Investigating!the!interaction!of!
bacteriophage!T4!with!its!host,!D5&=#$",!they!focused!on!the!role!of!the!TRG!T4!
Gp39.2,!which!they!renamed!GN@$,!for!“normalizes!weak!GroE!interactions.”!!
GN@$!encodes!a!58!amino!acid!protein!that!suppresses!D5&=#$"!mutations!affecting!
the!bacterium’s!GroEL!chaperone!proteins.!!In!their!model,!the!GN@$!protein!
“shifts!the!equilibrium!of!GroEL!to!the!‘open’!state,”!allowing!the!T4-encoded!co-
chaperone!to!bind!–!thus!enabling!the!complex!to!fold!T4!essential!proteins,!in!
particular,!“the!most!abundant!protein!produced!by!the!bacteriophage…its!major!
capsid!subunit,!Gp23,!whose!correct!folding!depends!entirely!on!the!host!GroEL!
chaperone”!(2012,!996).!!Ang!and!Georgopoulos!determined!(via!deletion!
strains)!that!“the!seemingly!nonessential”!TRG!Gp39.2!/!GN@$!was,!in!fact,!
“essential!for!bacteriophage!growth!on!certain!hosts”!(2012,!995).!In!a!search!of!
nucleotide!sequence!databases,!the!Gp39.2!family!was!only!found!in!T4-like!
bacteriophages!that!can!propagate!on!Enterobacteria!(Ang!and!Georgopoulos,!
2012).!
!
4.2.2!!Archaea:!Topoisomerase!V!in!Methanopyrus3kandlerii3
!
All!organisms!require!topoisomerases!as!essential!molecular!hardware:!these!
DNA!“disentangling”!proteins!change!the!topology!of!the!two!strands!of!the!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
9!
double!helix,!e.g.,!during!replication,!to!prevent!the!supercoiling!that!would!
otherwise!occur.!!While!many!topoisomerases!are!widely!distributed!throughout!
their!phylogenetic!domains,!Topo!V!has!a!unique!fold!and!is!present!as!a!TRG!in!a!
single!archaeon,!the!hyperthermophilic!7*->%(#K.+C,&J%(8$*+".!!The!clear!
functionality!of!Topo!V!can!be!seen!in!the!fact!that!it!has!proven!commercially!
useful,!as!the!crucial!component!of!the!ThermoFidelase!sequencing!kit,!due!to!its!
stability!at!high!temperatures!(Forterre,!2006!p.245).!This!led!one!researcher!to!
muse:!“if!Topo!V!is!such!a!wonderful!enzyme,!why!was!Mother!Nature!so!mean!
as!to!limit!its!presence!to!a!single!archaeal!species?”!(Forterre,!2006!p.246).!!
!
4.2.3.!!Bacteria:!LpoB!in!Escherichia3coli!!
!
D,=>*+"=>"%&=#$",!the!classical!model!system!of!biochemistry,!genetics,!and!
molecular!biology,!yielded!a!treasure!trove!of!functional!data!about!TRGs!in!the!
large-scale,!high-throughput!“phenotypic!analysis”!of!Nichols!et!al.!(2011).!!
Setting!out!to!find!“phenotypes!for!mutants!of!genes!without!functional!
annotation”!–!a!class!of!sequences!in!which!TRGs!are!predominant!–!Nichols!et!al.!
discovered!that!“the!most!responsive!orphans![i.e.!TRGs!with!strong!mutant!
phenotypes]!tended!to!be!narrowly!distributed!among!bacteria”!(2011,!p.11).!!
For!example,!6K#!,!a!gene!whose!product!regulates!peptidoglycan!synthesis,!
critical!for!the!formation!of!the!cell!wall,!is!found!only!in!D5&=#$"!and!its!near!
relatives!(Typas!et!al.,!2010,!1107).!!“An!exciting!explanation,”!argue!Typas!et!al.!
of!this!apparent!contradiction!–!namely,!a!gene!that!is!distributed!narrowly,!yet!
is!also!functionally!important!–!is!that!“such!genes!have!been!recently!acquired!
to!act!as!regulators!of!broadly!conserved!biological!processes,!adding!an!
additional!layer!of!control!that!helps!the!cell!adjust!to!the!specific!needs!of!its!
niche”!(2010,!p.!1108).!
!
4.2.4!!Cnidaria:!the!periculin!family!in!Hydra!
!
Developing!embryos!of!the!freshwater!polyp!O.8+%,!unprotected!in!the!water!
column,!would!appear!to!be!vulnerable!to!pathological!bacterial!colonization.!!
Remarkably,!however,!early!O.8+%!embryos!selectively!incorporate!a!bacterial!
microbiota,!using!potent!antimicrobials!to!regulate!the!abundance!and!type!of!
foreign!cells!admitted;!“the!host!seems!to!be!able!to!select!and!shape!the!
bacterial!community”!(Fraune!et!al.!2010,!18071).!!The!periculin!family!of!TRGs!
(five!genes!in!O.8+%)!encode!short!proteins,!129-158!amino!acids!in!length,!with!
high!bactericidal!activity!against!unwanted!bacterial!species.!Fraune!et!al.!(2010,!
p.!18071),!observe:!!
!
Moreover!embryo-protecting!peptides!of!the!periculin!family!are!specific!for!the!
genus!Hydra!and!are!not!present!in!the!genomes!of!other!animal!taxa.!This!
specificity!may!reflect!habitat-specific!adaptations,!supporting!the!view!that!
taxonomically!restricted!host-defense!molecules!represent!an!extremely!
effective!chemical!warfare!system!that!facilitates!the!disarming!of!taxon-specific!
microbial!attackers.!
!
4.2.5!!Mollusca:!nacre-building!genes!in!bivalves!and!gastropods!!
!!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
10!
A!defining!character!of!the!phylum!Mollusca!is!the!possession!of!a!mechanism!of!
shell!construction!which!first!appears!in!the!late!pre-Cambrian.!!Intuitively,!one!
would!expect!the!molecular!basis!of!this!feature!to!be!homologous!throughout!
the!group.!!Jackson!et!al.!(2010)!analysed!the!genes!and!proteins!implicated!in!
shell!construction!in!the!bivalve!P"(=-%8%&A%Q"A%!and!the!gastropod!O%$"#-",&
%,"("(*.!After!isolating!129!O5&%,"("(%!and!125!P5&A%Q"A%!sequences!likely!to!be!
involved!in!nacre!formation!from!both!species,!“the!majority!were!found!to!be!
unique;!95!(74%)!of!the!O5&%,"("(%secreted!products!and!71!(57%)!of!the!P5&
A%Q"A%!products!shared!no!similarity!with!sequences!in!GenBank!nr!and!EST!
databases”!(2010,!p.!595).!!These!TRG-based!differences!were!so!substantial!that!
Jackson!et!al.!hypothesized!that!“the!molecular!mechanisms!that!guide!the!
deposition!of!the!variants!of!nacre!and!its!derivatives!across!the!Mollusca!are!
fundamentally!different”!(2010,!p.!605).!!They!conclude!(2010,!p.!606):The!
degree!of!gene!novelty!and!differences!between!the!molluscs!analyzed!here!also!
highlights!the!importance!of!the!evolution!of!coding!sequences!to!the!generation!
of!metazoan!morphological!novelty.!In!particular,!the!evolution!and!
diversification!of!novel!RLCD!proteins!is!apparently!a!key!feature!of!molluscan!
shell!evolution”.!
!
These!five!examples!demonstrate!that!TRGs!can!have!biological!functions.!!We!
stress!the!fruitfulness!of!assaying!TRG!functionality,!given!the!excellent!
prospects!in!so!doing!for!fundamental!discovery.!!Analytical!challenges!exist,!of!
course:!searching!for!TRG!function!(in!any!taxon)!requires!describing!the!space!
of!relevant!environmental!or!life-history!conditions,!especially!for!those!groups!
whose!life!histories!go!well!beyond!what!can!be!seen!in!the!laboratory.!!If!the!
TRGs!one!is!assaying!for!possible!functions!“are!important!only!under!specific!
conditions!–!particular!situations!that!are!not!normally!tested!in!the!laboratory!–!
then!we!would!expect!mutation!of!these!genes!to!have!little!or!no!phenotype!in!
general”!(Peña-Castillo!and!Hughes!2007,!p.!11).!!But!given!that!TRGs!have!been!
associated!with!such!conditions!as!sociality!in!the!honey!bee!(Johnson!and!
Tsutsui,!2011),!courtship!behaviours!in!I+#,#K>"$%&(Dai!et!al.,!2008),!!and!limb!
regeneration!in!salamanders!(Garza-Garcia!et!al.,!2010),!against!the!right!
background,!functions!may!well!appear,!and!other!fascinating!findings!doubtless!
await.!!!
!
It!has!sometimes!been!suggested!that!TRGs!are!non-functional!sequences,!or!
annotation!artefacts!(Skovgaard!et!al.,!2001,!Clamp!et!al.,!2007),!but!evidence!is!
increasing!for!their!functionality!and!biological!significance!(see!reviews!
Khalturin!et!al.,!2009,!Tautz!and!Domazet-Lošo,!2011).!That!relatively!few!TRGs!
have!well-documented!functions!is!likely!due!to!lack!of!funding!and!research!
aimed!at!their!functional!characterization,!rather!than!lack!of!actual!function.!In!
short,!we!suspect!that!if!one’s!model!system!or!species!of!study!does!something!
unique!and!interesting,!TRGs!will!be!at!least!partially!responsible,!and!worth!
seeking!out.!Kaessmann!(2010)!notes!the!advances!that!might!be!made!through!
characterisation!of!novel!genes!and!suggests:!“Although!challenging,!newly!
identified!novel!genes!should!be!subjected!to!in-depth!characterizations!of!their!
functional!evolution,!using!evolutionary!analysis!combined!with!large-!and!
small-scale!genomics/transcriptomics,!molecular,!cellular,!and!in!vivo!
experiments”.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
11!
!
!
5.!The!origins!and!evolution!of!TRGs!
!
The!origins!of!TRGs!have!been!termed!“enigmatic”!(Domazet-Lošo!and!Tautz,!
2003),!“baffling!mysteries”!(Doolittle,!2002),!“an!evolutionary!mystery”!
(Merkeev!et!al.,!2006),!“unclear”!(Tautz!and!Domazet-Lošo,!2011)!and!“an!issue!
of!great!complexity!and!almost!completely!uncharted!territory”!(Khalturin!et!al.,!
2009).!Our!inability!to!explain!the!evolution!of!TRGs!has!been!used!as!an!
argument!to!support!the!proposition!that!they!are!non-functional!(Clamp!et!al.,!
2007),!and!may!therefore!have!contributed!to!the!comparative!neglect!of!TRGs!in!
research!(Khalturin!et!al.,!2009)!until!evidence!began!to!accumulate!for!their!
functionality!(see!Section!4!above).!!
!!
5.1!Standard!models!of!novel!gene!evolution!
!
The!evolution!of!TRGs!is!hard!to!explain!because!most!models!of!novel!gene!
evolution!depend!upon!duplication,!reshuffling,!retrotransposition!and/or!
horizontal!transfer!of!pre-existing!coding!regions!(Ohno!1970,!Long!2001,!Long!
et!al.!2003,!Kaessmann!2010).!These!mechanisms!leave!behind!traceable!
putative-progenitor!sequences,!detectable!by!similarity!searches!(see!for!
example!Zhou!et!al.!2008,!Donoghue!et!al.!2011).!If!TRGs!arise!by!such!
mechanisms,!they!must!rapidly!diverge!from!their!progenitor!sequences,!beyond!
the!threshold!of!similarity!searches!(Tautz!and!Domazet-Lošo,!2011,!Zhou!et!al.,!
2008).!This!does!not!fit!easily!with!a!gradual!mutation/selection!mechanism!of!
evolution!(Wright,!1931,!Fisher,!1930),!and!several!recent!papers!have!argued!
that!these!mechanisms!do!not!explain!many!cases!of!TRG!evolution,!and!8*&(#)#!
gene!evolution!is!a!better!explanation!(Neme!and!Tautz,!2013,!Carvunis!et!al.,!
2012,!Ding!et!al.,!2012).!.!
!
5.2!De3novo3gene!evolution!
!
A!mechanism!increasingly!invoked!for!the!origin!of!TRGs!is!the!evolution!of!
genes!from!non-coding!sequence,!sometimes!called!R8*&(#)#”!gene!evolution.!
Some!researchers!cite!this!mechanism!for!TRGs!without!identifying!an!
orthologous!noncoding!region!in!a!close!relative!(Zhou!et!al.,!2008,!Levine!et!al.,!
2006,!Begun!et!al.,!2007,!Toll-Riera!et!al.,!2009);!as!such!“8*&(#)#&gene!evolution”!
is!more!an!observation!of!orphan!gene!existence!than!an!understood!mechanism!
of!gene!origination.!Other!researchers,!such!as!Cardoso-Moreira!and!Long!(2012,!
p.!170)!stipulate!that!“in!order!for!a!new!gene!to!be!classified!as!a!8*&(#)#!gene,!
the!orthologous!noncoding!region!in!the!genome!of!a!close!relative!should!be!
identified.!!This!is!required!to!show!that!indeed!coding!sequence!evolved!from!a!
previously!noncoding!sequence.”!This!is!the!sense!in!which!we!discuss!8*&(#)#!
evolution!below.!It!should!be!noted!that!the!term!8*&(#)#!evolution!is!sometimes!
used!to!describe!a!new!ORF!that!appears!to!have!evolved!by!“overprinting”!in!an!
alternative!reading!frame!of!a!pre-existing!ORF!(Ohno,!1984,!Sabath!et!al.,!2012,!
Li!et!al.,!2010a);!however,!this!mechanism!cannot!directly!result!in!a!TRG!as!
defined!by!homology!searches,!and!authors!of!studies!on!these!do!not!normally!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
12!
use!the!term!orphan!or!TRG!to!describe!the!overprinted!ORF!(e.g.!Sabath!et!al.,!
2012,!Li!et!al.,!2010a),!so!we!will!not!discuss!them!further!here.!
!
Several!studies!have!identified!possible!cases!of!8*&(#)#!gene!evolution!as!
defined!by!Cardoso-Moreira!and!Long,!with!sequences!orthologous!to!orphan!
genes!in!the!non-coding!DNA!of!other!species!(e.g.!Knowles!and!McLysaght,!2009,!
Levine!et!al.,!2006,!Wu!et!al.,!2011,!Zhou!et!al.,!2008).!One!particularly!detailed!
study!shows!a!gene,!P#$8",&in!mouse!with!expression!in!the!testes,!three!exons,!
alternative!splicing,!and!a!knock-out!phenotype,!that!has!orthologous!regions!in!
human!and!rat!that!appear!not!to!be!capable!of!expression!(Heinen!et!al.,!2009).!!
!
There!are!two!difficulties!with!8*&(#)#&gene!evolution!as!defined!by!Cardoso-
Moreira!and!Longabove!as!an!explanation!for!the!origins!of!TRGs.!!Firstly,!it!is!
difficult!to!see!these!8*&(#)#!genes!as!orphans!,*(,C&,-+"=-#,!given!that!orthologs!
–!albeit!apparently!non-functional!orthologs!–!do!exist.!The!presence!of!
orthologous!sequences!in!other!taxa!is!K+"A%&?%="*!difficult!to!reconcile!with!most!
operational!definitions!of!TRGs!and!ORFans,!in!particular,!the!criterion!of!
similarity!threshold!(see!2.2,!above).!A!second!difficulty!lies!in!proving!the!
direction!of!evolution!in!cases!of!8*&(#)#!evolution:!it!could!be!that!the!non-
coding!orthologs!of!the!functional!orphan!genes!are!simply!pseudogenes!which!
were!previously!functional.!Given!that!8*&(#)#!gene!origination!is!unlikely!as!an!
evolutionary!process!because!the!probability!of!a!functional!protein!sequence!
emerging!from!a!random!sequence!is!vanishingly!small!(Jacob,!1977,!Ohno,!
1970),!pseudogenization!may!be!a!more!parsimonious!explanation!for!the!
patterns!seen.!Cardoso-Moreira!and!Long!(2012,!p.!170)!caution!that!“the!
presence!of!a!gene!in!a!genome!and!its!absence![as!a!coding!sequence]!in!the!
genomes!of!close!relatives!does!not!necessarily!imply!that!that!gene!evolved!8*&
(#)#…that!gene!could!have!been!lost!from!all!other!genomes”!(2012,!p.!170).!!
Siepel!(2009,!p.!1694)!argues!that!this!could!be!the!case!even!if!multiple!
pseudogenes!are!found!“the!possibility!that!apparent!gene!births!were!actually!
functional!in!ancestral!genomes!and!were!lost!independently!in!multiple!lineages,!
although!remote!for!these!genes,!cannot!be!completely!discounted.!Mutational!
hotspots!could!lead!to!non-negligible!probabilities!of!parallel!(homoplastic)!
disabling!mutations.”!!
!
Many!investigators!are!understandably!reluctant!to!infer!the!direct!origin!of!
functional!TRGs!from!random!sequences.!!Siepel!(2009,!p.!1694)!lists!some!of!the!
features!likely!to!be!necessary!to!transform!an!(otherwise!non-coding)!
nucleotide!string!into!a!gene!with!a!functional!product:!
!
These!apparent!8*&(#)#!gene!origins!raise!the!question!of!how!evolution!by!
natural!selection!can!produce!functional!genes!from!noncoding!DNA.!While!a!
single!gene!is!not!as!complex!as!a!complete!organ,!such!as!an!eye!or!even!a!
feather,!it!still!has!a!series!of!nontrivial!requirements!for!functionality,!for!
instance,!an!ORF,!an!encoded!protein!that!serves!some!useful!purpose,!a!
promoter!capable!of!initiating!transcription,!and!presence!in!a!region!of!open!
chromatin!structure!that!permits!transcription!to!occur.!How!could!all!of!these!
pieces!fall!into!place!through!the!random!processes!of!mutation,!recombination,!
and!neutral!driftor!at!least!enough!of!these!pieces!to!produce!a!protogene!that!
was!sufficiently!useful!for!selection!to!take!hold?!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
13!
!
Wilson!and!Masel!(2011,!p.!1246)!share!this!skepticism,!citing!additional!
hurdles:!
!
Conversion!from!noncoding!to!coding!seems!too!unlikely!an!event!to!happen!in!a!
single!evolutionary!step.!The!sequence!in!question!must!be!transcribed,!escape!
degradation!at!the!nuclear!exosome,!associate!with!ribosomes,!be!translated,!
and!again!escape!degradation!by!the!proteasome.!Finally,!it!must!avoid!toxic!
conformations!such!as!amyloid,!for!example,!in!favor!of!a!stable!protein!fold.!
!
Armengaud!et!al.!(2011)!note!that!while!origin!from!random!sequence!“cannot!
be!a!priori!rejected,”!the!odds!are!long:!“Since!a!protein!should!fold!in!the!proper!
way!to!give!a!stable!three-dimensional!structure!for!a!correct!function,!obtaining!
a!new!function!from!scratch!is!statistically!highly!improbable”!(2011,!p.!2).!
!!
Carvunis!et!al.!(2012)!have!proposed!a!model!for!8*&(#)#&gene!birth!from!short!
“proto-genes”!that!may!overcome!some!of!these!problems!mentioned!above.!As!
mentioned!in!Section!4.1,!gene!characteristics!such!as!length!and!GC!content!are!
correlated!with!degree!of!taxonomic!restriction!of!annotated!genes!(Daubin!and!
Ochman,!2004,!Wolf!et!al.,!2009,!Lipman!et!al.,!2002).!In!their!detailed!study!of!
14!yeast!species,!Carvunis!et!al.!(2012)!extended!such!observations!to!all!ORFs!
longer!than!30!nucleotides!in!<%==>%+#A.=*,&=*+*)","%*.!They!found!that!the!
majority!of!short,!unannotated!ORFs!in!<5&=*+*)","%*!are!restricted!to!the!species,!
and!hundreds!of!these!ORFs!are!translated!into!proteins!and!may!be!functional.!
Whilst!this!observation!potentially!increases!the!number!of!TRGs!in!<5&=*+*)",%*,!
the!authors!argue!that!it!may!also!provide!a!route!for!TRG!evolution.!They!found!
something!of!a!continuum!from!short,!little!expressed,!unannotated!ORFs!with!
restricted!taxonomic!distribution!through!to!long,!highly!expressed,!well!
annotated!ORFs!with!broad!taxonomic!distribution.!Applying!various!metrics!
relating!to!possible!gene!functions,!they!suggested!that!this!distribution!of!
characters!in!ORFs!represents!an!evolutionary!continuum.!They!present!a!verbal!
model!in!which!short!non-genic!sequences!in!the!genome!mutate!to!become!
short!non-genic!ORFs,!some!of!which!then!acquire!the!ability!to!be!transcribed!
and!become!“protogenes”,!some!of!which!lengthen!to!become!longer,!fully!
functional!genes.!!
!
Currently,!as!Pilcher!(2013)!points!out,!the!plausibility!of!the!Carvunis!et!al.!
(2012)!model!is!partly!dependent!on!one’s!view!of!the!functionality!of!non-genic!
regions!of!genomes.!An!assumption!of!the!model!seems!to!be!that!the!majority!of!
short!non-genic!regions!and!ORFs!that!provide!the!raw!material!for!evolution!are!
lacking!in!function,!or!at!least!have!a!function!that!can!be!dispensed!with!or!
incorporated!as!they!grow!into!genes.!If!widespread!transcription!of!non-genic!
regions,!as!found!in!this!and!other!studies!such!as!(Djebali!et!al.,!2012),!is!mainly!
noise,!then!these!regions!may!be!evolutionarily!labile,!but!if!transcription!of!non-
genic!regions!in!fact!indicates!functionality!then!their!evolution!may!be!
constrained.!Kaessmann!(2010)!argues!that!pervasive!transcription!of!non-genic!
regions!might!make!8*&(#)#!gene!origination!common,!but!notes!“the!regulatory,!
sequence,!and!structural!requirements!for!the!functionality!of!long!noncoding!
RNAs!are!so!far!poorly!understood!and!hence!the!probability!of!such!gene!
formation!events!is!hard!to!predict.”!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
14!
!
It!is!to!be!hoped!that!the!Carvunis!et!al.!(2012)!model!will!continue!to!be!
developed!from!a!general!verbal!model!to!one!in!which!step-by-step!evolution!of!
particular!gene!sequences!is!documented.!One!prediction!that!might!arise!from!
the!model!is!that!although!a!particular!species-specific!gene!may!have!no!
orthologs!in!other!species,!if!it!has!arisen!by!8*&(#)#!gene!evolution!it!may!be!
that!re-sequencing!of!different!populations!within!the!species!will!reveal!
intermediate!short!ORFs!that!have!partial!homology!with!the!longer!gene.!In!
other!words,!if!TRGs!arise!8*&(#)#!by!the!lengthening!of!short!ORFs,!there!may!in!
some!cases!be!traceable!evolutionary!pathways!found!within!the!taxonomic!
range!of!the!TRG.!!A!related!aspect!of!the!Carvunis!et!al.!(2012)!model!that!
deserves!further!attention!is!the!time!that!might!be!taken!for!a!non-genic!ORF!to!
evolve!into!a!fully-fledged!gene:!to!explain!the!occurrence!of!species-specific!
genes,!this!process!has!to!mainly!occur!within!the!time!since!the!divergence!of!
the!closest!sister!species.!We!have!something!of!a!Catch-22!situation!in!that!any!
plausible!model!for!gene!evolution!has!to!be!gradualistic,!but!species-specific!
TRG!occurrence!patterns!do!not!seem!to!allow!much!time!for!evolutionary!
processes!to!occur!in.!!
!
5.3!The!need!for!data-driven!research!
!
The!origins!of!TRGs!continue!to!be!a!mystery,!and!their!existence!seems!to!be!at!
odds!with!many!of!our!hypotheses!about!how!evolution!works.!An!hypothesis-
driven!reaction!to!this!might!be!to!ignore!TRGs.!For!example,!two!pioneering!
studies!of!gene!evolution!in!humans!(Wu!et!al.,!2011,!Knowles!and!McLysaght,!
2009)!excluded!over!200!genes!that!had!no!detectable!orthologs!in!other!
primates,!on!the!assumption!that!their!TRG!status!was!simply!due!to!
incompleteness!of!other!primate!genome!drafts.!Similarly,!in!an!analysis!of!gene!
family!evolution!across!12!Drosophila!genomes,!Hahn!et!al.!(2007)!“found!
23,070!families!that!consisted!of!a!single!gene!and!that!appeared!to!have!evolved!
on!a!terminal!lineage!(i.e.,!they!are!found!in!only!a!single!species).!These!single-
gene!families!were!regarded!as!artifacts!of!the!annotation!process,!and!were!
removed!from!further!analysis.”!Such!approaches!may!have!led!Khalturin!et!al.!
(2009,!Box!1)!to!note!that:!
!
Taxonomists!are!fascinated!when!they!manage!to!identify!a!new!species;!
molecular!biologists,!on!the!contrary,!seem!to!be!rather!bemused!when!
stumbling!on!‘novel’!genes.!
!
An!alternative!approach!is!to!avoid!the!jettisoning!of!data!as!something!that!runs!
the!danger!of!making!our!basic!observations!of!the!natural!world!too!theory-
laden.!A!data-driven!approach!would!treat!TRGs!that!align!to!EST!sequences!as!
unique!functional!genes!until!proven!otherwise.!As!Nichols!et!al.!(2011!p.!147)!
argue,!“evolutionary!conservation!is!not!a!reliable!indicator!of!the!importance!of!
an!orphan!to!the!organism…orphans!may!have!evolved!to!fulfill!an!important!but!
specialized!function!required!by!the!niche!of!the!organism.”!It!is!notable!that!in!
cancer!research,!human!genomics!has!led!to!a!data-first!approach!that!has!
yielded!insights!unanticipated!by!hypothesis-first!approaches!(Golub,!2010).!
Similarly,!genome!sequencing!of!multiple!genomes!across!the!diversity!of!life!is!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
15!
yielding!insights!for!evolution,!which!were!unanticipated!by!current!paradigms!
(e.g.!Koonin,!2009,!Boto,!2010).!!
!
6.!Systematics!of!TRGs!
!
6.1!Phylostratigraphy!
As!we!noted!above,!a!distinct!advantage!(in!terms!of!conceptual!clarity)!is!
afforded!by!the!term!“taxonomically!restricted!gene”!(TRG)!over!“orphan.!This!
is!especially!true!with!respect!to!the!possible!systematic!utility!of!a!coding!
sequence.!!A!gene!found!in!all!Metazoa,!for!example,!but!not!elsewhere,!will!not!
be!useful!(in!terms!of!presence/absence!data)!for!diagnosing!the!genus!
I+#,#K>"$%,!or,!for!that!matter,!the!phylum!Arthropoda!–!but!that!same!gene!will!
pick!out!a!metazoan!from!the!larger!universe!of!organisms!on!Earth.!!Thus,!the!
comparative!analysis!of!the!distribution!of!any!gene!calls!(necessarily)!for!the!
specifying!of!the!taxonomic!category!providing!the!reference!class!(i.e.,!
specifying!the!TRG!criterion!of!taxonomic!category;!see!2.1,!above).!!In!the!case!
of!the!TRG!whose!reference!category!is!“Metazoa,”!the!cladistic!dictum!
“symplesiomorphy!becomes!synapomorphy!at!a!higher!level”!explains!how!
absence!of!systematic!utility!for!one!question!(e.g.,!for!diagnosing!I+#,#K>"$%!or!
Arthropoda,!nested!within!Metazoa)!–!because!the!character!is!distributed!too!
broadly!–!changes!to!usefulness!when!the!question!itself!changes!to!a!broader!
scope:!what!genes!might!diagnose!animals!as!a!taxon?!!“Symplesiomorphic!
similarities!are!obviously!homologous,”!argues!de!Pinna!(1991)!–!for!example,!
any!TRG!found!throughout!the!Metazoa,!but!not!elsewhere!–!“but!every!
symplesiomorphy!is!a!synapomorphy!at!a!higher!level,!and!it!is!the!knowledge!of!
this!that!allows!recognition!of!symplesiomorphies!in!the!first!place.”!!More!
precisely,!
!
Every!hypothesis!of!homology!is!a!hypothesis!of!monophyletic!grouping!and,!in!
any!particular!context,!a!symplesiomorphy!is!a!hypothesis!of!a!set,!and!a!
synapomorphy!is!a!hypothesis!of!a!subset!of!that!set.!!Symplesiomorphy!and!
synapomorphy!are!thus!terms!for!homologies!which!stand!in!hierarchic!relation!
to!each!other.!!(Patterson!1982,!33)!
!
The!project!of!mapping!gene!distributions!of!fully-sequenced!genomes!onto!
taxonomic!(or!phylogenetic)!categories!–!in!effect,!determining!how!gene!
distributions!stand!in!hierarchic!relation!to!each!other!–!has!been!developed!
most!fully!by!Domazet-Lošo!and!Tautz!(2007,!2008,!2010)!in!a!method!they!have!
dubbed!“phylostratigraphy.”!!Consider!I+#,#K>"$%&A*$%(#@%,-*+!within!its!usual!
sequence!of!systematic!ranks:!
!
I+#,#K>"$%&A*$%(#@%,-*+&
!!!Diptera!
!!!!!!Endopterygota!
!!!!!!!!!Insecta!
!!!!!!!!!!!!Pancrustacea!
!!!!!!!!!!!!!!!Arthropoda!
!!!!!!!!!!!!!!!!!!Protostomia!
!!!!!!!!!!!!!!!!!!!!!Bilateria!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
16!
!!!!!!!!!!!!!!!!!!!!!!!!Eumetazoa!
!!!!!!!!!!!!!!!!!!!!!!!!!!!Metazoa!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Holozoa!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Opisthokonta!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Eukaryota!
!
As!one!descends!(or!ascends)!within!this!hierarchy,!more!(or!less)!inclusive!sets!
of!genes!will!be!present,!at!what!Domazet-Lošo!and!Tautz!term!“phylostrata”!
(singular,!phylostratum)!characterized!by!“founder!genes”!–!“the!
phylogenetically!oldest!genes!forming!the!basis!of!a!new!gene!lineage,!new!
protein!domain!or!new!gene!family”!(Tautz!and!Domazet-Lošo!2011,!p.!693).!!
Diagrams!of!the!same!form!can!be!plotted!for!any!species!(although!obviously!
taxonomic!depth!will!be!much!shallower!for!prokaryotic!taxa),!making!
phylostratigraphy!an!excellent!comparative!tool!for!analyzing!TRG!distribution!
patterns.!These!methods!become!successively!more!informative!as!more!
genomes!are!sequenced,!and!sequencing!of!congeneric!species!is!particularly!
consequential!for!our!understanding!of!species-specific!genes.!
!
Figure!3!shows!the!phylostratigraphy!of!the!genome!of!I5&A*$%(#@%,-*+,!with!the!
phylostrata!extending!from!the!species!(on!the!left)!to!the!Eukaryota.!!Relative!
numbers!of!genes!present!at!each!stratum!are!plotted!on!the!vertical!axis.!!Notice!
the!“spike”!in!gene!innovation!at!the!appearance!of!the!genus!Drosophila.!When!
only!the!I5&A*$%(#@%,-*+!genome!had!been!sequenced,!this!spike!appeared!to!be!
at!the!species!level!(Domazet-Lošo!et!al.,!2007),!but!sequencing!of!11!congeneric!
species!pushed!this!back!to!the!genus!level.!Another!analysis!examining!the!
emergence!of!novel!protein!domains!strongly!reinforces!the!signal!of!a!spike!of!
innovation!at!the!origin!of!the!genus!Drosophila.&Once!I5&A*$%(#@%,-*+’s!11!
congeners!were!added,!“the!Drosophila!lineages!see!a!3-fold!increase!in!domain!
emergence,”!relative!to!the!8!other!pancrustacean!species!sequenced!(Moore!and!
Bornberg-Bauer!2011,!p.!4;!see!their!Figure!1,!p.!3).!
!
Perhaps!one!future!application!of!phylostratigraphy!will!be!the!defining!of!
natural!groups!above!the!species!level.!For!example,!the!large!number!of!genes!
unique!to!the!genus!Drosophila!shown!in!Figure!3!might!suggest!that!this!is!a!
genuine!higher!taxonomic!category.!Whether!such!peaks!will!persist!as!
taxonomic!coverage!of!genome!sequences!improves!remains!to!be!seen.!The!
apparent!increase!in!domain!innovation!within!the!genus!Drosophila,!compared!
to!the!other!pancrustaceans,!could!be!an!artifact!of!limited!genomic!sampling!of!
the!other!8!genera.!!All!are!currently!represented!by!a!genome!sequence!of!a!
single!species,!except!for!Anopheles!(two!species,!45&@%AM"%*!and!45&%@.K-").!
!!
6.2!Phylogenetic!reconstruction!
!
Current!molecular!systematics!relies!largely!upon!methods!of!phylogenetic!
reconstruction!based!on!gene!sequences!that!are!both!shared!and!variable!
among!members!of!a!group!of!taxa!being!studied.!With!these!methods,!genes!can!
only!provide!usable!data!to!test!hypotheses!about!relationships!within!the!
taxonomic!level!to!which!they!are!restricted.!When!a!TRG!is!shared!among!
species!(synapomorphic)!it!can!be!useful!in!this!way,!but!when!a!TRG!is!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
17!
restricted!to!a!single!species,!it!is!autoapomorphic!and!no!statements!on!
relationships!are!possible!from!it!(Wägele,!2005!p.!129).!However,!both!
synapomorphic!and!autoapomorphic!TRGs!may!be!useful!as!defining!
(supporting)!characters!for!the!taxon!in!which!they!occur!(Monsch,!2003,!
Wägele,!2005!p.!27).!!
!
Since!genome!sequencing!became!common,!there!has!been!some!exploration!of!
the!use!of!gene!content!data!for!phylogenetic!inference.!For!example,!Snel!et!al.!
(1999)!constructed!a!distance-based!phylogeny!for!13!unicellular!species!based!
on!gene!content,!defining!distance!in!terms!of!number!of!shared!versus!unshared!
genes;!their!results!correlated!with!those!from!16S!rRNA.!Other!methods!have!
used!gene!family!content!methods!of!phylogenetic!reconstruction!exclude!gene!
families!with!single!members,!and!hence!discard!autoapomorphic!TRGs!(e.g.!
Hughes!et!al.,!2005,!Lienau!et!al.,!2006).!!
!
Different!genes!may!have!different!evolutionary!histories!(Doolittle,!1999),!and!
thus!these!trees!are!best!viewed!simply!as!“a!means!to!capture!and!compare!the!
overwhelming!amount!of!information!that!is!present!in!genomes”!(Snel!et!al.,!
2005!p.!193).!It!has!been!argued!that!gene!gain!can!provide!convincing!
characters,!as!the!occurrence!of!homoplasy!is!unlikely!(Boore!and!Fuerstenberg,!
2008)!whereas!convergent!gene!losses!are!likely!and!so!are!less!reliable!
characters.!As!with!the!identification!of!TRGs!in!general,!gene-gains!could!be!
falsely!inferred!if!homologs!are!missed!due!to!rapid!evolution,!gaps!in!draft!
genomes!or!poor!gene-finding!models!(Boore,!2006).!
!
6.3!Supporting!characters!
!
In!2000,!Carl!Woese’s!group!(Graham!et!al.,!2000)!used!newly!completed!
genomes!from!four!major!euryarchaeal!taxa!to!identify!defining!characters!for!
the!Euryarchaeota!in!terms!of!“signature!proteins”!that!were!taxonomically!
restricted!in!that!they!had!no!recognizable!bacterial!or!eukaryal!homologs.!They!
suggested!that!this!could!herald!a!new!approach!to!taxonomy:!
!
This strategy of identifying genes that function uniquely in a lineage can be applied to
any phylogenetically related group of organisms. The comprehensive nature of
genomic analysis brings an unprecedented objectivity to describing cell lineages:
genomics raises taxonomy to a new level. Whereas earlier taxonomies identified and
related organisms, the new taxonomy will elaborate those relationships, allowing the
biologist to see the essential character of a group and (to some extent) the mode of
that group’s evolution.
!
Such!an!approach!may!have!clarified!the!systematic!relationships!of!the!
myxozoans.!Phylogenetic!analyses!of!widely-shared!genes!have!grouped!
myxozoans!variously!as!a!sister!taxon!to!the!Bilateria!or!within!the!Cnidaria,!but!
Holland!et!al.!(2011)!claim!to!have!demonstrated!that!the!latter!placement!is!
correct!due!to!the!discovery!of!a!novel!minicollagen!TRG!(Tb-Ncol-1).!This!
“represents!the!first!example!of!using!a!gene!associated!with!a!phylum-specific!
morphological!novelty!to!infer!placement!within!the!Metazoa”!(Holland!et!al.,!
2011).!!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
18!
However,!inference!of!sister-group!status!based!on!single!TRGs!may!be!
inherently!unreliable.!The!pea!aphid!genome!contains!four!carotenoid!
desaturase!genes!and!three!proteins!consisting!of!fused!carotenoid!cyclase
carotenoid!synthase!enzymes;!these!encode!a!functional!biochemical!pathway,!
but!searching!the!GenBank!protein!database!revealed!no!detectable!homologs!in!
any!other!available!animal!genome!(Moran!and!Jarvik,!2010).!The!genes!
therefore!appeared!to!be!taxonomically!restricted!within!Animalia!to!the!
Aphididae.!However,!homologs!were!found!in!several!fungal!genomes,!where!the!
pattern!of!arrangement!of!these!genes!was!similar!(Moran!and!Jarvik,!2010).!
Since!this!discovery,!homologs!have!also!been!found!in!the!two-spotted!spider!
mite,!found!in!a!different!class!of!Arthropoda!to!the!aphids.!Clearly!these!TRGs!
would!be!highly!misleading!if!used!as!evidence!for!a!sister-group.!
!
Analysis!of!TRGs!can!perhaps!more!reliably!shed!light!on!morphological!
characters!whose!status!has!been!controversial.!The!regeneration!of!limbs!in!
salamanders!has!often!been!considered!a!symplesiomorphic!character,!but!
Garza-Garcia!et!al.!(2010)!provided!evidence!for!its!apomorphy!by!identifying!a!
salamander-specific!protein!(Prod!1)!with!a!central!role!in!limb-regeneration.!
!
6.!Concluding!remarks!
!
Understanding!the!taxonomic!distribution!of!genes!within!the!diversity!of!life!is!
an!ever-growing!task,!lying!at!the!intersection!of!current!genomics!and!
systematics.!Every!gene!is!at!some!level!taxonomically!confined,!except!for!a!
handful!of!genes!involved!in!DNA!replication,!transcription!and!translation!that!
appear!to!be!universal!(Harris!et!al.,!2003).!As!reviewed!here,!a!most!surprising!
aspect!of!our!recently!acquired!knowledge!of!gene!distribution!has!been!the!very!
large!number!of!genes!that!are!confined!to!a!single!genus!or!species.!Another!
closely!related!surprise!has!been!the!frequency!of!genes!that!show!apparently!
homoplasious!patterns!of!taxonomic!restriction;!these!have!not!been!the!focus!of!
this!chapter,!and!are!the!subject!of!a!substantial!body!of!literature!on!lateral!(or!
horizontal)!gene!transfer!(for!reviews!see!Keeling!and!Palmer,!2008,!Boto,!2010,!
Zhaxybayeva!and!Doolittle,!2011).!
!
Those!who!learned!their!phylogenetics!prior!to!the!DNA!sequencing!revolution!
may!still!feel!a!sense!of!frank!awe!at!the!ocean!of!surprising!data!on!which!they!
are!now!able!to!sail.!!Between!1949!and!1955,!Frederick!Sanger!painstakingly!
sequenced!bovine!insulin!–!a!single,!relatively!small!hormone.!!Throughout!the!
1970s!and!80s,!into!the!early!1990s,!molecular!phylogenies!were!constructed!on!
the!basis!of!a!handful!of!ribosomal!RNAs!or!highly!conserved!protein!sequences.!!
Today,!entire!genomes!are!sequenced!in!a!few!days’!time,!and!with!the!
increasing!speed!and!decreasing!cost!of!improving!technology,!the!effort!
required!for!obtaining!whole!genomes!can!be!expected!to!shrink!further.!!!
!
An!illuminating!parallel!can!be!drawn!from!the!history!of!astronomy.!!In!the!
early!decades!of!the!20th!century,!the!dimensions!of!the!entire!physical!universe!
were!thought!by!astronomer!Harlow!Shapley!to!extend!to!~300,000!light!years,!
encompassing!only!our!galaxy,!the!Milky!Way.!!Spiral!“nebulae,”!on!Shapley’s!
view,!lay!within!the!Milky!Way!–!until!a!powerful!new!instrument,!the!100!inch!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
19!
Hooker!reflector!at!Mt.!Wilson,!manned!by!Edwin!Hubble,!showed!the!presence!
of!Cepheid!variable!stars!in!those!nebulae.!!Indeed,!the!nebulae!were!not!nebulae!
(i.e.,!clouds)!at!all,!but!distant!galaxies!–!“island!universes”!of!their!own!(Trimble!
1995).!!The!extent!of!the!physical!universe!was!vastly!greater!than!Shapley!
imagined.!The!instrument!enabled!the!discovery,!and!the!profound!change!of!
theoretical!outlook.!!Similarly,!rapid!and!increasingly!inexpensive!DNA!
sequencing!is!expanding!the!genetic!(and!proteomic)!universe!well!beyond!what!
any!biologist!could!have!imagined,!prior!to!the!mid-1990s.!The!impact!of!these!
data!on!systematics!and!our!knowledge!of!evolution!cannot!be!overstated.!
!
!
!
References!
!
!
Abroi!A,!Gough!J!(2011).!Are!viruses!a!source!of!new!protein!folds!for!organisms?!
–!Virosphere!structure!space!and!evolution.!!"#D,,%.,&33(8):626-35!
Altschul!SF,!Gish!W,!Miller!W,!Myers!EW,!Lipman!DJ!(1990).!Basic!local!alignment!
search!tool.!S#C+(%$&#?&7#$*=C$%+&!"#$#@.!215(3):403-10.!
Altschul!SF,!Madden!TL,!Schäffer!AA,!Zhang!J,!Zhang!Z,!Miller!W,!Lipman!DJ!
(1997).!Gapped!BLAST!and!PSI-BLAST:!a!new!generation!of!protein!
database!search!programs.!GC=$*"=&4="8,&E*,*%+=>!25(17):3389-402.!
Ang!D,!Georgopoulos!C!(2012).!An!ORFan!no!more:!the!bacteriophage!T4!39.2!
gene!product,!NwgI,!modulates!GroEL!chaperone!function.!T*(*-"=,!
190(3):!989-1000.!
Armengaud!J,!Bland!C,!Christie-Oleza!J,!Miotello!G!(2011).!Microbial!
proteogenomics,!gaining!ground!with!the!avalanche!of!genome!
sequences.!!S#C+(%$&#?&&!%=-*+"#$#@.&%(8&&P%+%,"-#$#@.!!S3-001.!
Baumdicker!F,!Hess!WR,!Pfaffelhuber!P!(2010).!The!diversity!of!a!distributed!
genome!in!bacterial!populations.!4((%$,&#?&4KK$"*8&P+#M%M"$"-.!20!
(5):15671606.!
Baumdicker!F,!Hess!WR,!Pfaffelhuber!P!(2012).!The!infinitely!many!genes!model!
for!the!distributed!genome!of!bacteria.!T*(#A*&!"#$#@.&%(8&D)#$C-"#(!4!
(4):!443-456.!
Beiko,!RG!(2011).!Telling!the!whole!story!in!a!10,000-genome!world.!!"#$#@.&
I"+*=-!2011,!6:34!
Begun!DJ,!Lindfors!HA,!Kern!AD,!Jones!CD!(2007).!Evidence!for!8*&(#)#&evolution!
of!testis-expressed!genes!in!the&I+#,#K>"$%&.%JCM%/I+#,#K>"$%&*+*=-%!
clade.!T*(*-"=,!176(2):!1131-1137.!
Bench!SR,!Hanson!TE,!Williamson!KE,!Ghosh!D,!Radosovich!M,!Wang!K/&et!al.!
(2007).!Metagenomic!characterization!of!Chesapeake!Bay!virioplankton.!
4KK$"*8&%(8&D()"+#(A*(-%$&7"=+#M"#$#@.!73(23):!7629-7641.!
Boissy!R,!Ahmed!A,!Janto!B,!Earl!J,!Hall!BG,!Hogg!JS,!Pusch!GD,!Hiller!LN,!Powell!E,!
Hayes!J,!Yu!S,!Kathju!S,!Stoodley!P,!Post!JC,!Ehrlich!GD,!Hu!FZ!(2011).!
Comparative!supragenomic!analyses!among!the!pathogens!
<-%K>.$#=#==C,&%C+*C,,!<-+*K-#=#==C,&K(*CA#("%*,!and!O%*A#K>"$C,&
"(?$C*(U%*!using!a!modification!of!the!finite!supragenome!model.!!79&
T*(#A"=,5!12:187.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
20!
Boore!JL!(2006).!The!use!of!genome-level!characters!for!phylogenetic!
reconstruction.!V+*(8,&"(&D=#$#@.&W&D)#$C-"#(&21(8):!439-446.!
Boore!JL,!Fuerstenberg!SI!(2008).!Beyond!linear!sequence!comparisons:!the!use!
of!genome-level!characters!for!phylogenetic!reconstruction.!P>"$#,#K>"=%$&
V+%(,%=-"#(,&#?&->*&E#.%$&<#="*-.&!X&!"#$#@"=%$&<="*(=*,!363(1496):!1445-
1451.!
Boto!L!(2010).!Horizontal!gene!transfer!in!evolution:!facts!and!challenges.!
P+#=**8"(@,&#?&->*&E#.%$&<#="*-.&!X&!"#$#@"=%$&<="*(=*,!277(1683):!819-
827.!
Boyer!M,!Gimenez!G,!Suzan-Monti!M,!Raoult!D!(2010).!Classification!and!
determination!of!possible!origins!of!ORFans!through!analysis!of!
nucleocytoplasmic!large!DNA!viruses.!Y(-*+)"+#$#@y!53(5):310-20.!
Breitbart,!M!(2012).!!Marine!Viruses:!Truth!or!Dare.!4((C%$&E*)"*N&#?&7%+"(*&
<="*(=*&4:42548.!
Campbell!MA,!Zhu!W,!Jiang!N,!Lin!H,!Ouyang!S,!Childs!KL/&et!al.!(2007).!
Identification!and!Characterization!of!Lineage-Specific!Genes!within!the!
Poaceae.!P$%(-&P>.,"#$#@.!145(4):!1311-1322.!
Cardoso-Moreira!M,!Long!M!(2012).!The!Origin!and!Evolution!of!New!Genes.!
7*->#8,&"(&7#$*=C$%+&!"#$#@.!856:161-86.!
Carvunis!A-R,!Rolland!T,!Wapinski!I,!Calderwood!MA,!Yildirim!MA,!Simonis!N/&et!
al.!(2012).!Proto-genes!and!8*&(#)#!gene!birth.!G%-C+*!487(7407):!370-
374.!
Chan,!CX,!Darling!AE,!Beiko!RG,!Ragan!MA!(2009).!Are!protein!domains!modules!
of!lateral!genetic!transfer?!P6#<&Z(*!4(2):e4524.!
Clamp!M,!Fry!B,!Kamal!M,!Xie!XH,!Cuff!J,!Lin!MF/&et!al.!(2007).!Distinguishing!
protein-coding!and!noncoding!genes!in!the!human!genome.!P+#=**8"(@,&#?&
->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!104(49):!
19428-19433.!
Dai!D,!Chen!Y,!Chen!S,!Mao!Q,!Kennedy!K,!Landback!P/&et!al.!(2008).!The!evolution!
of!courtship!behaviors!through!the!origination!of!a!new!gene!in!
I+#,#K>"$%.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&
<-%-*,&#?&4A*+"=%!105(21):!7478!-!7483.!
Daubin!V,Ochman!H!(2004).!Bacterial!genomes!as!new!gene!homes:!the!
genealogy!of!ORFans!in!E.!coli.!T*(#A*&E*,*%+=>!14:!1036-1042.!
de!Pinna!MGG!(1991).!!Concepts!and!tests!of!homology!in!the!cladistic!paradigm.!
9$%8",-"=,!7:367-394.!!
Ding!Y,!Zhou!Q,Wang!W!(2012).!Origins!of!new!genes!and!evolution!of!their!novel!
functions.!4((C%$&E*)"*N&#?&D=#$#@./&D)#$C-"#(/&%(8&<.,-*A%-"=,!43(1):!
345-363.!
Djebali!S,!Davis!CA,!Merkel!A,!Dobin!A,!Lassmann!T,!Mortazavi!A/&et!al.!(2012).!
Landscape!of!transcription!in!human!cells.!G%-C+*!489(7414):!101-108.!
Domazet-Lošo!T,Tautz!D!(2003).!An!evolutionary!analysis!of!orphan!genes!in!
Drosophila.!T*(#A*&E*,*%+=>!13(10):!2213!-!2219.!
Domazet-Lošo!T,!Tautz!D!(2007).!A!phylostratigraphy!approach!to!uncover!the!
genomic!history!of!major!adaptations!in!metazoan!lineages.!V+*(8,&"(&
T*(*-"=,&23(11):533-9.!
Domazet-Lošo!T,!Tautz!D!(2008).!An!ancient!evolutionary!origin!of!genes!
associated!with!human!genetic!diseases.!7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!
25(12):2699-707.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
21!
Domazet-Lošo!T,!Tautz!D!(2010).!A!phylogenetically!based!transcriptome!age!
index!mirrors!ontogenetic!divergence!patterns.!G%-C+*!468(7325):815-8.!
Donoghue!M,!Keshavaiah!C,!Swamidatta!S,!Spillane!C!(2011).!Evolutionary!
origins!of!Brassicaceae!specific!genes!in!Arabidopsis!thaliana.!!79&
D)#$C-"#(%+.&!"#$#@.!11(1):!47.!
Doolittle!RF!(1997).!A!bug!with!excess!gastric!activity.!G%-C+*!388:!515-516.!
Doolittle!RF!(2002).!Biodiversity:!Microbial!genomes!multiply.!G%-C+*!
416(6882):!697-700.!
Doolittle!W!(1999).!Phylogenetic!classification!and!the!universal!tree.!<="*(=*!
284:!2124!-!2129.!!
Dujon!B!(1996).!The!yeast!genome!project:!what!did!we!learn?!V+*(8,&"(&T*(*-"=,!
12(7):!263-270.!
Dunn!B,!Richter!C,!Kvitek!DJ,!Pugh!T,!Sherlock!G!(2012).!Analysis!of!the!
<%==>%+#A.=*,&=*+*)","%*!pan-genome!reveals!a!pool!of!copy!number!
variants!distributed!in!diverse!yeast!strains!from!differing!industrial!
environments.!T*(#A*&E*,*%+=>!22(5):!908-924.!
Edwards!AM,!Isserlin!R,!Bader!GD,!Frye!SV,!Willson!TM,!Yu!FH!(2011).!Too!many!
roads!not!taken.!G%-C+*!470(7333):!163-165.!
Edwards!RA,!Rohwer!F!(2005).!Viral!metagenomics.!G%-C+*&E*)"*N,&7"=+#M"#$#@.!
3(6):!504-510.!
Eisen,!JA!(1998).!Phylogenomics:!improving!functional!predictions!for!
uncharacterized!genes!by!evolutionary!analysis.!T*(#A*&E*,*%+=>!8:163-
167.!
Extavour,!CG!(2011).!Long-Lost!Relative!Claims!Orphan!Gene:!#,J%+!in!a!Wasp.!
P6#<&T*(*-"=,!7(4):!e1002045.!!
Fischer!D,!Eisenberg,!D!(1999).!Finding!families!for!genomic!ORFans!
!"#"(?#+A%-"=,!15!(9):!759-762.!
Fisher!RA!(1930).!The!Genetical!Theory!of!Natural!Selection.!Oxford!University!
Press:!Oxford.!
Forterre!P!(2006).!DNA!topoisomerase!V:!a!new!fold!of!mysterious!origin.!V+*(8,&
"(&!"#-*=>(#$#@.!24(6):!245-247.!
Forterre!P,Prangishvili!D!(2009).!The!origin!of!viruses.!E*,*%+=>&"(&7"=+#M"#$#@.!
160(7):!466-472.!!
Fraune!S,!Augustin!R,!Anton-Erxleben!F,!Wittlieb!J,!Gelhaus!C,!Klimovich!VB,!
Samoilovich!MP,!Bosch!TCG!(2010)!In!an!early!branching!metazoan,!
bacterial!colonization!of!the!embryo!is!controlled!by!maternal!
antimicrobial!peptides.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&
->*&'("-*8&<-%-*,&#?&4A*+"=%!107(42):!18067-18072!
Garza-Garcia!AA,!Driscoll!PC,!Brockes!JP!(2010).!Evidence!for!the!local!evolution!
of!mechanisms!underlying!limb!regeneration!in!salamanders.!Y(-*@+%-")*&
%(8&9#AK%+%-")*&!"#$#@.!50(4):!528-535.!
Golub!T!(2010).!Counterpoint:!Data!first.!G%-C+*!464(7289):!679.!
Graham!DE,!Overbeek!R,!Olsen!GJ,!Woese!CR!(2000).!An!archaeal!genomic!
signature.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&
<-%-*,&#?&4A*+"=%!97(7):!3304-3308.!
Hahn!MW,!Han!MV,!Han!SG!(2007).!Gene!family!evolution!across!12!Drosophila!
genomes.!P6#<&T*(*-"=,!3(11):!e197.!
Harris!JK,!Kelley!ST,!Spiegelman!GB,!Pace!NR!(2003).!The!Genetic!Core!of!the!
Universal!Ancestor.!T*(#A*&E*,*%+=>!13(3):!407-412.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
22!
Heinen!TJAJ,!Staubach!F,!Häming!D,!Tautz!D!(2009).!Emergence!of!a!new!gene!
from!an!intergenic!region.!9C++*(-&!"#$#@.!19(18):!1527-1531.!
Hillis,!DM!(1994).!Homology!in!molecular!biology.!In!Hall,!BK!(ed.)!Homology:!
the!hierarchical!basis!of!comparative!biology.!Academic!Press,!San!Diego,!
CA,!pp.!339-368.!
Holland!JW,!Okamura!B,!Hartikainen!H,!Secombes!CJ!(2011).!A!novel!
minicollagen!gene!links!cnidarians!and!myxozoans.!P+#=**8"(@,&#?&->*&
E#.%$&<#="*-.&!X&!"#$#@"=%$&<="*(=*,!278(1705):!546-553.!
Hughes!AL,!Ekollu!V,!Friedman!R,!Rose!JR!(2005).!Gene!family!content-based!
phylogeny!of!prokaryotes:!The!effect!of!criteria!for!inferring!homology.!
<.,-*A%-"=&!"#$#@.!54(2):!268-276.!
Jackson!DJ,!McDougall!C,!Woodcroft!B,!Moase!P,!Rose!RA,!Kube!M/&et!al.!(2010).!
Parallel!Evolution!of!Nacre!Building!Gene!Sets!in!Molluscs.!7#$*=C$%+&
!"#$#@.&%(8&D)#$C-"#(!27(3):!591-608.!
Jacob!F!(1977).!Evolution!and!tinkering.!<="*(=*!196(4295):!1161-1166.!
Johnson!B,!Tsutsui!N!(2011).!Taxonomically!restricted!genes!are!associated!with!
the!evolution!of!sociality!in!the!honey!bee.!!79&T*(#A"=,!12(1):!164.!
Kaessmann!H!(2010).!Origins,!evolution,!and!phenotypic!impact!of!new!genes.!
T*(#A*&E*,*%+=>!20(10):!1313-1326.!
Keeling!PJ,!Palmer!JD!(2008).!Horizontal!gene!transfer!in!eukaryotic!evolution.!
G%-C+*&E*)"*N,&T*(*-"=,!9(8):!605-618.!
Kessler!MM,!Zeng!Q,!Hogan!S,!Cook!R,!Morales!AJ,!Cottarel!G!(2003).!Systematic!
discovery!of!new!genes!in!the&<%==>%+#A.=*,&=*+*)","%*!genome.!T*(#A*&
E*,*%+=>!13(2):!264-271.!
Khalturin!K,!Hemmrich!G,!Fraune!S,!Augustin!R,!Bosch!T!(2009).!More!than!just!
orphans:!are!taxonomically-restricted!genes!important!in!evolution?!
V+*(8,&"(&T*(*-"=,!25(9):!404!-!413.!
Knowles!DG,!McLysaght!A!(2009).!Recent!8*&(#)#!origin!of!human!protein-
coding!genes.!T*(#A*&E*,*%+=>!19(10):!1752-1759.!
Koonin!EV!(2009).!Darwinian!evolution!in!the!light!of!genomics.!GC=$*"=&4="8,&
E*,*%+=>!37(4):!1011-1034.!
Koonin,!EV!(2011).!The!Logic!of!Chance:!The!Nature!and!Origin!of!Biological!
Evolution.!FT!Press!Science:!Upper!Saddle!River,!NJ.!
Koonin!EV,!Wolf!YI!(2008).!Genomics!of!bacteria!and!archaea:!the!emerging!
dynamic!view!of!the!prokaryotic!world.!GC=$*"=&4="8,&E*,*%+=>!36(21):!
6688-6719.!
Koski!LB,!Golding!GB!(2001).!The!closest!BLAST!hit!is!often!not!the!nearest!
neighbor.!S#C+(%$&#?&7#$*=C$%+&D)#$C-"#(!52(6):!540-542.!
Lapierre!P,!Gogarten!JP!(2009).!Estimating!the!size!of!the!bacterial!pan-genome.!
V+*(8,&"(&T*(*-"=,!25(3):107-10.!
Lefébure!T,!Bitar!PDP,!Suzuki!H,!and!Stanhope!MJ!(2010).!Evolutionary!Dynamics!
of!Complete!9%AK.$#M%=-*+!Pan-Genomes!and!the!Bacterial!Species!
Concept.!T*(#A*&!"#$#@.&%(8&D)#$C-"#(!2:646655.!
Levine!MT,!Jones!CD,!Kern!AD,!Lindfors!HA,!Begun!DJ!(2006).!Novel!genes!
derived!from!noncoding!DNA!in!Drosophila!melanogaster!are!frequently!
X-linked!and!exhibit!testis-biased!expression.!P+#=**8"(@,&#?&->*&G%-"#(%$&
4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!103(26):!9935-9939.!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
23!
Li!D,!Dong!Y,!Jiang!Y,!Jiang!H,!Cai!J,!Wang!W!(2010a).!A!de!novo!originated!gene!
depresses!budding!yeast!mating!pathway!and!is!repressed!by!the!protein!
encoded!by!its!antisense!strand.!9*$$&E*,*%+=>!20(4):!408-420.!
Li!R,!Li!Y,!Zheng!H,!Luo!R,!Zhu!H,!Li!Q/&et!al.!(2010b).!Building!the!sequence!map!
of!the!human!pan-genome.!G%-C+*&!"#-*=>(#$#@.!28(1):!57-63.!
Lienau!EK,!DeSalle!R,!Rosenfeld!JA,!Planet!PJ!(2006).!Reciprocal!illumination!in!
the!gene!content!tree!of!life.!<.,-*A%-"=&!"#$#@.!55(3):!441-453.!!
Lipman!D,!Souvorov!A,!Koonin!E,!Panchenko!A,!Tatusova!T!(2002).!The!
relationship!of!protein!conservation!and!sequence!length.!!79&
D)#$C-"#(%+.&!"#$#@.!2(1):!20.!
Long!M!(2001).!Evolution!of!novel!genes.!9C++*(-&ZK"("#(&"(&T*(*-"=,&W&
I*)*$#KA*(-!11(6):!673-680.!
Long!M,!Betran!E,!Thornton!K,!Wang!W!(2003).!The!origin!of!new!genes:!
glimpses!from!the!young!and!old.!G%-C+*&E*)"*N,&T*(*-"=,!4(11):!865-875.!
Lynch!JA,!Özüak!O,!Khila!A,!Abouheif!E,!Desplan!C,!Roth!S!(2011).!The!
phylogenetic!origin!of!#,J%+!coincided!with!the!origin!of!maternally!
provisioned!germ!plasm!and!pole!cells!at!the!base!of!the!Holometabola.!
P6#<&T*(*-"=,&7(4):!e1002029.!
Merkeev!I,!Novichkov!P,Mironov!A!(2006).!PHOG:!a!database!of!supergenomes!
built!from!proteome!complements.!!79&D)#$C-"#(%+.&!"#$#@.!6(1):!52.!
Mira!A,!Martín-Cuadrado,!AB,!D’Auria!G,!Rodríguez-Valera!F!(2010).!The!
bacterial!pan-genome:!a!new!paradigm!in!microbiology.!Y(-*+(%-"#(%$&
7"=+#M"#$#@.&13:45-57.!
Monsch!KA!(2003).!The!use!of!apomorphies!in!taxonomic!defining.!V%Q#(!52(1):!
105-107.!
Moore!AD,!Bornberg-Bauer!E!(2012).!The!dynamics!and!evolutionary!potential!
of!domain!loss!and!emergence.!7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!29!(2):!
787-796.!
Moran!NA,Jarvik!T!(2010).!Lateral!transfer!of!genes!from!fungi!underlies!
carotenoid!production!in!aphids.!<="*(=*!328(5978):!624-627.!!
Morgante!M,!De!Paoli!E,Radovic!S!(2007).!Transposable!elements!and!the!plant!
pan-genomes.!9C++*(-&ZK"("#(&"(&P$%(-&!"#$#@.!10(2):!149-155.!
Neme!R,Tautz!D!(2013).!Phylogenetic!patterns!of!emergence!of!new!genes!
support!a!model!of!frequent!de!novo!evolution.!!79&T*(#A"=,!14(1):!117.!
Narra!HP,!Cordes!MHJ,!Ochman!H!(2008).!Structural!features!and!the!persistence!
of!acquired!proteins.!P+#-*#A"=,!8:1-10.!
Nichols!RJ,!Sen!S,!Choo!YJ,!Beltrao!P,!Zietek!M,!Chaba!R/&et!al.!(2011).!Phenotypic!
landscape!of!a!bacterial!cell.!9*$$!144(1):!143-156.!
Ohno!S!(1970).!Evolution!by!gene!duplication.!Springer-Verlag:!New!York.!
Ohno!S!(1984).!Birth!of!a!unique!enzyme!from!an!alternative!reading!frame!of!
the!preexisted,!internally!repetitious!coding!sequence.!P+#=**8"(@,&#?&->*&
G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!81(8):!2421-
2425.!
Patterson!C!(1982).!Morphological!characters!and!homology.!In!K.A.!Joysey!and!
A.E.!Friday!(eds.),!Problems!of!Phylogenetic!Reconstruction!(Academic!
Press:!Longdon).!
Patterson!C!(1988).!Homology!in!classical!and!molecular!biology.!7#$*=C$%+&
!"#$#@.&%(8&D)#$C-"#(!5:603-625.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
24!
Pena-Castillo!L,Hughes!TR!(2007).!Why!are!there!still!over!1000!uncharacterized!
yeast!genes?!T*(*-"=,!176(1):!7-14.!
Pilcher!H!(2013).!All!alone.!G*N&<="*(-",-!217(2900):!40-43.!
Prangishvili!D,!Garrett!RA,Koonin!EV!(2006).!Evolutionary!genomics!of!archaeal!
viruses:!unique!viral!genomes!in!the!third!domain!of!life.!["+C,&E*,*%+=>!
117(1):!52-67.!
Rasko!DA,!Rosovitz!MJ,!Myers!GS,!Mongodin!EF,!Fricke!WF,!Gajer!P,!Crabtree!J,!
Sebaihia!M,!Thomson!NR,!Chaudhuri!R,!Henderson!IR,!Sperandio!V,!Ravel!
J!(2008).!The!pangenome!structure!of!D,=>*+"=>"%&=#$":!comparative!
genomic!analysis!of!D5&=#$"&commensal!and!pathogenic!isolates.!S#C+(%$&#?&
!%=-*+"#$#@.5!190(20):6881-93.!!
Reeck!GR,!de!Haën!C,!Teller!DC,!Doolittle!RF,!Fitch!WM,!Dickerson!RE,!Chambon!
P,!McLachlan!AD,!Margoliash!E,!Jukes!TH!(1987)!"Homology"!in!proteins!
and!nucleic!acids:!a!terminology!muddle!and!a!way!out!of!it.!9*$$!50!(5):!
667!!
delsperger!C,!Streit!A,!Sommer!RJ!(2013)!Structure,!function!and!evolution!of!
the!nematode!genome.!In:!eLS.!John!Wiley!&!Sons,!Ltd:!Chichester.!!
Rost!B!(1999).!Twilight!zone!of!protein!sequence!alignments.!P+#-*"(&
D(@"(**+"(@!12(2):!85-94.!
Rutter!MT,!Cross!KV,!Van!Woert!PA!(2012).!Birth,!death!and!subfunctionalization!
in!the!Arabidopsis!genome.!V+*(8,&"(&P$%(-&<="*(=*!17(4):!204-212!
Sabath!N,!Wagner!A,!Karlin!D!(2012).!Evolution!of!viral!proteins!originated!de!
novo!by!overprinting.!7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!29(12):!3767-3780.!
Shapiro!J!(2011).!Evolution:!A!View!from!the!21st!Century.!FT!Press!Science:!
Upper!Saddle!River,!NJ.!
Siepel!A!(2009).!Darwinian!alchemy:!Human!genes!from!noncoding!DNA.!
T*(#A*&E*,*%+=>&19:1693-95.!
Skovgaard!M,!Jensen!LJ,!Brunak!Sr,!Ussery!D,!Krogh!A!(2001).!On!the!total!
number!of!genes!and!their!length!distribution!in!complete!microbial!
genomes.!V+*(8,&"(&T*(*-"=,!17(8):!425-428.!
Siew!N,!Fischer!D!(2003).!Unraveling!the!ORFan!puzzle.!9#AK%+%-")*&%(8&
\C(=-"#(%$&T*(#A"=,&4!(4):432-441.!
Snel!B,!Bork!P,!Huynen!MA!(1999).!Genome!phylogeny!based!on!gene!content.!
G%-C+*&T*(*-"=,!21(1):!108-110.!
Snel!B,!Huynen!MA,!Dutilh!BE!(2005).!Genome!trees!and!the!nature!of!genome!
evolution.!4((C%$&E*)"*N&#?&7"=+#M"#$#@.!59(1):!191-209.!!
Sonea!S,!Panisset!M!(1980).!Introduction!à!la!nouvelle!bactériologie.!Les!Presses!
de!l'Université!de!Montréal:!Boston,!MA.!
Tautz!D,!Domazet-Lošo!T!(2011).!The!evolutionary!origin!of!orphan!genes.!
G%-C+*&E*)"*N,&T*(*-"=,!12(10):!692-702.!
Tettelin!H,!Masignani!V,!Cieslewicz!MJ!et!al.!(2005).!Genome!analysis!of!multiple!
pathogenic!isolates!of!<-+*K-#=#==C,&%@%$%=-"%*:!Implications!for!the!
microbial!‘‘pan-genome’’.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&
#?&->*&'("-*8&<-%-*,&#?&4A*+"=%&102!(39): 1395013955.!
Tettelin!H,!Riley!D,!Cattuto!C,!Medini,!D!(2008).!Comparative!genomics:!the!
bacterial!pan-genome.!9C++*(-&ZK"("#(&"(&7"=+#M"#$#@.!12:472477.!!
Toll-Riera!M,!Bosch!N,!Bellora!N,!Castelo!R,!Armengol!L,!Estivill!X/&*-&%$5!(2009).!
Origin!of!primate!orphan!genes:!a!comparative!genomics!qpproach.!
7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!26(3):!603!-!612.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
25!
Touchon!M,!Hoede!C,!Tenaillon!O!et!al.!(2009).!Organised!genome!dynamics!in!
the!D,=>*+"=>"%&=#$"&species!results!in!highly!diverse!adaptive!paths.!P6#<&
T*(*-"=,&5(1):e1000344.!
Trimble!V!(1995).!The!1920!Shapley-Curtis!discussion:!Background,!issues,!and!
aftermath.!PCM$"=%-"#(,&#?&->*&4,-+#(#A"=%$&<#="*-.&#?&->*&P%="?"=!107:1133-44.!
Typas!A,!Banzhaf!M,!van!den!Berg!van!Saparoea!B,!Verheul!J,!Biboy!J,!Nichols!RJ/&
*-&%$5!(2010).!Regulation!of!Peptidoglycan!Synthesis!by!Outer-Membrane!
Proteins.!9*$$!143(7):!1097-1109.!
Wägele!J-W!(2005).!Foundations!of!Phylogenetic!Systematics.!Pfeil-Verlag:!
Munich.!
Wang!X,!Wang!H,!Wang!J!et!al.!(2011)!The!genome!of!the!mesopolyploid!crop!
species!!+%,,"=%&+%K%5&G%-C+*&T*(*-"=,&43:1035-1039!
Wasmuth!J,!Schmid!R,!Hedley!A,!Blaxter!M!(2008).!On!the!extent!and!origins!of!
genic!novelty!in!the!phylum!Nematoda.!P6#<&G*@$*=-*8&V+#K"=%$&I",*%,*,&2!
(7):e258.!
Wilson!BA,!Masel!J!(2011).!Putatively!noncoding!transcripts!show!extensive!
association!with!ribosomes.!T*(#A*&!"#$#@.&%(8&D)#$C-"#(!3:1245-1252.!!
Wilson!GA,!Bertrand!N,!Patel!Y,!Hughes!JB,!Feil!EJ,!Field!D.!(2005).!Orphans!as!
taxonomically!restricted!and!ecologically!important!genes.!7"=+#M"#$#@.!
151!(8):2499-2501.!
Wilson!GA,!Feil!EJ,!Lilley!AK,!Field!D!(2007).!Large-scale!comparative!genomic!
ranking!of!taxonomically!restricted!genes!(TRGs)!in!bacterial!and!
archaeal!genomes.!P6#<&ZGD!2(3):!e324.!
Wissler!L,!Gadau!J,!Simola!DF,!Helmkampf!M!&!Bornberg-Bauer!E!(2013)!
Mechanisms!and!dynamics!of!orphan!gene!emergence!in!insect!genomes!
T*(#A*&!"#$#@.&W&D)#$C-"#(!5!(2):!439-455!
Wolf!YI,!Novichkov!PS,!Karev!GP,!Koonin!EV,!Lipman!DJ!(2009).!The!universal!
distribution!of!evolutionary!rates!of!genes!and!distinct!characteristics!of!
eukaryotic!genes!of!different!apparent!ages.!P+#=**8"(@,&#?&->*&G%-"#(%$&
4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!106(18):!7273-7280.!
Wright!S!(1931).!Evolution!in!Mendelian!populations.!T*(*-"=,!16:!97-159.!
Wu!D-D,!Irwin!DM,!Zhang!Y-P!(2011).!I*&(#)#!origin!of!human!protein-coding!
genes.!P6#<&T*(*-!7(11):!e1002379.!
Zhaxybayeva!O,!Doolittle!W!(2011).!Lateral!gene!transfer.!9C++*(-&!"#$#@.!21(7):!
R242-246.!
Zuckerkandl!E,!Pauling!L!(1965)!Molecules!as!documents!of!evolutionary!history!
S#C+(%$&#?&V>*#+*-"=%$&!"#$#@.!8(2):!357366!
Zhou!Q,!Zhang!G,!Zhang!Y,!Xu!S,!Zhao!R,!Zhan!Z,!et!al.!(2008).!On!the!origin!of!new!
genes!in!Drosophila.!T*(#A*&E*,*%+=>!18(9):!1446-1455.!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
26!
!!
Figures!
!
Figure!1.!Chart!showing!accumulation!of!proteins!annotated!in!sequenced!
genomes;!orphans!are!defined!as!proteins!with!no!detectable!homologs!at!
a!BLAST!threshold!of!1!x!10-10.!Redrawn!from!Beiko!(2011)!!"#$#@.&I"+*=-&
6:34,!Figure!2,!with!permission!from!the!author.!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
27!
Figure!2.!Venn!diagram!showing!number!of!unique!and!shared!gene!families!
between!and!among!four!plant!species!genome!sequences.!Two!of!the!
species!are!in!the!same!family!and!three!are!in!the!same!order.!Redrawn!
from!Wang!et!al!(2011).!Reprinted!by!permission!from!Macmillan!
Publishers!Ltd:!G%-C+*&T*(*-"=,!43:1035-1039,!copyright!2011!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
28!
Figure!3.!Phylostratigraphy!for!I+#,#K>"$%&A*$%(#@%,-*+!showing!number!of!
genes!restricted!to!each!taxonomic!level.!Figure!adapted!from!Figure!4b!
of!Tautz!and!Domazet-Lošo!(2011).!Genes!shared!with!all!cellular!life!are!
not!shown.!Reprinted!by!permission!from!Macmillan!Publishers!Ltd:!
G%-C+*&E*)"*N,&T*(*-"=,!12:692-702,!copyright!2011!
!
!
!
!
... Yeates et al. (2016) for example, state "We predict that insect phylogenomic analyses will become much more sophisticated, and produce more reliable results, in the near future". It appears however, that nature is more complex than the sequences might suggest and there are obvious and often confusing complications (Nelson & Buggs, 2016). Although the idea that nested sets of mutations has merit (if properly polarized) (Sperling & Roe, 2009;Kück & Wägele, 2016), this is not the process that is currently implemented by molecular systematists (more on this below). ...
... Indeed, SUMMARY: Molecular sequence analyses have been volatile, producing inconsistent phylogenetic results over the years. the use of the terms "synapomorphy" and "plesiomorphy" are virtually absent from the literature on molecular phylogenies (but see Nelson & Buggs, 2016). Kück & Wägele (2016) have recently shown how nearly all published molecular sequence analyses are fundamentally flawed because of a lack of consideration of plesiomorphic/apomorphic states. ...
... In spite of the present inability of sequence data to give consistent results, there remains a rich opportunity to use the genes themselves as evidence for phylogenetic relationships (Nelson & Buggs, 2016). For example, primates have non-functional genes for generating vitamin C, genes that are functional in other mammals (Drouin et al., 2011;Mukherjee, 2016). ...
Article
The order Diptera is remarkably diverse, not only in species but in morphological variation in every life stage, making them excellent candidates for phylogenetic analysis. Such analysis has been hampered by methods that have severely restricted character state interpretation. Morphological-based phylogenies should be based on a deep understanding of the morphology, development and function of character states, and have extensive outgroup comparisons made to determine their polarity. Character states clearly vary in their value for determining phylogenetic relationships and this needs to be studied and utilized. Characters themselves need more explicit discussion, including how some may be developmentally or functionally related to other characters (and potentially not independent indicators of genealogical relationship). The current practice by many, of filling a matrix with poorly understood character states and highly limited outgroup comparisons, is unacceptable if the results are to be a valid reflection of the actual history of the group.Parsimony analysis is not an objective interpretation of phylogenetic relationships when all characters are treated as equal in value. Exact mathematical values applied to characters are entirely arbitrary and are generally used to produce a phylogeny that the author considers as reasonable. Mathematical appraisal of a given node is similarly inconsequential because characters do not have an intrinsic mathematical value. Bremer support, for example, provides values that have no biological reality but provide the pretence of objectivity. Cladists need to focus their attention on testing the validity of each synapomorphy proposed, as the basis for all further phylogenetic interpretation, rather than the testing of differing phylogenies through various comparative programs.Current phylogenetic analyses have come to increasingly depend on DNA sequence-based characters, in spite of their tumultuous history of inconsistent results. Until such time as sequences can be shown to produce predictive phylogenies (i.e., using Hennigian logic), independent of morphological analysis, they should be viewed with caution and certainly not as a panacea as they are commonly portrayed.The purported comprehensive analyses of phylogenetic relationships between families of Diptera by Wiegmann et al. (2011) and Lambkin et al. (2013) have serious flaws and cannot be considered as the “Periodic Table” of such relationships as originally heralded.Systematists working on Diptera have a plethora of complex and informative morphological synapomorphies in every life stage, either described or awaiting study. Many lineages have the potential of providing a wealth of evolutionary stories to share with other biologists if we produce stable phylogenies based on weighted synapomorphies and interpreted to elucidate the zoogeographic and bionomic divergence of the group. Some lineages are devoid of convincing synapomorphies and, in spite of our desires, should be recognized as being largely uninterpretable.
... Moreover, increased representation of species through the 50 Helminth Genomes Project [160] means that we can make more comprehensive investigations of the diversity and interrelationships of developmentally related genes to determine their degree of taxonomic restriction (i.e. 'orphan-ness') [161,162] and to identify where direct orthologs exist between free-living and parasitic species. ...
Article
Full-text available
Background Tapeworms are agents of neglected tropical diseases responsible for significant health problems and economic loss. They also exhibit adaptations to a parasitic lifestyle that confound comparisons of their development with other animals. Identifying the genetic factors regulating their complex ontogeny is essential to understanding unique aspects of their biology and for advancing novel therapeutics. Here we use RNA sequencing to identify up-regulated signalling components, transcription factors and post-transcriptional/translational regulators (genes of interest, GOI) in the transcriptomes of Larvae and different regions of segmented worms in the tapeworm Hymenolepis microstoma and combine this with spatial gene expression analyses of a selection of genes. Results RNA-seq reads collectively mapped to 90% of the > 12,000 gene models in the H. microstoma v.2 genome assembly, demonstrating that the transcriptome profiles captured a high percentage of predicted genes. Contrasts made between the transcriptomes of Larvae and whole, adult worms, and between the Scolex-Neck, mature strobila and gravid strobila, resulted in 4.5–30% of the genes determined to be differentially expressed. Among these, we identified 190 unique GOI up-regulated in one or more contrasts, including a large range of zinc finger, homeobox and other transcription factors, components of Wnt, Notch, Hedgehog and TGF-β/BMP signalling, and post-transcriptional regulators (e.g. Boule, Pumilio). Heatmap clusterings based on overall expression and on select groups of genes representing ‘signals’ and ‘switches’ showed that expression in the Scolex-Neck region is more similar to that of Larvae than to the mature or gravid regions of the adult worm, which was further reflected in large overlap of up-regulated GOI. Conclusions Spatial expression analyses in Larvae and adult worms corroborated inferences made from quantitative RNA-seq data and in most cases indicated consistency with canonical roles of the genes in other animals, including free-living flatworms. Recapitulation of developmental factors up-regulated during larval metamorphosis suggests that strobilar growth involves many of the same underlying gene regulatory networks despite the significant disparity in developmental outcomes. The majority of genes identified were investigated in tapeworms for the first time, setting the stage for advancing our understanding of developmental genetics in an important group of flatworm parasites. Electronic supplementary material The online version of this article (10.1186/s13227-018-0110-5) contains supplementary material, which is available to authorized users.
Chapter
Taxonomically restricted genes, or TRGs, are specific to a particular taxon that can be found only in the genomes of single species or are represented as orthologs in closely related genera. Despite being regarded with a mixture of skepticism and awe by the scientific community, progress has been gradually attained in the understanding of their presumed origin and function in most, if not all, forms of life. Grain amaranth is not an exception, as shown by the numerous unknown function TRGs that were unveiled by a recent transcriptomic analysis undergone under different (a)biotic stress conditions. True to their nature, amaranth TRGs appear to be mostly stress-related genes that may offer a clue to better understand the ability of these remarkable plants to thrive under unfavorable ambient conditions. This chapter will concentrate on the description of what has gradually emerged from the incipient study of TRGs in grain amaranth and will place this knowledge in the context of what is known about these enigmatic genes in other organisms.
Article
Full-text available
The origination of novel genes is an important process during the evolution of organisms because it provides critical sources for evolutionary innovation. Addressing how novel genes emerged and acquired novel and adaptive functions is of fundamental importance. Here we summarize the newest advances in our understanding of the molecular mechanisms and genome-wide patterns of new gene origination and new gene functions. We pay special attention to the origins of noncoding RNA genes and de novo genes, whose processes had been previously overlooked but are gaining increasingly visible importance. We then introduce recent findings that have opened a path to the study of the evolution of novel functions and pathways via novel genes. We also discuss the important issues and potential developments in the field.
Chapter
Full-text available
Article
Full-text available
306) describe the proce-dure of a "good Linnaean taxonomist" for defining as being "... equivalent to a formal phylogenetic definition based on diagnosis, i.e., the largest inclusive group bear-ing the designated homologies". Nixon and Carpenter's "homologies" are presum-ably synapomorphies, following Patterson (1982) who defined homology in terms of synapomorphies. The "phylogenetic definition" of Nixon and Carpenter is not following Phylogenetic Nomenclature (PN); they advo-cate "Linnaean" taxonomy. Lee (2001), commented that Nixon & Carpenter (2000), contrary to their intention, do not discredit PN's apomorphy-based definition and unin-tentionally argue in favour of the superiority of those def-initions to others, including "Linnaean". Lee furthermore correctly remarks that Nixon and Carpenter's procedure is not the Linnaean practice of defining names, but instead identical to the phylogenetic apomorphy-based definition. Then, what are Linnaean definitions? Lee (2001: 176) mentions that they "...delimit taxa very vaguely, by specifying a rank and type...." This statement fits the way in which Linnaean names are defined. Before discussing whether Lee or Nixon and Carpenter are right or wrong, I will consider what constitutes defining.
Article
Many of our genes have no obvious relatives or evolutionary history. So where did they come from, wonders Helen Pilcher?
Chapter
In the past few years, an increasing number of draft genome sequences of multiple free‐living and parasitic nematodes have been published. Although nematode genomes vary in size within an order of magnitude, compared with mammalian genomes, they are all very small. Nevertheless, nematodes possess only marginally fewer genes than mammals do. Nematode genomes are very compact and therefore form a highly attractive system for comparative studies of genome structure and evolution. Strikingly, approximately one‐third of the genes in every sequenced nematode genome has no recognisable homologues outside their genus. One observes high rates of gene losses and gains, among them numerous examples of gene acquisition by horizontal gene transfer. Not only does the ‘gene for parasitism’ not exist, but also there appear to be no common genomic characteristics of parasitic nematode genomes which would distinguish them from genomes of free‐living nematodes. Key Concepts Nematode genomes tend to be compact. Nematode genomes vary in their gene composition due to extensive gene gain and loss. Genes are lost through gene deletion or rapid evolutionary change beyond the point where they can be recognised as homologous to a gene in another species. Genes are acquired through gene duplication, de novo formation and horizontal gene transfer. Horizontal gene transfer allows nematode species to acquire new physiological properties. All nematode genomes sequenced so far contain operons, multigene transcription units giving rise to a single pre‐mRNA, which is broken up into single protein coding mRNAs by trans‐splicing and polyadenylation. Within the nematodes parasitism has arisen multiple times independently and a ‘gene for parasitism’ or unifying parasite genomic features were not identified.