Content uploaded by Richard Buggs
Author content
All content in this area was uploaded by Richard Buggs on Jun 17, 2016
Content may be subject to copyright.
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
1!
!
Next-generation!apomorphy:!the!ubiquity!of!
taxonomically!restricted!genes!
!
P.!A.!Nelson1!and!R.!J.!A.!Buggs2!
!
1!"#$%&'(")*+,"-./&01233&!"#$%&4)*5&6%&7"+%8%/&94&:3;1:/&'<4&
2<=>##$&#?&!"#$#@"=%$&%(8&9>*A"=%$&<="*(=*,/&BC**(&7%+.&'(")*+,"-.&#?&6#(8#(/&7"$*&
D(8&E#%8/&6#(8#(/&D0&FG</&'H&
!
1.!!Introduction!
!
2.!!The!contingent!nature!of!TRG!classification!
!! 2.1.!Contingency!due!to!taxonomic!category!
!!!2.2.!Contingency!due!to!similarity!threshold!
2.3.!Contingency!due!to!sampling!
!
3.!The!ubiquity!of!TRGs!
!!!3.1!Bacterial!pan-genomes!
!! 3.2!Virus!reservoirs!
!!!3.3!Eukaryotes!
!
4.!!The!functional!significance!of!TRGs!
!!!4.1!General!evidence!
!!! 4.2!Five!examples!of!TRG!function!
!!!!
5.!!The!origin!and!evolution!of!TRGs!
!! !5.1!Standard!models!of!novel!gene!evolution!
!! !5.2!I*&(#)#!gene!evolution!
!! !5.3!The!need!for!data-driven!research!
!
6.!!Systematics!of!TRGs!
!! !6.1!Phylostratigraphy!
!! !6.2!Phylogenetic!reconstruction!
!6.3!Supporting!characters!
!
7.!!Concluding!remarks! !
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
2!
1.!!!!Introduction!
!
The!ability!to!sequence!whole!genomes!at!ever!increasing!rates!has!led!to!the!
discovery!of!vast!numbers!of!genes!that!are!uniquely!found!in!a!single!taxon!(i.e.!
apomorphic!genes).!Before!the!advent!of!automated!DNA!sequencing!in!the!early!
1990s,!genetic!comparison!of!organisms!was!only!feasible!through!the!targeted!
amplification!of!homologous!genes!that!are!shared!among!divergent!taxa,!and!
reliable!identification!of!taxon-specific!genes!was!almost!impossible.!Shortly!
after!the!publication!of!the!first!whole!genome!in!1995,!it!became!clear!that!
species!possessed!many!more!taxonomically!unique,!or!restricted,!gene!
sequences!than!expected.!When!seven!whole!genomes!had!been!published,!
Russell!F.!Doolittle,!a!molecular!biologist!of!many!decades’!experience,!
commented:!“I!am!surprised!that!so!many!open!reading!frames!remain!as!
unidentified![i.e.!unique]!reading!frames”!(1997,!516).!Five!years,!when!60!
whole!genomes!had!been!sequenced,!he!called!taxonomically!unique!sequences!
“the!biggest!surprise!in!genome!sequencing”!(2002,!698).!
!
Today,!with!whole!genome!sequencing!further!facilitated!by!next!generation!
technologies,!these!taxonomically!restricted!genes!(TRGs;!Khalturin!et!al.!2009),!
also!known!as!orphan!genes!(Dujon!1996),!or!“ORFans”!(Fischer!and!Eisenberg!
1999)!continue!to!be!discovered!in!every!newly!sequenced!species!genome!
(Figures!1!and!2).!These!genes!represent!one!of!the!most!intriguing!aspects!of!
systematics,!lying!at!the!intersection!of!genomics,!genetics,!comparative!and!
structural!biology,!phylogenetics!and!evolution.!Yet,!by!their!very!nature,!they!
are!difficult!to!study!using!conventional!comparative!approaches!and!attract!
little!research!funding.!!
!
In!this!chapter!we!review!the!current!status!of!this!conundrum!in!the!light!of!
rapid!advances!in!genomics.!Section!2!examines!the!definition!of!TRGs/ORFans,!
noting!that!this!is!an!inherently!comparative!concept!and!the!status!of!any!gene!
as!a!TRG/ORFan!is!therefore!highly!contingent.!Section!3!emphasizes!their!
ubiquity.!Section!4!discusses!the!biological!significance!of!some!TRGs!in!terms!of!
putative!functions.!Section!5!discusses!hypotheses!for!the!origins!and!evolution!
of!TRGs.!Section!6!examines!the!relevance!of!TRGs!to!systematics.!
!
2.!!!The!contingent!nature!of!TRG!classification!!
!
Assigning!any!gene!the!status!of!“taxonomically!restricted”!or!“orphan”!is!
necessarily!a!relative!judgment;!an!“orphan”!gene!always!holds!its!status!
provisionally.!!Its!status!is!contingent!on!three!factors:!(1)!a!taxonomic!category,!
(2)!a!similarity!threshold!used!as!a!proxy!for!homology,!and!(3)!genomic!
database!size!and!sampling:!i.e.,!the!total!pool!of!known!gene!sequences,!which!
jointly!yield!the!universe!of!objects!for!comparison.!Because!both!factors!(1)!and!
(2)!involve!judgments!on!which!workers!may!differ,!and!(3)!is!constantly!in!flux!
(i.e.,!growing),!the!status!of!any!gene!as!a!TRG!will!be!necessarily!conditional.!!
Any!gene,!at!any!time,!may!move!from!being!an!orphan!to!an!ortholog!(the!
contrast!class,!by!definition,!of!taxonomically!unique!sequences);!an!example!of!
such!a!re-evaluation!is!given!for!the!Drosophila!gene!#,J%+!in!Section!2.3!below.!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
3!
Given!the!importance!of!these!issues!of!definition,!we!consider!them!in!turn!
below.!
!
2.1!Contingency!due!to!taxonomic!category!
!
The!term!“orphan”!was!introduced!in!1996!by!Dujon!(1996)!with!reference!to!
the!yeast!genome,!but!as!taxonomic!sampling!of!whole!genome!sequences!was!
tiny!at!the!time,!the!level!of!taxonomic!restriction!implied!by!the!term!was!not!
clearly!defined.!Some!authors!now!restrict!the!term!to!sequences!from!the!
genomes!of!single!species!(i.e.!autapomorphic!genes),!while!others!(e.g.,!Narra!et!
al.!2008)!use!it!to!refer!to!genes!with!orthologs!found!in!multiple!closely!allied!
genera!(i.e.!synapomorphic!genes).!Others!apply!additional!descriptors!and!
referred!to!“singleton!ORFans”,!“orthologous!ORFans”!and!“paralogous!ORFans”!
(Siew!and!Fischer!2003).!This!lack!of!consistent!usage!engenders!confusion!–!
one!investigator’s!ORFan!will!be!another’s!ortholog!–!and,!this!has!given!rise!to!
the!longer,!but!more!useful!term!“taxonomically!restricted!gene”!(TRG),!
promoted!by!Wilson!et!al.!(2005,!2007)!and!Bosch!and!colleagues!(Khalturin!et!
al.!2009),!among!others.!!Using!“taxonomically!restricted!gene”!to!refer!to!
sequences!with!limited!systematic!distribution!encourages!(indeed,!requires)!
that!one!specify!the!taxon!in!question.!!With!the!taxonomic!level!thus!defined,!it!
is!much!less!likely!that!ambiguity!of!meaning!will!creep!in.!In!any!case,!the!
designation!of!any!gene!as!“taxonomically!restricted”!can!be!no!more!stable!than!
the!boundaries!or!definition!of!the!source!taxon!itself.!
!
2.2!Contingency!due!to!similarity!threshold!
!
In!the!early!1960s,!in!a!series!of!prescient!publications,!Zuckerkandl!and!Pauling!
described!“ways!of!gaining!information!about!evolutionary!history!through!the!
comparison!of!homologous!polypeptide!chains”!(1965,!360).!!As!with!classical!
anatomical!homology,!the!signal!of!history!was!to!be!extracted!from!similarity:!
that!is,!the!more!closely!related!(causally,!via!material!descent)!two!or!more!
biological!objects!are,!the!more!similar!they!will!be.!!By!this!same!logic,!the!$*,,!
similar!two!objects!are,!the!less!closely-related!they!are.!!The!definition!of!
“homology”!and!the!criteria!used!to!determine!its!presence!in!molecular!data!
have!long!been!subjects!of!controversy!(see,!e.g.,!Reeck!et!al.!1987;!Hillis!1994;!
Eisen!1998).!!At!the!heart!the!use!of!sequence!similarity!to!assess!homology!lies!
a!probabilistic!intuition,!well!expressed!by!Patterson!(1988,!615):!“if!two!
structures!are!complex!enough!and!similar!in!detail,!probability!dictates!that!
they!must!be!homologous!rather!than!convergent”.!
!
Both!genes!and!proteins!occur!as!discrete!strings,!enabling!their!direct!alignment!
from!different!species,!with!counting!of!differences!as!a!measure!of!distance.!
With!the!development!of!heuristic!tools,!such!as!BLAST!(Basic!Local!Alignment!
Search!Tool;!Altschul!et!al.!1990,!1997),!there!has!been!widespread!use!of!
parameter!thresholds!for!“homologous!sequence”!identification,!such!as!BLAST!
“Expect”!values!of!!0.001!to!0.00001!(Siew!and!Fischer!2003).!While!this!is!a!
useful!practical!approach,!it!should!be!borne!in!mind!that!quantitative!measures!
of!similarity!are!being!used!to!make!a!qualitative,!binary!assessment!of!the!status!
of!a!gene:!under!the!most!widely-accepted!evolutionary!definition!of!“homology,”!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
4!
entities!are!either!homologous!or!they!are!not!(Reeck!et!al.!1987).1!!Thus!a!
somewhat!arbitrary!probabilistic!convention!is!applied!to!a!relation!that!is!
binary!and!qualitative.!!Differences!in!threshold!levels!used!affect!greatly!the!
detection!of!homology!with!BLAST!searches!(Rost,!1999,!Koski!and!Golding,!
2001),!and!hence!our!assessment!of!the!frequency!and!occurrence!of!TRGs.!The!
usefulness!and!shortcomings!of!using!BLAST!to!detect!TRGs!is!explored!at!
greater!length!by!Tautz!and!Domazet-Lošo!(2011),!who!recommend!use!of!
position-specific!iterated!BLAST,!with!manual!supervision,!for!tracing!patterns!of!
homology!rigorously.!
!
2.3!Contingency!due!to!sampling!
!
No!TRG!could!be!named!as!such!in!a!world!where!only!one!genome!had!been!
sequenced,!nor!could!TRGs!be!found!where!only!homologous!genes!had!been!
sampled!from!a!range!of!genomes.!!Our!confidence!of!uniqueness!(for!any!
sequence)!is!directly!proportional!to!the!completeness!of!taxonomic!sampling.!
We!should!expect!that!increased!genomic!sampling!will!provide!matches!
(orthologs)!for!many!TRGs.!!The!gene!#,J%+,!first!identified!as!a!TRG!in!
I+#,#K>"$%!A*$%(#@%,-*+,!provides!an!instructive!illustration.!!Although!
necessary!for!germ-cell!formation!in!I5&A*$%(#@%,-*+,!“unlike!many!other!genes!
with!indispensable!roles!in!development,!#,J%+!is!not!a!widely!conserved!gene:!it!
proved!absent!from!the!first!non-fly!insect!genomes!sequenced,!and!has!no!clear!
homologue!in!any!other!animal”!(Extavour!2011).!!Using!“a!relaxed!and!modified!
BLAST!strategy,”!however!(see!section!2.2),!Lynch!and!colleagues!(2011)!located!
an!#,J%+!ortholog,!G)L#,J,!in!the!wasp!G%,#("%.!!Thus!we!return!to!the!point!
which!opened!this!section:!the!status!of!any!gene!as!“orphan”!or!“taxonomically!
restricted,”!intrinsically!a!relative!judgment,!calls!for!alertness!on!the!part!of!
investigators!to!the!three!principal!criteria!(similarity!threshold,!taxonomic!
category,!completeness!of!sample)!employed.!
!
3.!!The!ubiquity!of!TRGs!!!
!
Every!sequenced!genome!has!revealed!a!substantial!number!of!TRGs.!When!this!
first!occured,!it!was!widely!assumed!by!many!that!TRGs!were!simply!artifacts!of!
limited!sampling:!
!
…when!only!a!handful!of!complete!genome!sequences!were!available,!a!number!
of!possible!explanations!for!the!abundance!of!ORFans!were!suggested.!!One!
explanation!was!that!the!relatively!high!proportion!of!ORFans!may!be!due!to!an!
artifact!of!sparse!sampling!of!the!sequence!space,!and!that!with!the!availability!
of!more!genomes,!most!ORFans!would!disappear.!!(Siew!and!Fischer!2003,!7)!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1!We!note!however!that!given!the!possibility!of!lateral!gene!transfer!(LGT),!some!
authors!have!used!the!term!“partial!homology”!for!recombinant!coding!
sequences,!where!(for!example)!domain!A!comes!from!species!P,!whereas!
domain!B!comes!from!species!Q,!and!A!and!B!are!conjoined!in!species!R!to!form!a!
new!protein!C!(see!Chan!et!al.!2009)!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
5!
Perhaps!surprisingly,!this!expectation!was!not!borne!out.!!The!majority!of!early!
genome!sequences!were!of!bacteria!and!a!2005!study!of!122!bacterial!genomes!
showed!that!the!number!of!TRGs!found!was!rising!in!a!linear!fashion!with!
number!of!genomes!sequenced,!showing!no!signs!of!a!plateau!(Wilson!et!al.,!
2005).!!More!recently,!Beiko!(2011)!surveyed!over!a!thousand!complete!
bacterial!and!archaeal!genomes!(see!Figure!1),!noting!that!no!plateau!for!new!
TRGs!can!yet!be!envisaged.!!“Given!the!amount!of!novel!genetic!information!in!
new!genomes,”!he!writes!(2011,!5),!“and!the!increasing!rate!at!which!genomes!
are!being!sequenced,!there!is!consequently!no!reason!to!suspect!that!the!rate!of!
accumulation!of!novel!genes!will!decrease!in!the!near!future.”!
!
3.1!Bacterial!pan-genomes!
This!vista!of!genetic!novelty!existing!beyond!the!horizon!of!what!has!already!
been!sequenced,!can!perhaps!be!seen!most!dramatically!in!the!notion!of!the!
‘open!pan-genome.’!!!The!concept!of!a!common!potential!genome!for!all!bacteria!
was!articulated!by!Sonea!and!Panisett!(1980)!and!the!term!“pan-genome”!was!
introduced!by!Tettelin!et!al.!(2005),!as!they!attempted!to!describe!the!full!genetic!
diversity!found!within!a!single!bacterial!species.!!After!comparing!the!complete!
genomes!of!eight!strains!of!the!pathogen!<-+*K-#=#==C,&%@%$%=-"%*!(also!known!as!
Group!B!<-+*K-#=#==C,,!or!GBS),!Tettelin!et!al.!(2005)!found!that!each!newly-
sequenced!strain!contained!genes!not!previously!seen!in!any!other!strain.!!Fitting!
their!data!to!an!exponential!decay!function,!they!predicted!that!“for!every!new!
GBS!genome!sequenced,!an!average!of!33!new!strain-specific!genes!will!be!
identified!and!added!to!the!pan-genome.”!Similar!studies!examining!D,=>*+"=>"%&
=#$"!(Rasko!et!al.!2008;!Touchon!et!al.!2009)!found!“continual!addition!of!new!
genes!with!each!newly!sequenced!genome,”!and!thus,!the!same!‘open’!pattern:!
“no!single!strain!can!be!regarded!as!highly!representative!of!the!species…the!
pan-genome!is!far!from!being!fully!uncovered”!(Touchon!et!al.!2009,!p.!5).!!Those!
sequences!found!in!all!strains!constitute!the!‘core!genome’!–!mainly!encoding!
housekeeping!functions!such!as!translation!or!core!metabolic!processes!–!
whereas!the!strain-specific!sequences!are!usually!described!as!the!‘accessory’!or!
‘dispensable!genome,’2!needed!for!existence!“in!a!specific!environment…linked!
to!virulence,!capsular!serotype,!adaptation,!and!antibiotic!resistance!and!might!
reflect!the!organisms’!predominant!lifestyle”!(Mira!et!al.!2010,!47).!
!
Not!all!bacterial!species!exhibit!“open”!genome!patterns;!“closed”!pan-genomes,!
such!as!found!in!<-%K>.$#=#==C,&%C+*C,/!!%="$$C,&%(->+%=",!(Tettelin!et!al.!2008),!
and!9%AK.$#M%=-*+!sp.!(Lefébure!et!al.!2010)!are!characterized!by!rarefaction!
curves!“that!converge!to!a!small!but!finite!number!of!asymptotically!discovered!
new!genes”!(Tettelin!et!al.!2008,!475)!–!meaning!that!additional!sequencing!of!
strains!within!these!species!is!unlikely!to!reveal!new!genetic!diversity.!!!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2!‘Dispensable’!is!something!of!a!misnomer.!!Noting!that!the!functions!specified!
by!the!‘dispensable’!sequences!often!involve!“characters!that!are!a!direct!
response!to!the!environment,”!Mira!et!al.!(2010,!55)!stress!that!“a!gene!within!
the!accessory!genome…should!not!be!literally!regarded!as!dispensable.”!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
6!
While!a!variety!of!models!have!been!proposed!to!explain!bacterial!pan-genome!
patterns!(see,!e.g.,!Boissy!et!al.!2011,!Baumdicker!et!al.!2010,!2012),!the!take-
home!lesson!is!the!enormous!genetic!diversity!within!the!domain!as!a!whole.!!
For!example,!in!an!analysis!of!573!sequenced!bacterial!genomes,!Lapierre!and!
Gogarten!(2009)!sampled!genes!randomly,!and!then!queried!the!entire!pool!of!
genomes!to!find!BLAST!hits!for!the!sampled!genes,!categorizing!gene!families!
according!to!the!degree!in!which!they!were!shared!among!species.!!Within!each!
individual!genome,!the!typical!gene!complement!was!approximately:!8%!core!
conserved,!64%!“character”!genes!(“essential!for!colonization!and!survival!in!
particular!environmental!niches”)!and!28%!“accessory”!genes!(TRGs!mainly!of!
unknown!function).!!This!meant!that!within!the!pan-genome!of!about!150k!gene!
families,!approximately!0.2%!were!core,!5%!were!“character”!and!over!94%!
were!“accessory”.!!Fitting!the!data!to!an!exponential!decay!function,!Lapierre!and!
Gogarten!concluded,!“the!pan-genome!of!the!bacterial!domain!is!of!infinite!size.”3!!
!
3.2!Virus!reservoirs!
!
Viruses!are!especially!rich!in!TRGs!(Edwards!and!Rohwer,!2005,!Bench!et!al.,!
2007,!Forterre!and!Prangishvili,!2009,!Prangishvili!et!al.,!2006).!Boyer!et!al.!
(2010)!estimate!between!30!to!>70!percent!of!viral!genomes!constitute!TRGs,!
compared!to!10-15!percent!of!TRGs!in!archaeal!and!bacterial!genomes!(Koonin!
2011,!110).!!Metagenomic!surveys!of!viral!populations!in!seawater,!drawing!on!
the!presence!“of!an!average!of!107!virus-like!particles!per!milliliter!of!surface!
seawater”!and!“an!estimated!1030!viruses!in!the!global!oceans”!(Breitbart!2012),!
have!motivated!theoretical!speculations!about!a!vast!reservoir!of!viral!sequences,!
dwarfing!in!size!the!prokaryotic!and!eukaryotic!genomic!universes.!!Shapiro!
(2011),!Koonin!(2011),!Abroi!and!Gough!(2011),!and!others!have!hypothesized!
that!this!enormous!“virosphere”!provides!a!“research!and!development”!realm!
where!“experimentation!with!genomic!processes”!(Shapiro!2011,!133)!yields!a!
supply!of!novel!sequences!(TRGs),!which!may!eventually!be!taken!up!by!
prokaryotes!via!viral!transfer.!
!
3.3!Eukaryotes!
!
All!eukaryote!genome!sequences,!including!those!from!yeasts!(Kessler!et!al.,!
2003,!Pena-Castillo!and!Hughes,!2007),!plants!(Rutter!et!al.,!2012,!Donoghue!et!
al.,!2011,!Campbell!et!al.,!2007)!and!primates!(Wu!et!al.,!2011,!Knowles!and!
McLysaght,!2009,!Clamp!et!al.,!2007)!have!yielded!TRGs.!Every!new!completed!
genome!sequence!reveals!a!significant!percentage!of!new!TRGs;!indeed,!“as!
orphan!genes![TRGs]!represent!a!substantial!fraction!of!every!extant!genome,!the!
total!number!of!orphans!across!all!evolutionary!lineages!by!far!exceeds!the!
number!of!known!gene!families”!(Tautz!and!Domazet-Lošo!2011,!p.!693).!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
3!“Infinite”!should!be!understood!to!mean!“indefinitely!large,”!given!that!the!
number!of!bacterial!cells!(and!hence!possible!bacterial!genes)!on!Earth,!while!
vast,!is!finite.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
7!
As!with!bacteria,!increased!sampling!of!genomes!has!rapidly!increased!the!
number!of!TRGs!discovered.!!Within!the!Nematoda,!for!instance,!EST!datasets!
and!whole!genome!data!complied!by!Wasmuth!et!al.!(2008),!showed!the!
following:!
!
Cross-comparison!of!the!95&*$*@%(,!and!95&M+"@@,%*!proteomes!identified!~10%!
of!unique!genes!in!each!species.!Throwing!the!draft!!5&A%$%."!genome!into!the!
mix,!revealed!~40%!of!its!proteins!did!not!share!homology!to!95&*$*@%(,,!95&
M+"@@,%*!nor!I+#,#K>"$%&A*$%(#@%,-*+…!Adding!partial!proteomes!from!37!
additional!nematode!species!reduced!the!number!of!private!genes!to!~8%!in!
each!species.!While!we!expect!this!proportion!to!decline!as!nematode!EST!
sequencing!continues,!along!with!the!release!of!genomes,!we!expect!that!each!
fully!sequenced!genome!has!a!significant!complement!of!novel!genes!that!have!
arisen!since!they!last!shared!a!common!ancestor,!less!than!100!million!years!ago.!
If!this!pattern!is!true!of!all!the!>1!million!predicted!nematode!species,!then!
‘nematode!protein!space’,!the!portion!of!possible!sequence!structures!actually!
occupied!by!nematode!proteins,!is!likely!to!be!huge.!Our!analyses!suggest!that!
nematode!protein!space!is!huge,!and!that!it!is!likely!that!our!survey!has!merely!
scraped!its!surface.!!(Wasmuth!et!al.!2008,!pp.!11-12)!
!
See!also!Rödelsperger!et!al.!(2013,!p.!1):!“Strikingly,!approximately!one-third!of!
the!genes!in!every!sequenced!nematode!genome!has!no!recognisable!
homologues!outside!their!genus.”!
!
In!their!survey!of!orphan!percentages!within!the!insects,!Wissler!et!al.!(2013)!
found!that!“averaged!over!all!included!insect!and!arthropod!outgroup!species,!
approximately!13%!of!all!genes!lack!a!homologous!protein!in!any!other!species”
(2013,!p.!444).!!Given!that!~14,000!ant!species!alone!have!been!described!
worldwide,!the!potential!for!further!TRG!discovery!simply!by!samplingin!the!
Formicidae!(not!to!mention!other!insects)!is!mind-boggling.!!
!
The!pan-genome!concept!developed!for!bacteria!has!also!been!applied!to!
eukaryotes!including!maize!(Morgante!et!al.,!2007),!yeasts!(Dunn!et!al.,!2012)!
and!humans!(Li!et!al.,!2010b).!
!
4.!!The!functional!significance!of!TRGs!!!
!
The!simplest!and!most!common!way!of!gaining!an!indication!of!a!newly!
sequenced!gene’s!function!is!to!compare!it!to!other!known!sequences!whose!
function!has!been!elucidated!in!a!model!organism.!For!TRGs,!this!is!by!definition!
not!an!option,!meaning!that!functional!characterization!must!occur!on!a!case-by-
case!basis.!This!is!expensive!and!time-consuming.!In!general,!recently!discovered!
genes!tend!to!attract!limited!research!attention.!For!example,!a!recent!
bibliographic!analysis!found!that!75%!of!protein!research!was!still!focused!on!
the!10%!of!human!proteins!that!were!known!before!the!human!genome!was!
sequenced!(Edwards!et!al.,!2011).!!
!
4.1!General!evidence!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
8!
Several!lines!of!reasoning!suggest!that!TRGs!are!functional.!(1)!The!fact!that!they!
have!been!annotated!in!genome!sequences!in!the!first!place!is!often!due!to!the!
fact!that!EST!data!align!to!them!and!therefore!they!are!at!least!expressed.!(2)!
Wilson!et!al.!(2007)!have!developed!a!“Quality!Index!for!Predicted!Proteins’’!that!
scores!the!likelihood!that!a!protein!is!functional,!using!non-homology-based!
criteria.!Applying!this!to!TRGs!suggests!that!many!of!them!are!functional!(Wilson!
et!al.,!2007).!(3)!Comparison!of!TRGs!with!different!levels!of!taxonomic!
restriction!has!identified!characteristics!correlated!with!degree!of!taxonomic!
restriction!(Daubin!and!Ochman,!2004,!Wolf!et!al.,!2009),!such!as!gradual!
reduction!in!length!and!GC!content.!This!continuum!of!characteristics!between!
widespread,!functionally!characterized!genes!and!restricted,!little-studied!genes!
has!been!taken!as!evidence!that!the!latter!are!functional!and!not!artefacts!(Wolf!
et!al.,!2009).!(4)!If!TRGs!are!functional,!their!frequency!should!correlate!with!the!
degree!to!which!their!species!is!ecologically!or!taxonomically!removed!from!
other!species!whose!genomes!have!been!sequenced!–!this!seems!to!be!the!case!
(Wilson!et!al.,!2005,!Khalturin!et!al.,!2009);!if!TRGs!were!merely!annotation!
artefacts,!we!would!expect!them!to!be!approximately!equally!common!per!
megabase!in!any!genome.!For!these!four!reasons,!there!seems!to!be!good!reason!
to!expect!that!many!TRGs!do!have!a!function.!Below!we!give!five!examples!from!
contrasting!taxonomic!groups!where!this!has!clearly!been!shown!to!be!the!case.!
!
4.2!Five!examples!of!TRG!function!
!
4.2.1.!Viruses:!Nwgl!in!T4!bacteriophages!
!
Frequencies!of!TRGs!in!viral!genomes!are!higher!than!in!any!other!biological!
entity,!and!it!is!likely!that!many!of!these!sequences!are!functionally!significant!or!
even!essential.!!Ang!and!Georgopoulos!(2012)!note!that!“even!closely!related!
bacteriophages!carry!their!own!set!of!unique!genes!that!most!likely!favor!their!
growth!on!certain!bacterial!hosts”!(p.!989).!!Investigating!the!interaction!of!
bacteriophage!T4!with!its!host,!D5&=#$",!they!focused!on!the!role!of!the!TRG!T4!
Gp39.2,!which!they!renamed!GN@$,!for!“normalizes!weak!GroE!interactions.”!!
GN@$!encodes!a!58!amino!acid!protein!that!suppresses!D5&=#$"!mutations!affecting!
the!bacterium’s!GroEL!chaperone!proteins.!!In!their!model,!the!GN@$!protein!
“shifts!the!equilibrium!of!GroEL!to!the!‘open’!state,”!allowing!the!T4-encoded!co-
chaperone!to!bind!–!thus!enabling!the!complex!to!fold!T4!essential!proteins,!in!
particular,!“the!most!abundant!protein!produced!by!the!bacteriophage…its!major!
capsid!subunit,!Gp23,!whose!correct!folding!depends!entirely!on!the!host!GroEL!
chaperone”!(2012,!996).!!Ang!and!Georgopoulos!determined!(via!deletion!
strains)!that!“the!seemingly!nonessential”!TRG!Gp39.2!/!GN@$!was,!in!fact,!
“essential!for!bacteriophage!growth!on!certain!hosts”!(2012,!995).!In!a!search!of!
nucleotide!sequence!databases,!the!Gp39.2!family!was!only!found!in!T4-like!
bacteriophages!that!can!propagate!on!Enterobacteria!(Ang!and!Georgopoulos,!
2012).!
!
4.2.2!!Archaea:!Topoisomerase!V!in!Methanopyrus3kandlerii3
!
All!organisms!require!topoisomerases!as!essential!molecular!hardware:!these!
DNA!“disentangling”!proteins!change!the!topology!of!the!two!strands!of!the!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
9!
double!helix,!e.g.,!during!replication,!to!prevent!the!supercoiling!that!would!
otherwise!occur.!!While!many!topoisomerases!are!widely!distributed!throughout!
their!phylogenetic!domains,!Topo!V!has!a!unique!fold!and!is!present!as!a!TRG!in!a!
single!archaeon,!the!hyperthermophilic!7*->%(#K.+C,&J%(8$*+".!!The!clear!
functionality!of!Topo!V!can!be!seen!in!the!fact!that!it!has!proven!commercially!
useful,!as!the!crucial!component!of!the!ThermoFidelase!sequencing!kit,!due!to!its!
stability!at!high!temperatures!(Forterre,!2006!p.245).!This!led!one!researcher!to!
muse:!“if!Topo!V!is!such!a!wonderful!enzyme,!why!was!Mother!Nature!so!mean!
as!to!limit!its!presence!to!a!single!archaeal!species?”!(Forterre,!2006!p.246).!!
!
4.2.3.!!Bacteria:!LpoB!in!Escherichia3coli!!
!
D,=>*+"=>"%&=#$",!the!classical!model!system!of!biochemistry,!genetics,!and!
molecular!biology,!yielded!a!treasure!trove!of!functional!data!about!TRGs!in!the!
large-scale,!high-throughput!“phenotypic!analysis”!of!Nichols!et!al.!(2011).!!
Setting!out!to!find!“phenotypes!for!mutants!of!genes!without!functional!
annotation”!–!a!class!of!sequences!in!which!TRGs!are!predominant!–!Nichols!et!al.!
discovered!that!“the!most!responsive!orphans![i.e.!TRGs!with!strong!mutant!
phenotypes]!tended!to!be!narrowly!distributed!among!bacteria”!(2011,!p.11).!!
For!example,!6K#!,!a!gene!whose!product!regulates!peptidoglycan!synthesis,!
critical!for!the!formation!of!the!cell!wall,!is!found!only!in!D5&=#$"!and!its!near!
relatives!(Typas!et!al.,!2010,!1107).!!“An!exciting!explanation,”!argue!Typas!et!al.!
of!this!apparent!contradiction!–!namely,!a!gene!that!is!distributed!narrowly,!yet!
is!also!functionally!important!–!is!that!“such!genes!have!been!recently!acquired!
to!act!as!regulators!of!broadly!conserved!biological!processes,!adding!an!
additional!layer!of!control!that!helps!the!cell!adjust!to!the!specific!needs!of!its!
niche”!(2010,!p.!1108).!
!
4.2.4!!Cnidaria:!the!periculin!family!in!Hydra!
!
Developing!embryos!of!the!freshwater!polyp!O.8+%,!unprotected!in!the!water!
column,!would!appear!to!be!vulnerable!to!pathological!bacterial!colonization.!!
Remarkably,!however,!early!O.8+%!embryos!selectively!incorporate!a!bacterial!
microbiota,!using!potent!antimicrobials!to!regulate!the!abundance!and!type!of!
foreign!cells!admitted;!“the!host!seems!to!be!able!to!select!and!shape!the!
bacterial!community”!(Fraune!et!al.!2010,!18071).!!The!periculin!family!of!TRGs!
(five!genes!in!O.8+%)!encode!short!proteins,!129-158!amino!acids!in!length,!with!
high!bactericidal!activity!against!unwanted!bacterial!species.!Fraune!et!al.!(2010,!
p.!18071),!observe:!!
!
Moreover!embryo-protecting!peptides!of!the!periculin!family!are!specific!for!the!
genus!Hydra!and!are!not!present!in!the!genomes!of!other!animal!taxa.!This!
specificity!may!reflect!habitat-specific!adaptations,!supporting!the!view!that!
taxonomically!restricted!host-defense!molecules!represent!an!extremely!
effective!chemical!warfare!system!that!facilitates!the!disarming!of!taxon-specific!
microbial!attackers.!
!
4.2.5!!Mollusca:!nacre-building!genes!in!bivalves!and!gastropods!!
!!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
10!
A!defining!character!of!the!phylum!Mollusca!is!the!possession!of!a!mechanism!of!
shell!construction!which!first!appears!in!the!late!pre-Cambrian.!!Intuitively,!one!
would!expect!the!molecular!basis!of!this!feature!to!be!homologous!throughout!
the!group.!!Jackson!et!al.!(2010)!analysed!the!genes!and!proteins!implicated!in!
shell!construction!in!the!bivalve!P"(=-%8%&A%Q"A%!and!the!gastropod!O%$"#-",&
%,"("(*.!After!isolating!129!O5&%,"("(%!and!125!P5&A%Q"A%!sequences!likely!to!be!
involved!in!nacre!formation!from!both!species,!“the!majority!were!found!to!be!
unique;!95!(74%)!of!the!O5&%,"("(%–secreted!products!and!71!(57%)!of!the!P5&
A%Q"A%!products!shared!no!similarity!with!sequences!in!GenBank!nr!and!EST!
databases”!(2010,!p.!595).!!These!TRG-based!differences!were!so!substantial!that!
Jackson!et!al.!hypothesized!that!“the!molecular!mechanisms!that!guide!the!
deposition!of!the!variants!of!nacre!and!its!derivatives!across!the!Mollusca!are!
fundamentally!different”!(2010,!p.!605).!!They!conclude!(2010,!p.!606): “The!
degree!of!gene!novelty!and!differences!between!the!molluscs!analyzed!here!also!
highlights!the!importance!of!the!evolution!of!coding!sequences!to!the!generation!
of!metazoan!morphological!novelty.!In!particular,!the!evolution!and!
diversification!of!novel!RLCD!proteins!is!apparently!a!key!feature!of!molluscan!
shell!evolution”.!
!
These!five!examples!demonstrate!that!TRGs!can!have!biological!functions.!!We!
stress!the!fruitfulness!of!assaying!TRG!functionality,!given!the!excellent!
prospects!in!so!doing!for!fundamental!discovery.!!Analytical!challenges!exist,!of!
course:!searching!for!TRG!function!(in!any!taxon)!requires!describing!the!space!
of!relevant!environmental!or!life-history!conditions,!especially!for!those!groups!
whose!life!histories!go!well!beyond!what!can!be!seen!in!the!laboratory.!!If!the!
TRGs!one!is!assaying!for!possible!functions!“are!important!only!under!specific!
conditions!–!particular!situations!that!are!not!normally!tested!in!the!laboratory!–!
then!we!would!expect!mutation!of!these!genes!to!have!little!or!no!phenotype!in!
general”!(Peña-Castillo!and!Hughes!2007,!p.!11).!!But!given!that!TRGs!have!been!
associated!with!such!conditions!as!sociality!in!the!honey!bee!(Johnson!and!
Tsutsui,!2011),!courtship!behaviours!in!I+#,#K>"$%&(Dai!et!al.,!2008),!!and!limb!
regeneration!in!salamanders!(Garza-Garcia!et!al.,!2010),!against!the!right!
background,!functions!may!well!appear,!and!other!fascinating!findings!doubtless!
await.!!!
!
It!has!sometimes!been!suggested!that!TRGs!are!non-functional!sequences,!or!
annotation!artefacts!(Skovgaard!et!al.,!2001,!Clamp!et!al.,!2007),!but!evidence!is!
increasing!for!their!functionality!and!biological!significance!(see!reviews!
Khalturin!et!al.,!2009,!Tautz!and!Domazet-Lošo,!2011).!That!relatively!few!TRGs!
have!well-documented!functions!is!likely!due!to!lack!of!funding!and!research!
aimed!at!their!functional!characterization,!rather!than!lack!of!actual!function.!In!
short,!we!suspect!that!if!one’s!model!system!or!species!of!study!does!something!
unique!and!interesting,!TRGs!will!be!at!least!partially!responsible,!and!worth!
seeking!out.!Kaessmann!(2010)!notes!the!advances!that!might!be!made!through!
characterisation!of!novel!genes!and!suggests:!“Although!challenging,!newly!
identified!novel!genes!should!be!subjected!to!in-depth!characterizations!of!their!
functional!evolution,!using!evolutionary!analysis!combined!with!large-!and!
small-scale!genomics/transcriptomics,!molecular,!cellular,!and!in!vivo!
experiments”.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
11!
!
!
5.!The!origins!and!evolution!of!TRGs!
!
The!origins!of!TRGs!have!been!termed!“enigmatic”!(Domazet-Lošo!and!Tautz,!
2003),!“baffling!mysteries”!(Doolittle,!2002),!“an!evolutionary!mystery”!
(Merkeev!et!al.,!2006),!“unclear”!(Tautz!and!Domazet-Lošo,!2011)!and!“an!issue!
of!great!complexity!and!almost!completely!uncharted!territory”!(Khalturin!et!al.,!
2009).!Our!inability!to!explain!the!evolution!of!TRGs!has!been!used!as!an!
argument!to!support!the!proposition!that!they!are!non-functional!(Clamp!et!al.,!
2007),!and!may!therefore!have!contributed!to!the!comparative!neglect!of!TRGs!in!
research!(Khalturin!et!al.,!2009)!until!evidence!began!to!accumulate!for!their!
functionality!(see!Section!4!above).!!
!!
5.1!Standard!models!of!novel!gene!evolution!
!
The!evolution!of!TRGs!is!hard!to!explain!because!most!models!of!novel!gene!
evolution!depend!upon!duplication,!reshuffling,!retrotransposition!and/or!
horizontal!transfer!of!pre-existing!coding!regions!(Ohno!1970,!Long!2001,!Long!
et!al.!2003,!Kaessmann!2010).!These!mechanisms!leave!behind!traceable!
putative-progenitor!sequences,!detectable!by!similarity!searches!(see!for!
example!Zhou!et!al.!2008,!Donoghue!et!al.!2011).!If!TRGs!arise!by!such!
mechanisms,!they!must!rapidly!diverge!from!their!progenitor!sequences,!beyond!
the!threshold!of!similarity!searches!(Tautz!and!Domazet-Lošo,!2011,!Zhou!et!al.,!
2008).!This!does!not!fit!easily!with!a!gradual!mutation/selection!mechanism!of!
evolution!(Wright,!1931,!Fisher,!1930),!and!several!recent!papers!have!argued!
that!these!mechanisms!do!not!explain!many!cases!of!TRG!evolution,!and!8*&(#)#!
gene!evolution!is!a!better!explanation!(Neme!and!Tautz,!2013,!Carvunis!et!al.,!
2012,!Ding!et!al.,!2012).!.!
!
5.2!De3novo3gene!evolution!
!
A!mechanism!increasingly!invoked!for!the!origin!of!TRGs!is!the!evolution!of!
genes!from!non-coding!sequence,!sometimes!called!R8*&(#)#”!gene!evolution.!
Some!researchers!cite!this!mechanism!for!TRGs!without!identifying!an!
orthologous!noncoding!region!in!a!close!relative!(Zhou!et!al.,!2008,!Levine!et!al.,!
2006,!Begun!et!al.,!2007,!Toll-Riera!et!al.,!2009);!as!such!“8*&(#)#&gene!evolution”!
is!more!an!observation!of!orphan!gene!existence!than!an!understood!mechanism!
of!gene!origination.!Other!researchers,!such!as!Cardoso-Moreira!and!Long!(2012,!
p.!170)!stipulate!that!“in!order!for!a!new!gene!to!be!classified!as!a!8*&(#)#!gene,!
the!orthologous!noncoding!region!in!the!genome!of!a!close!relative!should!be!
identified.!!This!is!required!to!show!that!indeed!coding!sequence!evolved!from!a!
previously!noncoding!sequence.”!This!is!the!sense!in!which!we!discuss!8*&(#)#!
evolution!below.!It!should!be!noted!that!the!term!8*&(#)#!evolution!is!sometimes!
used!to!describe!a!new!ORF!that!appears!to!have!evolved!by!“overprinting”!in!an!
alternative!reading!frame!of!a!pre-existing!ORF!(Ohno,!1984,!Sabath!et!al.,!2012,!
Li!et!al.,!2010a);!however,!this!mechanism!cannot!directly!result!in!a!TRG!as!
defined!by!homology!searches,!and!authors!of!studies!on!these!do!not!normally!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
12!
use!the!term!orphan!or!TRG!to!describe!the!overprinted!ORF!(e.g.!Sabath!et!al.,!
2012,!Li!et!al.,!2010a),!so!we!will!not!discuss!them!further!here.!
!
Several!studies!have!identified!possible!cases!of!8*&(#)#!gene!evolution!as!
defined!by!Cardoso-Moreira!and!Long,!with!sequences!orthologous!to!orphan!
genes!in!the!non-coding!DNA!of!other!species!(e.g.!Knowles!and!McLysaght,!2009,!
Levine!et!al.,!2006,!Wu!et!al.,!2011,!Zhou!et!al.,!2008).!One!particularly!detailed!
study!shows!a!gene,!P#$8",&in!mouse!with!expression!in!the!testes,!three!exons,!
alternative!splicing,!and!a!knock-out!phenotype,!that!has!orthologous!regions!in!
human!and!rat!that!appear!not!to!be!capable!of!expression!(Heinen!et!al.,!2009).!!
!
There!are!two!difficulties!with!8*&(#)#&gene!evolution!as!defined!by!Cardoso-
Moreira!and!Longabove!as!an!explanation!for!the!origins!of!TRGs.!!Firstly,!it!is!
difficult!to!see!these!8*&(#)#!genes!as!orphans!,*(,C&,-+"=-#,!given!that!orthologs!
–!albeit!apparently!non-functional!orthologs!–!do!exist.!The!presence!of!
orthologous!sequences!in!other!taxa!is!K+"A%&?%="*!difficult!to!reconcile!with!most!
operational!definitions!of!TRGs!and!ORFans,!in!particular,!the!criterion!of!
similarity!threshold!(see!2.2,!above).!A!second!difficulty!lies!in!proving!the!
direction!of!evolution!in!cases!of!8*&(#)#!evolution:!it!could!be!that!the!non-
coding!orthologs!of!the!functional!orphan!genes!are!simply!pseudogenes!which!
were!previously!functional.!Given!that!8*&(#)#!gene!origination!is!unlikely!as!an!
evolutionary!process!because!the!probability!of!a!functional!protein!sequence!
emerging!from!a!random!sequence!is!vanishingly!small!(Jacob,!1977,!Ohno,!
1970),!pseudogenization!may!be!a!more!parsimonious!explanation!for!the!
patterns!seen.!Cardoso-Moreira!and!Long!(2012,!p.!170)!caution!that!“the!
presence!of!a!gene!in!a!genome!and!its!absence![as!a!coding!sequence]!in!the!
genomes!of!close!relatives!does!not!necessarily!imply!that!that!gene!evolved!8*&
(#)#…that!gene!could!have!been!lost!from!all!other!genomes”!(2012,!p.!170).!!
Siepel!(2009,!p.!1694)!argues!that!this!could!be!the!case!even!if!multiple!
pseudogenes!are!found!“the!possibility!that!apparent!gene!births!were!actually!
functional!in!ancestral!genomes!and!were!lost!independently!in!multiple!lineages,!
although!remote!for!these!genes,!cannot!be!completely!discounted.!Mutational!
hotspots!could!lead!to!non-negligible!probabilities!of!parallel!(homoplastic)!
disabling!mutations.”!!
!
Many!investigators!are!understandably!reluctant!to!infer!the!direct!origin!of!
functional!TRGs!from!random!sequences.!!Siepel!(2009,!p.!1694)!lists!some!of!the!
features!likely!to!be!necessary!to!transform!an!(otherwise!non-coding)!
nucleotide!string!into!a!gene!with!a!functional!product:!
!
These!apparent!8*&(#)#!gene!origins!raise!the!question!of!how!evolution!by!
natural!selection!can!produce!functional!genes!from!noncoding!DNA.!While!a!
single!gene!is!not!as!complex!as!a!complete!organ,!such!as!an!eye!or!even!a!
feather,!it!still!has!a!series!of!nontrivial!requirements!for!functionality,!for!
instance,!an!ORF,!an!encoded!protein!that!serves!some!useful!purpose,!a!
promoter!capable!of!initiating!transcription,!and!presence!in!a!region!of!open!
chromatin!structure!that!permits!transcription!to!occur.!How!could!all!of!these!
pieces!fall!into!place!through!the!random!processes!of!mutation,!recombination,!
and!neutral!drift—or!at!least!enough!of!these!pieces!to!produce!a!protogene!that!
was!sufficiently!useful!for!selection!to!take!hold?!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
13!
!
Wilson!and!Masel!(2011,!p.!1246)!share!this!skepticism,!citing!additional!
hurdles:!
!
Conversion!from!noncoding!to!coding!seems!too!unlikely!an!event!to!happen!in!a!
single!evolutionary!step.!The!sequence!in!question!must!be!transcribed,!escape!
degradation!at!the!nuclear!exosome,!associate!with!ribosomes,!be!translated,!
and!again!escape!degradation!by!the!proteasome.!Finally,!it!must!avoid!toxic!
conformations!such!as!amyloid,!for!example,!in!favor!of!a!stable!protein!fold.!
!
Armengaud!et!al.!(2011)!note!that!while!origin!from!random!sequence!“cannot!
be!a!priori!rejected,”!the!odds!are!long:!“Since!a!protein!should!fold!in!the!proper!
way!to!give!a!stable!three-dimensional!structure!for!a!correct!function,!obtaining!
a!new!function!from!scratch!is!statistically!highly!improbable”!(2011,!p.!2).!
!!
Carvunis!et!al.!(2012)!have!proposed!a!model!for!8*&(#)#&gene!birth!from!short!
“proto-genes”!that!may!overcome!some!of!these!problems!mentioned!above.!As!
mentioned!in!Section!4.1,!gene!characteristics!such!as!length!and!GC!content!are!
correlated!with!degree!of!taxonomic!restriction!of!annotated!genes!(Daubin!and!
Ochman,!2004,!Wolf!et!al.,!2009,!Lipman!et!al.,!2002).!In!their!detailed!study!of!
14!yeast!species,!Carvunis!et!al.!(2012)!extended!such!observations!to!all!ORFs!
longer!than!30!nucleotides!in!<%==>%+#A.=*,&=*+*)","%*.!They!found!that!the!
majority!of!short,!unannotated!ORFs!in!<5&=*+*)","%*!are!restricted!to!the!species,!
and!hundreds!of!these!ORFs!are!translated!into!proteins!and!may!be!functional.!
Whilst!this!observation!potentially!increases!the!number!of!TRGs!in!<5&=*+*)",%*,!
the!authors!argue!that!it!may!also!provide!a!route!for!TRG!evolution.!They!found!
something!of!a!continuum!from!short,!little!expressed,!unannotated!ORFs!with!
restricted!taxonomic!distribution!through!to!long,!highly!expressed,!well!
annotated!ORFs!with!broad!taxonomic!distribution.!Applying!various!metrics!
relating!to!possible!gene!functions,!they!suggested!that!this!distribution!of!
characters!in!ORFs!represents!an!evolutionary!continuum.!They!present!a!verbal!
model!in!which!short!non-genic!sequences!in!the!genome!mutate!to!become!
short!non-genic!ORFs,!some!of!which!then!acquire!the!ability!to!be!transcribed!
and!become!“protogenes”,!some!of!which!lengthen!to!become!longer,!fully!
functional!genes.!!
!
Currently,!as!Pilcher!(2013)!points!out,!the!plausibility!of!the!Carvunis!et!al.!
(2012)!model!is!partly!dependent!on!one’s!view!of!the!functionality!of!non-genic!
regions!of!genomes.!An!assumption!of!the!model!seems!to!be!that!the!majority!of!
short!non-genic!regions!and!ORFs!that!provide!the!raw!material!for!evolution!are!
lacking!in!function,!or!at!least!have!a!function!that!can!be!dispensed!with!or!
incorporated!as!they!grow!into!genes.!If!widespread!transcription!of!non-genic!
regions,!as!found!in!this!and!other!studies!such!as!(Djebali!et!al.,!2012),!is!mainly!
noise,!then!these!regions!may!be!evolutionarily!labile,!but!if!transcription!of!non-
genic!regions!in!fact!indicates!functionality!then!their!evolution!may!be!
constrained.!Kaessmann!(2010)!argues!that!pervasive!transcription!of!non-genic!
regions!might!make!8*&(#)#!gene!origination!common,!but!notes!“the!regulatory,!
sequence,!and!structural!requirements!for!the!functionality!of!long!noncoding!
RNAs!are!so!far!poorly!understood!and!hence!the!probability!of!such!gene!
formation!events!is!hard!to!predict.”!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
14!
!
It!is!to!be!hoped!that!the!Carvunis!et!al.!(2012)!model!will!continue!to!be!
developed!from!a!general!verbal!model!to!one!in!which!step-by-step!evolution!of!
particular!gene!sequences!is!documented.!One!prediction!that!might!arise!from!
the!model!is!that!although!a!particular!species-specific!gene!may!have!no!
orthologs!in!other!species,!if!it!has!arisen!by!8*&(#)#!gene!evolution!it!may!be!
that!re-sequencing!of!different!populations!within!the!species!will!reveal!
intermediate!short!ORFs!that!have!partial!homology!with!the!longer!gene.!In!
other!words,!if!TRGs!arise!8*&(#)#!by!the!lengthening!of!short!ORFs,!there!may!in!
some!cases!be!traceable!evolutionary!pathways!found!within!the!taxonomic!
range!of!the!TRG.!!A!related!aspect!of!the!Carvunis!et!al.!(2012)!model!that!
deserves!further!attention!is!the!time!that!might!be!taken!for!a!non-genic!ORF!to!
evolve!into!a!fully-fledged!gene:!to!explain!the!occurrence!of!species-specific!
genes,!this!process!has!to!mainly!occur!within!the!time!since!the!divergence!of!
the!closest!sister!species.!We!have!something!of!a!Catch-22!situation!in!that!any!
plausible!model!for!gene!evolution!has!to!be!gradualistic,!but!species-specific!
TRG!occurrence!patterns!do!not!seem!to!allow!much!time!for!evolutionary!
processes!to!occur!in.!!
!
5.3!The!need!for!data-driven!research!
!
The!origins!of!TRGs!continue!to!be!a!mystery,!and!their!existence!seems!to!be!at!
odds!with!many!of!our!hypotheses!about!how!evolution!works.!An!hypothesis-
driven!reaction!to!this!might!be!to!ignore!TRGs.!For!example,!two!pioneering!
studies!of!gene!evolution!in!humans!(Wu!et!al.,!2011,!Knowles!and!McLysaght,!
2009)!excluded!over!200!genes!that!had!no!detectable!orthologs!in!other!
primates,!on!the!assumption!that!their!TRG!status!was!simply!due!to!
incompleteness!of!other!primate!genome!drafts.!Similarly,!in!an!analysis!of!gene!
family!evolution!across!12!Drosophila!genomes,!Hahn!et!al.!(2007)!“found!
23,070!families!that!consisted!of!a!single!gene!and!that!appeared!to!have!evolved!
on!a!terminal!lineage!(i.e.,!they!are!found!in!only!a!single!species).!These!single-
gene!families!were!regarded!as!artifacts!of!the!annotation!process,!and!were!
removed!from!further!analysis.”!Such!approaches!may!have!led!Khalturin!et!al.!
(2009,!Box!1)!to!note!that:!
!
Taxonomists!are!fascinated!when!they!manage!to!identify!a!new!species;!
molecular!biologists,!on!the!contrary,!seem!to!be!rather!bemused!when!
stumbling!on!‘novel’!genes.!
!
An!alternative!approach!is!to!avoid!the!jettisoning!of!data!as!something!that!runs!
the!danger!of!making!our!basic!observations!of!the!natural!world!too!theory-
laden.!A!data-driven!approach!would!treat!TRGs!that!align!to!EST!sequences!as!
unique!functional!genes!until!proven!otherwise.!As!Nichols!et!al.!(2011!p.!147)!
argue,!“evolutionary!conservation!is!not!a!reliable!indicator!of!the!importance!of!
an!orphan!to!the!organism…orphans!may!have!evolved!to!fulfill!an!important!but!
specialized!function!required!by!the!niche!of!the!organism.”!It!is!notable!that!in!
cancer!research,!human!genomics!has!led!to!a!data-first!approach!that!has!
yielded!insights!unanticipated!by!hypothesis-first!approaches!(Golub,!2010).!
Similarly,!genome!sequencing!of!multiple!genomes!across!the!diversity!of!life!is!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
15!
yielding!insights!for!evolution,!which!were!unanticipated!by!current!paradigms!
(e.g.!Koonin,!2009,!Boto,!2010).!!
!
6.!Systematics!of!TRGs!
!
6.1!Phylostratigraphy!
As!we!noted!above,!a!distinct!advantage!(in!terms!of!conceptual!clarity)!is!
afforded!by!the!term!“taxonomically!restricted!gene”!(TRG)!over!“orphan”.!This!
is!especially!true!with!respect!to!the!possible!systematic!utility!of!a!coding!
sequence.!!A!gene!found!in!all!Metazoa,!for!example,!but!not!elsewhere,!will!not!
be!useful!(in!terms!of!presence/absence!data)!for!diagnosing!the!genus!
I+#,#K>"$%,!or,!for!that!matter,!the!phylum!Arthropoda!–!but!that!same!gene!will!
pick!out!a!metazoan!from!the!larger!universe!of!organisms!on!Earth.!!Thus,!the!
comparative!analysis!of!the!distribution!of!any!gene!calls!(necessarily)!for!the!
specifying!of!the!taxonomic!category!providing!the!reference!class!(i.e.,!
specifying!the!TRG!criterion!of!taxonomic!category;!see!2.1,!above).!!In!the!case!
of!the!TRG!whose!reference!category!is!“Metazoa,”!the!cladistic!dictum!
“symplesiomorphy!becomes!synapomorphy!at!a!higher!level”!explains!how!
absence!of!systematic!utility!for!one!question!(e.g.,!for!diagnosing!I+#,#K>"$%!or!
Arthropoda,!nested!within!Metazoa)!–!because!the!character!is!distributed!too!
broadly!–!changes!to!usefulness!when!the!question!itself!changes!to!a!broader!
scope:!what!genes!might!diagnose!animals!as!a!taxon?!!“Symplesiomorphic!
similarities!are!obviously!homologous,”!argues!de!Pinna!(1991)!–!for!example,!
any!TRG!found!throughout!the!Metazoa,!but!not!elsewhere!–!“but!every!
symplesiomorphy!is!a!synapomorphy!at!a!higher!level,!and!it!is!the!knowledge!of!
this!that!allows!recognition!of!symplesiomorphies!in!the!first!place.”!!More!
precisely,!
!
Every!hypothesis!of!homology!is!a!hypothesis!of!monophyletic!grouping!and,!in!
any!particular!context,!a!symplesiomorphy!is!a!hypothesis!of!a!set,!and!a!
synapomorphy!is!a!hypothesis!of!a!subset!of!that!set.!!Symplesiomorphy!and!
synapomorphy!are!thus!terms!for!homologies!which!stand!in!hierarchic!relation!
to!each!other.!!(Patterson!1982,!33)!
!
The!project!of!mapping!gene!distributions!of!fully-sequenced!genomes!onto!
taxonomic!(or!phylogenetic)!categories!–!in!effect,!determining!how!gene!
distributions!stand!in!hierarchic!relation!to!each!other!–!has!been!developed!
most!fully!by!Domazet-Lošo!and!Tautz!(2007,!2008,!2010)!in!a!method!they!have!
dubbed!“phylostratigraphy.”!!Consider!I+#,#K>"$%&A*$%(#@%,-*+!within!its!usual!
sequence!of!systematic!ranks:!
!
I+#,#K>"$%&A*$%(#@%,-*+&
!!!Diptera!
!!!!!!Endopterygota!
!!!!!!!!!Insecta!
!!!!!!!!!!!!Pancrustacea!
!!!!!!!!!!!!!!!Arthropoda!
!!!!!!!!!!!!!!!!!!Protostomia!
!!!!!!!!!!!!!!!!!!!!!Bilateria!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
16!
!!!!!!!!!!!!!!!!!!!!!!!!Eumetazoa!
!!!!!!!!!!!!!!!!!!!!!!!!!!!Metazoa!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Holozoa!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Opisthokonta!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Eukaryota!
!
As!one!descends!(or!ascends)!within!this!hierarchy,!more!(or!less)!inclusive!sets!
of!genes!will!be!present,!at!what!Domazet-Lošo!and!Tautz!term!“phylostrata”!
(singular,!phylostratum)!characterized!by!“founder!genes”!–!“the!
phylogenetically!oldest!genes!forming!the!basis!of!a!new!gene!lineage,!new!
protein!domain!or!new!gene!family”!(Tautz!and!Domazet-Lošo!2011,!p.!693).!!
Diagrams!of!the!same!form!can!be!plotted!for!any!species!(although!obviously!
taxonomic!depth!will!be!much!shallower!for!prokaryotic!taxa),!making!
phylostratigraphy!an!excellent!comparative!tool!for!analyzing!TRG!distribution!
patterns.!These!methods!become!successively!more!informative!as!more!
genomes!are!sequenced,!and!sequencing!of!congeneric!species!is!particularly!
consequential!for!our!understanding!of!species-specific!genes.!
!
Figure!3!shows!the!phylostratigraphy!of!the!genome!of!I5&A*$%(#@%,-*+,!with!the!
phylostrata!extending!from!the!species!(on!the!left)!to!the!Eukaryota.!!Relative!
numbers!of!genes!present!at!each!stratum!are!plotted!on!the!vertical!axis.!!Notice!
the!“spike”!in!gene!innovation!at!the!appearance!of!the!genus!Drosophila.!When!
only!the!I5&A*$%(#@%,-*+!genome!had!been!sequenced,!this!spike!appeared!to!be!
at!the!species!level!(Domazet-Lošo!et!al.,!2007),!but!sequencing!of!11!congeneric!
species!pushed!this!back!to!the!genus!level.!Another!analysis!examining!the!
emergence!of!novel!protein!domains!strongly!reinforces!the!signal!of!a!spike!of!
innovation!at!the!origin!of!the!genus!Drosophila.&Once!I5&A*$%(#@%,-*+’s!11!
congeners!were!added,!“the!Drosophila!lineages!see!a!3-fold!increase!in!domain!
emergence,”!relative!to!the!8!other!pancrustacean!species!sequenced!(Moore!and!
Bornberg-Bauer!2011,!p.!4;!see!their!Figure!1,!p.!3).!
!
Perhaps!one!future!application!of!phylostratigraphy!will!be!the!defining!of!
natural!groups!above!the!species!level.!For!example,!the!large!number!of!genes!
unique!to!the!genus!Drosophila!shown!in!Figure!3!might!suggest!that!this!is!a!
genuine!higher!taxonomic!category.!Whether!such!peaks!will!persist!as!
taxonomic!coverage!of!genome!sequences!improves!remains!to!be!seen.!The!
apparent!increase!in!domain!innovation!within!the!genus!Drosophila,!compared!
to!the!other!pancrustaceans,!could!be!an!artifact!of!limited!genomic!sampling!of!
the!other!8!genera.!!All!are!currently!represented!by!a!genome!sequence!of!a!
single!species,!except!for!Anopheles!(two!species,!45&@%AM"%*!and!45&%@.K-").!
!!
6.2!Phylogenetic!reconstruction!
!
Current!molecular!systematics!relies!largely!upon!methods!of!phylogenetic!
reconstruction!based!on!gene!sequences!that!are!both!shared!and!variable!
among!members!of!a!group!of!taxa!being!studied.!With!these!methods,!genes!can!
only!provide!usable!data!to!test!hypotheses!about!relationships!within!the!
taxonomic!level!to!which!they!are!restricted.!When!a!TRG!is!shared!among!
species!(synapomorphic)!it!can!be!useful!in!this!way,!but!when!a!TRG!is!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
17!
restricted!to!a!single!species,!it!is!autoapomorphic!and!no!statements!on!
relationships!are!possible!from!it!(Wägele,!2005!p.!129).!However,!both!
synapomorphic!and!autoapomorphic!TRGs!may!be!useful!as!defining!
(supporting)!characters!for!the!taxon!in!which!they!occur!(Monsch,!2003,!
Wägele,!2005!p.!27).!!
!
Since!genome!sequencing!became!common,!there!has!been!some!exploration!of!
the!use!of!gene!content!data!for!phylogenetic!inference.!For!example,!Snel!et!al.!
(1999)!constructed!a!distance-based!phylogeny!for!13!unicellular!species!based!
on!gene!content,!defining!distance!in!terms!of!number!of!shared!versus!unshared!
genes;!their!results!correlated!with!those!from!16S!rRNA.!Other!methods!have!
used!gene!family!content!methods!of!phylogenetic!reconstruction!exclude!gene!
families!with!single!members,!and!hence!discard!autoapomorphic!TRGs!(e.g.!
Hughes!et!al.,!2005,!Lienau!et!al.,!2006).!!
!
Different!genes!may!have!different!evolutionary!histories!(Doolittle,!1999),!and!
thus!these!trees!are!best!viewed!simply!as!“a!means!to!capture!and!compare!the!
overwhelming!amount!of!information!that!is!present!in!genomes”!(Snel!et!al.,!
2005!p.!193).!It!has!been!argued!that!gene!gain!can!provide!convincing!
characters,!as!the!occurrence!of!homoplasy!is!unlikely!(Boore!and!Fuerstenberg,!
2008)!whereas!convergent!gene!losses!are!likely!and!so!are!less!reliable!
characters.!As!with!the!identification!of!TRGs!in!general,!gene-gains!could!be!
falsely!inferred!if!homologs!are!missed!due!to!rapid!evolution,!gaps!in!draft!
genomes!or!poor!gene-finding!models!(Boore,!2006).!
!
6.3!Supporting!characters!
!
In!2000,!Carl!Woese’s!group!(Graham!et!al.,!2000)!used!newly!completed!
genomes!from!four!major!euryarchaeal!taxa!to!identify!defining!characters!for!
the!Euryarchaeota!in!terms!of!“signature!proteins”!that!were!taxonomically!
restricted!in!that!they!had!no!recognizable!bacterial!or!eukaryal!homologs.!They!
suggested!that!this!could!herald!a!new!approach!to!taxonomy:!
!
This strategy of identifying genes that function uniquely in a lineage can be applied to
any phylogenetically related group of organisms. The comprehensive nature of
genomic analysis brings an unprecedented objectivity to describing cell lineages:
genomics raises taxonomy to a new level. Whereas earlier taxonomies identified and
related organisms, the new taxonomy will elaborate those relationships, allowing the
biologist to see the essential character of a group and (to some extent) the mode of
that group’s evolution.
!
Such!an!approach!may!have!clarified!the!systematic!relationships!of!the!
myxozoans.!Phylogenetic!analyses!of!widely-shared!genes!have!grouped!
myxozoans!variously!as!a!sister!taxon!to!the!Bilateria!or!within!the!Cnidaria,!but!
Holland!et!al.!(2011)!claim!to!have!demonstrated!that!the!latter!placement!is!
correct!due!to!the!discovery!of!a!novel!minicollagen!TRG!(Tb-Ncol-1).!This!
“represents!the!first!example!of!using!a!gene!associated!with!a!phylum-specific!
morphological!novelty!to!infer!placement!within!the!Metazoa”!(Holland!et!al.,!
2011).!!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
18!
However,!inference!of!sister-group!status!based!on!single!TRGs!may!be!
inherently!unreliable.!The!pea!aphid!genome!contains!four!carotenoid!
desaturase!genes!and!three!proteins!consisting!of!fused!carotenoid!cyclase–
carotenoid!synthase!enzymes;!these!encode!a!functional!biochemical!pathway,!
but!searching!the!GenBank!protein!database!revealed!no!detectable!homologs!in!
any!other!available!animal!genome!(Moran!and!Jarvik,!2010).!The!genes!
therefore!appeared!to!be!taxonomically!restricted!within!Animalia!to!the!
Aphididae.!However,!homologs!were!found!in!several!fungal!genomes,!where!the!
pattern!of!arrangement!of!these!genes!was!similar!(Moran!and!Jarvik,!2010).!
Since!this!discovery,!homologs!have!also!been!found!in!the!two-spotted!spider!
mite,!found!in!a!different!class!of!Arthropoda!to!the!aphids.!Clearly!these!TRGs!
would!be!highly!misleading!if!used!as!evidence!for!a!sister-group.!
!
Analysis!of!TRGs!can!perhaps!more!reliably!shed!light!on!morphological!
characters!whose!status!has!been!controversial.!The!regeneration!of!limbs!in!
salamanders!has!often!been!considered!a!symplesiomorphic!character,!but!
Garza-Garcia!et!al.!(2010)!provided!evidence!for!its!apomorphy!by!identifying!a!
salamander-specific!protein!(Prod!1)!with!a!central!role!in!limb-regeneration.!
!
6.!Concluding!remarks!
!
Understanding!the!taxonomic!distribution!of!genes!within!the!diversity!of!life!is!
an!ever-growing!task,!lying!at!the!intersection!of!current!genomics!and!
systematics.!Every!gene!is!at!some!level!taxonomically!confined,!except!for!a!
handful!of!genes!involved!in!DNA!replication,!transcription!and!translation!that!
appear!to!be!universal!(Harris!et!al.,!2003).!As!reviewed!here,!a!most!surprising!
aspect!of!our!recently!acquired!knowledge!of!gene!distribution!has!been!the!very!
large!number!of!genes!that!are!confined!to!a!single!genus!or!species.!Another!
closely!related!surprise!has!been!the!frequency!of!genes!that!show!apparently!
homoplasious!patterns!of!taxonomic!restriction;!these!have!not!been!the!focus!of!
this!chapter,!and!are!the!subject!of!a!substantial!body!of!literature!on!lateral!(or!
horizontal)!gene!transfer!(for!reviews!see!Keeling!and!Palmer,!2008,!Boto,!2010,!
Zhaxybayeva!and!Doolittle,!2011).!
!
Those!who!learned!their!phylogenetics!prior!to!the!DNA!sequencing!revolution!
may!still!feel!a!sense!of!frank!awe!at!the!ocean!of!surprising!data!on!which!they!
are!now!able!to!sail.!!Between!1949!and!1955,!Frederick!Sanger!painstakingly!
sequenced!bovine!insulin!–!a!single,!relatively!small!hormone.!!Throughout!the!
1970s!and!80s,!into!the!early!1990s,!molecular!phylogenies!were!constructed!on!
the!basis!of!a!handful!of!ribosomal!RNAs!or!highly!conserved!protein!sequences.!!
Today,!entire!genomes!are!sequenced!in!a!few!days’!time,!and!with!the!
increasing!speed!and!decreasing!cost!of!improving!technology,!the!effort!
required!for!obtaining!whole!genomes!can!be!expected!to!shrink!further.!!!
!
An!illuminating!parallel!can!be!drawn!from!the!history!of!astronomy.!!In!the!
early!decades!of!the!20th!century,!the!dimensions!of!the!entire!physical!universe!
were!thought!by!astronomer!Harlow!Shapley!to!extend!to!~300,000!light!years,!
encompassing!only!our!galaxy,!the!Milky!Way.!!Spiral!“nebulae,”!on!Shapley’s!
view,!lay!within!the!Milky!Way!–!until!a!powerful!new!instrument,!the!100!inch!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
19!
Hooker!reflector!at!Mt.!Wilson,!manned!by!Edwin!Hubble,!showed!the!presence!
of!Cepheid!variable!stars!in!those!nebulae.!!Indeed,!the!nebulae!were!not!nebulae!
(i.e.,!clouds)!at!all,!but!distant!galaxies!–!“island!universes”!of!their!own!(Trimble!
1995).!!The!extent!of!the!physical!universe!was!vastly!greater!than!Shapley!
imagined.!The!instrument!enabled!the!discovery,!and!the!profound!change!of!
theoretical!outlook.!!Similarly,!rapid!and!increasingly!inexpensive!DNA!
sequencing!is!expanding!the!genetic!(and!proteomic)!universe!well!beyond!what!
any!biologist!could!have!imagined,!prior!to!the!mid-1990s.!The!impact!of!these!
data!on!systematics!and!our!knowledge!of!evolution!cannot!be!overstated.!
!
!
!
References!
!
!
Abroi!A,!Gough!J!(2011).!Are!viruses!a!source!of!new!protein!folds!for!organisms?!
–!Virosphere!structure!space!and!evolution.!!"#D,,%.,&33(8):626-35!
Altschul!SF,!Gish!W,!Miller!W,!Myers!EW,!Lipman!DJ!(1990).!Basic!local!alignment!
search!tool.!S#C+(%$&#?&7#$*=C$%+&!"#$#@.!215(3):403-10.!
Altschul!SF,!Madden!TL,!Schäffer!AA,!Zhang!J,!Zhang!Z,!Miller!W,!Lipman!DJ!
(1997).!Gapped!BLAST!and!PSI-BLAST:!a!new!generation!of!protein!
database!search!programs.!GC=$*"=&4="8,&E*,*%+=>!25(17):3389-402.!
Ang!D,!Georgopoulos!C!(2012).!An!ORFan!no!more:!the!bacteriophage!T4!39.2!
gene!product,!NwgI,!modulates!GroEL!chaperone!function.!T*(*-"=,!
190(3):!989-1000.!
Armengaud!J,!Bland!C,!Christie-Oleza!J,!Miotello!G!(2011).!Microbial!
proteogenomics,!gaining!ground!with!the!avalanche!of!genome!
sequences.!!S#C+(%$&#?&&!%=-*+"#$#@.&%(8&&P%+%,"-#$#@.!!S3-001.!
Baumdicker!F,!Hess!WR,!Pfaffelhuber!P!(2010).!The!diversity!of!a!distributed!
genome!in!bacterial!populations.!4((%$,&#?&4KK$"*8&P+#M%M"$"-.!20!
(5):1567–1606.!
Baumdicker!F,!Hess!WR,!Pfaffelhuber!P!(2012).!The!infinitely!many!genes!model!
for!the!distributed!genome!of!bacteria.!T*(#A*&!"#$#@.&%(8&D)#$C-"#(!4!
(4):!443-456.!
Beiko,!RG!(2011).!Telling!the!whole!story!in!a!10,000-genome!world.!!"#$#@.&
I"+*=-!2011,!6:34!
Begun!DJ,!Lindfors!HA,!Kern!AD,!Jones!CD!(2007).!Evidence!for!8*&(#)#&evolution!
of!testis-expressed!genes!in!the&I+#,#K>"$%&.%JCM%/I+#,#K>"$%&*+*=-%!
clade.!T*(*-"=,!176(2):!1131-1137.!
Bench!SR,!Hanson!TE,!Williamson!KE,!Ghosh!D,!Radosovich!M,!Wang!K/&et!al.!
(2007).!Metagenomic!characterization!of!Chesapeake!Bay!virioplankton.!
4KK$"*8&%(8&D()"+#(A*(-%$&7"=+#M"#$#@.!73(23):!7629-7641.!
Boissy!R,!Ahmed!A,!Janto!B,!Earl!J,!Hall!BG,!Hogg!JS,!Pusch!GD,!Hiller!LN,!Powell!E,!
Hayes!J,!Yu!S,!Kathju!S,!Stoodley!P,!Post!JC,!Ehrlich!GD,!Hu!FZ!(2011).!
Comparative!supragenomic!analyses!among!the!pathogens!
<-%K>.$#=#==C,&%C+*C,,!<-+*K-#=#==C,&K(*CA#("%*,!and!O%*A#K>"$C,&
"(?$C*(U%*!using!a!modification!of!the!finite!supragenome!model.!!79&
T*(#A"=,5!12:187.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
20!
Boore!JL!(2006).!The!use!of!genome-level!characters!for!phylogenetic!
reconstruction.!V+*(8,&"(&D=#$#@.&W&D)#$C-"#(&21(8):!439-446.!
Boore!JL,!Fuerstenberg!SI!(2008).!Beyond!linear!sequence!comparisons:!the!use!
of!genome-level!characters!for!phylogenetic!reconstruction.!P>"$#,#K>"=%$&
V+%(,%=-"#(,&#?&->*&E#.%$&<#="*-.&!X&!"#$#@"=%$&<="*(=*,!363(1496):!1445-
1451.!
Boto!L!(2010).!Horizontal!gene!transfer!in!evolution:!facts!and!challenges.!
P+#=**8"(@,&#?&->*&E#.%$&<#="*-.&!X&!"#$#@"=%$&<="*(=*,!277(1683):!819-
827.!
Boyer!M,!Gimenez!G,!Suzan-Monti!M,!Raoult!D!(2010).!Classification!and!
determination!of!possible!origins!of!ORFans!through!analysis!of!
nucleocytoplasmic!large!DNA!viruses.!Y(-*+)"+#$#@y!53(5):310-20.!
Breitbart,!M!(2012).!!Marine!Viruses:!Truth!or!Dare.!4((C%$&E*)"*N&#?&7%+"(*&
<="*(=*&4:425–48.!
Campbell!MA,!Zhu!W,!Jiang!N,!Lin!H,!Ouyang!S,!Childs!KL/&et!al.!(2007).!
Identification!and!Characterization!of!Lineage-Specific!Genes!within!the!
Poaceae.!P$%(-&P>.,"#$#@.!145(4):!1311-1322.!
Cardoso-Moreira!M,!Long!M!(2012).!The!Origin!and!Evolution!of!New!Genes.!
7*->#8,&"(&7#$*=C$%+&!"#$#@.!856:161-86.!
Carvunis!A-R,!Rolland!T,!Wapinski!I,!Calderwood!MA,!Yildirim!MA,!Simonis!N/&et!
al.!(2012).!Proto-genes!and!8*&(#)#!gene!birth.!G%-C+*!487(7407):!370-
374.!
Chan,!CX,!Darling!AE,!Beiko!RG,!Ragan!MA!(2009).!Are!protein!domains!modules!
of!lateral!genetic!transfer?!P6#<&Z(*!4(2):e4524.!
Clamp!M,!Fry!B,!Kamal!M,!Xie!XH,!Cuff!J,!Lin!MF/&et!al.!(2007).!Distinguishing!
protein-coding!and!noncoding!genes!in!the!human!genome.!P+#=**8"(@,&#?&
->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!104(49):!
19428-19433.!
Dai!D,!Chen!Y,!Chen!S,!Mao!Q,!Kennedy!K,!Landback!P/&et!al.!(2008).!The!evolution!
of!courtship!behaviors!through!the!origination!of!a!new!gene!in!
I+#,#K>"$%.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&
<-%-*,&#?&4A*+"=%!105(21):!7478!-!7483.!
Daubin!V,Ochman!H!(2004).!Bacterial!genomes!as!new!gene!homes:!the!
genealogy!of!ORFans!in!E.!coli.!T*(#A*&E*,*%+=>!14:!1036-1042.!
de!Pinna!MGG!(1991).!!Concepts!and!tests!of!homology!in!the!cladistic!paradigm.!
9$%8",-"=,!7:367-394.!!
Ding!Y,!Zhou!Q,Wang!W!(2012).!Origins!of!new!genes!and!evolution!of!their!novel!
functions.!4((C%$&E*)"*N&#?&D=#$#@./&D)#$C-"#(/&%(8&<.,-*A%-"=,!43(1):!
345-363.!
Djebali!S,!Davis!CA,!Merkel!A,!Dobin!A,!Lassmann!T,!Mortazavi!A/&et!al.!(2012).!
Landscape!of!transcription!in!human!cells.!G%-C+*!489(7414):!101-108.!
Domazet-Lošo!T,Tautz!D!(2003).!An!evolutionary!analysis!of!orphan!genes!in!
Drosophila.!T*(#A*&E*,*%+=>!13(10):!2213!-!2219.!
Domazet-Lošo!T,!Tautz!D!(2007).!A!phylostratigraphy!approach!to!uncover!the!
genomic!history!of!major!adaptations!in!metazoan!lineages.!V+*(8,&"(&
T*(*-"=,&23(11):533-9.!
Domazet-Lošo!T,!Tautz!D!(2008).!An!ancient!evolutionary!origin!of!genes!
associated!with!human!genetic!diseases.!7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!
25(12):2699-707.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
21!
Domazet-Lošo!T,!Tautz!D!(2010).!A!phylogenetically!based!transcriptome!age!
index!mirrors!ontogenetic!divergence!patterns.!G%-C+*!468(7325):815-8.!
Donoghue!M,!Keshavaiah!C,!Swamidatta!S,!Spillane!C!(2011).!Evolutionary!
origins!of!Brassicaceae!specific!genes!in!Arabidopsis!thaliana.!!79&
D)#$C-"#(%+.&!"#$#@.!11(1):!47.!
Doolittle!RF!(1997).!A!bug!with!excess!gastric!activity.!G%-C+*!388:!515-516.!
Doolittle!RF!(2002).!Biodiversity:!Microbial!genomes!multiply.!G%-C+*!
416(6882):!697-700.!
Doolittle!W!(1999).!Phylogenetic!classification!and!the!universal!tree.!<="*(=*!
284:!2124!-!2129.!!
Dujon!B!(1996).!The!yeast!genome!project:!what!did!we!learn?!V+*(8,&"(&T*(*-"=,!
12(7):!263-270.!
Dunn!B,!Richter!C,!Kvitek!DJ,!Pugh!T,!Sherlock!G!(2012).!Analysis!of!the!
<%==>%+#A.=*,&=*+*)","%*!pan-genome!reveals!a!pool!of!copy!number!
variants!distributed!in!diverse!yeast!strains!from!differing!industrial!
environments.!T*(#A*&E*,*%+=>!22(5):!908-924.!
Edwards!AM,!Isserlin!R,!Bader!GD,!Frye!SV,!Willson!TM,!Yu!FH!(2011).!Too!many!
roads!not!taken.!G%-C+*!470(7333):!163-165.!
Edwards!RA,!Rohwer!F!(2005).!Viral!metagenomics.!G%-C+*&E*)"*N,&7"=+#M"#$#@.!
3(6):!504-510.!
Eisen,!JA!(1998).!Phylogenomics:!improving!functional!predictions!for!
uncharacterized!genes!by!evolutionary!analysis.!T*(#A*&E*,*%+=>!8:163-
167.!
Extavour,!CG!(2011).!Long-Lost!Relative!Claims!Orphan!Gene:!#,J%+!in!a!Wasp.!
P6#<&T*(*-"=,!7(4):!e1002045.!!
Fischer!D,!Eisenberg,!D!(1999).!Finding!families!for!genomic!ORFans!
!"#"(?#+A%-"=,!15!(9):!759-762.!
Fisher!RA!(1930).!The!Genetical!Theory!of!Natural!Selection.!Oxford!University!
Press:!Oxford.!
Forterre!P!(2006).!DNA!topoisomerase!V:!a!new!fold!of!mysterious!origin.!V+*(8,&
"(&!"#-*=>(#$#@.!24(6):!245-247.!
Forterre!P,Prangishvili!D!(2009).!The!origin!of!viruses.!E*,*%+=>&"(&7"=+#M"#$#@.!
160(7):!466-472.!!
Fraune!S,!Augustin!R,!Anton-Erxleben!F,!Wittlieb!J,!Gelhaus!C,!Klimovich!VB,!
Samoilovich!MP,!Bosch!TCG!(2010)!In!an!early!branching!metazoan,!
bacterial!colonization!of!the!embryo!is!controlled!by!maternal!
antimicrobial!peptides.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&
->*&'("-*8&<-%-*,&#?&4A*+"=%!107(42):!18067-18072!
Garza-Garcia!AA,!Driscoll!PC,!Brockes!JP!(2010).!Evidence!for!the!local!evolution!
of!mechanisms!underlying!limb!regeneration!in!salamanders.!Y(-*@+%-")*&
%(8&9#AK%+%-")*&!"#$#@.!50(4):!528-535.!
Golub!T!(2010).!Counterpoint:!Data!first.!G%-C+*!464(7289):!679.!
Graham!DE,!Overbeek!R,!Olsen!GJ,!Woese!CR!(2000).!An!archaeal!genomic!
signature.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&
<-%-*,&#?&4A*+"=%!97(7):!3304-3308.!
Hahn!MW,!Han!MV,!Han!SG!(2007).!Gene!family!evolution!across!12!Drosophila!
genomes.!P6#<&T*(*-"=,!3(11):!e197.!
Harris!JK,!Kelley!ST,!Spiegelman!GB,!Pace!NR!(2003).!The!Genetic!Core!of!the!
Universal!Ancestor.!T*(#A*&E*,*%+=>!13(3):!407-412.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
22!
Heinen!TJAJ,!Staubach!F,!Häming!D,!Tautz!D!(2009).!Emergence!of!a!new!gene!
from!an!intergenic!region.!9C++*(-&!"#$#@.!19(18):!1527-1531.!
Hillis,!DM!(1994).!Homology!in!molecular!biology.!In!Hall,!BK!(ed.)!Homology:!
the!hierarchical!basis!of!comparative!biology.!Academic!Press,!San!Diego,!
CA,!pp.!339-368.!
Holland!JW,!Okamura!B,!Hartikainen!H,!Secombes!CJ!(2011).!A!novel!
minicollagen!gene!links!cnidarians!and!myxozoans.!P+#=**8"(@,&#?&->*&
E#.%$&<#="*-.&!X&!"#$#@"=%$&<="*(=*,!278(1705):!546-553.!
Hughes!AL,!Ekollu!V,!Friedman!R,!Rose!JR!(2005).!Gene!family!content-based!
phylogeny!of!prokaryotes:!The!effect!of!criteria!for!inferring!homology.!
<.,-*A%-"=&!"#$#@.!54(2):!268-276.!
Jackson!DJ,!McDougall!C,!Woodcroft!B,!Moase!P,!Rose!RA,!Kube!M/&et!al.!(2010).!
Parallel!Evolution!of!Nacre!Building!Gene!Sets!in!Molluscs.!7#$*=C$%+&
!"#$#@.&%(8&D)#$C-"#(!27(3):!591-608.!
Jacob!F!(1977).!Evolution!and!tinkering.!<="*(=*!196(4295):!1161-1166.!
Johnson!B,!Tsutsui!N!(2011).!Taxonomically!restricted!genes!are!associated!with!
the!evolution!of!sociality!in!the!honey!bee.!!79&T*(#A"=,!12(1):!164.!
Kaessmann!H!(2010).!Origins,!evolution,!and!phenotypic!impact!of!new!genes.!
T*(#A*&E*,*%+=>!20(10):!1313-1326.!
Keeling!PJ,!Palmer!JD!(2008).!Horizontal!gene!transfer!in!eukaryotic!evolution.!
G%-C+*&E*)"*N,&T*(*-"=,!9(8):!605-618.!
Kessler!MM,!Zeng!Q,!Hogan!S,!Cook!R,!Morales!AJ,!Cottarel!G!(2003).!Systematic!
discovery!of!new!genes!in!the&<%==>%+#A.=*,&=*+*)","%*!genome.!T*(#A*&
E*,*%+=>!13(2):!264-271.!
Khalturin!K,!Hemmrich!G,!Fraune!S,!Augustin!R,!Bosch!T!(2009).!More!than!just!
orphans:!are!taxonomically-restricted!genes!important!in!evolution?!
V+*(8,&"(&T*(*-"=,!25(9):!404!-!413.!
Knowles!DG,!McLysaght!A!(2009).!Recent!8*&(#)#!origin!of!human!protein-
coding!genes.!T*(#A*&E*,*%+=>!19(10):!1752-1759.!
Koonin!EV!(2009).!Darwinian!evolution!in!the!light!of!genomics.!GC=$*"=&4="8,&
E*,*%+=>!37(4):!1011-1034.!
Koonin,!EV!(2011).!The!Logic!of!Chance:!The!Nature!and!Origin!of!Biological!
Evolution.!FT!Press!Science:!Upper!Saddle!River,!NJ.!
Koonin!EV,!Wolf!YI!(2008).!Genomics!of!bacteria!and!archaea:!the!emerging!
dynamic!view!of!the!prokaryotic!world.!GC=$*"=&4="8,&E*,*%+=>!36(21):!
6688-6719.!
Koski!LB,!Golding!GB!(2001).!The!closest!BLAST!hit!is!often!not!the!nearest!
neighbor.!S#C+(%$&#?&7#$*=C$%+&D)#$C-"#(!52(6):!540-542.!
Lapierre!P,!Gogarten!JP!(2009).!Estimating!the!size!of!the!bacterial!pan-genome.!
V+*(8,&"(&T*(*-"=,!25(3):107-10.!
Lefébure!T,!Bitar!PDP,!Suzuki!H,!and!Stanhope!MJ!(2010).!Evolutionary!Dynamics!
of!Complete!9%AK.$#M%=-*+!Pan-Genomes!and!the!Bacterial!Species!
Concept.!T*(#A*&!"#$#@.&%(8&D)#$C-"#(!2:646–655.!
Levine!MT,!Jones!CD,!Kern!AD,!Lindfors!HA,!Begun!DJ!(2006).!Novel!genes!
derived!from!noncoding!DNA!in!Drosophila!melanogaster!are!frequently!
X-linked!and!exhibit!testis-biased!expression.!P+#=**8"(@,&#?&->*&G%-"#(%$&
4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!103(26):!9935-9939.!!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
23!
Li!D,!Dong!Y,!Jiang!Y,!Jiang!H,!Cai!J,!Wang!W!(2010a).!A!de!novo!originated!gene!
depresses!budding!yeast!mating!pathway!and!is!repressed!by!the!protein!
encoded!by!its!antisense!strand.!9*$$&E*,*%+=>!20(4):!408-420.!
Li!R,!Li!Y,!Zheng!H,!Luo!R,!Zhu!H,!Li!Q/&et!al.!(2010b).!Building!the!sequence!map!
of!the!human!pan-genome.!G%-C+*&!"#-*=>(#$#@.!28(1):!57-63.!
Lienau!EK,!DeSalle!R,!Rosenfeld!JA,!Planet!PJ!(2006).!Reciprocal!illumination!in!
the!gene!content!tree!of!life.!<.,-*A%-"=&!"#$#@.!55(3):!441-453.!!
Lipman!D,!Souvorov!A,!Koonin!E,!Panchenko!A,!Tatusova!T!(2002).!The!
relationship!of!protein!conservation!and!sequence!length.!!79&
D)#$C-"#(%+.&!"#$#@.!2(1):!20.!
Long!M!(2001).!Evolution!of!novel!genes.!9C++*(-&ZK"("#(&"(&T*(*-"=,&W&
I*)*$#KA*(-!11(6):!673-680.!
Long!M,!Betran!E,!Thornton!K,!Wang!W!(2003).!The!origin!of!new!genes:!
glimpses!from!the!young!and!old.!G%-C+*&E*)"*N,&T*(*-"=,!4(11):!865-875.!
Lynch!JA,!Özüak!O,!Khila!A,!Abouheif!E,!Desplan!C,!Roth!S!(2011).!The!
phylogenetic!origin!of!#,J%+!coincided!with!the!origin!of!maternally!
provisioned!germ!plasm!and!pole!cells!at!the!base!of!the!Holometabola.!
P6#<&T*(*-"=,&7(4):!e1002029.!
Merkeev!I,!Novichkov!P,Mironov!A!(2006).!PHOG:!a!database!of!supergenomes!
built!from!proteome!complements.!!79&D)#$C-"#(%+.&!"#$#@.!6(1):!52.!
Mira!A,!Martín-Cuadrado,!AB,!D’Auria!G,!Rodríguez-Valera!F!(2010).!The!
bacterial!pan-genome:!a!new!paradigm!in!microbiology.!Y(-*+(%-"#(%$&
7"=+#M"#$#@.&13:45-57.!
Monsch!KA!(2003).!The!use!of!apomorphies!in!taxonomic!defining.!V%Q#(!52(1):!
105-107.!
Moore!AD,!Bornberg-Bauer!E!(2012).!The!dynamics!and!evolutionary!potential!
of!domain!loss!and!emergence.!7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!29!(2):!
787-796.!
Moran!NA,Jarvik!T!(2010).!Lateral!transfer!of!genes!from!fungi!underlies!
carotenoid!production!in!aphids.!<="*(=*!328(5978):!624-627.!!
Morgante!M,!De!Paoli!E,Radovic!S!(2007).!Transposable!elements!and!the!plant!
pan-genomes.!9C++*(-&ZK"("#(&"(&P$%(-&!"#$#@.!10(2):!149-155.!
Neme!R,Tautz!D!(2013).!Phylogenetic!patterns!of!emergence!of!new!genes!
support!a!model!of!frequent!de!novo!evolution.!!79&T*(#A"=,!14(1):!117.!
Narra!HP,!Cordes!MHJ,!Ochman!H!(2008).!Structural!features!and!the!persistence!
of!acquired!proteins.!P+#-*#A"=,!8:1-10.!
Nichols!RJ,!Sen!S,!Choo!YJ,!Beltrao!P,!Zietek!M,!Chaba!R/&et!al.!(2011).!Phenotypic!
landscape!of!a!bacterial!cell.!9*$$!144(1):!143-156.!
Ohno!S!(1970).!Evolution!by!gene!duplication.!Springer-Verlag:!New!York.!
Ohno!S!(1984).!Birth!of!a!unique!enzyme!from!an!alternative!reading!frame!of!
the!preexisted,!internally!repetitious!coding!sequence.!P+#=**8"(@,&#?&->*&
G%-"#(%$&4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!81(8):!2421-
2425.!
Patterson!C!(1982).!Morphological!characters!and!homology.!In!K.A.!Joysey!and!
A.E.!Friday!(eds.),!Problems!of!Phylogenetic!Reconstruction!(Academic!
Press:!Longdon).!
Patterson!C!(1988).!Homology!in!classical!and!molecular!biology.!7#$*=C$%+&
!"#$#@.&%(8&D)#$C-"#(!5:603-625.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
24!
Pena-Castillo!L,Hughes!TR!(2007).!Why!are!there!still!over!1000!uncharacterized!
yeast!genes?!T*(*-"=,!176(1):!7-14.!
Pilcher!H!(2013).!All!alone.!G*N&<="*(-",-!217(2900):!40-43.!
Prangishvili!D,!Garrett!RA,Koonin!EV!(2006).!Evolutionary!genomics!of!archaeal!
viruses:!unique!viral!genomes!in!the!third!domain!of!life.!["+C,&E*,*%+=>!
117(1):!52-67.!
Rasko!DA,!Rosovitz!MJ,!Myers!GS,!Mongodin!EF,!Fricke!WF,!Gajer!P,!Crabtree!J,!
Sebaihia!M,!Thomson!NR,!Chaudhuri!R,!Henderson!IR,!Sperandio!V,!Ravel!
J!(2008).!The!pangenome!structure!of!D,=>*+"=>"%&=#$":!comparative!
genomic!analysis!of!D5&=#$"&commensal!and!pathogenic!isolates.!S#C+(%$&#?&
!%=-*+"#$#@.5!190(20):6881-93.!!
Reeck!GR,!de!Haën!C,!Teller!DC,!Doolittle!RF,!Fitch!WM,!Dickerson!RE,!Chambon!
P,!McLachlan!AD,!Margoliash!E,!Jukes!TH!(1987)!"Homology"!in!proteins!
and!nucleic!acids:!a!terminology!muddle!and!a!way!out!of!it.!9*$$!50!(5):!
667!!
Rödelsperger!C,!Streit!A,!Sommer!RJ!(2013)!Structure,!function!and!evolution!of!
the!nematode!genome.!In:!eLS.!John!Wiley!&!Sons,!Ltd:!Chichester.!!
Rost!B!(1999).!Twilight!zone!of!protein!sequence!alignments.!P+#-*"(&
D(@"(**+"(@!12(2):!85-94.!
Rutter!MT,!Cross!KV,!Van!Woert!PA!(2012).!Birth,!death!and!subfunctionalization!
in!the!Arabidopsis!genome.!V+*(8,&"(&P$%(-&<="*(=*!17(4):!204-212!
Sabath!N,!Wagner!A,!Karlin!D!(2012).!Evolution!of!viral!proteins!originated!de!
novo!by!overprinting.!7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!29(12):!3767-3780.!
Shapiro!J!(2011).!Evolution:!A!View!from!the!21st!Century.!FT!Press!Science:!
Upper!Saddle!River,!NJ.!
Siepel!A!(2009).!Darwinian!alchemy:!Human!genes!from!noncoding!DNA.!
T*(#A*&E*,*%+=>&19:1693-95.!
Skovgaard!M,!Jensen!LJ,!Brunak!Sr,!Ussery!D,!Krogh!A!(2001).!On!the!total!
number!of!genes!and!their!length!distribution!in!complete!microbial!
genomes.!V+*(8,&"(&T*(*-"=,!17(8):!425-428.!
Siew!N,!Fischer!D!(2003).!Unraveling!the!ORFan!puzzle.!9#AK%+%-")*&%(8&
\C(=-"#(%$&T*(#A"=,&4!(4):432-441.!
Snel!B,!Bork!P,!Huynen!MA!(1999).!Genome!phylogeny!based!on!gene!content.!
G%-C+*&T*(*-"=,!21(1):!108-110.!
Snel!B,!Huynen!MA,!Dutilh!BE!(2005).!Genome!trees!and!the!nature!of!genome!
evolution.!4((C%$&E*)"*N&#?&7"=+#M"#$#@.!59(1):!191-209.!!
Sonea!S,!Panisset!M!(1980).!Introduction!à!la!nouvelle!bactériologie.!Les!Presses!
de!l'Université!de!Montréal:!Boston,!MA.!
Tautz!D,!Domazet-Lošo!T!(2011).!The!evolutionary!origin!of!orphan!genes.!
G%-C+*&E*)"*N,&T*(*-"=,!12(10):!692-702.!
Tettelin!H,!Masignani!V,!Cieslewicz!MJ!et!al.!(2005).!Genome!analysis!of!multiple!
pathogenic!isolates!of!<-+*K-#=#==C,&%@%$%=-"%*:!Implications!for!the!
microbial!‘‘pan-genome’’.!P+#=**8"(@,&#?&->*&G%-"#(%$&4=%8*A.&#?&<="*(=*,&
#?&->*&'("-*8&<-%-*,&#?&4A*+"=%&102!(39): 13950–13955.!
Tettelin!H,!Riley!D,!Cattuto!C,!Medini,!D!(2008).!Comparative!genomics:!the!
bacterial!pan-genome.!9C++*(-&ZK"("#(&"(&7"=+#M"#$#@.!12:472–477.!!
Toll-Riera!M,!Bosch!N,!Bellora!N,!Castelo!R,!Armengol!L,!Estivill!X/&*-&%$5!(2009).!
Origin!of!primate!orphan!genes:!a!comparative!genomics!qpproach.!
7#$*=C$%+&!"#$#@.&%(8&D)#$C-"#(!26(3):!603!-!612.!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
25!
Touchon!M,!Hoede!C,!Tenaillon!O!et!al.!(2009).!Organised!genome!dynamics!in!
the!D,=>*+"=>"%&=#$"&species!results!in!highly!diverse!adaptive!paths.!P6#<&
T*(*-"=,&5(1):e1000344.!
Trimble!V!(1995).!The!1920!Shapley-Curtis!discussion:!Background,!issues,!and!
aftermath.!PCM$"=%-"#(,&#?&->*&4,-+#(#A"=%$&<#="*-.&#?&->*&P%="?"=!107:1133-44.!
Typas!A,!Banzhaf!M,!van!den!Berg!van!Saparoea!B,!Verheul!J,!Biboy!J,!Nichols!RJ/&
*-&%$5!(2010).!Regulation!of!Peptidoglycan!Synthesis!by!Outer-Membrane!
Proteins.!9*$$!143(7):!1097-1109.!
Wägele!J-W!(2005).!Foundations!of!Phylogenetic!Systematics.!Pfeil-Verlag:!
Munich.!
Wang!X,!Wang!H,!Wang!J!et!al.!(2011)!The!genome!of!the!mesopolyploid!crop!
species!!+%,,"=%&+%K%5&G%-C+*&T*(*-"=,&43:1035-1039!
Wasmuth!J,!Schmid!R,!Hedley!A,!Blaxter!M!(2008).!On!the!extent!and!origins!of!
genic!novelty!in!the!phylum!Nematoda.!P6#<&G*@$*=-*8&V+#K"=%$&I",*%,*,&2!
(7):e258.!
Wilson!BA,!Masel!J!(2011).!Putatively!noncoding!transcripts!show!extensive!
association!with!ribosomes.!T*(#A*&!"#$#@.&%(8&D)#$C-"#(!3:1245-1252.!!
Wilson!GA,!Bertrand!N,!Patel!Y,!Hughes!JB,!Feil!EJ,!Field!D.!(2005).!Orphans!as!
taxonomically!restricted!and!ecologically!important!genes.!7"=+#M"#$#@.!
151!(8):2499-2501.!
Wilson!GA,!Feil!EJ,!Lilley!AK,!Field!D!(2007).!Large-scale!comparative!genomic!
ranking!of!taxonomically!restricted!genes!(TRGs)!in!bacterial!and!
archaeal!genomes.!P6#<&ZGD!2(3):!e324.!
Wissler!L,!Gadau!J,!Simola!DF,!Helmkampf!M!&!Bornberg-Bauer!E!(2013)!
Mechanisms!and!dynamics!of!orphan!gene!emergence!in!insect!genomes!
T*(#A*&!"#$#@.&W&D)#$C-"#(!5!(2):!439-455!
Wolf!YI,!Novichkov!PS,!Karev!GP,!Koonin!EV,!Lipman!DJ!(2009).!The!universal!
distribution!of!evolutionary!rates!of!genes!and!distinct!characteristics!of!
eukaryotic!genes!of!different!apparent!ages.!P+#=**8"(@,&#?&->*&G%-"#(%$&
4=%8*A.&#?&<="*(=*,&#?&->*&'("-*8&<-%-*,&#?&4A*+"=%!106(18):!7273-7280.!
Wright!S!(1931).!Evolution!in!Mendelian!populations.!T*(*-"=,!16:!97-159.!
Wu!D-D,!Irwin!DM,!Zhang!Y-P!(2011).!I*&(#)#!origin!of!human!protein-coding!
genes.!P6#<&T*(*-!7(11):!e1002379.!
Zhaxybayeva!O,!Doolittle!W!(2011).!Lateral!gene!transfer.!9C++*(-&!"#$#@.!21(7):!
R242-246.!
Zuckerkandl!E,!Pauling!L!(1965)!Molecules!as!documents!of!evolutionary!history!
S#C+(%$&#?&V>*#+*-"=%$&!"#$#@.!8(2):!357–366!
Zhou!Q,!Zhang!G,!Zhang!Y,!Xu!S,!Zhao!R,!Zhan!Z,!et!al.!(2008).!On!the!origin!of!new!
genes!in!Drosophila.!T*(#A*&E*,*%+=>!18(9):!1446-1455.!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
26!
!!
Figures!
!
Figure!1.!Chart!showing!accumulation!of!proteins!annotated!in!sequenced!
genomes;!orphans!are!defined!as!proteins!with!no!detectable!homologs!at!
a!BLAST!threshold!of!1!x!10-10.!Redrawn!from!Beiko!(2011)!!"#$#@.&I"+*=-&
6:34,!Figure!2,!with!permission!from!the!author.!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
27!
Figure!2.!Venn!diagram!showing!number!of!unique!and!shared!gene!families!
between!and!among!four!plant!species!genome!sequences.!Two!of!the!
species!are!in!the!same!family!and!three!are!in!the!same!order.!Redrawn!
from!Wang!et!al!(2011).!Reprinted!by!permission!from!Macmillan!
Publishers!Ltd:!G%-C+*&T*(*-"=,!43:1035-1039,!copyright!2011!
!
Final submission after peer review of: P. Nelson & R. Buggs (2016) Next-generation apomorphy: the ubiquity of
taxonomically restricted genes, pp 237-264 in Next Generation Systematics Cambridge University Press.
!
28!
Figure!3.!Phylostratigraphy!for!I+#,#K>"$%&A*$%(#@%,-*+!showing!number!of!
genes!restricted!to!each!taxonomic!level.!Figure!adapted!from!Figure!4b!
of!Tautz!and!Domazet-Lošo!(2011).!Genes!shared!with!all!cellular!life!are!
not!shown.!Reprinted!by!permission!from!Macmillan!Publishers!Ltd:!
G%-C+*&E*)"*N,&T*(*-"=,!12:692-702,!copyright!2011!
!
!
!
!