Derek Caetano-Anollés

Derek Caetano-Anollés
Max Planck Institute for Evolutionary Biology · Department of Evolutionary Genetics

About

62
Publications
9,726
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,358
Citations
Additional affiliations
January 2016 - present
Max Planck Institute for Evolutionary Biology
Position
  • PostDoc Position
August 2009 - January 2016
University of Illinois, Urbana-Champaign
Position
  • PhD Student

Publications

Publications (62)
Preprint
While the origin and early evolution of proteins and their biosynthetic mechanisms remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Here we describe how chronologies and networks provide a window into the past a...
Article
Full-text available
Recruitment is a pervasive activity of life that is at the center of novelty generation and persistence. Without recruitment, novelties cannot spread and biological systems cannot maintain identity through time. Here we explore the problem of identity and change unfolding in space and time. We illustrate recruitment operating at different timescale...
Article
Introduction: While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsys...
Article
Full-text available
Networks describe how parts associate with each other to form integrated systems which often have modular and hierarchical structure. In biology, network growth involves two processes, one that unifies and the other that diversifies. Here, we propose a biphasic (bow-tie) theory of module emergence. In the first phase, parts are at first weakly link...
Article
Full-text available
Trees of life (ToLs) can only be rooted with direct methods that seek optimization of character state information in ingroup taxa. This involves optimizing phylogenetic tree, model and data in an exercise of reciprocal illumination. Rooted ToLs have been built from a census of protein structural domains in proteomes using two kinds of models. Fully...
Article
Full-text available
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of...
Article
Agricultural genetic technologies typically achieve their agronomic aims by introducing laboratory-generated modifications into target species' chromosomes. However, the speed and flexibility of this approach are limited, because modified chromosomes must be vertically inherited from one generation to the next. In an effort to remove this limitatio...
Article
The evolution of structure in biology is driven by accretion and diversification. Accretion brings together disparate parts to form bigger wholes. Diversification provides opportunities for growth and innovation. Here, we review patterns and processes that are responsible for a 'double tale' of accretion and diversification at various levels of com...
Preprint
The evolution of structure in biology is driven by accretion and change. Accretion brings together disparate parts to form bigger wholes. Change provides opportunities for growth and innovation. Here we review patterns and processes that are responsible for a 'double tale' of evolutionary accretion at various levels of complexity, from proteins and...
Article
Harish and Kurland recently compared two evolutionary models that root the tree of life. Both models make use of unordered phylogenetic characters. Here we show that they confused one of their models with models that use Wagner ordered characters. In addition, we also show that these Wagner characters should be polarized a posteriori with the gener...
Article
Full-text available
Economic decisions arise from evaluation of alternative actions in contexts of motivation and memory. In the predatory sea-slug Pleurobranchaea the economic decisions of foraging are found to occur by the workings of a simple, affectively controlled homeostat with learning abilities. Here, the neuronal circuit relations for approach-avoidance choic...
Chapter
The origin and evolution of molecular functions hold the key to the emergence of modern biochemistry and cellular organization. Here we explore the existence of a growing vocabulary in the proteins and molecular functions of Archaea. A genomic census of structural domains and its mappings to Gene Ontology terms provides the raw data for the search...
Article
Agonistic encounters are powerful effectors of future behavior, and the ability to learn from this type of social challenge is an essential adaptive trait. We recently identified a conserved transcriptional program defining the response to social challenge across animal species, highly enriched in transcription factor (TF), energy metabolism, and d...
Article
Full-text available
The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic cod...
Chapter
The natural history of translation is mysterious but central to our understanding of the origin and evolution of biochemistry and life. tRNA is at the center of this biological process. Its interactions with aminoacyl-tRNA synthetase enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpr...
Article
Full-text available
Accretion occurs pervasively in nature at widely different timeframes. The process also manifests in the evolution of macromolecules. Here we review recent computational and structural biology studies of evolutionary accretion that make use of the ideographic (historical, retrodictive) and nomothetic (universal, predictive) scientific frameworks. C...
Article
Full-text available
Historical (ideographic) and non-historical (nomothetic) studies of ribosomal accretion appear to arrive at diametrically opposite conclusions. Phylogenetic analysis of thousands of RNA molecules and protein structures in hundreds of genomes supports the structural origin of the ribosome in RNA decoding and ribosomal mechanics. Predictions from ext...
Article
Full-text available
Certain complex phenotypes appear repeatedly across diverse species due to processes of evolutionary conservation and convergence. In some contexts like developmental body patterning, there is increased appreciation that common molecular mechanisms underlie common phenotypes; these molecular mechanisms include highly conserved genes and networks th...
Article
Full-text available
Time-calibrated phylogenomic trees of protein domain structure produce powerful chronologies describing the evolution of biochemistry and life. These timetrees are built from a genomic census of millions of encoded proteins using models of nested accumulation of molecules in evolving proteomes. Here we show that a primordial stem line of descent, a...
Article
Full-text available
The study of the origin of diversified life has been plagued by technical and conceptual difficulties, controversy, and apriorism. It is now popularly accepted that the universal tree of life is rooted in the akaryotes and that Archaea and Eukarya are sister groups to each other. However, evolutionary studies have overwhelmingly focused on nucleic...
Chapter
The hypothesis of the molecular clock posits that the rate of evolution of a macromolecule is approximately constant over time and among different evolutionary lineages. Despite provoking many controversies since it was put forth in the early 1960s, the molecular clock has had remarkable influence on today’s theories of molecular evolution and has...
Article
Full-text available
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of pr...
Data
Distribution of age groups of domains with editing (1, 2 and 3) and anticodon-binding (A, B and C) functions, groups in exchange graphs, and active site participation in Venn diagrams of amino acids describing their physicochemical properties. Venn diagrams show that the origin of amino acid charging in Group 1 specificities was associated with a p...
Data
Dipeptide sequences enriched in ancient domains. Dipeptides are identified by participating amino acids using one-letter codes and listed together with statistical significance values of enrichment (p) and subsets specified by intervening amino acids corresponding to Group 1, 2 and 3 domain structures (one of 9 subsets possible). (PDF)
Data
Full-text available
Risks of confusion of complementary anticodons (see Table S1) under four scenarios of aaRS-tRNA recognition of Rodin and Rodin [37] and their relative ages ( Fig. 3 ). (PDF)
Data
Analysis of the number of identity elements in cognate tRNA interacting with Groups 1, 2 and 3 aaRS domains and their aminoacylation role estimated by loss of aminoacylation efficiency upon mutation. Note that identity elements associated with Group 1 domains retain ancestral features of poor specificity. These elements include N73 discriminator ba...
Data
Evolutionary pathways of code expansion and their possible impact on protein structure. The standard genetic code maps a set of 64 base triplets (codons) to 20 standard amino acids (plus Sec and Pyl for subsets of organisms), and 3 translation stop signals. Zull and Smith [41] showed that the genetic code could be uniquely dissected into three poss...
Data
Structural alignments of amino acid-[acyl-carrier-protein]-ligases (aaACPLs) and cyclodipeptide synthases (CDPSs) to homologous aaRSs using DALI conservation mapping [97] and structural entries of the Astral compendium. A. RMSD-Z score plots of 601 structural neighbors of aaACPLs (relative to B110957; 3PZC) with Z scores above 2. The closest struct...
Data
Models of origin of mirror modes of tRNA acceptor stem recognition by aaRSs. The class I FF domain appears in the timeline concurrently with the GP-binding domain of elongation and initiation factors, the G protein domain (c.37.1.8), at ndFF = 0.020. The class II FF domain appears immediately after at ndFF = 0.024. The almost concurrent emergence o...
Chapter
Annealing is the formation of a double-stranded DNA or RNA molecule from the combination of complementary single-stranded nucleotides. This occurs by hydrogen bonding in a complementary manner, with adenine pairing with thymine (or uracil) and guanine pairing with cytosine.
Article
Full-text available
The intricate molecular and cellular structure of organisms converts energy to work, which builds and maintains structure. Evolving structure implements modules, in which parts are tightly linked. Each module performs characteristic functions. In this work we propose that a module can emerge through two phases of diversification of parts. Early in...
Article
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on struct...
Article
The origin and evolution of modern biochemistry remain a mystery despite advances in evolutionary bioinformatics. Here, we use a structural census in nearly 1,000 genomes and a molecular clock of folds to define a timeline of appearance of protein families linked to single-domain enzymes. The timeline sorts out enzymatic recruitment, validates patt...
Data
Pair wise number of high-confidence and all-inclusive orthologous loci and number of orthologous loci in hominids and all four primates. (XLSX)
Data
ZNF domains with human or chimpanzee specific DNA-contacting amino acids. (XLSX)
Data
For the different types of functional domains in KZNFs the number of loci that gained or lost such domains in a lineage-specific way are given. (XLSX)
Data
Gene and Pseudogene Nomenclature (DOCX)
Data
Expression of KZNF genes with human or chimpanzee specific ZNF domain changes and Ka/Ks>1. Out of eleven such genes we could obtain expression information from five human and chimpanzee tissues for five genes. The darker the field, the higher the percentage of individuals (ranging from 0 to 100%) expressing the gene in a given tissue. (TIF)
Data
Ten genes that were not assigned official gene symbols by the Human Genome Organization (HUGO) before publication of this manuscript. (XLSX)
Data
Number of KZNF loci in orthologous clusters of human, chimpanzee, orangutan, and rhesus macaque. (XLSX)
Data
A snapshot of human chromosome 8:11870862-12350861 compared to the related chimapanzee region in the synteny browser. This example highlights a region with whole gene duplication in addition to smaller duplications affecting domain architecture. Human was chosen as the reference for this display. Loci are depicted by rectangles. Orthologous loci ar...
Data
Interaction network of genes with ZNF780B binding motif (A: human, B: chimpanzee). Blue links represent known protein-protein interactions; red lines represent known gene regulatory interactions. (TIF)
Data
A list of loci with lineage-specific ZNF domain composition. (XLSX)
Data
Number of orthologs based on best reciprocal blast hit (RBH), synteny, and/or OrthoMCL. (XLSX)
Article
Full-text available
The molecular changes underlying major phenotypic differences between humans and other primates are not well understood, but alterations in gene regulation are likely to play a major role. Here we performed a thorough evolutionary analysis of the largest family of primate transcription factors, the Krüppel-type zinc finger (KZNF) gene family. We id...
Article
Full-text available
Krüppel-type or C2H2 zinc fingers represent a dominant DNA-binding motif in eukaryotic transcription factor (TF) proteins. In Krüppel-type (KZNF) TFs, KZNF motifs are arranged in arrays of three to as many as 40 tandem units, which cooperate to define the unique DNA recognition properties of the protein. Each finger contains four amino acids locate...
Article
Full-text available
The origin of life has puzzled molecular scientists for over half a century. Yet fundamental questions remain unanswered, including which came first, the metabolic machinery or the encoding nucleic acids. In this study we take a protein-centric view and explore the ancestral origins of proteins. Protein domain structures in proteomes are highly con...
Article
Full-text available
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex inter...
Article
One fundamental goal of current research is to understand how complex biomolecular networks took the form that we observe today. Cellular metabolism is probably one of the most ancient biological networks and constitutes a good model system for the study of network evolution. While many evolutionary models have been proposed, a substantial body of...
Article
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. A census of protein structure in proteomes and novel bioinformatics methods uncovered patterns and processes linked to the evolution of both proteins and proteomes that are described here. Timelines of...
Article
Full-text available
The survey of components in living systems at different levels of organization enables an evolutionary exploration of patterns and processes in macromolecules, networks, and genomic repertoires. Here we discuss how phylogenetic strategies that generate intrinsically rooted phylogenies impact the evolutionary study of RNA and protein components of t...
Article
Full-text available
The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superf...
Article
Full-text available
Protein evolution is imprinted in both the sequence and the structure of evolutionary building blocks known as protein domains. These domains share a common ancestry and can be unified into a comparatively small set of folding architectures, the protein folds. We have traced the distribution of protein folds between and within proteomes belonging t...
Article
Protein structural diversity encompasses a finite set of architectural designs. Embedded in these topologies are evolutionary histories that we here uncover using cladistic principles and measurements of protein-fold usage and sharing. The reconstructed phylogenies are inherently rooted and depict histories of protein and proteome diversification....
Article
Full-text available
Polydimethylsiloxane (PDMS) is a silica-based elastomer used in many biological applications in microfluidics and the growing of cell cultures. A PDMS biochip has been designed for cell culture testing, but due to PDMS's natural hydrophobicity, water pressures in the chip's microchannels make it difficult to accurately distribute fluids to the cell...