About
123
Publications
36,365
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,977
Citations
Introduction
Additional affiliations
September 1995 - present
Publications
Publications (123)
Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an e...
Our knowledge of the avian tree of life remains uncertain, particularly at deeper levels due to the rapid diversification early in their evolutionary history. They are the most abundant land vertebrate on the planet and have been of great historical interest to systematists. Birds are also economically and ecologically important and as a result are...
BioNames is a web database of taxonomic names for animals, linked to the primary literature and, wherever possible, to phylogenetic trees. It aims to provide a taxonomic "dashboard" where at a glance we can see a summary of the taxonomic and phylogenetic information we have for a given taxon and hence provide a quick answer to the basic question "w...
Much progress has been made in the past ten years to fulfil the potential of biodiversity informatics. However, it is dwarfed by the scale of what is still required. The Global Biodiversity Informatics Outlook (GBIO) offers a framework for reaching a much deeper understanding of the world’s biodiversity, and through that understanding the means to...
Biodiversity informatics plays a central enabling role in the research community's efforts to address scientific conservation and sustainability issues. Great strides have been made in the past decade establishing a framework for sharing data, where taxonomy and systematics has been perceived as the most prominent discipline involved. To some exten...
The correct caption of Fig. 3 should read:Figure 3. The growth of unnamed sequences. (a) Growth in the numbers of new species-level taxa added to GenBank each year. Taxa are partitioned into those with scientific names (‘named’; white shading) and those that have informal names (‘unnamed’; gray shading). (b) Percentage of taxa in the named and unna...
There are numerous ways to display a phylogenetic tree, which is reflected in the diversity of software tools available to phylogenetists. Displaying very large trees continues to be a challenge, made ever harder as increasing computing power enables researchers to construct ever-larger trees. At the same time, computing technology is enabling nove...
The accelerating growth of data and knowledge in evolutionary biology is indisputable. Despite this rapid progress, information remains scattered, poorly documented and in formats that impede discovery and integration. A grand challenge is the creation of a linked system of all evolutionary data, information and knowledge organized around Darwin's...
Scientists are amassing details about the scope and status of life's variation at an accelerating rate. This aids our understanding of species' distributions and their interactions over space and time. If we are to address the consequences of global environmental change for life's future, however, biodiversity data must be aggregated, integrated an...
Background
The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit...
The NCBI Taxonomy underpins many bioinformatics and phyloinformatics databases, but by itself provides limited information on the taxa it contains. One readily available source of information on many taxa is Wikipedia. This paper describes iPhylo Linkout, a Semantic wiki that maps taxa in NCBI's taxonomy database onto corresponding pages in Wikiped...
The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation...
In his 2003 essay E O Wilson outlined his vision for an “encyclopaedia of life” comprising “an electronic page for each species of organism on Earth”, each page containing “the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.” Althoug...
Although the Web has transformed science publishing, scientific papers themselves are still essentially "black boxes", with much of their content intended for human readers only. Typically, computer-readable metadata associated with an article is limited to bibliographic details. By expanding article metadata to include taxonomic names, identifiers...
This talk describes a mapping between the NCBI taxonomy database and Wikipedia. These two databases were chosen because the NCBI taxonomy contains all the taxa for which sequences are publicly available, and for many taxa Wikipedia is the first site returned in a Google search on that taxon's scientific name. The NCBI web pages for nearly 53,000 NC...
In his 2003 essay E O Wilson outlined his vision for an “encyclopaedia of life” comprising “an electronic page for each species of organism on Earth”, each page containing “the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.” Althoug...
Linking together the data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) requires services that can mint, resolve, and discover globally unique identifiers (including, but not limited to, DOIs, HTTP URIs, and LSIDs).
bioGUID implements a range of services, the core ones being an Open...
Although the Web has transformed science publishing, scientific papers themselves are still essentially "black boxes", with much of their content intended for human readers only. Typically, computer-readable metadata associated with an article is limited to bibliographic details. By expanding article metadata to include taxonomic names, identifiers...
Linking together the data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) requires services that can mint, resolve, and discover globally unique identifiers (including, but not limited to, DOIs, HTTP URIs, and LSIDs). bioGUID implements a range of services, the core ones being an Open...
This paper describes my entry in the Elsevier Grand Challenge "Knowledge Enhancement in the Life Sciences" contest. The entry takes a collection of fulltext issues of _Molecular Phylogenetics and Evolution_ as the starting point, then extracts citation links to both papers and data, such as Genbank sequences and specimens, together with geotagged l...
Abstract— Rerent criticisms of component analysis are based on misunderstandings of the relationship between component analysis, parsimony and consensus methods. These criticisms are rebutted, and the appropriateness of applying the Wagner parsimony criterion to the study of biogcography and co-speciation is questioned. An alternative parsimony met...
The fact that all living organisms are related by common descent is one of the central principles of modern biology. Since the early 1990's the amount of data available to evolutionary biologists has exploded, and Elsevier’s journal _Molecular Phylogenetics and Evolution_, has become the largest single publisher of evolutionary trees (phylogenies)....
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular ta...
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular ta...
Life Science Identifiers (LSIDs) are persistent, globally unique identifiers for biological objects. The decentralised nature of LSIDs makes them attractive for identifying distributed resources. Data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) are distributed over many different...
We compared patterns of mitochondrial DNA (mtDNA) differentiation in three host-specific lice (Halipeurus abnormis, Austromenopon echinatum and Saemundssonia peusi) and one generalist flea (Xenopsylla gratiosa), parasitizing 22 colonies of Cory's and Cape Verde shearwater (Calonectris). The shearwater hosts show distinct phylogeographic structure c...
Aligning RNA sequences can be a challenging task. Automatic sequence alignment programs typically align sequences only with respect to primary sequence, and as a result may yield spurious alignments. Incorporating information on RNA secondary structure can improve the alignment, but this must usually be done by hand. One approach to aligning RNA se...
TreeMap is a computer program for analysing host-parasite cospeciation. We respond to Dowling’s (Cladistics, 18: 416-435) recent comparison of TreeMap and Brooks Parsimony Analysis (BPA) by showing that Dowling’s comparison suffers from several mistakes and flaws. We discuss the problems with both BPA and TreeMap, and show that BPA incorrectly coun...
This note outlines some of the key intellectual obstacles that stand in the way of creating a usable phylogenetic database. These challenges include the need to accommodate multiple taxonomic names and classifications, and the need for tools to query trees in biologically meaningful ways. Until these problems are addressed, and a taxonomically inte...
Aligning RNA sequences can be a challenging task. Automatic sequence alignment programs typically align sequences only with respect to primary sequence, and as a result may yield spurious alignments. Incorporating information on RNA secondary structure can improve the alignment, but this must usually be done by hand. One approach to aligning RNA se...
Lice in the genus Pectinopygus parasitize a single order of birds (Pelecaniformes). To examine the degree of congruence between the phylogenies of 17 Pectinopygus species and their pelecaniform hosts, sequences from mitochondrial 12S rRNA, 16S rRNA, COI, and nuclear wingless and EF1-α
genes (2290 nucleotides) and from mitochondrial 12S rRNA, COI, a...
The diversity of parasites attacking a host varies substantially among different host species. Understanding the factors that explain these patterns of parasite diversity is critical to identifying the ecological principles underlying biodiversity. Seabirds (Charadriiformes, Pelecaniformes and Procellariiformes) and their ectoparasitic lice (Insect...
TreeBASE is currently the only available large-scale database of published organismal phylogenies. Its utility is hampered by a lack of taxonomic consistency, both within the database, and with names of organisms in external genomic, specimen, and taxonomic databases. The extent to which the phylogenetic knowledge in TreeBASE becomes integrated wit...
We investigated phylogenetic relationships and the biogeographic history of the Calonectris species complex, using both molecular and biometric data from one population of the Cape Verde shearwater Calonectris edwardsii (Cape Verde Islands), one from the streaked shearwater C. leucomelas (western Pacific Ocean) and 26 from Cory's shearwater populat...
Life Science Identifiers (LSIDs) offer an attractive solution to the problem of globally unique identifiers for digital objects in biology. However, I suggest that in the context of taxonomic names, the most compelling benefit of adopting these identifiers comes from the metadata associated with each LSID. By using existing vocabularies wherever po...
The shape of phylogenetic trees has been used to make inferences about the evolutionary process by comparing the shapes of actual phylogenies with those expected under simple models of the speciation process. Previous studies have focused on speciation events, but gene duplication is another lineage splitting event, analogous to speciation, and gen...
The information content of a tree can be judged with respect to the data at hand (how well does the tree describe the data set), or as a topology alone (how many trees does the topology allow). Mickevich and Platnick's (1989) measure of "retrospective" information measures neither component of information, ignores some information altogether and is...
Gene duplication has certainly played a major role in structuring vertebrate genomes but the extent and nature of the duplication events involved remains controversial. A recent study identified two major episodes of gene duplication: one episode of putative genome duplication ca. 500 Myr ago and a more recent gene-family expansion attributed to se...
The NCBI taxonomy provides one of the most powerful ways to navigate sequence data bases but currently users are forced to formulate queries according to a single taxonomic classification. Given that there is not universal agreement on the classification of organisms, providing a single classification places constraints on the questions biologists...
The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same na...
Much of the interest in the “tree of life” is motivated by the notion that we can make much more meaningful use of biological information if we query the information in a phylogenetic framework. Assembling the tree of life raises numerous computational and data management issues. Biologists are generating large numbers of evolutionary trees (phylog...
Supertree methods have been often identified as a possible approach to the reconstruction of the 'Tree of Life'. However, a limitation of such methods is that, typically, they use just leaf-labelled phylogenetic trees to infer the resulting supertree.
In this paper, we describe several new supertree algorithms that extend the allowable information...
Smith, V. S., Page, R. D. M. & Johnson, K. P. (2004). Data incongruence and the problem of avian louse phylogeny. —Zoologica Scripta, 33, 239 –259.
Recent studies based on different types of data (i.e. morphological and molecular) have supported conflicting phylogenies for the genera of avian feather lice (Ischnocera: Phthiraptera). We analyse new...
The Philoceanus complex is a large assemblage of lice that parasitise procellariiform seabirds (petrels, albatrosses, and their relatives). We obtained mitochondrial 12S rRNA and cytochrome oxidase I DNA sequences from 39 species from diverse hosts and localities. Resolution of deeper relationships between genera was limited, however there is evide...
An increasing number of plant-insect studies using phylogenetic analysis suggest that cospeciation events are rare in plant-insect systems. Instead, nonrandom patterns of phylogenetic congruence are produced by phylogenetically conserved host switching (to related plants) or tracking of particular resources or traits (e.g., chemical). The dominance...
Supertree methods combine information from multiple phylogenies into a larger, composite phylogeny. When there is no disagreement between the source phylogenies, constructing the supertree is straightforward. But in the (nearly universal) presence of disagreement between source trees, supertree methods seek to either represent or resolve this confl...
In recent years, the use of molecular data to build phylogenetic trees and sophisticated computer-aided techniques to analyze them have led to a revolution in the study of cospeciation. Tangled Trees provides an up-to-date review and synthesis of current knowledge about phylogeny, cospeciation, and coevolution. The opening chapters present various...
have suggested that there are important weaknesses of gene tree parsimony in reconstructing phylogeny in the face of gene duplication, weaknesses that are addressed by method of uninode coding. Here, we discuss Simmons and Freudenstein's criticisms and suggest a number of reasons why gene tree parsimony is preferable to uninode coding. During this...
Cospeciation generally increases the similarity between host and parasite phylogenies. Incongruence between host and parasite phylogenies has previously been explained in terms of host switching, sorting, and duplication events. Here, we describe an additional process, failure of the parasite to speciate in response to host speciation, that may be...
Comparisons of whole genomes can yield important insights into the evolution of genome structure, such as the role of inversions in bacterial evolution and the identification of large-scale duplications in the human genome. This unit briefly compares two tools for aligning whole genome sequences: MUMmer and PipMaker. These tools differ in both the...
This unit provides a general introduction to phylogeny. It defines common terms and discusses the issue of rooting trees, in addition to comparing gene and species trees. Methods for inferring phylogenies, such as distance methods, parsimony methods, and maximum likelihood are also presented. The unit concludes with discussion of how to assess tree...
Cospeciation generally increases the similarity between host and parasite phylogenies. Incongruence between host and parasite phylogenies has previously been explained in terms of host switching, sorting, and duplication events.
Few estimates of relative substitution rates, and the underlying mutation rates, exist between mitochondrial and nuclear genes in insects. Previous estimates for insects indicate a 2-9 times faster substitution rate in mitochondrial genes relative to nuclear genes. Here we use novel methods for estimating relative rates of substitution, which incor...
A polynomial time supertree algorithm could play a key role in a divide-and-conquer strategy for assembling the tree of life.
To date only a single such method capable of accommodate conflicting input trees has been proposed, the MinCutSupertree algorithm
of Semple and Steel. This paper describes this algorithm and its implementation, then illustra...
Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplicati...
Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops....
TreeView provides a simple way to view the phylogenetic trees produced by a range of programs, such as PAUP*, PHYLIP, TREE-PUZZLE, and ClustalX. While some phylogenetic programs (such as the Macintosh version of PAUP*) have excellent tree printing facilities, many programs do not have the ability to generate publication quality trees. TreeView addr...
The growing use of comparative methods to address evolutionary questions has generated an increased need for robust hypotheses of evolutionary relationships for a wide range of organisms. Where a phylogeny exists for a group, often more than one phylogeny will exist for that group, and it is uncommon that the same taxa are in each of the existing t...
Ancient gene duplication events have left many traces in vertebrate genomes. Reconciled trees represent the differences between gene family trees and the species phylogeny those genes are sampled from, allowing us to both infer gene duplication events and estimate a species phylogeny from a sample of gene families. We show that analysis of 118 gene...
TreeMap is a computer program for analysing host-parasite cospeciation. We respond to Dowling’s (Cladistics, 18: 416-435) recent comparison of TreeMap and Brooks Parsimony Analysis (BPA) by showing that Dowling’s comparison suffers from several mistakes and flaws. We discuss the problems with both BPA and TreeMap, and show that BPA incorrectly coun...
this paper we explore the use of reconciled trees to address the latter question
As a first attempt to use molecular data to resolve the relationships between the four suborders of lice and within the suborder Ischnocera, we sequenced a 347-bp fragment of the elongation factor 1α gene of 127 lice (Insecta: Phthiraptera) as well as outgroup taxa from the order Psocoptera. A number of well-supported monophyletic groups were found...
By examining deoxyribonucleic acid (DNA) sequence data from closely related species, we may gain insights into the genetic mechanism and process leading to the formation of new species. Recently, several interesting hypotheses have been proposed independently that maintain a complex mode of speciation of humans and chimpanzees, but at present, ther...
Aligning RNA sequences can be a challenging task. Automatic sequence alignment programs typically align sequences only with respect to primary sequence, and as a result may yield spurious alignments. Incorporating information on RNA secondary structure can improve the alignment, but this must usually be done by hand. One approach to aligning RNA se...
Unlabelled:
Circles is a program for inferring RNA secondary structure using maximum weight matching. The program can read in an alignment in FASTA, ClustalW, or NEXUS format, compute a maximum weight matching, and export one or more secondary structures in various file formats.
Availability:
The program is available at no cost from http://taxon...
Comparative analysis is the preferred method of inferring RNA secondary structure, but its use requires considerable expertise
and manual effort. As the importance of secondary structure for accurate sequence alignment and phylogenetic analysis becomes
increasingly realised, the need for secondary structure models for diverse taxonomic groups becom...
In this paper, we use the method of independent contrasts to study body size relationships between pocket gophers and their chewing lice, a host-parasite system in which both host and parasite phylogcnies are well studied. The evolution of body size of chewing lice appears to be dependent only on the body size of their hosts, which confirms the 199...
RadCon is a Macintosh program for manipulating and analysing phylogenetic trees. The program can determine the Cladistic Information Content of individual trees, the stability of leaves across a set of bootstrap trees, produce the strict basic Reduced Cladistic Consensus profile of a set of trees and convert a set of trees into its matrix represent...
*To whom correspondence should be addressed
Summary: RadCon is a Macintosh® program for manipulating and analysing phylogenetic trees. The program can determine the Cladistic Information Content of individual trees, the stability of leaves across a set of bootstrap trees, produce the strict basic Reduced Cladistic Consensus profile of a set of tree...
Paralogy is a pervasive problem in trying to use nuclear gene sequences to infer species phylogenies. One strategy for dealing with this problem is to infer species phylogenies from gene trees using reconciled trees, rather than directly from the sequences themselves. In this approach, the optimal species tree is the tree that requires the fewest g...
No abstract available.
Molecular biologists interested in the evolution of gene families and molecular systematists interested in the evolution of whole organisms are both concerned with the relationship between gene phylogenies and organism phylogenies. We present reconciled trees as a tool for exploring this relationship. In discussing recent developments, we focus on...
The results of Allard and Carpenter's (Cladistics 12, 183–198, 1996) paper on weighting and congruence among mammalian mitochondrial genes are an artefact of errors in their data matrix; their “blue whale” ATPASE8 sequence is human, the actual blue whale sequence is assigned to the grey seal, and the “horse” sequence is that of the harbor seal. Whe...
The results of Allard and Carpenter's (Cladistics 12, 183-198, 1996) paper on weighting and congruence among mammalian mitochondrial genes are an artefact of errors in their data matrix; their "blue whale" ATPASE8 sequence is human, the actual blue whale sequence is assigned to the grey seal, and the "horse" sequence is that of the harbor seal. Whe...
The association between two or more lineages over evolutionary time is a recurrent theme spanning several different fields within biology, from molecular evolution to coevolution and biogeography. In each `historical association', one lineage is associated with another, and can be thought of as tracking the other over evolutionary time with a great...
A phylogeny for the lice (Insecta: Phthiraptera: genusDennyus) parasitic on swiftlets (Aves: Collocalliinae) was constructed based on mitochondrial cytochromebDNA sequences. This phylogeny is congruent with previous phenetic analyses of morphometric data for the lice. Comparison with a previously obtained phylogeny for the hosts indicates some degr...
GeneTree is a program for comparing gene and species trees using reconciled trees. The program can compute the cost of embedding a gene tree within a species tree, visually display the location and number of gene duplications and losses, and search for optimal species trees.
Availability:
The program is free and is available at ((http://taxonomy....
We present a method for visualising and quantifying the relationship between a pair of gene and species trees that constructs a third tree termed the reconciled tree. Given a gene tree and a species tree the reconciled tree represents the history of the gene tree embedded within the species tree. The reconciled tree is constructed from one or more...
The processes of gene duplication, loss, and lineage sorting can result in incongruence between the phylogenies of genes and those of species. This incongruence complicates the task of inferring the latter from the former. We describe the use of reconciled trees to reconstruct the history of a gene tree with respect to a species tree. Reconciled tr...
No abstract available.
Swiftlets are small insectivorous birds, many of which nest in caves and are known to echolocate. Due to a lack of distinguishing morphological characters, the taxonomy of swiftlets is primarily based on the presence or absence of echolocating ability, together with nest characters. To test the reliability of these behavioral characters, we constru...
Lice of the subgenus Dennyus (Collodennyus) are host specific, permanent parasites of swiftlets (Aves: Apodidae). As a prelude to a test of the hypothesis that these lice have cospeciated with their hosts, we revise the taxonomy of the subgenus, redescribing the seven previously recognized species, and adding thirteen new species and three new subs...
“The student who intends working on the Mallophaga should take warning that he will be tried almost beyond endurance by the paradoxes and complexities which beset his subject but he will also find, in the dual and inter-related aspect of insect and bird, an infinite fascination.” (Rothschild & Clay, 1952: pp. 156–157).The study of host-louse coevol...
Cladistic tree balance is the extent to which internal nodes on a cladistic tree define clades of equal size. More robust maximum-parsimony trees taken from the literature are more balanced. Simulation studies suggest that a methodological bias is responsible for this correlation because incorrect reconstructions are also likely to be less balanced...
Recent methodological advances permit a rigorous comparison of phylogenetic trees for hosts and their parasites to determine the extent to which these groups have cospeciated through evolutionary time. In cases where significant levels of cospeciation are indicated, comparison of amounts of evolutionary change that have accumulated along analogous...