Diego Darriba

Diego Darriba
Universidade da Coruña | UDC · Department of Computer Engineering

PhD

About

36
Publications
17,226
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,758
Citations
Additional affiliations
October 2017 - present
Universidade da Coruña
Position
  • Professor (Assistant)
January 2014 - October 2017
Heidelberger Institut für Theoretische Studien
Position
  • PostDoc Position
September 2009 - April 2016
University of Vigo
Position
  • PhD Student

Publications

Publications (36)
Article
Full-text available
We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. Availability: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uv...
Conference Paper
We present a substantially improved and parallelized version of DPPDiv, a software tool for estimating species divergence times and lineage-specific substitution rates on a fixed tree topology. The improvement is achieved by integrating the DPPDiv code with the Phylogenetic Likelihood Library (PLL), a fast, optimized, and parallelized collection of...
Preprint
This is a pre-print. Currently, the paper is under-review. You may download it freely from here: 10.2139/ssrn.4578618
Article
Wall lizards of the genus Podarcis (Sauria, Lacertidae) are the predominant reptile group in southern Europe, including 24 recognized species. Mitochondrial DNA data have shown that, with the exception of P. muralis, the Podarcis species distributed in the Balkan peninsula form a species group that is further sub-divided into two subgroups: the one...
Article
Full-text available
ModelTest-NG is a re-implementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate, and introduces several new features, such as ascertainment bia...
Article
Full-text available
Motivation: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture, and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical...
Preprint
Full-text available
ModelTest-NG is a re-implementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate, and introduces several new features, such as ascertainment bia...
Preprint
Full-text available
Motivation Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture, and medicine. Finding the optimal tree under the popular maximum like-lihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical...
Article
Full-text available
Next Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary c...
Article
The Balkan Peninsula constitutes a biodiversity hotspot with high levels of species richness and endemism. The complex geological history of the Balkans in conjunction with the climate evolution are hypothesized as the main drivers generating this biodiversity. We investigated the phylogeography, historical demography, and population structure of c...
Preprint
Full-text available
Next Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. To achieve this, phylogenetic placement methods determine how these sequences fit into...
Article
Full-text available
With Next Generation Sequencing data being routinely used, evolutionary biology is transforming into a computational science. Thus, researchers have to rely on a growing number of increasingly complex software. All widely used core tools in the field have grown considerably, in terms of the number of features as well as lines of code and consequent...
Article
Full-text available
Background In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian...
Preprint
Full-text available
Background In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian...
Article
Full-text available
Motivation: The presence of missing data in large-scale phylogenomic datasets has negative effects on the phylogenetic inference process. One effect that is caused by alignments with missing per-gene or per-partition sequences is that the inferred phylogenies may exhibit extremely long branch lengths. We investigate if statistically predicting mis...
Article
Nowadays, computing calculations are becoming more and more demanding due to the huge pool of resources available. This demand must be satisfied in terms of computational efficiency and resilience, which is compromised in distributed and heterogeneous platforms. Not only this, data obtained are often either reused by other researchers or recalculat...
Preprint
Full-text available
With Next Generation Sequencing Data (NGS) coming off age and being routinely used, evolutionary biology is transforming into a data-driven science. As a consequence, researchers have to rely on a growing number of increasingly complex software. All widely used tools in our field have grown considerably, in terms of the number of features as well a...
Preprint
Full-text available
Several strategies have been proposed to assign substitution models in phylogenomic datasets, or partitioning. The accuracy of these methods, and most importantly, their impact on phylogenetic estimation has not been thoroughly assessed using computer simulations. We simulated multiple partitioning scenarios to benchmark two a priori partitioning s...
Article
In this work, the authors present a set of tools to overcome the problem of creating and executing distributed applications on dynamic environments in a resilient way, also ensuring the reproducibility of the performed experiments. The objective is to provide a portable, unattended and fault-tolerant set of tools, encapsulating the infrastructure-d...
Article
Full-text available
We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as like...
Article
This paper presents the high-performance computing (HPC) support of jModelTest2, the most popular bioinformatic tool for the statistical selection of models of DNA substitution. As this can demand vast computational resources, especially in terms of processing power, jModelTest2 implements three parallel algorithms for model selection: (1) a multit...
Article
Full-text available
The selection of models of nucleotide substitution is one of the major steps of modern phylogenetic analysis. Different tools exist to accomplish this task, among which jModelTest 2 (jMT2) is one of the most popular. Still, in order to deal with large DNA alignments with hundreds or thousands of loci, users of jMT2 need to have access to High Perfo...
Conference Paper
Statistical model selection has become an essential step for the estimation of phylogenies from DNA sequence alignments. The program jModelTest offers different strategies to identify best-fit models for the data at hand, but for large DNA alignments, this task can demand vast computational resources. This paper presents a High Performance Computin...
Conference Paper
The use of probabilistic models of amino acid replacement is essential for the study of protein evolution, and programs like ProtTest implement different strategies to identify the best-fit model for the data at hand. For large protein alignments, this task can demand vast computational resources, preventing the justification of the model used in t...

Network

Cited By