Tiffani L. Williams's research while affiliated with Northeastern University and other places

Publications (29)

Article
Full-text available
Background: The inference of species divergence time is a key step in most phylogenetic studies. Methods have been available for the last ten years to perform the inference, but the performance of the methods does not yet scale well to studies with hundreds of taxa and thousands of DNA base pairs. For example a study of 349 primate taxa was estima...
Article
In this paper, we discuss the benefits of assignment difficulty on student performance and perceptions in an introductory-level non-major programming course. We assessed both performance (how well the students can program) and perceptions (how the students feel about programming) weekly as the students progressed through the semester, though in mos...
Conference Paper
We present the QuickQuartet algorithm for computing the all-to-all quartet distance for large evolutionary tree collections. By leveraging the relationship between bipartitions and quartets, our approach significantly improves upon the performance of existing quartet distance algorithms. To explore QuickQuartet’s performance, sets of biological dat...
Article
Full-text available
ABSTRACT Different phylogenetic methods often yield different inferred trees for the same set of organisms. Moreover, a single phylogenetic approach (such as a Bayesian analysis) can produce many trees. Consensus trees and topological distance matrices are often used to summarize the evolutionary relationships among the trees of interest. These sum...
Article
Full-text available
For centuries, the research paper have been the main vehicle for scientific progress. From the paper, readers in the scientific community are expected to extract all the relevant information necessary to reproduce and validate the results presented by the paper's authors. However, the increased use of computer software in science makes reproducing...
Article
Full-text available
Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip fil...
Article
Full-text available
Phylogentic analyses are often incorrectly assumed to have stabilized to a single optimum. However, a set of trees from a phylogenetic analysis may contain multiple distinct local optima with each optimum providing different levels of support for each clade. For situations with multiple local optima, we propose p-support which is a clade support me...
Article
Full-text available
Previous analyses of relations, divergence times, and diversification patterns among extant mammalian families have relied on supertree methods and local molecular clocks. We constructed a molecular supermatrix for mammalian families and analyzed these data with likelihood-based methods and relaxed molecular clocks. Phylogenetic analyses resulted i...
Article
Phylogenetics seeks to deduce the pattern of relatedness between organisms by using a phylogeny or evolutionary tree. For a given set of organisms or taxa, there may be many evolutionary trees depicting how these organisms evolved from a common ancestor. As a result, consensus trees are a popular approach for summarizing the shared evolutionary rel...
Technical Report
Full-text available
Phylogeny studies how organisms have evolved over time by combining the organisms into a phylogenetic tree. Branching in a tree represents an evolutionary event leading to two species. Visualization is an important part of phylogeny. Phylogenetic applications output so much data that it is hard for humans to interpret it manually. With visualizatio...
Article
Full-text available
Phylogenetic trees are family trees that represent the relationships between a group of organisms, or taxa. The most popular techniques for reconstructing phylogenetic trees intelligently navigate an exponentially-sized tree space by solving NP-hard optimization problems that that best hypothesize the evolutionary history for a given set of taxa. I...
Conference Paper
In this paper, we explore the novel use of decision trees to study the convergence properties of phylogenetic analyses. A decision learning tree is constructed from the evolutionary relationships (or bipartitions) found in the evolutionary trees returned from a phylogenetic analysis. We treat evolutionary trees returned from multiple runs of a phyl...
Conference Paper
Full-text available
Phylogenetic trees are tree structures that depict relationships between organisms. Popular analysis techniques often produce large collections of candidate trees, which are expensive to store. We introduce TreeZip, a novel algorithm to compress phylogenetic trees based on their shared evolutionary relationships. We evaluate TreeZip's performance o...
Article
Full-text available
MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many a...
Article
Phylogenetic analysis is used in all branches of biology with applications ranging from studies on the origin of human populations to investigations of the transmission patterns of HIV. Most phylogenetic analyses rely on effective heuristics for obtaining accurate trees. However, relatively little work has been done to analyze quantitatively the be...
Conference Paper
The primary advantage of TreeZip is its use of semantic compression, which allows us to uniquely store tree relationship information. Phylogenetic trees are stored in a format known as a Newick representation, which uses nested parentheses to represent the evolutionary relationships (or subtrees) within a phylogenetic tree. TreeZip uses two univers...
Conference Paper
When large data sets are involved in phylogenies, trivial issues may become central problems. Tree comparison and visualization are two of these issues addressed here. A data set for 18S and 28S D2 through D5 rDNA covering 525 taxa representing all 19 families of Chalcidoidea (Hymenoptera) (65 subfamilies and 267 genera) and five outgroup superfami...
Conference Paper
Full-text available
Consensus trees are a popular approach for summarizing the shared evolutionary relationships in a collection of trees. Many popular techniques such as Bayesian analyses produce results that can contain tens of thousands of trees to summarize. We develop a fast consensus algorithm called HashCS to construct large-scale consensus trees. We perform an...
Article
Full-text available
Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic heuristics are used to search stochastically for the best-scoring trees in tree space. Given that better tree scores are believed to be better approximations of the true phylogeny, traditional evaluation techniques have used tree scores t...
Conference Paper
Full-text available
Phylogenetics is concerned with inferring the genealogical relationships between a group of organisms (or taxa), and this relationship is usually expressed as an evolutionary tree. However, inferring the phylogenetic tree is not a trivial task since it is impossible to know the true evolutionary history for a set of organisms. As a result, most phy...
Conference Paper
Full-text available
We present new and novel insights into the behavior of two maximum parsimony heuristics for building evolutionary trees of different sizes. First, our results show that the heuristics find different classes of good-scoring trees, where the different classes of trees may have significant evolutionary implications. Secondly, we develop a new entropy-...
Conference Paper
Full-text available
In this paper, we study two fast algorithms—HashRF and PGM-Hashed—for computing the Robinson-Foulds (RF) distance matrix between a collection of evolutionary trees. The RF distance matrix represents a tremendous data-mining opportunity for helping biologists understand the evolutionary relationships depicted among their trees. The novelty of our wo...
Conference Paper
In this paper, we introduce the HashRF(p,q) algorithm for computing RF matrices of large binary, evolutionary tree collections. The novelty of our algorithm is that it can be used to compute arbitrarily-sized (p ×q) RF matrices without running into physical memory limitations. In this paper, we explore the performance of our HashRF(p,q) approach on...
Conference Paper
Full-text available
Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the "true" tree. Post-processing techniques such as strict consensus trees are widely used to summarize the evolutionary relationships into a single tree. However, valuable information is lost during the summarization process. A more elementary...
Conference Paper
Full-text available
The most popular approaches for reconstructing phylogenetic trees attempt to solve NP-hard optimization criteria such as maximum parsimony (MP). Currently, the best-performing heuristic for reconstructing MP trees is Recursive-Iterative DCM3 (Rec-I-DCM3), which uses a single tree (or solution) to guide its way through an exponentially-sized tree sp...
Conference Paper
Full-text available
Phylospaces is a novel framework for reconstructing evolutionarytrees in tuple space, a distributed shared mem- ory that permits processes to communicate and coordinate with each other. Our choice of tuple space as a concur- rency model is somewhat unusual, given the prominence and success of pure message passing models, such as MPI. We use Phylosp...
Article
Background • Many other majors require or recommend a class in "Computer Programming". • Classic programming classes don't really prepare the student for what they really might need programming for.
Article
Computational phylogeny attempts to use de-scriptions of various taxa to generate an evolutionary tree of life. When generating phylogenic trees using Bayesian methods, it is difficult for the researcher to determine the run time necessary in order to get a good sample from the target distribution. Less than adequate run times can miss important fe...

Citations

... Bayesian methods are computationally demanding because of their need for extensive sampling from the posterior distribution using the Markov chain Monte Carlo (MCMC) approach (Bromham et al. 2018). The computational burden is usually very high for large data sets and grows with the number of sequences (Crosby and Williams 2017;Tamura et al. 2018). In addition, problems in MCMC mixing can increase the computational time further (Bhatnagar et al. 2011). ...
... Like ATE, an AQE of 0 indicates two trees are identical, and an AQE of 100 indicates the two trees share no common quartets. We used TOPD/FMTS ( Puigbò et al. 2007) to compute the quartet distance for the 10taxon trees, and QuickQuartet ( Crosby and Williams 2012) for the rest. We also compared the number of gene duplications estimated by Only-dup, and duplications and losses estimated by Dup-loss with the actual number of these events in each gene tree simulation under duplication and loss model. ...
... There are 12,379 unique bipartitions out of 47, 341, 052 total bipartitions. 4. insects: 150,000 unweighted trees obtained from an analysis 525 insect taxa [11]. The trees contained in this set are multifurcating. ...
... The potential to leverage programming difficulty to improve understanding and perceptions of non-majors was explored by Crosby et al. in [31]. The CS0 course for non-majors was redesigned by replacing the standard intro to CS course -similar but less deep than that for CS majors -with a course focusing on introducing Python with a variety of lab assignments that included ASCII art and games of chance. ...
... These relationships are commonly represented in the form of trees. These trees can have a variety of applications in drug and vaccine development, conservation efforts and much more [5], [3]. However, finding the correct tree for a group of organisms is an NP-hard optimization problem that requires heuristics on the tree space [3]. ...
... Searching, merging, and sorting problems are used as fundamental subroutines in many research areas such as databases [17], recognition [16], information retrieval [25], and bioinformatics [20]. For example, to use a database efficiently, it is necessary to sort the data file according to keys [17]. ...
... This hash table and the related data are encoded in the output file, which is much smaller than the original input file. The authors reported phylogenetic compression up to 2% [19], [20], [21]. ...
... Currently, all computer algorithms for solving these problems are heuristics without performance guarantee. As a result drawing the phylogenetic tree is not a trivial task since it is not possible to know the exact evolutionary history for a set of organisms [44]. The biological importance of these problems calls for developing better algorithms with assurance of finding either optimal or approximate solutions [45]. ...
... TreeZip adapts the idea of hashing bipartitions from Amenta et al.'s [26] randomized linear-time algorithm for generating a majority rule tree. We note that the authors of TreeZip have also used this idea in HashCS [27] -which generates the majority-rule tree for a collection of trees -and HashRF [28] -which constructs the Robinson-Foulds distance matrix [29] for a collection of trees. ...
... It works by finding common subtrees among the set of input trees, replacing all repeated subtrees with a reference to its first occurrence. TreeZip [19] was a breakthrough in the field, greatly enhancing the compression ratio achieved by previous algorithms. It is the current state-ofthe-art phylogenetic tree compression method. ...