Erin K Molloy

Erin K Molloy
University of Maryland, College Park | UMD, UMCP, University of Maryland College Park · Department of Computer Science

Doctor of Philosophy

About

44
Publications
4,214
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,711
Citations
Introduction
Skills and Expertise
Education
August 2013 - August 2020
University of Illinois, Urbana-Champaign
Field of study
  • Computer Science
September 2007 - June 2011
University of Chicago
Field of study
  • Physics

Publications

Publications (44)
Article
Full-text available
Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We int...
Article
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation a...
Article
Full-text available
Motivation Admixture, the interbreeding between previously distinct populations, is a pervasive force in evolution. The evolutionary history of populations in the presence of admixture can be modeled by augmenting phylogenetic trees with additional nodes that represent admixture events. While enabling a more faithful representation of evolutionary...
Article
Full-text available
One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization proble...
Preprint
Full-text available
Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. While a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introd...
Preprint
Full-text available
Motivation Admixture, the interbreeding between previously distinct populations, is a pervasive force in evolution. The evolutionary history of populations in the presence of admixture can be modeled by augmenting phylogenetic trees with additional nodes that represent admixture events. While enabling a more faithful representation of evolutionary...
Preprint
Full-text available
One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization proble...
Article
Full-text available
Motivation Metagenomics has revolutionized microbiome research by enabling researchers to characterize the composition of complex microbial communities. Taxonomic profiling is one of the critical steps in metagenomic analyses. Marker genes, which are single-copy and universally found across Bacteria and Archaea, can provide accurate estimates of ta...
Article
Phylogenomics-the estimation of species trees from multilocus data sets-is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this...
Preprint
A major shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting (ILS). Coalescence methods explicitly address this problem, but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both...
Article
Full-text available
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these method...
Article
Full-text available
Motivation: Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not...
Preprint
Full-text available
One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems...
Article
Full-text available
Incremental tree building (INC) is a new phylogeny estimation method that has been proven to be absolute fast converging under standard sequence evolution models. A variant of INC, called Constrained-INC, is designed for use in divide-and-conquer pipelines for phylogeny estimation where a set of species is divided into disjoint subsets, trees are c...
Chapter
Phylogenomics—the estimation of species trees from multi-locus datasets—is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this...
Article
Full-text available
After publication of [1], the authors were informed by John A. Rhodes of a counterexample to Theorem 11 of [1].
Article
Full-text available
Motivation: Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However...
Preprint
Full-text available
Species tree inference via summary methods that combine gene trees has become an increasingly common analysis in recent phylogenomic studies. This broad adoption has been partly due to the greater availability of genome-wide data and ample recognition that gene trees and species trees can differ due to biological processes such as gene duplication...
Article
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciph...
Preprint
Full-text available
Motivation Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not di...
Preprint
Full-text available
Phylogenomics, the estimation of species trees from multi-locus datasets, is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In thi...
Article
Full-text available
Background: Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typic...
Article
Full-text available
Motivation: At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer fr...
Chapter
In a recent paper (Zhang, Rao, and Warnow, Algorithms for Molecular Biology 2019), the INC (incremental tree building) algorithm was presented and proven to be absolute fast converging under standard sequence evolution models. A variant of INC which allows a set of disjoint constraint trees to be provided and then uses INC to merge the constraint t...
Preprint
Full-text available
Species tree estimation is a complex problem, due to the fact that different parts of the genome can have different evolutionary histories than the genome itself. One of the causes for this discord is incomplete lineage sorting (also called deep coalescence), which is a population-level process that produces gene trees that differ from the species...
Preprint
Background Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typical...
Article
Full-text available
Background: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to inco...
Article
Full-text available
Background For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desir...
Conference Paper
Full-text available
Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T,t) where T has leaf set S and t has leaf set R,...
Article
Full-text available
Humans make moral judgments every day, and research demonstrates that these evaluations are based on a host of related event features (e.g., harm, legality). In order to acquire systematic data on how moral judgments are made, our assessments need to be expanded to include real-life, ecologically valid stimuli that take into account the numerous ev...
Preprint
Full-text available
Species tree estimation from loci sampled from multiple genomes is now common, but is challenged by the heterogeneity across the genome due to multiple processes, such as gene duplication and loss, horizontal gene transfer, and incomplete lineage sorting. Although methods for estimating species trees have been developed that address gene tree heter...
Article
With the increasing availability of whole genome data, many species trees are being constructed from hundreds to thousands of loci. Although concatenation analysis using maximum likelihood is a standard approach for estimating species trees, it does not account for gene tree heterogeneity, which can occur due to many biological processes, such as i...
Article
Full-text available
Recent fMRI studies have outlined the critical impact of in-scanner head motion, particularly on estimates of functional connectivity. Common strategies to reduce the influence of motion include realignment, as well as the inclusion of nuisance regressors, such as the 6 realignment parameters, their first derivatives, time-shifted versions of the r...
Article
Full-text available
The utility and success of resting-state functional connectivity MRI (rs-fcMRI) depend critically on the reliability of this technique and the extent to which it accurately reflects neuronal function. One challenge is that rs-fcMRI is influenced by various sources of noise, particularly cardiac and respiratory related signal variations. The goal of...
Article
Functional MRI blood oxygen level-dependent (BOLD) signal changes can be subtle, motivating the use of imaging parameters and processing strategies that maximize the temporal signal-to-noise ratio (tSNR) and thus the detection power of neuronal activity-induced fluctuations. Previous studies have shown that acquiring data at higher spatial resoluti...
Article
Resting-state fMRI (rs-fMRI) has been demonstrated to have moderate to high reliability and produce consistent patterns of connectivity across a wide variety of subjects, sites, and scanners. However, there is no one agreed upon method to acquire rs-fMRI data. Some sites instruct their subjects, or patients, to lie still with their eyes closed, whi...
Article
Full-text available
Early life stress (ELS) and function of the hypothalamic-pituitary-adrenal axis predict later psychopathology. Animal studies and cross-sectional human studies suggest that this process might operate through amygdala-ventromedial prefrontal cortex (vmPFC) circuitry implicated in the regulation of emotion. Here we prospectively investigated the role...

Network

Cited By