
Quinn Snell- Brigham Young University–Hawaii
Quinn Snell
- Brigham Young University–Hawaii
About
98
Publications
15,562
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,599
Citations
Introduction
Current institution
Publications
Publications (98)
Background
Bullying, encompassing physical, psychological, social, or educational harm, affects approximately 1 in 20 United States teens aged 12-18. The prevalence and impact of bullying, including online bullying, necessitate a deeper understanding of risk and protective factors to enhance prevention efforts. This study investigated the key risk...
Introduction
Addressing the problem of suicidal thoughts and behavior (STB) in adolescents requires understanding the associated risk factors. While previous research has identified individual risk and protective factors associated with many adolescent social morbidities, modern machine learning approaches can help identify risk and protective fact...
We present kleuren, a novel assembly-free method to reconstruct phylogenetic trees using the Colored de Bruijn Graph. kleuren works by constructing the Colored de Bruijn Graph and then traversing it, finding bubble structures in the graph that provide phylogenetic signal. The bubbles are then aligned and concatenated to form a supermatrix, from whi...
Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine cl...
Motivation:
The contig orientation problem, which we formally define as the MAX-DIR problem, has at times been addressed cursorily and at times using various heuristics. In setting forth a linear-time reduction from the MAX-CUT problem to the MAX-DIR problem, we prove the latter is NP-complete. We compare the relative performance of a novel greedy...
Genome assemblers to date have predominantly targeted haploid reference reconstruction from homozygous data. When applied to diploid genome assembly, these assemblers perform poorly, owing to the violation of assumptions during both the contigging and scaffolding phases. Effective tools to overcome these problems are in growing demand. Increasing p...
Next-Generation Sequencing experiments have been used to identify genotypes that are associated with many medical conditions. An important part of Next Generation read processing is the mapping of short reads to a reference genome. Although many algorithms have been created to perform this mapping, there are many reads that cannot be mapped because...
This research employs an exhaustive search of different at- Tribute selection algorithms in order to provide a more struc- Tured approach to learning design for prediction of Alzheimer's clinical dementia rating (CDR).
Background
Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many or...
DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, seque...
Secondary structure prediction is an important step in understanding gene function. Several algorithms have been proposed for applying machine learning techniques to this problem. This research examines these algorithms and constructs a framework that is effective in providing accurate predictions.
Moleculo DNA sequencing technology provides extremely accurate, phased, reads having an average length of over 4,000 bp. Very little is yet known about the precise characteristics of these reads. We estimate a lower bound for the single nucleotide substitution error rate of these reads, and provide probabilities for each type of substitution. We al...
Background
Since the advent of microarray technology, numerous methods have been devised to infer gene regulatory relationships from gene expression data. Many approaches that infer entire regulatory networks. This produces results that are rich in information and yet so complex that they are often of limited usefulness for researchers. One alterna...
In the context of genome assembly, the contig orientation problem is described as the problem of removing sufficient edges from the scaffold graph so that the remaining subgraph assigns a consistent orientation to all sequence nodes in the graph. This problem can also be phrased as a weighted MAX-CUT problem. The performance of MAX-CUT heuristics i...
Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient...
Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a...
I. MOTIVATION: Due to the massive amounts of data generated from each instrument run, next generation sequencing technologies have presented researchers with unique analytical challenges which require innovative, computationally efficient statistical solutions. Here we present a parallel implementation of a probabilistic Pair-Hidden Markov Model fo...
Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the f...
Mapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute mem...
Long branch attraction (LBA) is a problem that afflicts both the parsimony and maximum likelihood phylogenetic analysis techniques. Research has shown that parsimony is particularly vulnerable to inferring the wrong tree in Felsenstein topologies. The long branch extraction method is a procedure to detect a data set suffering from this problem so t...
Modern approaches to treating genetic disorders, cancers and even epidemics rely on a detailed understanding of the underlying gene signaling network. Previous work has used time series microarray data to infer gene signaling networks given a large number of accurate time series samples. Microarray data available for many biological experiments is...
A central issue to systems biology is modeling how genes interact with each other. The non-linear relationships between genes and feedback loops in the network makes modeling gene regulatory networks (GRNs) a difficult problem. In this paper, we examine modeling GRNs using neural networks (NNs) with hidden layers to predict gene expression levels....
Phylogenetic analysis is an integral part of biolog-ical research. As the number of sequenced genomes increases, available data sets are growing in number and size. Several algorithms have been proposed to handle these larger data sets. A family of algorithms known as disc covering methods (DCMs), have been selected by the NSF funded CIPRes project...
Abstract—Fundamental,to multiple sequence,alignment,algo- rithms,is modeling,insertions and,deletions (gaps). The most prevalent model,is to use gap open,and gap,extension penalties. While gap open and,gap extension penalties are well understood conceptually, their effects on multiple sequence alignment, and consequently,on phylogeny,scores are not...
Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be...
Many online sources of gene interaction networks supply rich visual data regarding gene pathways that can aid in the study of biological processes, disease research and drug discovery. PathGen incorporates data from several sources to create transitive connections that span multiple gene interaction databases. Results are displayed in a comprehensi...
We present a new algorithm, ChemAlign, that uses physicochemical properties and secondary structure elements to create biologically relevant multiple sequence alignments (MSAs). Additionally, we introduce the physicochemical property difference (PPD) score for the evaluation of MSAs. This score is the normalized difference of physicochemical proper...
A packet-switched network architecture named Qnet and programming interface is presented that simplifies the integration of reconfigurable computing modules within a Field-Programmable Gate Array (FPGA). Qnet provides an abstraction layer to the designer of FPGA accelerator modules that hides the complexities of the system, while supporting a high...
The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research.
In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation...
Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological
studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data
sets which can be analyzed is limited by the exponential growth in the number of trees that must be...
PSODA is a comprehensive phylogenetics package, including alignment, phylogenetic search under both parsimony and maximum likelihood, and visualisation and analysis tools. PSODA offers performance comparable to PAUP* in an open source package that aims to provide a foundation for researchers examining new phylogenetic algorithms. A key new feature...
Long branch attraction is a problem that afflicts phylogenetic methods and a procedure to detect a data set suffering from this problem is the long branch extraction method[1]. This method has been well cited and used by many authors for their analysis but no strong validation has been performed as to its accuracy. We performed such an analysis by...
Numerous online sources of gene interaction networks supply rich visual data regarding gene pathways. These pathways are useful in understanding inter-gene regulation and interaction and are integral to the study of biological processes, disease research and drug discovery. Most annotated pathways involve a network of relations between genes, manua...
Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results
in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison...
Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison...
Gene regulatory networks lie at the heart of life's biological machinery. Our understanding of these mechanisms is directly related to our understanding of many diseases. Current methods for determining the structure of gene regulatory networks are limited in scale due to their computational complexity. By assuming linearity, genomescale analysis b...
– A wide variety of phylogenetic search programs are available to the modern bioinformatics researcher. All of these programs claim to be different implementations of the same basic search strategies and scoring methods. However differences among the implementations lead to differences in the scores between methods. Great care must be taken when co...
Phylogenetic search is a key tool used in a variety of biological research endeavours. However, this search problem is known to be computationally difficult, due to the astronomically large search space, making the use of heuristic methods necessary. The performance of heuristic methods for finding Maximum Likelihood (ML) trees can be improved by u...
The relationship between changes in gene expression and physical characteristics associated with Down syndrome is not well understood. Chromosome 21 genes interact with nonchromosome 21 genes to produce Down syndrome characteristics. This indirect influence, however, is difficult to empirically define due to the number, size, and complexity of the...
Motivation:
Multiple sequence alignments (MSAs) are at the heart of bioinformatics analysis. Recently, a number of multiple protein sequence alignment benchmarks (i.e. BAliBASE, OXBench, PREFAB and SMART) have been released to evaluate new and existing MSA applications. These databases have been well received by researchers and help to quantitativ...
The performance of maximum likelihood searches can be boosted by using the most parsimonious tree as a starting point for the search. The time spent in performing the parsimony search to find this starting tree is insignificant compared to the time spent in the maximum likelihood search, leading to an overall gain in search time. These parsimony bo...
Fundamental to Multiple Sequence Alignment (MSA) algorithms is modelling insertions and deletions (gaps). The most prevalent model is to use Gap Open Penalties (GOP) and Gap Extension Penalties (GEP). While GOP and GEP are well understood conceptually, their effects on MSA and consequently on phylogeny scores are not as well understood. We use exha...
Abstract—PSODA is an open-source phylogenetic search application that implements,traditional parsimony,and,likelihood search techniques as well as advanced,search algorithms. PSODA is compatible,with PAUP and,the search algorithms,are competitive with those in PAUP. PSODA also adds,a basic scripting language to the PAUP block, making it possible to...
Phylogenetic analysis is a central tool in studies of comparative genomics. When a new region of DNA is isolated and sequenced, researchers are often forced to throw away months of computation on an existing phylogeny of homologous sequences in order to incorporate this new sequence. The previously constructed trees are often discarded, and the res...
DNA sequence alignment is a critical step in identifying homology between organisms. The most widely used alignment program, ClustalW, is known to suffer from the local minima problem, where suboptimal guide trees produce incorrect gap insertions. The optimization alignment approach, has been shown to be effective in combining alignment and phyloge...
We describe a new algorithm designed to quickly and robustly solve general linear problems of the form Ax = b. We describe both serial and parallel versions of the algorithm, which can be considered a prioritized version of an Alternating Multiplicative Schwarz procedure. We also adopt a general view of alternating Multiplicative Schwarz procedures...
The Internet is comprised of vast networks of wires and fiber. A common misconception is that there is an unlimited amount of bandwidth; in reality there exists only a finite amount. Each length of wire and fiber is owned by a company, and every company wants to maximize its profit. One means of improving profit is to overbook existing transmission...
Overbooking is frequently used to increase the revenue generated by a network infrastructure without incurring additional costs. If the overbooking factor is chosen appropriately, additional virtual circuits can be admitted without degrading quality of service for existing customers. Most implementations use a single factor to accept a linear fract...
Phylogentic analysis is becoming an increasingly important tool for customized drug treatments, epidemiological studies, and evolutionary analysis. The TCS method provides an important tool for dealing with genes at a population level. Existing software for TCS analysis takes an unreasonable amount of time for the analysis of significant numbers of...
Recent advances in DNA analysis, global climate modeling and computational fluid dynamics have increased the demand for supercomputing resources. Through increasing the efficiency and throughput of existing supercomputing centers, additional computational power can be provided for these applications. Backfill has been shown to increase the efficien...
The Simple Object Access Protocol (SOAP) is an XML-based messaging protocol used primarily as the wire format for a remote procedure call (RPC). SOAP offers advantages for RPC communications on the Internet. SOAP messages piggyback on HTTP and are able to penetrate Internet firewalls and reach distributed computing applications. SOAP also holds the...
As the Internet began its exponential growth into a global information environment, software was often unreliable, slow and had difficulty in interoperating with other systems. Supercomputing node counts also continue to follow high growth trends. Supercomputer and grid resource management software must mature into a reliable computational platform...
The Maui scheduler has received wide acceptance in the HPC community as a highly con gurable and effective batch scheduler. It is currently in use on hundreds of SP, O2K, and Linux cluster systems throughout the world including a high percentage of the largest and most cutting edge research sites. While the algorithms used within Maui have proven t...
Heterogeneous parallel clusters of workstations are being used to solve many important computational problems. Scheduling parallel applications on the best collection of machines in a heterogeneous computing environment is a complex problem. Performance prediction is vital to good application performance in this environment since utilization of an...
As Internet usage proliferates, resource security becomes both more important and more complex. Contemporary users and systems are ill-equipped to deal with the complex security demands of a ubiquitous, insecure network. The YGuard Access Control Model, developed at Brigham Young University, employs set-based access control lists, XML, and a modula...
In today's Internet, demand is increasing for guarantees of speed
and efficiency. Current routers are very limited in the type and
quantity of observed data they can provide, making it difficult for
providers to maximize utilization without the risk of degraded
throughput. This research uses statistical data currents provided by
router vendors to e...
Meta-scheduling, a process which allows a user to schedule a job across multiple sites, has a potential for livelock. Current systems avoid livelock by locking down resources at multiple sites and allowing a meta-scheduler to control the resources during the lock down period or by limiting job size to that which will fit on one site. The former app...
Heterogeneous distributed computing has traditionally been a problematic undertaking which increases in complexity as heterogeneity increases. This paper presents results obtained using DOGMA-a Java based system which simplifies parallel computing on heterogeneous computers. The performance of Java just-in-time compilers currently approaches C++ fo...
Recent advances in DNA sequencing technology have created large data sets upon which phylogenetic inference can be performed. However, current research is limited by the prohibitive time necessary to perform tree search on even a reasonably sized data set. Some parallel algorithms have been developed but the biological research community does not u...
Recent advances in DNA sequencing technology have created large data sets upon which phylogenetic inference can be performed. However, current research is limited by the prohibitive time necessary to perform tree search on even a reasonably sized data set. Some parallel algorithms have been developed but the biological research community does not u...
As supercomputing resources become more available, users will require resources man- aged by several local schedulers. For example, a user may request 100 processors, a telescope, network bandwidth and a graphics display in order to perform an experiment. In order to gain access to all of these resources (some of which may be in different geographi...
As supercomputing resources become more available, users will require resources managed by several local schedulers. To gain
access to a collection of resources, current systems require metajobs to run during locked down periods when the resources
are only available for metajob use. It is more convenient and e.cient if the user is able to make a re...
Heterogeneous parallel clusters of workstations are being used to
solve many important computational problems. Scheduling parallel
applications on the best collection of machines in a heterogeneous
computing environment is a complex problem. Performance prediction is
vital to good application performance in this environment since
utilization of an...
While there is growing interest in using Java for high-performance applications, many in the highperformance computing community do not believe that Java can match the performance of traditional native message passing environments. This paper discusses critical issues that must be addressed in the design of Java based message passing systems. Effic...
Recent advances in software and hardware for clustered computing have allowed scientists and computing specialists to take advantage of commodity processors in solving challenging computational problems. The setup, management and coding involved in parallel programming along with the challenges of heterogeneous computing machinery prevent most non-...
The face of parallel computing has changed in the last few years as high performance clusters of workstations are being used in conjunction with supercomputers to solve demanding computational problems. In order for a user to effectively run an application on both tightly coupled and network based clusters, he must often use different algorithms th...
One of the biggest differences between traditional supercomputers and workstation clusters is the latency involved in sending a message between processors. Wide Area Network (WAN) based workstation clusters can experience significant latency between machines at different geographical positions. Improvements in network technology can achieve margina...
Phylogenetic analysis is an integral part of many biological research programs. In essence, it is the study of gene genealogy. It is the study of gene mutation and the generational relationships. Phylogenetic analysis is being used in many diverse areas such as human epidemiology, viral transmission, biogeography, and systematics. Researchers are n...
CORBA applications can transparently use service instances running
on the client's machine, on the local-area network, or across the
Internet. Standard CORBA services help the application locate service
instances, but do not provide a mechanism to identify service instances
that will give good performance. The PerformanceBroker executes
performance...
The performance of Java just-in-time compilers currently approaches native C++, making Java a serious contender for supercomputing application development. This paper presents DOGMA – a new Java based system which enables parallel computing on heterogeneous computers. DOGMA supports parallel programming in both a traditional message-passing form an...
In this paper, we compare the Redundant Boundary Computation (RBC) algorithm for convolution with traditional parallel methods. This algorithm dramatically reduces the communication cost for the the computation in certain environments. We theoretically and experimentally study the conventional parallel algorithm and the RBC algorithm. First, we dis...
The performance of Java just-in-time compilers currently approaches native C++, making Java a serious contender for supercomputing application development. This paper presents DOGMA#a new Java based system which enables parallel computing on heterogeneous computers. DOGMA supports parallel programming in both a traditional message passing form and...
Heterogeneous distributed computing has traditionally been a
problematic undertaking which increases in complexity as heterogeneity
increases. The recent advent of Java has made heterogeneous computing a
fairly straightforward task. Nevertheless, many researchers have not
considered the use of Java in a mainstream parallel programming
environment....
This paper presents an algorithm that solves the Rendering
Equation to any desired accuracy, and can be run in parallel on
distributed memory or shared memory computer systems with excellent
scaling properties. It appears superior in both speed and physical
correctness to recent published methods involving bidirectional ray
tracing or hybrid treatm...
This paper presents the design of NetPIPE, a new Network Protocol Independent Performance Evaluator. NetPIPE maps the performance of a network across a wide range and presents the data in a new manner. Its protocol independence allows for visualization of the overhead associated with a protocol layer. Using NetPIPE has led to the discovery of a dee...
The computing community has long faced the problem of scientifically comparing different computers and different algorithms. When architectures, methods, precision, or storage capacity are very different it is difficult or misleading to compare speeds using the ratio of execution times. We present a practical and fair approach that provides mathema...
Thesis (M.S.)--Utah State University. Dept. of Computer Science, 1993. Includes bibliographical references (leaves 35-36).
We describe a new algorithm designed to quickly and robustly solve general linear problems of the form Ax = b. We describe both serial and parallel versions of the algorithm, which can be considered a prioritized version of an Alternating Multiplicative Schwarz procedure. We also adopt a general view of alternating Multiplicative Schwarz procedures...
Abstract—Due to the immensity,of phylogenetic,tree space for large data sets, researches must rely on heuristic searches to infer reasonable,phylogenies. By designing,meta-searches which appropriately,combine,a variety of heuristics and,parameter settings, researchers can significantly improve the performance of heuristic searches. Advanced,languag...
The Internet has become the foundation for world-wide digital communication. The survivability of this critical network infrastructure is crucial to businesses, universities, and government agencies. This survivability can be compromised by the failure of a small percentage of critical routers within the network. Additional security is justied on t...
When searching the Internet, the volume of information is often overwhelming. Various search engines have been developed that deal with information overload. The most popular ones sort through the abundance of data by scoring each doc-ument based upon its content or its previous track history, then presenting only those with the top scores. This pa...
Fundamental to multiple sequence alignment algorithms is modeling insertions and deletions (gaps). The most prevalent model is to use gap open and gap extension penalties. While gap open and gap extension penalties are well understood conceptually, their effects on multiple sequence alignment, and consequently on phylogeny scores are not as well un...
For complex designs using multiple factors, traditional methods of performance analysis are inadequate for assessing the improvement of program speed by increasing the number of processors. In this paper, we suggest an approach to performance analysis based on the statistical design of experiments paradigm. This approach allows for the estimation o...