About
35
Publications
9,705
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,827
Citations
Introduction
Publications
Publications (35)
Whole-genome assemblies of 19 placental mammals and two outgroup species were used to reconstruct the order and orientation of syntenic fragments in chromosomes of the eutherian ancestor and six other descendant ancestors leading to human. For ancestral chromosome reconstructions, we developed an algorithm (DESCHRAMBLER) that probabilistically dete...
Homologous synteny blocks (HSBs) and evolutionary breakpoint regions (EBRs) in mammalian chromosomes are enriched for distinct DNA features, contributing to distinct phenotypes. To reveal HSB and EBR roles in avian evolution, we performed a sequence-based comparison of 21 avian and five outgroup species using recently sequenced genomes across the a...
Currently there are few means for humanists interested in accessing and analyzing spoken word audio collections to use and to understand how to use advanced technologies for analyzing sound. The HiPSTAS (High Performance Sound Technologies for Access and Scholarship) project introduces humanists to ARLO (Adaptive Recognition with Layered Optimizati...
Used here to describe the investigation of significant sound or prosodic patterns within the context of a system that can translate these patterns into comparative visualizations across texts, the term 'distant listening' is used provocatively to suggest that readers might interpret prosodic patterns as 'noise' (or seemingly unintelligible informat...
To mine large digital libraries in humanistically meaningful ways, scholars
need to divide them by genre. This is a task that classification algorithms are
well suited to assist, but they need adjustment to address the specific
challenges of this domain. Digital libraries pose two problems of scale not
usually found in the article datasets used to...
The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation....
Supplementary Figures S1-S9, Supplementary Tables S1-S12, Supplementary Methods and Supplementary References
List of genes under positive selection in the branch leading to the pika.
List of genes under positive selection in the branch leading to the Tibetan Antelope.
One of the most difficult problems in modern genomics is the assembly of full-length chromosomes using next generation sequencing (NGS) data. To address this problem, we developed "reference-assisted chromosome assembly" (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal frag...
For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from...
Domestic yaks (Bos grunniens) provide meat and other necessities for Tibetans living at high altitude on the Qinghai-Tibetan Plateau and in adjacent regions. Comparison between yak and the closely related low-altitude cattle (Bos taurus) is informative in studying animal adaptation to high altitude. Here, we present the draft genome sequence of a f...
The volumes and velocity of data are growing at unprecedented rates, often physically distributed, have access constraints, and requirements to leverage the diverse computational fabrics like clouds and grids. The Meandre data-intensive component-based application infrastructure can leverage diversity and enables extremely scalable server clusters...
Data deluge is the norm for research as the volumes of data grow every day. Another complication is the growing number of places and ways that the data needs to be accessed. This leads to a common requirement to integrate a variety from data source and providers to work with the informational resources. This paper describes a general approach to da...
The persistence of large blocks of homologous synteny and a high frequency of breakpoint reuse are distinctive features of mammalian chromosomes that are not well understood in evolutionary terms. To gain a better understanding of the evolutionary forces that affect genome architecture, synteny relationships among 10 amniotes (human, chimp, macaque...
With the development of petascale computing systems, a long-term effort is needed to educate and train the next generation of researchers. As part of its graduate education component, the Virtual School of Computational Science and Engineering held a summer school in August 2008 entitled "Accelerators for Science and Engineering Applications," prov...
Data-intensive flow computing allows efficient processing of large volumes of data otherwise unapproachable. This paper introduces a new semantic-driven data-intensive flow infrastructure which: (1) provides a robust and transparent scalable solution from a laptop to large-scale clusters,(2) creates an unified solution for batch and interactive tas...
The investigation of the VAST Contest collection provided a valuable test for text mining techniques. Our group has focused on creating analytical tools to unveil relevant patterns and to aid with the content navigation in such text collections. Our results show how such an approach, in combination with visualization techniques, can ease the discov...
This paper addresses the problem of making text mining results more comprehensible to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections. Our system, FeatureLens, visualizes a text collection at several levels of granularity and enables users to explore interesting te...
This paper describes a system to support humanities scholars in their interpretation of literary work. It presents a user interface and web architecture that integrates text mining, a graphical user interface and visualization, while attempting to remain easy to use by non specialists. Users can interactively read and rate documents found in a digi...
Creativity protocols and methodologies tend to be time consuming if applied manually. This paper presents how information technologies can support innovation and creativity for collaborative scenario creation and discussion. The fusion of change discovery, genetics algorithms, data mining, and computer-supported collaborative tools provide computat...
The genome organizations of eight phylogenetically distinct species from five mammalian orders were compared in order to address
fundamental questions relating to mammalian chromosomal evolution. Rates of chromosome evolution within mammalian orders were
found to increase since the Cretaceous-Tertiary boundary. Nearly 20% of chromosome breakpoint r...
This paper presents a demonstration of our recent research and development of the MAIDS system, which mines alarming incidents from data streams, with the following major analysis functions: (1) multi-resolution modeling using a tilted time window framework, (2) multi-dimensional analysis using a stream "data cube" model, (3) online stream classifi...
The massive amounts of data flooding into the astronomy field hold many answers to important problems in contemporary astrophysics. The biggest problem is sifting through massive amounts of data to uncover these secrets. In this presentation, we identify an approach in which we apply data-mining techniques to the problem of photometric quasar ident...
Looping patterns rich in laminin are present in tissue samples of primary aggressive human uveal melanomas and their metastases. Because these extravascular patterns connect to blood vessels and transmit fluid in vitro and in vivo, the three-dimensional configuration of these patterns has been the subject of considerable speculation. In the current...
In the sequential world, the mapping of processes to a computer system was not a problem because of the classic Von Newman architecture. However, in the parallel world, the mapping of processes to nodes and the balancing of the load on each node becomes an issue that needs to be addressed. This problem enters software development in the implementat...
Next-generation sequencing (NGS) technologies together with de novo assembly algorithms have provided us the unprecedented opportunity to unravel the genomes of different species at low cost. This trend will be further accelerated by the launch of large-scale genome projects such the Genome 10K and i5k projects. However, due to the limitation of re...