[Show abstract][Hide abstract] ABSTRACT: As sequencing technologies progress, the amount of data produced grows exponentially, shifting the bottleneck of discovery
towards the data analysis phase. In particular, currently available mapping solutions for RNA-seq leave room for improvement
in terms of sensitivity and performance, hindering an efficient analysis of transcriptomes by massive sequencing. Here, we
present an innovative approach that combines re-engineering, optimization and parallelization. This solution results in a
significant increase of mapping sensitivity over a wide range of read lengths and substantial shorter runtimes when compared
with current RNA-seq mapping methods available.
[Show abstract][Hide abstract] ABSTRACT: We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA1, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100–400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.
No preview · Article · Oct 2015 · IEEE/ACM Transactions on Computational Biology and Bioinformatics
[Show abstract][Hide abstract] ABSTRACT: We present a concurrent algorithm for mapping short and long RNA sequences on multicore processors. Our solution processes the data, initially stored on disk, in batches of reads which are passed between the consecutive stages of a pipeline. A major operational reorganization of the original static pipeline, combined with a complete reimplementation based on POSIX threads, renders a dissociated execution between threads and stages/task types, so that threads can compute any type of pending task resulting in a dynamic pipeline. The experiments on a multicore platform reveal that this reorganization yields significantly higher performance, specially for architectures equipped with a small to moderate number of cores. As an additional contribution, our experiments also reveal that the use of 16-nucleotide (nt) seeds during the one of the stages of the pipeline, instead of the 15-nt length that was proposed originally, yields a remarkable reduction in the execution time of the global alignment process while maintaining the sensitivity of the algorithm.
[Show abstract][Hide abstract] ABSTRACT: In this paper we introduce a novel parallel pipeline for fast and
accurate mapping of RNA sequences on servers equipped with multicore
processors. Our software, named HPG-Aligner, leverages the speed of the
Burrows-Wheeler Transform to map a large number of RNA fragments (reads)
rapidly, as well as the accuracy of the Smith-Waterman algorithm, that
is employed to deal with conflictive reads. The aligner is complemented
with a careful strategy to detect splice junctions based on the division
of RNA reads into short segments (or seeds), which are then mapped onto
a number of candidate alignment locations, providing useful information
for the successful alignment of the complete reads. Experimental results
on platforms with AMD and Intel multicore processors report the
remarkable parallel performance of HPG-Aligner, on short and long RNA
reads, which excels in both execution time and sensitivity to an
state-of-the-art aligner such as TopHat 2 built on top of Bowtie and
[Show abstract][Hide abstract] ABSTRACT: In this paper, we present our joint efforts to design and develop parallel implementations of the GNU Scientific Library for
a wide variety of parallel platforms. The multilevel software architecture proposed provides several interfaces: asequential
interface that hides the parallel nature of the library to sequential users, a parallel interface for parallel programmers,
and a web services based interface to provide remote access to the routines of the library. The physical level of the architecture
includes platforms ranging from distributed and shared-memory multiprocessors to hybrid systems and heterogeneous clusters.
Several well-known operations arising in discrete mathematics and sparse linear algebra are used to illustrate the challenges,
benefits, and performance of different parallelization approaches.
No preview · Article · Apr 2009 · The Journal of Supercomputing
[Show abstract][Hide abstract] ABSTRACT: Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.
Full-text · Article · Mar 2009 · Computational Linguistics
[Show abstract][Hide abstract] ABSTRACT: We present several algorithms to compute the solution of a linear system of equations on a GPU, as well as general techniques
to improve their performance, such as padding and hybrid GPU-CPU computation. We also show how iterative refinement with mixed-precision
can be used to regain full accuracy in the solution of linear systems. Experimental results on a G80 using CUBLAS 1.0, the
implementation of BLAS for NVIDIA® GPUs with unified architecture, illustrate the performance of the different algorithms
and techniques proposed.
[Show abstract][Hide abstract] ABSTRACT: The increase in performance of the last generations of graphics processors (GPUs) has made this class of platform a coprocessing tool with remarkable success in certain types of operations. In this paper we evaluate the performance of the Level 3 operations in CUBLAS, the implementation of BIAS for NVIDIAreg GPUs with unified architecture. From this study, we gain insights on the quality of the kernels in the library and we propose several alternative implementations that are competitive with those in CUBLAS. Experimental results on a GeForce 8800 Ultra compare the performance of CUBLAS and the new variants.
[Show abstract][Hide abstract] ABSTRACT: The availability of large amounts of data is a fundamental prerequisite for building handwriting recognition systems. Any system needs a test set of labelled samples for measuring its performance along its development and guiding it. Moreover, there are systems that need additional samples for learning the recognition task they have to cope with later, i.e. a training set. Thus, the acquisition and distribution of standard databases has become an important issue in the handwriting recognition research community. Examples of widely used databases in the online domain are UNIPEN, IRONOFF, and Pendigits. This paper describes the current state of our own database, UJIpenchars, whose first version contains online representations of 1 364 isolated handwritten characters produced by 11 writers and is freely available at the UCI Machine Learning Repository. Moreover, we have recently concluded a second acquisition phase, totalling more than 11 000 samples from 60 writers to be made available in short as UJIpenchars2.
[Show abstract][Hide abstract] ABSTRACT: We investigate the solution of large-scale generalized algebraic Bernoulli equations as those arising in control and systems
theory. Here, we discuss algorithms based on a generalization of the Newton iteration for the matrix sign function. The algorithms
are easy to parallelize and provide an efficient numerical tool to solve large-scale problems. Both the accuracy and the parallel
performance of our implementations on a cluster of Intel Xeon processors are reported.
Full-text · Article · Dec 2007 · Numerical Algorithms
[Show abstract][Hide abstract] ABSTRACT: We investigate the parallel solution of large-scale discrete-time alge-braic Riccati equations, as those arising in control and systems theory, on a cluster of symmetric multiprocessors. The structure-preserving doubling algorithms considered in this paper are composed of matrix operations im-plemented in existing parallel dense linear algebra libraries. We suggest a parallel implementation that employs a new and efficient update strategy for the doubling iteration and a suitable stopping criterion. Numerical experiments on a cluster of multiprocessor nodes confirm the parallel per-formance and scalability of the doubling algorithms, which are comparable to those of other parallel DARE solvers based on the matrix sign and disk functions.
[Show abstract][Hide abstract] ABSTRACT: We discuss and compare two approaches for model reduction of large-scale unstable systems on parallel computers. The first method proceeds by computing the additive decomposition of the transfer function via a block diagonalization, followed by a reduction of the stable part of the system using techniques based on state-space truncation. The second method employs a representation of the controllability and observability Gramians of an unstable systems in terms of the Gramians of the stabilized system where the particular stabilization is obtained via the solution of dual algebraic Bernoulli equations. Based on these Gramians, balanced truncation is then applied in the usual manner. All core computational steps in these methods can be efficiently solved on parallel computers by means of diverse variants of the Newton iteration for the sign function. Numerical experiments on a cluster of Intel Xeon processors show the numerical and parallel performances of these methods.
[Show abstract][Hide abstract] ABSTRACT: We present our joint effort to develop a web based interface for the GNU Scientific library and its parallelization. The interface
has been developed using standard web services technology to enable the use of non local resources to execute parallel programs.
The final result is a computing service where sequential and parallel routines demanding high performace computing are supplied.
The design allows to incorporate new servers and platforms with a small number of software requirements. We also introduce
an open source development environment to allow developers to cooperate in the parallelization of the GNU Scientific library
codes. These codes also will be available trough the web based interface to end users. Performance results are shown for some
GSL codes in two cluster heterogeneous systems using the interface enabled with web services technology.
[Show abstract][Hide abstract] ABSTRACT: The Computer-Assisted Translation (CAT) paradigm tries to integrate hu- man expertise into the automatic translation process. In this paradigm, a human translator interacts with a translation system that dynamically oers a list of translations that best completes the part of the sentence that is being translated. This human-machine sinergy aims at a double goal, to increase translator productivity and ease translators' work. In this paper, we present a CAT system based on stochastic nite-state transducer tech- nology. This system has been developed and assessed on two real parallel corpora in the framework of the European project TransType2 (TT2).
[Show abstract][Hide abstract] ABSTRACT: We present our joint effort to develop a Web based interface for the GNU Scientific library and its parallelization. The interface has been developed using standard Web services technology to enable the use of non local resources to execute parallel programs. The final result is a computing service where sequential and parallel routines demanding high performance computing are supplied. The design allows to incorporate new servers and platforms with a small number of software requirements.
[Show abstract][Hide abstract] ABSTRACT: We investigate the numerical solution of algebraic Bernoulli equations via the Newton iteration for the matrix sign function. Bernoulli equations are nonlinear matrix equations arising in control and systems theory in the context of stabilization of linear systems, coprime factorization of rational matrix-valued functions, as well as model reduction. The algorithm proposed here is easily parallelizable and thus provides an efficient tool to solve large-scale problems. We report the parallel performance and scalability of our parallel implementations on an IBM Regatta system. Efficiencies around 80% and higher are obtained for using a reduced number of nodes.