Toward the parallelization of GSL.
ABSTRACT In this paper, we present our joint efforts to design and develop parallel implementations of the GNU Scientific Library for
a wide variety of parallel platforms. The multilevel software architecture proposed provides several interfaces: asequential
interface that hides the parallel nature of the library to sequential users, a parallel interface for parallel programmers,
and a web services based interface to provide remote access to the routines of the library. The physical level of the architecture
includes platforms ranging from distributed and shared-memory multiprocessors to hybrid systems and heterogeneous clusters.
Several well-known operations arising in discrete mathematics and sparse linear algebra are used to illustrate the challenges,
benefits, and performance of different parallelization approaches.
- SourceAvailable from: Jorge González-Domínguez[Show abstract] [Hide abstract]
ABSTRACT: The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C language. The routines developed in UPCBLAS are built on top of sequential basic linear algebra subprograms functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared with a message-passing-based parallel numerical library, demonstrating good scalability and efficiency. Copyright © 2012 John Wiley & Sons, Ltd.Concurrency and Computation Practice and Experience 09/2012; 24(14):1645-1667. DOI:10.1002/cpe.1914 · 0.78 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: In this paper we present an approximate algorithm for detecting and filtering data dependencies with a sufficiently large distance between memory references. A sequence of the same operations (typically enclosed in a ‘for’ loop) can be replaced with a single SIMD operation if the distance between memory references is greater than or equal to the number of data processed in the SIMD register. Some loops that could not be vectorized on traditional vector processors, can still be parallelized for short SIMD execution. There are a number of approximate data-dependence tests that have been proposed in the literature but in all of them data dependency will be assumed when actually there is no such a dependence that could restrict parallelization related to the short SIMD execution model. By examining the properties of linear subscript expressions of possibly conflicting data references, our algorithm gives the green light to the parallelization process if some sufficient conditions regarding the dependence distance are met. Our method is based on the Banerjee test and checks the minimum and maximum distances between memory references within the iteration space rather than searching for the existence of an integer solution to the dependence equation. The proposed method extends the accuracy and applicability of the classical Banerjee test. Data dependency–Multimedia extensions–SIMD instructions–Vectorizing compilersThe Journal of Supercomputing 01/2011; 56(2):226-244. DOI:10.1007/s11227-009-0364-8 · 0.84 Impact Factor