Chapter

Parallel computations in Java with PCJ library

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we present PCJ - a new library for parallel computations in Java inspired by the partitioned global address space approach. We present design details together with the examples of usage for basic operations such as a point-point communication, synchronization or broadcast. The PCJ library is easy to use and allows for a fast development of the parallel programs. It allows to develop distributed applications in easy way, hiding communication details which allows user to focus on the implementation of the algorithm rather than on network or threads programming. The objects and variables are local to the program threads however some variables can be marked shared which allows to access them from different nodes. For shared variables PCJ offers one-sided communication which allows for easy access to the data stored at the different nodes. The parallel programs developed with PCJ can be run on the distributed systems with different Java VM running on the nodes. In the paper we present evaluation of the performance of the PCJ communication on the state of art hardware. The results are compared with the native MPI implementation showing good performance and scalability of the PCJ.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For these purposes, we have developed the PCJ library [2]. PCJ is implementing the Partitioned Global Address Space (PGAS) programming paradigm [3], as languages adhering to it are very promising in the context of exascale. ...
... In this paper, we focus on the comparison of PCJ with Java-based solutions. The performance comparison with the C/MPI based codes has been presented in previous papers [2,10]. ...
... The PCJ library PCJ [2] is an OpenSource Java library available under the BSD license with the source code hosted on GitHub. PCJ does not require any language extensions or special compiler. ...
Article
Full-text available
With the development of peta- and exascale size computational systems there is growing interest in running Big Data and Artificial Intelligence (AI) applications on them. Big Data and AI applications are implemented in Java, Scala, Python and other languages that are not widely used in High-Performance Computing (HPC) which is still dominated by C and Fortran. Moreover, they are based on dedicated environments such as Hadoop or Spark which are difficult to integrate with the traditional HPC management systems. We have developed the Parallel Computing in Java (PCJ) library, a tool for scalable high-performance computing and Big Data processing in Java. In this paper, we present the basic functionality of the PCJ library with examples of highly scalable applications running on the large resources. The performance results are presented for different classes of applications including traditional computational intensive (HPC) workloads (e.g. stencil), as well as communication-intensive algorithms such as Fast Fourier Transform (FFT). We present implementation details and performance results for Big Data type processing running on petascale size systems. The examples of large scale AI workloads parallelized using PCJ are presented.
... PCJ is a library [2, 3, 4, 5] for Java language that helps to perform parallel and distributed calculations. It is able to work on the multicore systems with the typical interconnect such as ethernet or infiniband providing users with the uniform view across nodes. ...
... There is one thread per node for receiving incoming data and another one for processing messages. The communication is nonblocking and uses 256 KB buffer by default [3]. The buffer size can be changed using dedicated JVM parameter. ...
... The PCJ library has been successfully used to parallelize a number of applications including typical HPC benchmarks [15] receiving HPC Challenge Award at recent Supercomputing Conference (SC 2014). Some examples can be viewed on the [3]. Recently PCJ has been used to parallelize the problem of a large graph traversing. ...
Chapter
Full-text available
With the wide adoption of the multicore and multiprocessor systems the parallel programming became a very important element of the computer science. The programming of the multicore systems is still complicated and far to be easy. The difficulties are caused, amongst others, by the parallel tools, libraries and programming models which are not easy especially for a nonexperienced programmer. In this paper, we present PCJ-a Java library for parallel programming of heterogeneous multicore systems. The PCJ is adopting Partitioned Global Address Space paradigm which makes programming easy. We present basic functionality pf the PCJ library and its usage for parallelization of selected applications. The scalability of the genetic algorithm implementation is presented. The parallelization of the N-body algorithm implementation with PCJ is also described.
... In previous works, we have shown that the PCJ library allows for easy and feasible development of computational applications as well as Big Data and AI processing running on supercomputers or clusters 3 . The performance comparison with the C/MPI based codes has been presented in previous papers 4,5 . The extensive comparison with Java-based solutions including APGAS (Java implementation of X10 language) has been also performed 6,3 . ...
Article
Large-scale computing and data processing with cloud resources is gaining popularity. However, the usage of the cloud differs from traditional high-performance computing (HPC) systems and both algorithms and codes have to be adjusted. This work is often time-consuming and performance is not guaranteed. To address this problem we have developed the PCJ library (parallel computing in Java), a novel tool for scalable HPC and big data processing in Java. In this article, we present a performance evaluation of parallel applications implemented in Java using the PCJ library. The performance evaluation is based on the examples of highly scalable applications of different characteristics focusing on CPU, communication or I/O. They run on the traditional HPC system and Amazon web services Cloud as well as Linaro Developer Cloud. For the clouds, we have used Intel x86 and ARM processors for running Java codes without changing any line of the program code and without the need for time-consuming recompilation. Presented applications have been parallelized using the partitioned global address space programming model and its realization in the PCJ library. Our results prove that the PCJ library, due to its performance and ability to create simple portable code, has great promise to be successful for the parallelization of various applications and run them on the cloud with a performance close to HPC systems.
... In the previous works, we have shown that the PCJ library allows for the easy and feasible development of computational applications as well as Big Data and AI processing running on supercomputers or clusters. The performance comparison with the C/MPI based codes has been presented in previous papers [22,26]. The extensive comparison with Java-based solutions including APGAS (Java implementation of X10 language) has been also performed [27,31]. ...
Chapter
Cloud resources are more often used for large scale computing and data processing. However, the usage of the cloud is different than traditional High-Performance Computing (HPC) systems and both algorithms and codes have to be adjusted. This work is often time-consuming and performance is not guaranteed. To address this problem we have developed the PCJ library (Parallel Computing in Java), a novel tool for scalable high-performance computing and big data processing in Java. In this paper, we present a performance evaluation of parallel applications implemented in Java using the PCJ library. The performance evaluation is based on the examples of highly scalable applications that run on the traditional HPC system and Amazon AWS Cloud. For the cloud, we have used Intel x86 and ARM processors running Java codes without changing any line of the program code and without the need for time-consuming recompilation. Presented applications have been parallelized using the PGAS programming model and its realization in the PCJ library. Our results prove that the PCJ library, due to its performance and ability to create simple portable code, has great promise to be successful for the parallelization of various applications and run them on the cloud with a similar performance as for HPC systems.
... Another notable PGAS implementation is Parallel Computing in Java [29] (PCJ), providing a library of functions and dedicated annotations for distributed memory access over an HPC cluster. e proposed solution uses Java language constructs like classes, interfaces, and annotations for storing and exchanging common data between the cooperating processes, potentially placed in different Java Virtual Machines on separated cluster nodes. ...
Article
Full-text available
This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals, ease of programming, debugging, deployment, portability, level of parallelism, constructs enabling parallelism and synchronization, features introduced in recent versions indicating trends, support for hybridity in parallel execution, and disadvantages. Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future. It can help to shape future versions of programming standards, select technologies best matching programmers’ needs, and avoid potential difficulties while using high-performance computing systems.
... In the PCJ, there is possibility to assign tasks into groups. Groups can be used for simplify collective operations like broadcast or synchronize [8]. Each node has its own, unique for whole calculations, identifier. ...
Conference Paper
With the wide adoption of the multicore and multiprocessor systems, parallel programming becomes very important element of the computer science education. However, the number of students exposed to the parallel programming is still limited and it is difficult to increase this number using traditional approach to teaching. The difficulties are caused, amongst others, by the parallel tools, libraries and programming models. The parallel programming using the message passing model is difficult, the shared memory model is easier to learn but writing codes which scales well is not easy. There is quite potential in the PGAS languages but they are not widely popularized. Finally, the teaching of scalable parallel programming requires access to the large computational systems which is not easy to obtain and even then, the operating systems and its specific features like operating and queueing systems provide students with additional challenges. In this paper we present extension of the developed by us ZawodyWeb system for on-line validation of the programs sent by the students. The ZawodyWeb system has been extended to support parallel programs written in different programming paradigms. With the help of UNICORE middleware, it allows to run students problems on the large scale production facilities. The added value is simple web interface which reduces all peculiarities of the large multiprocessor computers. The developed by us system has been verified during the parallel programming course for the undergraduate students from the computer science program.
Chapter
In this paper, we present performance and scalability of the Java codes parallelized on the Intel KNL platform using Java and PCJ Library. The parallelization is performed using PGAS programming model with no modification to Java language nor Java Virtual Machine. The obtained results show good overall performance, especially for parallel applications. The microbenchmark results, compared to the C/MPI, show that PCJ communication efficiency should be improved.
Chapter
Computations based on graphs are very common problems but complexity, increasing size of analyzed graphs and a huge amount of communication make this analysis a challenging task. In this paper, we present a comparison of two parallel BFS (Breath-First Search) implementations: MapReduce run on Hadoop infrastructure and in PGAS (Partitioned Global Address Space) model. The latter implementation has been developed with the help of the PCJ (Parallel Computations in Java) - a library for parallel and distributed computations in Java. Both implementations realize the level synchronous strategy - Hadoop algorithm assumes iterative MapReduce jobs, whereas PCJ uses explicit synchronization after each level. The scalability of both solutions is similar. However, the PCJ implementation is much faster (about 100 times) than the MapReduce Hadoop solution.
Conference Paper
Graph processing is used in many fields of science such as sociology, risk prediction or biology. Although analysis of graphs is important it also poses numerous challenges especially for large graphs which have to be processed on multicore systems. In this paper, we present PGAS (Partitioned Global Address Space) version of the level-synchronous BFS (Breadth First Search) algorithm and its implementation written in Java. Java so far is not extensively used in high performance computing, but because of its popularity, portability, and increasing capabilities is becoming more widely exploit especially for data analysis. The level-synchronous BFS has been implemented using a PCJ (Parallel Computations in Java) library. In this paper, we present implementation details and compare its scalability and performance with the MPI implementation of Graph500 benchmark. We show good scalability and performance of our implementation in comparison with MPI code written in C. We present challenges we faced and optimizations we used in our implementation necessary to obtain good performance.
Article
This paper describes the Java MPI bindings that have been included in the Open MPI distribution. Open MPI is one of the most popular implementations of MPI, the Message-Passing Interface, which is the predominant programming paradigm for parallel applications on distributed memory computers. We have added Java support to Open MPI, exposing MPI functionality to Java programmers. Our approach is based on the Java Native Interface, and has similarities with previous efforts, as well as important differences. This paper serves as a reference for the application program interface, and in addition we provide details of the internal implementation to justify some of the design decisions. We also show some results to assess the performance of the bindings.
Chapter
Graph-based computations are used in many applications. Increasing size of analyzed data and its complexity make graph analysis a challenging task. In this paper we present performance evaluation of Java implementation of Graph500 benchmark. It has been developed with the help of the PCJ (Parallel Computations in Java) library for parallel and distributed computations in Java. PCJ is based on a PGAS (Partitioned Global Address Space) programming paradigm, where all communication details such as threads or network programming are hidden. In this paper, we present Java implementation details of first and second kernel from Graph500 benchmark. The results are compared with the existing MPI implementations of Graph500 benchmark, showing good scalability of PCJ library.
Chapter
This paper presents the application of the PCJ library for the parallelization of the selected HPC applications implemented in Java language. The library is motivated by partitioned global address space (PGAS) model represented by Co-Array Fortran, Unified Parallel C, X10 or Titanium. In the PCJ, each task has its own local memory and storesand access variables locally. Variables can be shared between tasks and can be accessed, read and modified by other tasks. The library provides methods to perform basic operations like synchronization of tasks, get and put values in asynchronous one-sided way. Additionally the library offers methods for creating groups of tasks, broadcasting and monitoring variables. The PCJ has ability to work on the multinode multicore systems hiding details of inter- and intranode communication. The PCJ library fully complies with Java standards therefore the programmer does not have to use additional libraries, which are not part of the standard Java distribution. In this paper the PCJ library has been used to run example HPC applications on the multicore nodes. In particular we present performance results for parallel raytracing, matrix multiplication and map-reduce calculations. The detailed information on performance of the reduction operation is also presented. The results show good performance and scalability compared to native implementations of the same algorithms. In particular, MPI C++ and Java 8 parallel streams have been used as a reference. It is noteworthy that the PCJ library due to its performance and ability to create simple code has great promise to be successful for parallelization of the HPC applications.
Conference Paper
In this paper we present PCJ - a new library for parallel computations in Java. The PCJ library implements partitioned global address space approach. It hides communication details and therefore it is easy to use and allows for fast development of parallel programs. With the PCJ user can focus on implementation of the algorithm rather than on thread or network programming. The design details with examples of usage for basic operations are described. We also present evaluation of the performance of the PCJ communication on the state of art hardware such as cluster with gigabit interconnect. The results show good performance and scalability when compared to native MPI implementations.
Chapter
Full-text available
This paper presents the performance of the PCJ -- a new library for Java language that helps to perform parallel and distributed calculations. The library is motivated by partitioned global address space (PGAS) model represented by Co-Array Fortran, Unified Parallel C or Titanium. PCJ has ability to work on the multicore systems hiding details of inter- and intranode communication. In the PCJ, each task has its own local memory and stores and access variables only locally. Some variables can be shared between tasks and that variables can be accessed, read and modified by other tasks. The library provides methods to perform basic operations like synchronization of tasks, get and put values in asynchronous one-sided way. Additionally the library offers methods for creating groups of tasks, broadcasting and monitoring variables. The PCJ library fully complies with Java standards therefore the programmer does not have to use additional libraries, which are not part of the standard Java distribution. In this paper the PCJ library with the support for multicore, multithreaded systems is presented. It has been used to run example HPC benchmark suite applications on the multicore nodes. The results show good performance and scalability compared to native MPI and OpenMP implementations of the same algorithms. It is noteworthy that the PCJ library has great promise to be successful in scientific applications.
Article
Full-text available
UPC is a parallel extension of the C programming language intended for multiprocessors with a common global address space. A descendant of Split-C [CDG 93], AC [CaDr 95], and PCP [BrWa 95], UPC has two primary objectives: 1) to provide efficient access to the underlying machine, and 2) to establish a common syntax and semantics for explicitly paral-lel programming in C. The quest for high performance means in particular that UPC tries to minimize the overhead involved in communication among cooperating threads. When the underlying hardware enables a processor to read and write remote memory without inter-vention by the remote processor (as in the SGI/Cray T3D and T3E), UPC provides the pro-grammer with a direct and easy mapping from the language to low-level machine instructions. At the same time, UPC's parallel features can be mapped onto existing mes-sage-passing software or onto physically shared memory to make its programs portable from one parallel architecture to another. As a consequence, vendors who wish to imple-ment an explicitly parallel C could use the syntax and semantics of UPC as a basis for a standard.
Article
Full-text available
Lightweight Remote Procedure Call (LRPC) is a communication facility designed and optimized for communication between protection domains on the same machine.In contemporary small-kernel operating systems, existing RPC systems incur an unnecessarily high cost when used for the type of communication that predominates — between protection domains on the same machine. This cost leads system designers to coalesce weakly-related subsystems into the same protection domain, trading safety for performance. By reducing the overhead of same-machine communication, LRPC encourages both safety and performance.LRPC combines the control transfer and communication model of capability systems with the programming semantics and large-grained protection model of RPC. LRPC achieves a factor of three performance improvement over more traditional approaches based on independent threads exchanging messages, reducing the cost of same-machine communication to nearly the lower bound imposed by conventional hardware.LRPC has been integrated into the Taos operating system of the DEC SRC Firefly multiprocessor workstation.
Conference Paper
Full-text available
A distributed Java Virtual Machine (DJVM) spanning multiple cluster nodes can provide a true parallel execution environment for multi-threaded Java applications. Most existing DJVMs suffer from the slow Java execution in interpretive mode and thus may not be efficient enough for solving computation-intensive problems. We present JESSICA2, a new DJVM running in JIT compilation mode that can execute multi-threaded Java applications transparently on clusters. JESSICA2 provides a single system image (SSI) illusion to Java applications via an embedded global object space (GOS) layer. It implements a cluster-aware Java execution engine that supports transparent Java thread migration for achieving dynamic load balancing. We discuss the issues of supporting transparent Java thread migration in a JIT compilation environment and propose several lightweight solutions. An adaptive migrating-home protocol used in the implementation of the GOS is introduced. The system has been implemented on x86-based Linux clusters and significant performance improvements over the previous JESSICA system have been observed.
Article
Full-text available
The JLAPACK project provides the LAPACK numerical subroutines translated from their subset Fortran 77 source into class files, executable by the Java Virtual Machine (JVM) and suitable for use by Java programmers. This makes it possible for Java applications or applets, distributed on the World Wide Web (WWW) to use established legacy numerical code that was originally written in Fortran. The translation is accomplished using a special purpose Fortran-to-Java (source-to-source) compiler. The LAPACK API will be considerably simplified to take advantage of Java's object-oriented design. This report describes the research issues involved in the JLAPACK project, and its current implementation and status.
Article
We propose a grid programming approach using the ProAc-tive middleware. The proposed strategy addresses several grid concerns, which we have classified into three categories. I. Grid Infrastructure which handles the resource acquisition and creation using deployment descriptors and Peer-to-Peer. II. Grid Technical Services which can pro-vide non-functional transparent services like: fault tolerance, load balanc-ing, and file transfer. III. Grid Higher Level programming with: group communication and hierarchical components. We have validated our ap-proach with several grid programming experiences running applications on heterogeneous Grid resource using more than 1000 CPUs.
Article
Clustering (and caching) is a crosscutting infrastructure service that has historically been implemented with API-based solutions. As a result, it has suffered from the same code scattering and tangling problems as other crosscutting concerns. In this paper we will show how Aspect-Oriented Programming (AOP) can help to modularize clustering and turn it into a runtime infrastructure Quality of Service. We will show how AOP can be used to plug in directly into the Java Memory Model, which allows us to maintain the key Java semantics of pass-by- reference, garbage collection and thread coordination across the cluster, e.g. essentially cluster the Java Virtual Machine underneath the user application instead of the user application directly
Article
Co-Array Fortran, formerly known as F--, is a small extension of Fortran 95 for parallel processing. A Co-Array Fortran program is interpreted as if it were replicated a number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is termed an image. The array syntax of Fortran 95 is extended with additional trailing subscripts in square brackets to give a clear and straightforward representation of any access to data that is spread across images.References without square brackets are to local data, so code that can run independently is uncluttered. Only where there are square brackets, or where there is a procedure call and the procedure contains square brackets, is communication between images involved.There are intrinsic procedures to synchronize images, return the number of images, and return the index of the current image.We introduce the extension; give examples to illustrate how clear, powerful, and flexible it can be; and provide a technical definition.
Conference Paper
The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.
Conference Paper
(RMI) is too slow, especially for high performance computing. RMI is designed for wide-area and high-latency networks, it is based on a slow object serialization, and it does not support high-performance communication networks. The paper demonstrates that a much faster drop-in RMI and an e cient serialization can be designed and implemented completely in Java without any native code. Moreover, the re-designed RMI supports non-TCP/IP communication networks, even with heterogeneous transport protocols. As a by-product, a benchmark collection for RMI is presented. This collection { asked for by the Java Grande Forum from its first meeting { can guide JVM vendors in their performance optimizations. On PCs connected through Ethernet, the better serialization and the improved RMI save a median of 45 % (maximum of 71%) of the runtime for some set of arguments. On our Myrinet-based ParaStation network (a cluster of DEC Alphas) we save a median of 85 % (maximum of 96%), compared to standard RMI, standard serialization, and Fast Ethernet; a remote method invocation runs as fast as 115 s round trip time, compared to about 1.5 ms. 1
Conference Paper
Parallel Java is a parallel programming API whose goals are (1) to support both shared memory (thread-based) parallel programming and cluster (message-based) paral- lel programming in a single unified API, allowing one to write parallel programs combining both paradigms; (2) to provide the same capabilities as OpenMP and MPI in an object oriented, 100% Java API; and (3) to be easily de- ployed and run in a heterogeneous computing environment of single-core CPUs, multi-core CPUs, and clusters thereof. This paper describes Parallel Java's features and architec- ture; compares and contrasts Parallel Java to other Java- based parallel middleware libraries; and reports perfor- mance measurements of Parallel Java programs.
Article
Titanium is a language and system for high-performance parallel scientific computing. Titanium uses Java as its base, thereby leveraging the advantages of that language and allowing us to focus attention on parallel computing issues. The main additions to Java are immutable classes, multi- dimensional arrays, an explicitly parallel SPMD model of computation with a global address space, and zone-based memory management. We discuss these features and our design approach, and report progress on the development of Titanium, including our current driving application: a three-dimensional adaptive mesh refinement parallel Poisson solver.
Article
Recently, there has been a lot of interest in using Java for parallel programming. Efforts have been hindered by lack of standard Java parallel programming APIs. To alleviate this problem, various groups started projects to develop Java message passing systems modeled on the successful Message Passing Interface (MPI). Official MPI bindings are currently defined only for C, Fortran, and C++, so early MPI-like environments for Java have been divergent. This paper related an effort undertaken by a working group of the Java Grande Forum, seeking a consensus on an MPI-like API, to enhance the viability of parallel programming using Java.
Conference Paper
A distributed JVM on a cluster can provide a high-performance platform for running multithreaded Java applications transparently. Efficient scheduling of Java threads among cluster nodes in a distributed JVM is desired for maintaining a balanced system workload so that the application can achieve maximum speedup. We present a transparent thread migration system that is able to support high-performance native execution of multi-threaded Java programs. To achieve migration transparency, we perform dynamic native code instrumentation inside the JIT compiler. The mechanism has been successfully implemented and integrated in JESSICA2, a JIT-enabled distributed JVM, to enable automatic thread distribution and dynamic load balancing in a cluster environment. We discuss issues related to supporting transparent Java thread migration in a JIT-enabled distributed JVM, and compare our solution with previous approaches that use static bytecode instrumentation and JVMDI. We also propose optimizations including dynamic register patching and pseudo-inlining that can reduce the runtime overhead incurred in a migration act. We use measured experimental results to show that our system is efficient and lightweight
Standard Edition 6, Features and Enhancements http
  • Java Platform
Java Platform, Standard Edition 6, Features and Enhancements http://www.oracle.com/technetwork/java/javase/features-141434.html
SE 7 Features and Enhancements http
  • Java
Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
  • D Mallón
  • G Taboada
  • C Teijeiro
  • J Tourino
  • B Fraguela
  • A Gómez
  • R Doallo
  • J Mourino
D. Mallón, G. Taboada, C. Teijeiro, J.Tourino, B. Fraguela, A. Gómez, R. Doallo, J. Mourino. Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures In: M. Ropo, J. Westerholm, J. Dongarra (Eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface (Lecture Notes in Computer Science 5759)
Aiken Titanium: A High-Performance Java Dialect Concurrency: Practice and Experience
  • K A Yelick
  • L Semenzato
  • G Pike
  • C Miyamoto
  • B Liblit
  • A Krishnamurthy
  • P N Hilfinger
  • S L Graham
  • D Gay
  • P Colella
K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken Titanium: A High-Performance Java Dialect Concurrency: Practice and Experience, Vol. 10, No. 11-13, September-November 1998.
A Unified API for Shared Memory and Cluster Parallel Programming in 100% Java Parallel and Distributed Processing Symposium Java Grande Project: benchmark suite http
  • A Java
A. Kaminsky Parallel Java: A Unified API for Shared Memory and Cluster Parallel Programming in 100% Java Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International [11] Java Grande Project: benchmark suite http://www.epcc.ed.ac.uk/research/java-grande/