Conference Paper

PCJ-Java Library for Highly Scalable HPC and Big Data Processing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

PCJ is a Java library for scalable high performance and computing and Big Data processing. The library implements the partitioned global address space (PGAS) model. The PCJ application is run as a multi-threaded application with the threads distributed over multiple Java Virtual Machines. Each task has its own local memory to store and access variables locally. Selected variables can be shared between tasks and can be accessed, read and modified by other tasks. The library provides methods to perform basic operations like synchronization of tasks, get and put values in an asynchronous one-sided way. Additionally, PCJ offers methods for creating groups of tasks, broadcasting and monitoring variables. The library hides details of inter-and intra-node communication-making programming easy and feasible. The PCJ library allows for easy development of highly scalable (up to 200k cores) applications running on the large resources. PCJ applications can be also run on the systems designed for data analytics such as Hadoop clusters. In this case, performance is higher than for native applications. The PCJ library fully complies with Java standards, therefore, the programmer does not have to use additional libraries, which are not part of the standard Java distribution. In this paper, we present details of the PCJ library, its API and example applications. The results show good performance and scalability. It is noteworthy that the PCJ library due to its performance and ability to create simple code has great promise to be successful for the parallelization of HPC and Big Data applications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... (Particle p, RangedList<Particle> pairs, tla) − > { ( 23) force(p, pairs, tla); (24) }); (25) //Merge all the force contributions in the accumulators back into the designated particles (26) particles.parallelAccept(acc, (Particle p, Sp a) − > p.addForce(a)); (27) //Sum the force contributions across all hosts for each particle (28) particles.allreduce((out, Particle p) − > { ( 29) out.writeDouble(p.xforce); ...
... Another approach close to ours is PCJ [28]. is pure Java library brings a PGAS programming model to Java, relying on elegant annotations to mark the variables that belong to the global address space. e library also provides collective communications operating on the variables of the global address space such as broadcast, scatter, reduce, and others [29]. ...
... -localSubmitTime; (23) // (3) Collect all orders on the ''master'' place(0) (24) orderBag.team().gather(place(0)); (25) // (4) Match buy and sell orders, populating`contractedOrders( 26) finish(()->{(27) // (4 -optional) balance the agents between places 1..n(28) if (iter % lbPeriod � � 0) {(29) async(()->{(30) // Exchange time information between hosts(31) long [] computationTime � world.allGather1(accumulatedOrderComputeTime);(32) CollectiveMoveManager mm � new CollectiveMoveManager(world); // prepare a relocator(33) performLoadBalance(computationTimes, mm); // various relocation strategies possible (34) mm.sync(); // perform the relocation (35) accumulatedOrderComputeTime � 0l; // reset accumulated order-submission time (36) agents.updateDist(); ...
Article
Full-text available
In this article, we present our relocatable distributed collection library. Building on top of the AGPAS for Java library, we provide a number of useful intranode parallel patterns as well as the features necessary to support the distributed nature of the computation through clearly identified methods. In particular, the transfer of distributed collections’ entries between processes is supported via an integrated relocation system. This enables dynamic load-balancing capabilities, making it possible for programs to adapt to uneven or evolving cluster performance. The system we developed makes it possible to dynamically control the distribution and the data flow of distributed programs through high-level abstractions. Programmers using our library can, therefore, write complex distributed programs combining computation and communication phases through a consistent API. We evaluate the performance of our library against two programs taken from well-known Java benchmark suites, demonstrating superior programmability and obtaining better performance on one benchmark and reasonable overhead on the second. Finally, we demonstrate the ease and benefits of load balancing and a more complex application, which uses the various features of our library extensively.
... As a result, a higher performance is achieved; unfortunately, it is still lower than for other Java-based solutions. In particular, it has been demonstrated that the performance of applications built using the PCJ library or APGAS (a branch of IBM's X10 language project) is higher than for Hadoop/Spark implementations [10,11]. For the calculation-intensive and communication-intensive HPC workloads, PCJ is several up to hundreds of times faster [12]. ...
... Scalable computing in Java with PCJ Library. Improved collective operations i * 2 * blockSize , 10 (i + 1) * 2 * blockSize ) ) ) ; 11 PCJ . scatter(toSendMap , Shareable . ...
... readAllLines(Paths . get(myFileName ) , 10 StandardCharsets . ISO_8859_1 ) 11 . ...
Conference Paper
Full-text available
Machine learning and Big Data workloads are becoming as important as traditional HPC ones. AI and Big Data users tend to use new programming languages such as Python, Julia, or Java, while the HPC community is still dominated by C/C++ or Fortran. Hence, there is a need for new programming libraries and languages that will integrate different applications and allow them to run on large computer infrastructure. Since modest computers are multinode and multicore, parallel execution is an additional challenge here. For that purpose, we have developed the PCJ library, which introduces parallel programming capabilities to Java using the Partitioned Global Address Space model. It does not modify language nor running environment (JVM). The PCJ library allows for easy development of parallel code and runs it on laptops, workstations, supercomputers, and the cloud. This paper presents an overview of the PCJ library and its usage in parallelizing selected workloads, including HPC, AI, and Big Data. The performance and scalability are presented. We present recent addition to the PCJ library, which are collective operations. The collective operations significantly reduce the number of lines of code to write, ensuring good performance.
... Usage of well-established industrial language opens the field for easy integration of both workload types. The PCJ library (Parallel Computing in Java) [25] fully complies with core Java standards, therefore, the programmer does not have to use additional libraries, which are not part of the standard Java distribution. Thus user is free from the burden of installing (properly versioned) dependencies, as the library is a single selfcontained jar file, which can be easily dropped into the classpath. ...
... PCJ [1,25] is an open-source Java library that does not require any language extensions or special compiler. The user has to download the single jar file (or use build automation tool with dependency resolvers like Maven or Gradle) on any system with Java installed. ...
... java.nio.*). The details of the algorithms used to implement PCJ communication are described in the [25]. ...
Chapter
Cloud resources are more often used for large scale computing and data processing. However, the usage of the cloud is different than traditional High-Performance Computing (HPC) systems and both algorithms and codes have to be adjusted. This work is often time-consuming and performance is not guaranteed. To address this problem we have developed the PCJ library (Parallel Computing in Java), a novel tool for scalable high-performance computing and big data processing in Java. In this paper, we present a performance evaluation of parallel applications implemented in Java using the PCJ library. The performance evaluation is based on the examples of highly scalable applications that run on the traditional HPC system and Amazon AWS Cloud. For the cloud, we have used Intel x86 and ARM processors running Java codes without changing any line of the program code and without the need for time-consuming recompilation. Presented applications have been parallelized using the PGAS programming model and its realization in the PCJ library. Our results prove that the PCJ library, due to its performance and ability to create simple portable code, has great promise to be successful for the parallelization of various applications and run them on the cloud with a similar performance as for HPC systems.
... Another approach close to ours is PCJ [28]. This pure Java library brings a PGAS programming model to Java, relying on elegant annotations to mark the variables that belong to the global address space. ...
Preprint
Full-text available
In this article we present our relocatable distributed collections library. Building on top of the AGPAS for Java library, we provide a number of useful intra-node parallel patterns as well as the features necessary to support the distributed nature of the computation through clearly identified methods. In particular, the transfer of distributed collections' entries between processes is supported via an integrated relocation system. This enables dynamic load-balancing capabilities, making it possible for programs to adapt to uneven or evolving cluster performance. The system we developed makes it possible to dynamically control the distribution and the data-flow of distributed programs through high-level abstractions. Programmers using our library can therefore write complex distributed programs combining computation and communication phases through a consistent API. We evaluate the performance of our library against two programs taken from well-known Java benchmark suites, demonstrating superior programmability, and obtaining better performance on one benchmark and reasonable overhead on the second. Finally, we demonstrate the ease and benefits of load-balancing and on a more complex application which uses the various features of our library extensively.
... The details of the algorithms used to implement PCJ communication are described in our previous publications. 2,3 The PCJ library provides necessary tools for PGAS programming including threads numbering, data transfer and threads synchronization. The communication is one-sided and asynchronous which makes programming easy and less error-prone. ...
Article
Large-scale computing and data processing with cloud resources is gaining popularity. However, the usage of the cloud differs from traditional high-performance computing (HPC) systems and both algorithms and codes have to be adjusted. This work is often time-consuming and performance is not guaranteed. To address this problem we have developed the PCJ library (parallel computing in Java), a novel tool for scalable HPC and big data processing in Java. In this article, we present a performance evaluation of parallel applications implemented in Java using the PCJ library. The performance evaluation is based on the examples of highly scalable applications of different characteristics focusing on CPU, communication or I/O. They run on the traditional HPC system and Amazon web services Cloud as well as Linaro Developer Cloud. For the clouds, we have used Intel x86 and ARM processors for running Java codes without changing any line of the program code and without the need for time-consuming recompilation. Presented applications have been parallelized using the partitioned global address space programming model and its realization in the PCJ library. Our results prove that the PCJ library, due to its performance and ability to create simple portable code, has great promise to be successful for the parallelization of various applications and run them on the cloud with a performance close to HPC systems.
... java.nio.*). The details of the algorithms used to implement PCJ communication are described in [26]. ...
Article
Full-text available
With the development of peta- and exascale size computational systems there is growing interest in running Big Data and Artificial Intelligence (AI) applications on them. Big Data and AI applications are implemented in Java, Scala, Python and other languages that are not widely used in High-Performance Computing (HPC) which is still dominated by C and Fortran. Moreover, they are based on dedicated environments such as Hadoop or Spark which are difficult to integrate with the traditional HPC management systems. We have developed the Parallel Computing in Java (PCJ) library, a tool for scalable high-performance computing and Big Data processing in Java. In this paper, we present the basic functionality of the PCJ library with examples of highly scalable applications running on the large resources. The performance results are presented for different classes of applications including traditional computational intensive (HPC) workloads (e.g. stencil), as well as communication-intensive algorithms such as Fast Fourier Transform (FFT). We present implementation details and performance results for Big Data type processing running on petascale size systems. The examples of large scale AI workloads parallelized using PCJ are presented.
... More detailed information about the PCJ library can be found in [38,39]. ...
Article
Full-text available
Sorting algorithms are among the most commonly used algorithms in computer science and modern software. Having efficient implementation of sorting is necessary for a wide spectrum of scientific applications. This paper describes the sorting algorithm written using the partitioned global address space (PGAS) model, implemented using the Parallel Computing in Java (PCJ) library. The iterative implementation description is used to outline the possible performance issues and provide means to resolve them. The key idea of the implementation is to have an efficient building block that can be easily integrated into many application codes. This paper also presents the performance comparison of the PCJ implementation with the MapReduce approach, using Apache Hadoop TeraSort implementation. The comparison serves to show that the performance of the implementation is good enough, as the PCJ implementation shows similar efficiency to the Hadoop implementation.
Conference Paper
Full-text available
In this paper, we present PCJ (Parallel Computing in Java), a novel tool for scalable high-performance computing and big data processing in Java. PCJ is Java library implementing PGAS (Partitioned Global Address Space) programming paradigm. It allows for the easy and feasible development of computational applications as well as Big Data processing. The use of Java brings HPC and Big Data type of processing together and enables running on the different types of hardware. In particular, the high scalability and good performance of PCJ applications have been demonstrated using Cray XC40 systems. We present performance and scalability of PCJ library measured on Cray XC40 systems with standard benchmarks such as ping-pong, broadcast, and random access. We describe parallelization of example applications of different characteristics including FFT and 2D stencil. Results for standard Big Data benchmarks such as word count are presented. In all cases, measured performance and scalability confirm that PCJ is a good tool to develop parallel applications of different type.
Article
Full-text available
The detailed knowledge of C. elegans connectome for 3 decades has not contributed dramatically to our understanding of worm’s behavior. One of main reasons for this situation has been the lack of data on the type of synaptic signaling between particular neurons in the worm’s connectome. The aim of this study was to determine synaptic polarities for each connection in a small pre-motor circuit controlling locomotion. Even in this compact network of just 7 neurons the space of all possible patterns of connection types (excitation vs. inhibition) is huge. To deal effectively with this combinatorial problem we devised a novel and relatively fast technique based on genetic algorithms and large-scale parallel computations, which we combined with detailed neurophysiological modeling of interneuron dynamics and compared the theory to the available behavioral data. As a result of these massive computations, we found that the optimal connectivity pattern that matches the best locomotory data is the one in which all interneuron connections are inhibitory, even those terminating on motor neurons. This finding is consistent with recent experimental data on cholinergic signaling in C. elegans, and it suggests that the system controlling locomotion is designed to save metabolic energy. Moreover, this result provides a solid basis for a more realistic modeling of neural control in these worms, and our novel powerful computational technique can in principle be applied (possibly with some modifications) to other small-scale functional circuits in C. elegans.
Chapter
Full-text available
With the wide adoption of the multicore and multiprocessor systems the parallel programming became a very important element of the computer science. The programming of the multicore systems is still complicated and far to be easy. The difficulties are caused, amongst others, by the parallel tools, libraries and programming models which are not easy especially for a nonexperienced programmer. In this paper, we present PCJ-a Java library for parallel programming of heterogeneous multicore systems. The PCJ is adopting Partitioned Global Address Space paradigm which makes programming easy. We present basic functionality pf the PCJ library and its usage for parallelization of selected applications. The scalability of the genetic algorithm implementation is presented. The parallelization of the N-body algorithm implementation with PCJ is also described.
Conference Paper
Full-text available
Building high-performance virtual machines is a complex and expensive undertaking; many popular languages still have low-performance implementations. We describe a new approach to virtual machine (VM) construction that amortizes much of the effort in initial construction by allowing new languages to be implemented with modest additional effort. The approach relies on abstract syntax tree (AST) interpretation where a node can rewrite itself to a more specialized or more general node, together with an optimizing compiler that exploits the structure of the interpreter. The compiler uses speculative assumptions and deoptimization in order to produce efficient machine code. Our initial experience suggests that high performance is attainable while preserving a modular and layered architecture, and that new high-performance language implementations can be obtained by writing little more than a stylized interpreter.
Conference Paper
Graph processing is used in many fields of science such as sociology, risk prediction or biology. Although analysis of graphs is important it also poses numerous challenges especially for large graphs which have to be processed on multicore systems. In this paper, we present PGAS (Partitioned Global Address Space) version of the level-synchronous BFS (Breadth First Search) algorithm and its implementation written in Java. Java so far is not extensively used in high performance computing, but because of its popularity, portability, and increasing capabilities is becoming more widely exploit especially for data analysis. The level-synchronous BFS has been implemented using a PCJ (Parallel Computations in Java) library. In this paper, we present implementation details and compare its scalability and performance with the MPI implementation of Graph500 benchmark. We show good scalability and performance of our implementation in comparison with MPI code written in C. We present challenges we faced and optimizations we used in our implementation necessary to obtain good performance.
Article
This paper describes the Java MPI bindings that have been included in the Open MPI distribution. Open MPI is one of the most popular implementations of MPI, the Message-Passing Interface, which is the predominant programming paradigm for parallel applications on distributed memory computers. We have added Java support to Open MPI, exposing MPI functionality to Java programmers. Our approach is based on the Java Native Interface, and has similarities with previous efforts, as well as important differences. This paper serves as a reference for the application program interface, and in addition we provide details of the internal implementation to justify some of the design decisions. We also show some results to assess the performance of the bindings.
Conference Paper
The fundamental turn of software towards concurrency that we are witnessing today has a strong impact on modeling and programming. How to properly integrate OO modelling/programming and concurrency is still an open problem, in spite of the many ad-hoc mechanisms, libraries and frameworks that have been introduced so far. We believe that such integration requires modelling and programming paradigms that make it possible to naturally exploit concurrency, decentralization of control and interaction as main ingredients of problem solving, as well as of program design and coding. In this paper we elaborate this point by discussing some main approaches in literature that propose such integration: in particular we review actors and concurrent objects first, then a recent proposal based on agent-oriented programming.
Article
We propose a grid programming approach using the ProAc-tive middleware. The proposed strategy addresses several grid concerns, which we have classified into three categories. I. Grid Infrastructure which handles the resource acquisition and creation using deployment descriptors and Peer-to-Peer. II. Grid Technical Services which can pro-vide non-functional transparent services like: fault tolerance, load balanc-ing, and file transfer. III. Grid Higher Level programming with: group communication and hierarchical components. We have validated our ap-proach with several grid programming experiences running applications on heterogeneous Grid resource using more than 1000 CPUs.
Article
Abstract The Titanium language is a Java dialect for high - performance parallel scientific computing Titanium's di erences from Java include multi - dimensional arrays, an explicitly parallel SPMD model of computation with a global address space, a form of value class, and zone - based memory management This reference manual describes the di erences between Titanium and Java
Conference Paper
Increasing interest is being shown in the use of Java for scientific applications. The Java Grande benchmark suite [4] was designed with such applications primarily in mind. The perceived lack of performance of Java still deters many potential users, despite recent advances in just-in-time (JIT) and adaptive compilers. There are however few benchmark results available comparing Java to more traditional languages such as C and Fortran. To address this issue, a subset of the Java Grande Benchmarks have been re-written in C and Fortran allowing direct performance comparisons between the three languages. The performance of a range of Java execution environments, C and Fortran compilers have been tested across a number of platforms using the suite. These demonstrate that on some platforms (notably Intel Pentium) the performance gap is now quite small.
Article
Increasing interest is being shown in the use of Java for scienti c applications. The Java Grande benchmark suite [4] was designed with such applications primarily in mind. The perceived lack of performance of Java still deters many potential users, despite recent advances in just-in-time (JIT) and adaptive compilers. There are however few benchmark results available comparing Java to more traditional languages such as C and Fortran. To address this issue, a subset of the Java Grande Benchmarks have been re-written in C and Fortran allowing direct performance comparisons between the three languages. The performance of a range of Java execution environments, C and Fortran compilers have been tested across a number of platforms using the suite. These demonstrate that on some platforms (notably Intel Pentium) the performance gap is now quite small. Keywords Java, C, Fortran, performance, benchmarking, scientic applications
Article
Increasing interest is being shown in the use of Java for largescale or Grande applications. This new use of Java placesspecific demands on the Java execution environments thatcould be tested and compared using a standard benchmarksuite. We describe the design and implementation of sucha suite, paying particular attention to Java-specific issues.Sample results are presented for a number of implementationsof the Java Virtual Machine (JVM).1 IntroductionWith the increasing ubiquity of...
Parallel Java: A unified API for shared memory and cluster parallel programming in 100% Java
  • A Kaminsky
Kaminsky, A.: Parallel Java: A unified API for shared memory and cluster parallel programming in 100% Java. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1-8. IEEE, 2007.
Parallel computations in Java with PCJ library
  • M Nowicki
  • P Bała
Nowicki, M. and Bała, P.: Parallel computations in Java with PCJ library. In: 2012 International Conference on High Performance Computing and Simulation (HPCS), pages 381-387. IEEE, 2012.
Opracowanie nowych metod programowania równoległego w Javie w oparciu o paradygmat PGAS (Partitioned Global Address Space)
  • M Nowicki
Nowicki. M.: Opracowanie nowych metod programowania równoległego w Javie w oparciu o paradygmat PGAS (Partitioned Global Address Space). PhD thesis, University of Warsaw 2015 http://ssdnm. mimuw.edu.pl/pliki/prace-studentow/st/pliki/marek-nowicki-d.pdf [Accessed: 15.05.2018.]
Big Data analytics in Java with PCJ library -performance comparison with Hadoop
  • M Nowicki
  • M Ryczkowska
  • Ł Górski
  • P Bała
Nowicki, M., Ryczkowska, M., Górski, Ł., Bała, P.: Big Data analytics in Java with PCJ library -performance comparison with Hadoop. In: International Conference on Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science, vol 10778. Springer, Cham, pp 318-327 (2018)
Fault-Tolerance Mechanisms for the Java Parallel Codes Implemented with the PCJ Library
  • M Szynkiewicz
  • M Nowicki
Szynkiewicz, M., Nowicki, M.: Fault-Tolerance Mechanisms for the Java Parallel Codes Implemented with the PCJ Library. In: International Conference on Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science, vol 10778. Springer, Cham, pp. 298-307 (2018)
Parallel Differential Evolution in the PGAS Programming Model Implemented with PCJ Java Library
  • Ł Górski
  • F Rakowski
  • P Bała
Górski, Ł., Rakowski, F., Bała, P.: Parallel Differential Evolution in the PGAS Programming Model Implemented with PCJ Java Library. In: International Conference on Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science, vol 9573. Springer, Cham, pp. 448-458 (2016)
Massively Parallel Sequence Alignment with BLAST through Work Distribution Implemented using PCJ Library
  • M Nowicki
  • D Bzhalava
  • P Bała
Nowicki, M., Bzhalava, D., Bała, P.: Massively Parallel Sequence Alignment with BLAST through Work Distribution Implemented using PCJ Library. In: International Conference on Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science, vol 10393. Springer, Cham, pp. 503-512 (2017)
Massively Parallel Implementation of Sequence Alignment with BLAST Using PCJ Library
  • M Nowicki
  • D Bzhalava
  • P Bała
Nowicki, M., Bzhalava, D., Bała, P.: Massively Parallel Implementation of Sequence Alignment with BLAST Using PCJ Library. J. Comput. Biol. (in press) (2018)
One VM to rule them all. In: Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software
  • T Wrthinger
  • C W A Wimmer
  • L Stadler
  • G Duboscq
  • C Humer
  • G Richards
  • D Simon
  • M Wolczko
Parallel Java: A unified API for shared memory and cluster parallel programming in 100% Java. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
  • A Kaminsky
  • Parallel Java
Level-synchronous BFS algorithm implemented in Java using PCJ Library. In: International Conference on Computational Science and Computational Intelligence (CSCI)
  • M Ryczkowska
  • M Nowicki
  • P Bala
Java Grande Project: benchmark suitez
  • Java Grande
Titanium Language Reference Manual U.C. Berkeley Tech Report, UCB/EECS-2005-15, 2005
  • P Hilfinger
  • D Bonachea
  • K Datta
  • D Gay
  • S Graham
  • B Liblit
  • G Pike
  • J Su
  • K Yelick
2012 International Conference on High Performance Computing and Simulation (HPCS)
Performance evaluation of parallel computing and Big Data processing with Java and PCJ library. Cray Users Group
  • M Nowicki
  • L Górski
  • P Bala
PCJ -a Java library for heterogenous parallel computing
  • M Nowicki
  • M Ryczkowska
  • Ł Górski
  • M Szynkiewicz
  • P Bała
Nowicki, M., Ryczkowska, M., Górski, Ł., Szynkiewicz, M., Bała, P.: PCJ -a Java library for heterogenous parallel computing. In: Recent Advances in Information Science (Recent Advances in Computer Engineering Series vol 36), WSEAS Press, pp. 66-72 (2016)
Evaluation of the parallel performance of the Java and PCJ on the Intel KNL based systems
  • M Nowicki
  • Ł Górski
  • P Bała
Nowicki, M., Górski, Ł., Bała, P.: Evaluation of the parallel performance of the Java and PCJ on the Intel KNL based systems. In: International Conference on Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science, vol 10778. Springer, Cham, pp. 288-297 (2018)