Predicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of message-passing applications on different systems by extracting a signature which will allow us to predict what system will allow the application to perform best. To achieve this goal, we have developed a method we called Parallel Application Signatures for Performance Prediction (PAS2P) that strives to describe an application based on its behavior. Based on the application's message-passing activity, we have been able to identify and extract representative phases, with which we created a Parallel Application Signature that has allowed us to predict the application's performance. We have experimented with different signature-extraction algorithms and found a reduction in the prediction error using different scientific applications on different clusters. We were able to predict execution times with an average accuracy of over 98%.
"The modules of DwarfCode include trace recording, trace merging, repeat compression and dwarf code generation. Although several related studies have been conducted and well-grounded in trace recording and code generation , , , , , , , , challenges remain W. Zhang is with the School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China. E-mail: firstname.lastname@example.org. "
[Show abstract][Hide abstract] ABSTRACT: We present DwarfCode, a performance prediction tool for MPI applications on diverse computing platforms. The goal is to accurately predict the running time of applications for task scheduling and job migration. First, DwarfCode collects the execution traces to record the computing and communication events. Then, it merges the traces from different processes into a single trace. After that, DwarfCode identifies and compresses the repeating patterns in the final trace to shrink the size of the events. Finally, a dwarf code is generated to mimic the original program behavior. This smaller running benchmark is replayed in the target platform to predict the performance of the original application. In order to generate such a benchmark, two major challenges are to reduce the time complexity of trace merging and repeat compression algorithms. We propose an O(mpn) trace merging algorithm to combine the traces generated by separate MPI processes , where m denotes the upper bound of tracing distance, p denotes the number of processes, and n denotes the maximum of event numbers of all the traces. More importantly, we put forward a novel repeat compression algorithm, whose time complexity is O(nlogn). Experimental results show that DwarfCode can accurately predict the running time of MPI applications. The error rate is below 10% for compute and communication intensive applications. This toolkit has been released for free download as a GNU General Public License v3 software.
IEEE Transactions on Computers 01/2015; DOI:10.1109/TC.2015.2417526 · 1.66 Impact Factor
"In this sense, performance tools become necessary to determine the most suitable system on which to execute the application. The Parallel Application Signature for Performance Prediction (PAS2P) methodology strives to solve the complexity of the behavior analysis making an application signature, which represents the relevant behavior of a message passing application. The signature is built by selecting a set of relevant parts (phases) and their weights, which are the frequency each phase repeats. "
[Show abstract][Hide abstract] ABSTRACT: Analyzing and predicting performance in parallel applications is a great challenge for scientific programmers due to its com- plexity. Analyzing parallel application behavior is not a trivial process and it requires spending a large amount of time and effort to understand the behavior of the application algorithms during execution. We have developed PAS2P toolkit from PAS2P methodology. This methodology strives to characterize the behavior of MPI applications to identify and extract repre- sentative phases and create a signature, which will be used to analyze the application behavior and predict its execution time in different target systems. Applying this methodology is a non-trivial process for users, for this reason we have developed the proposal toolkit, which allows users to make the whole process, from creating a signature to executing it on target systems, in user-space in an easy and fully automatic way. PAS2P toolkit has been validated, making clear the advantages of the signature, with its execution time being much lower than the whole application execution time (around 7% of the total execution time), with a high quality prediction of around 96%.
"Some example scenarios are shown in Fig. 2. In addition, we have equipped our models with a set of configuration parameters that allow users to modify the behavior and configuration of the simulated system. Some of these parameters enable the simulation of real-based scenarios from the execution traces of real programs ; as well as the inclusion of failure traces of real HPC systems . The most relevant parameters of the simulation models, summarized in Table 4, are: network topology; routing algorithm; traffic pattern; realprogram execution traces; real-system failure traces; packet size; link frequency/speed; and router buffer size. "
[Show abstract][Hide abstract] ABSTRACT: Nowadays, the study of high-performance computing (HPC) is one of the essential aspects of postgraduate pro-grammes in Computational Science. However, university education in HPC often suffers from a significant gap between theoretical concepts and the practical experience of students. To face this challenge, we have implemented an innovative teaching strategy to provide students appropriate resources to ease the assimilation of theoretical con-cepts, while improving their practical experience through the use of teaching tools and resources specifically designed to promote active learning. We have used the proposed strategy to organize the module of Parallel Computers and Architectures of the Master's in High-Performance Computing, at the Universitat Aut‘onoma de Barcelona, obtaining very promising results. In particular, we have observed improvements of both the academic marks of students and the perception about their own expertise and skills in HPC, regarding the previous teaching approach.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.