About
69
Publications
11,532
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
836
Citations
Citations since 2017
Introduction
Additional affiliations
December 2005 - January 2016
Education
February 2008 - March 2013
October 2000 - November 2005
Publications
Publications (69)
Data repositories like Zenodo have a limited list of metadata to search for. Metadata catalogue are designed to provide a community-specific parameters search, but their deployment has just started. These catalogues require metadata standards for interoperability, which in turn are often in development for many communities. To support publications...
When dealing with research data management, researchers at Helmholtz- Zentrum Dresden – Rossendorf (HZDR) face a variety of systems and tools. These range from the project planning phase (proposal management, data management plans and policies), over documentation during the experiment or simulation campaign, to the publication (collaborative autho...
The current-voltage characteristics of a single-molecule junction are determined by the electronic coupling Γ between the electronic states of the electrodes and the dominant transport channel(s) of the molecule. Γ is profoundly affected by the choice of the anchoring groups and their binding positions on the tip facets and the tip-tip separation....
In this article, we introduce a parallel algorithm for connected-component analysis (CCA) on GPUs which drastically reduces the volume of data to transfer from GPU to the host. CCA algorithms targeting GPUs typically store the extracted features in arrays large enough to potentially hold the maximum possible number of objects for the given image si...
Software as an important method and output of research should follow the RDA "FAIR for Research Software Principles". In practice, this means that research software, whether open, inner or closed source, should be published with rich metadata to enable FAIR4RS. For research software practitioners, this currently often means following an arduous and...
Publishing your research software in a publication repository is the first step on the path to making your software FAIR! But the publication of just the software itself is not quite enough: To truly enable findability, accessibility and reproducibility, as well as making your software correctly citable and unlock credit for your work, your softwar...
Software as an important method and output of research should follow the RDA "FAIR for Research Software Principles". In practice, this means that research software, whether open, inner or closed source, should be published with rich metadata to enable FAIR4RS. For research software practitioners, this currently often means to follow an arduous and...
Modern HPC systems are built with innovative system architectures and novel programming models to further push the speed limit of computing. The increased complexity poses challenges for performance portability and performance evaluation. The Standard Performance Evaluation Corporation -SPEC has a long history of producing industry standard benchma...
Publication of research software is an important step in making software more discoverable. Ideally, rich metadata are published alongside software artifacts to further enable software comprehension, citation, and reuse ("FAIR software"). The provision of these metadata is currently often an arduous manual process, as is its curation. A new project...
To satisfy the principles of FAIR software, software sustainability and software citation, research software must be formally published. Publication repositories make this possible and provide published software versions with unique and persistent identifiers. However, software publication is still a tedious, mostly manual process. To streamline so...
To satisfy the principles of FAIR software, software sustainability and software citation, research software must be formally published. Publication repositories make this possible and provide published software versions with unique and persistent identifiers. However, software publication is still a tedious, mostly manual process. To streamline so...
HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the requir...
Ultrafast X-ray computed tomography (UFXCT) is a fast tomographic imaging technique based on the principle of electron beam scanning. It is used for the investigation of transient multiphase flows. A UFXCT scanner comprises of multiple detector modules generating gigabytes of raw data per second for imaging rates of up to 8,000 frames per second. D...
HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the requir...
This book constitutes the proceedings of the 7th International Workshop on Accelerator Programming Using Directives, WACCPD 2020, which took place on November 20, 2021. The workshop was initially planned to take place in Atlanta, GA, USA, and changed to an online format due to the COVID-19 pandemic.
WACCPD is one of the major forums for bringing to...
Three kernels, Current Deposition (also known as Compute Current), Particle Push (Move and Mark), and Shift Particles are known to be some of the most time-consuming kernels in PIConGPU. The Current Deposition kernel and Particle Push kernel both set up the particle attributes for running any physics simulation with PIConGPU, so it is crucial to im...
This is a technical report that summarizes findings on the analysis of
PIConGPU's three most intensive kernels by using NVProf Profiler tool and
Summit system at the Oak Ridge National Lab (ORML). The kernels, Current
Deposition (also known as Compute Current), Particle Push (Move and Mark), and
Shift Particles are known to be some of the biggest k...
This book constitutes the refereed proceedings of 3 workshops co-located with International Conference for High Performance Computing, Networking, Storage, and Analysis, SC19, held in Denver, CO, USA, in November 2019.
The 12 full papers presented in this proceedings feature the outcome of the 6th Annual Workshop on HPC User Support Tools, HUST 20...
This book constitutes the refereed proceedings of the 35th International Conference on High Performance Computing, ISC High Performance 2020, held in Frankfurt/Main, Germany, in June 2020.*
The 27 revised full papers presented were carefully reviewed and selected from 87 submissions. The papers cover a broad range of topics such as architectures, n...
This book constitutes the refereed post-conference proceedings of 10 workshops held at the 35th International ISC High Performance 2020 Conference, in Frankfurt, Germany, in June 2020:
First Workshop on Compiler-assisted Correctness Checking and Performance Optimization for HPC (C3PO); First International Workshop on the Application of Machine Lear...
Software engineering (SWE) for modeling, simulation, and data analytics for computational science and engineering (CSE) is challenging, with ever-more sophisticated, higher fidelity simulation of ever-larger, more complex problems involving larger data volumes, more domains, and more researchers. Targeting both commodity and custom high-end compute...
This book constitutes the refereed post-conference proceedings of 13 workshops held at the 34th International ISC High Performance 2019 Conference, in Frankfurt, Germany, in June 2019:
HPC I/O in the Data Center (HPC-IODC), Workshop on Performance & Scalability of Storage Systems (WOPSSS), Workshop on Performance & Scalability of Storage Systems (W...
In the original version of this LNCS volume, four papers were erroneously released as open access papers. This has been corrected to only two papers – papers 5 and 7.
This book constitutes the refereed proceedings of the 34th International Conference on High Performance Computing, ISC High Performance 2019, held in Frankfurt/Main, Germany, in June 2019.
The 17 revised full papers presented were carefully reviewed and selected from 70 submissions. The papers cover a broad range of topics such as next-generation h...
This book constitutes the refereed post-conference proceedings of the 5th International Workshop on Accelerator Programming Using Directives, WACCPD 2018, held in Dallas, TX, USA, in November 2018.
The 6 full papers presented have been carefully reviewed and selected from 12 submissions. The papers share knowledge and experiences to program emergin...
This article highlights the Oak Ridge Leadership Compute Facilitys GPU Hackathon, presenting the training format used, trends observed, and reasons for teams successes and failures. It also summarizes participant outcomes and takeaways while demonstrating how educators could adopt this hackathon format for use in their respective institutions.
The purpose of this chapter is to familiarize the reader with the concept of evolutionary performance improvement and the tools involved when adding other parallelization paradigms to OpenACC applications. Such hybrid applications can suffer from a number of performance bottlenecks and a holistic picture of all activities during the application run...
The OLCF GPU Hackathons are a one-week code- development/learning event to better enable attendees to utilize GPUs. It only took three years to grow from a “Let’s give this a try”-event to a repeatedly copied format with several spin- offs that inspired HPC centers around the world. Sticking to a few fundamental principles—work on your own code, le...
With the rise of accelerators in high performance computing, programming models for the development of heterogeneous applications have evolved and are continuously being improved to increase program performance and programmer productivity. The concept of computation offloading to massively parallel compute devices has established itself as a new la...
Scientists and technical professionals can use OpenACC to leverage the immense power of modern GPUs without the complexity traditionally associated with programming them. OpenACC™ for Programmers is one of the first comprehensive and practical overviews of OpenACC for massively parallel programming.
This book integrates contributions from 19 leadi...
Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas–liquidflows in pipelines or chemical reactors. At Helmholtz-Zentr...
Neisseria gonorrhoeae is the causative agent of one of the most common sexually transmitted diseases, gonorrhea. Over the past two decades there has been an alarming increase of reported gonorrhea cases where the bacteria were resistant to the most commonly used antibiotics thus prompting for alternative antimicrobial treatment strategies. The cruc...
The use of accelerators in heterogeneous systems is an established approach in designing petascale applications. Today, Compute Unified Device Architecture (CUDA) offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both the CPU and the GPU. From this increasing program com...
With the appearance of the heterogeneous platform OpenPower, many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting ev...
Current and next generation HPC systems will exploit accelerators and self-hosting devices within their compute nodes to accelerate applications. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. One of the goals of OpenMP and OpenACC is to allow the user to specify pa...
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform.
The Alpaka library defines and implements an abstract hierarchi...
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchi...
A fundamental interest in application development for high performance computing (HPC) is a close-to-optimal execution efficiency. To systematically achieve this, it is reasonable to use performance analysis tools that provide an insight into the execution of a program. Programming models that also specify tool interfaces enable the design of robus...
This is the archive containing the software used for evaluations in the publication "Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond" submitted to the international workshop on OpenPOWER for HPC 2016. The archive has the following content: PIConGPU Kelvin-Helmholtz Simulation code (picongpu-alpaka/): Remo...
The OpenACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or OpenCL. Details are implicit in the programming model a...
The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performa...
Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated pe...
Hybrid nodes with hardware accelerators are becoming very common in systems today. Users often find it di to characterize and understand the performance advantage of such accelerators for their applications. The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelera...
We present a particle-in-cell simulation of the relativistic Kelvin-Helmholtz Instability (KHI) that for the first time delivers angularly resolved radiation spectra of the particle dynamics during the formation of the KHI. This enables studying the formation of the KHI with unprecedented spatial, angular and spectral resolution. Our results are of...
Hardware accelerators have changed the HPC landscape as they open a potential route to exascale computing. At the same time they also complicate the tasl of application development since they introduce another level of parallelism and, thus, complexity. Performance tool support to aid the developer will be a necessity. While profiling is offered by...
The advent of multi-core processors has made parallel computing techniques mandatory on mainstream systems. With the recent rise in hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. The inner workings of a parallel program are usually difficult to understand and verify. This p...
The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools to deliver high-performing applications. This paper studies the problems associated with performance measurement of heterogeneous machines with GPUs. A heterogeneous computation model and alternative host-GP...
The particle-in-cell (PIC) algorithm is one of the most widely used algorithms in computational plasma physics. With the advent of graphical processing units (GPUs), large-scale plasma simulations on inexpensive GPU clusters are in reach. We present an implementation of a fully relativistic plasma PIC algorithm for GPUs based on the NVIDIA CUDA lib...
New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerato...
The advent of multi-core processors has made parallel computing techniques mandatory on main stream systems. With the recent rise of hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. This article presents a tool for graphical program flow analysis of hardware accelerated paral...
Not long ago using the graphics chip as a co-processor to push performance of a single workstation to 1~TFLOP/s was unthinkable. Only the really gifted could program it after many years of practice. Today the users actually have the choice between two vendors (NVIDIA and AMD) and, thus, two different ways of using them. In this paper we compare bot...
The goal of the Cluster Challenge is to design, build and operate a compute cluster. Although it is an artificial environment for cluster computing, many of its key constraints on operation of cluster systems are important to real world scenarios: high energy efficiency, reliability and scalability. In this paper, we describe our approach to accomp...
Vampir 7 is a performance visualization tool that provides a comprehensive view on the runtime behavior of parallel programs.
It is a new member of the Vampir tool family. This new generation of performance visualizer combines state-of-the-art parallel
data processing techniques with an all-new graphical user interface experience. This includes fas...
The goal of the Cluster Challenge is to design, build and operate a compute
cluster. Although it is an artificial environment for cluster computing, many
of its key constraints on operation of cluster systems are important to real
world scenarios: high energy efficiency, reliability and scalability. In this
paper, we describe our approach to accomp...
Although common sense says that all nodes of a cluster should behave identically since they consist of exactly the same hardware parts and are running the same software, experience tells otherwise.
We present a collection of programs and tools that were gathered over several years during various cluster installations at different sites with cluster...
The SGI Altix system architecture allows to support very large ccNUMA shared memory systems. Nevertheless, the system layout sets boundaries to the sustained memory perfor-mance which can only be avoided by selecting the "right" data access strategies. The paper presents the results of cache and memory performance studies on SGI Altix 350. It demon...
The BenchIT kernels generate a large amount of measurement results in dependence of the number of functional arguments. Using the web interface, the user is given the chance to show the selected results of different measuring programs in only one coordinate system. Often there are different reasons they can cause characteristic minima, maxima, or a...
Understanding performance of modern system architectures is an always present and challenging task. BenchIT - a new tool to support the collection and presentation of such measurement data - is developed by the Center for High Performance Computing Dresden.