Guido Juckeland

Guido Juckeland
Helmholtz-Zentrum Dresden-Rossendorf | HZDR · Department of Information Services and Computing

Dr.-Ing.

About

56
Publications
9,329
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
662
Citations
Additional affiliations
February 2016 - present
Helmholtz-Zentrum Dresden-Rossendorf
Position
  • Head of Department
Description
  • I am heading the newly founded Computational Science Department in the IT Department of HZDR. We are working on better connecting the scientists in the Center with the central IT resources though education and collaboration.
December 2005 - January 2016
Technische Universität Dresden
Position
  • Hardware Accelerator Group, IT Architect
Education
February 2008 - March 2013
Technische Universität Dresden
Field of study
  • Computer Science
October 2000 - November 2005
Technische Universität Dresden
Field of study
  • Information System Technology

Publications

Publications (56)
Preprint
Full-text available
Modern HPC systems are built with innovative system architectures and novel programming models to further push the speed limit of computing. The increased complexity poses challenges for performance portability and performance evaluation. The Standard Performance Evaluation Corporation -SPEC has a long history of producing industry standard benchma...
Preprint
Full-text available
To satisfy the principles of FAIR software, software sustainability and software citation, research software must be formally published. Publication repositories make this possible and provide published software versions with unique and persistent identifiers. However, software publication is still a tedious, mostly manual process. To streamline so...
Article
Full-text available
Ultrafast X-ray computed tomography (UFXCT) is a fast tomographic imaging technique based on the principle of electron beam scanning. It is used for the investigation of transient multiphase flows. A UFXCT scanner comprises of multiple detector modules generating gigabytes of raw data per second for imaging rates of up to 8,000 frames per second. D...
Preprint
Full-text available
HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the requir...
Book
This book constitutes the proceedings of the 7th International Workshop on Accelerator Programming Using Directives, WACCPD 2020, which took place on November 20, 2021. The workshop was initially planned to take place in Atlanta, GA, USA, and changed to an online format due to the COVID-19 pandemic. WACCPD is one of the major forums for bringing to...
Technical Report
Full-text available
Three kernels, Current Deposition (also known as Compute Current), Particle Push (Move and Mark), and Shift Particles are known to be some of the most time-consuming kernels in PIConGPU. The Current Deposition kernel and Particle Push kernel both set up the particle attributes for running any physics simulation with PIConGPU, so it is crucial to im...
Technical Report
Full-text available
This is a technical report that summarizes findings on the analysis of PIConGPU's three most intensive kernels by using NVProf Profiler tool and Summit system at the Oak Ridge National Lab (ORML). The kernels, Current Deposition (also known as Compute Current), Particle Push (Move and Mark), and Shift Particles are known to be some of the biggest k...
Book
This book constitutes the refereed proceedings of 3 workshops co-located with International Conference for High Performance Computing, Networking, Storage, and Analysis, SC19, held in Denver, CO, USA, in November 2019. The 12 full papers presented in this proceedings feature the outcome of the 6th Annual Workshop on HPC User Support Tools, HUST 20...
Book
This book constitutes the refereed proceedings of the 35th International Conference on High Performance Computing, ISC High Performance 2020, held in Frankfurt/Main, Germany, in June 2020.* The 27 revised full papers presented were carefully reviewed and selected from 87 submissions. The papers cover a broad range of topics such as architectures, n...
Book
This book constitutes the refereed post-conference proceedings of 10 workshops held at the 35th International ISC High Performance 2020 Conference, in Frankfurt, Germany, in June 2020: First Workshop on Compiler-assisted Correctness Checking and Performance Optimization for HPC (C3PO); First International Workshop on the Application of Machine Lear...
Conference Paper
Full-text available
Software engineering (SWE) for modeling, simulation, and data analytics for computational science and engineering (CSE) is challenging, with ever-more sophisticated, higher fidelity simulation of ever-larger, more complex problems involving larger data volumes, more domains, and more researchers. Targeting both commodity and custom high-end compute...
Book
This book constitutes the refereed post-conference proceedings of 13 workshops held at the 34th International ISC High Performance 2019 Conference, in Frankfurt, Germany, in June 2019: HPC I/O in the Data Center (HPC-IODC), Workshop on Performance & Scalability of Storage Systems (WOPSSS), Workshop on Performance & Scalability of Storage Systems (W...
Book
This book constitutes the refereed proceedings of the 34th International Conference on High Performance Computing, ISC High Performance 2019, held in Frankfurt/Main, Germany, in June 2019. The 17 revised full papers presented were carefully reviewed and selected from 70 submissions. The papers cover a broad range of topics such as next-generation h...
Book
This book constitutes the refereed post-conference proceedings of the 5th International Workshop on Accelerator Programming Using Directives, WACCPD 2018, held in Dallas, TX, USA, in November 2018. The 6 full papers presented have been carefully reviewed and selected from 12 submissions. The papers share knowledge and experiences to program emergin...
Article
This article highlights the Oak Ridge Leadership Compute Facilitys GPU Hackathon, presenting the training format used, trends observed, and reasons for teams successes and failures. It also summarizes participant outcomes and takeaways while demonstrating how educators could adopt this hackathon format for use in their respective institutions.
Chapter
The purpose of this chapter is to familiarize the reader with the concept of evolutionary performance improvement and the tools involved when adding other parallelization paradigms to OpenACC applications. Such hybrid applications can suffer from a number of performance bottlenecks and a holistic picture of all activities during the application run...
Conference Paper
The OLCF GPU Hackathons are a one-week code- development/learning event to better enable attendees to utilize GPUs. It only took three years to grow from a “Let’s give this a try”-event to a repeatedly copied format with several spin- offs that inspired HPC centers around the world. Sticking to a few fundamental principles—work on your own code, le...
Conference Paper
Full-text available
With the rise of accelerators in high performance computing, programming models for the development of heterogeneous applications have evolved and are continuously being improved to increase program performance and programmer productivity. The concept of computation offloading to massively parallel compute devices has established itself as a new la...
Book
Scientists and technical professionals can use OpenACC to leverage the immense power of modern GPUs without the complexity traditionally associated with programming them. OpenACC™ for Programmers is one of the first comprehensive and practical overviews of OpenACC for massively parallel programming. This book integrates contributions from 19 leadi...
Article
Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas–liquidflows in pipelines or chemical reactors. At Helmholtz-Zentr...
Article
Full-text available
Neisseria gonorrhoeae is the causative agent of one of the most common sexually transmitted diseases, gonorrhea. Over the past two decades there has been an alarming increase of reported gonorrhea cases where the bacteria were resistant to the most commonly used antibiotics thus prompting for alternative antimicrobial treatment strategies. The cruc...
Article
Full-text available
The use of accelerators in heterogeneous systems is an established approach in designing petascale applications. Today, Compute Unified Device Architecture (CUDA) offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both the CPU and the GPU. From this increasing program com...
Conference Paper
With the appearance of the heterogeneous platform OpenPower, many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting ev...
Conference Paper
Current and next generation HPC systems will exploit accelerators and self-hosting devices within their compute nodes to accelerate applications. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. One of the goals of OpenMP and OpenACC is to allow the user to specify pa...
Conference Paper
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchi...
Article
Full-text available
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchi...
Chapter
A fundamental interest in application development for high performance computing (HPC) is a close-to-optimal execution efficiency. To systematically achieve this, it is reasonable to use performance analysis tools that provide an insight into the execution of a program. Programming models that also specify tool interfaces enable the design of robus...
Code
This is the archive containing the software used for evaluations in the publication "Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond" submitted to the international workshop on OpenPOWER for HPC 2016. The archive has the following content: PIConGPU Kelvin-Helmholtz Simulation code (picongpu-alpaka/): Remo...
Conference Paper
The OpenACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or OpenCL. Details are implicit in the programming model a...
Conference Paper
The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performa...
Article
Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated pe...
Conference Paper
Full-text available
Hybrid nodes with hardware accelerators are becoming very common in systems today. Users often find it di to characterize and understand the performance advantage of such accelerators for their applications. The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelera...
Conference Paper
Full-text available
We present a particle-in-cell simulation of the relativistic Kelvin-Helmholtz Instability (KHI) that for the first time delivers angularly resolved radiation spectra of the particle dynamics during the formation of the KHI. This enables studying the formation of the KHI with unprecedented spatial, angular and spectral resolution. Our results are of...
Thesis
Full-text available
Hardware accelerators have changed the HPC landscape as they open a potential route to exascale computing. At the same time they also complicate the tasl of application development since they introduce another level of parallelism and, thus, complexity. Performance tool support to aid the developer will be a necessity. While profiling is offered by...
Article
The advent of multi-core processors has made parallel computing techniques mandatory on mainstream systems. With the recent rise in hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. The inner workings of a parallel program are usually difficult to understand and verify. This p...
Conference Paper
Full-text available
The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools to deliver high-performing applications. This paper studies the problems associated with performance measurement of heterogeneous machines with GPUs. A heterogeneous computation model and alternative host-GP...
Article
The particle-in-cell (PIC) algorithm is one of the most widely used algorithms in computational plasma physics. With the advent of graphical processing units (GPUs), large-scale plasma simulations on inexpensive GPU clusters are in reach. We present an implementation of a fully relativistic plasma PIC algorithm for GPUs based on the NVIDIA CUDA lib...
Conference Paper
New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerato...
Conference Paper
The advent of multi-core processors has made parallel computing techniques mandatory on main stream systems. With the recent rise of hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. This article presents a tool for graphical program flow analysis of hardware accelerated paral...
Conference Paper
Full-text available
Not long ago using the graphics chip as a co-processor to push performance of a single workstation to 1~TFLOP/s was unthinkable. Only the really gifted could program it after many years of practice. Today the users actually have the choice between two vendors (NVIDIA and AMD) and, thus, two different ways of using them. In this paper we compare bot...
Conference Paper
Full-text available
The goal of the Cluster Challenge is to design, build and operate a compute cluster. Although it is an artificial environment for cluster computing, many of its key constraints on operation of cluster systems are important to real world scenarios: high energy efficiency, reliability and scalability. In this paper, we describe our approach to accomp...
Conference Paper
Vampir 7 is a performance visualization tool that provides a comprehensive view on the runtime behavior of parallel programs. It is a new member of the Vampir tool family. This new generation of performance visualizer combines state-of-the-art parallel data processing techniques with an all-new graphical user interface experience. This includes fas...
Article
Full-text available
The goal of the Cluster Challenge is to design, build and operate a compute cluster. Although it is an artificial environment for cluster computing, many of its key constraints on operation of cluster systems are important to real world scenarios: high energy efficiency, reliability and scalability. In this paper, we describe our approach to accomp...
Conference Paper
Full-text available
Although common sense says that all nodes of a cluster should behave identically since they consist of exactly the same hardware parts and are running the same software, experience tells otherwise. We present a collection of programs and tools that were gathered over several years during various cluster installations at different sites with cluster...
Article
Full-text available
The SGI Altix system architecture allows to support very large ccNUMA shared memory systems. Nevertheless, the system layout sets boundaries to the sustained memory perfor-mance which can only be avoided by selecting the "right" data access strategies. The paper presents the results of cache and memory performance studies on SGI Altix 350. It demon...
Article
The BenchIT kernels generate a large amount of measurement results in dependence of the number of functional arguments. Using the web interface, the user is given the chance to show the selected results of different measuring programs in only one coordinate system. Often there are different reasons they can cause characteristic minima, maxima, or a...
Conference Paper
Full-text available
Understanding performance of modern system architectures is an always present and challenging task. BenchIT - a new tool to support the collection and presentation of such measurement data - is developed by the Center for High Performance Computing Dresden.

Network

Cited By

Projects

Projects (3)