Dirk Schmidl

Dirk Schmidl
RWTH Aachen University · IT Center

Dr. rer. nat.

About

28
Publications
21,461
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
721
Citations
Additional affiliations
July 2009 - June 2016
RWTH Aachen University
Position
  • Research Assistant

Publications

Publications (28)
Conference Paper
A future large-scale high-performance computing (HPC) cluster will likely be power capped since the surrounding infrastructure like power supply and cooling is constrained. For such a cluster, it may be impossible to supply thermal design power (TDP) to all components. The default power supply of current system guarantees TDP to each computing node...
Conference Paper
Intel’s Knights Landing processor (KNL) is the latest product in the Xeon Phi product line. As a self-hosted system it is the first commercially available many-core architecture which can run unmodified applications. This makes KNL a very interesting option for HPC centers which have to support many different applications including community and IS...
Conference Paper
Full-text available
The tasking feature enriches OpenMP by a method to express parallelism in a more general way than before, as it can be applied to loops but also to recursive algorithms without the need of nested parallel regions. However, the performance of a tasking program is very much influenced by the task scheduling inside the OpenMP runtime. Especially on la...
Conference Paper
Next-generation sequencing techniques reduced the cost of sequencing a genome rapidly, but came with a relatively high error rate. Therefore, error correction of this data is a necessary task before assembly can take place. Since the input data is huge and error correction is compute intensive, parallelizing this work on a modern shared-memory syst...
Conference Paper
Full-text available
Modern processors contain a lot of features to reduce the energy consumption of the chip. The gain of these features highly depends on the workload which is executed. In this work, we investigate the energy consumption of OpenMP applications on the new Intel processor generation, called Haswell. We start with the basic chip characteristics of the c...
Conference Paper
Full-text available
Extended Abstract can be found here: http://sc14.supercomputing.org/sites/all/themes/sc14/files/archive/tech_poster/tech_poster_pages/post108.html
Conference Paper
Full-text available
OpenMP 4.0 extended affinity support to allow pinning of threads to places. Places are an abstraction of machine locations which in many cases do not require extensive hardware knowledge by the user. For memory affinity, i.e. data initialization and migration on NUMA systems, support is still missing in OpenMP. In this work we present an extension...
Conference Paper
Full-text available
OpenMP is one of the most widely used standards for enabling thread-level parallelism in high performance computing codes. The recently released version 4.0 of the specification introduces directives that enable application developers to offload portions of the computation to massively-parallel target devices. However, to efficiently utilize these...
Conference Paper
Full-text available
In 2008 task based parallelism was added to OpenMP as the major update for version 3.0. Tasks provide an easy way to express dynamic parallelism in OpenMP applications. However, achieving a good performance with OpenMP task-parallel programs is a challenging task. OpenMP runtime systems are free to schedule, interrupt and resume tasks in many diffe...
Conference Paper
Different types of shared memory machines with large core counts exist today. Standard x86-based servers are build with up to eight sockets per machine. To obtain larger machines, some companies, like SGI or Bull, invented special interconnects to couple a bunch of small servers into one larger SMP, Scalemp uses a special software layer on top of a...
Conference Paper
Full-text available
The Intel Xeon Phi has been introduced as a new type of compute accelerator that is capable of executing native x86 applications. It supports programming models that are well-established in the HPC community, namely MPI and OpenMP, thus removing the necessity to refactor codes for using accelerator-specific programming paradigms. Because of its nat...
Conference Paper
Full-text available
Parallel programming and performance optimization of parallel programs are not simple tasks. Various HPC and OpenMP courses as well as literature serve as introduction to this topic. Assuming the role of HPC beginners we evaluate how far the knowledge acquired from introductory courses and literature can drive performance optimization of a conjugat...
Conference Paper
Full-text available
With the task construct, the OpenMP 3.0 specification introduces an additional level of parallelism that challenges established schemes of performance profiling. First, a thread may execute a sequence of interleaved task fragments the profiling system must properly distinguish to enable correct performance analyses. Furthermore, the additional para...
Conference Paper
Full-text available
The multicore era has led to a renaissance of shared memory parallel programming models. Moreover, the introduction of task-level parallelization raises the level of abstraction compared to thread-centric expression of parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be controlled and non-local data is a...
Conference Paper
Full-text available
Version 3.0 of the OpenMP specification introduced the task construct for the explicit expression of dynamic task parallelism. Although automated load-balancing capabilities make it an attractive parallelization approach for programmers, the difficulty of integrating this new dimension of parallelism into traditional models of performance data has...
Conference Paper
Full-text available
The introduction of task-level parallelization promises to raise the level of abstraction compared to thread-centric expression of parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be maintained. In contrast to traditional OpenMP worksharing constructs for which threads can be bound, the behavior of tasks...
Chapter
Full-text available
This paper gives an overview about the Score-P performance measure-ment infrastructure which is being jointly developed by leading HPC performance tools groups. It motivates the advantages of the joint undertaking from both, the de-veloper and the user perspectives, and presents the design and components of the newly developed Score-P performance m...
Chapter
Full-text available
ScaleMP's vSMP software turns commodity Infiniband clusters with Intel's x86 processors into large shared memory machines providing a single system image at low cost. However, codes need to be tuned to deliver good performance on these machines. TrajSearch, developed at the Institute for Combustion Technology at RWTH Aachen University, is a post-pr...
Chapter
Full-text available
The rapidly growing number of cores on modern supercomputers imposes scalability demands not only on applications but also on the software tools needed for their development. At the same time, increasing application and system complexity makes the optimization of parallel codes more difficult, creating a need for scalable performance-analysis techn...
Conference Paper
Full-text available
Today most multi-socket shared memory systems exhibit a non- uniform memory architecture (NUMA). However, programming models such as OpenMP do not provide explicit support for that. To overcome this limitation, we propose a platform-independent approach to describe the system topology and to place threads on the hardware. A distance matrix provides...
Conference Paper
Full-text available
With version 3.0, the OpenMP specification introduced a task construct and with it an additional dimension of concurrency. While offering a convenient means to express task parallelism, the new construct presents a serious challenge to event-based performance analysis. Since tasking may disrupt the classic sequence of region entry and exit events,...
Conference Paper
Full-text available
The novel ScaleMP vSMP architecture employs commodity x86-based servers with an InfiniBand network to assemble a large shared memory system at an attractive price point. We examine this combined hardware- and software-approach of a DSM system using both system-level kernel benchmarks as well as real-world application codes. We compare this architec...
Conference Paper
Full-text available
In this work we discuss the performance problems of nested OpenMP programs concerning thread and data locality particularly on cc-NUMA architectures. We provide a user friendly solution and demonstrate its benefits by comparing the performance of some kernel benchmarks and some real-world applications with and without applying our affinity optimiza...
Conference Paper
MPI and OpenMP are the de-facto standards for distributed-memory and shared-memory parallelization, respectively. By employing a hybrid approach, that is combing OpenMP and MPI parallelization in one program, a cluster of SMP systems can be exploited. Nevertheless, mixing programming paradigms and writing explicit message passing code might increas...
Article
Full-text available
The slogan of last year's International Workshop on OpenMP was "A Practical Programming Model for the Multi-Core Era", although OpenMP still is fully hardware architecture agnostic. As a consequence the programmer is left alone with bad performance if threads and data happen to live apart. In this work we examine the programmer's possibil-ities to...

Projects

Project (1)
Archived project
The Performance Optimisation and Productivity Centre of Excellence in Computing Applications provides performance optimisation and productivity services for academic AND industrial code(s) in all domains! The services are free of charge to organisations in the EU! More information on http://pop-coe.eu