Carsten Clauss

Carsten Clauss
ParTec AG · FZ Jülich Office

Dr.-Ing. / MBA

About

45
Publications
3,650
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
285
Citations
Additional affiliations
April 2015 - July 2017
Hochschule Niederrhein
Position
  • Lecturer
Description
  • Laboratories on Object-oriented Application Development, and on Operating Systems
September 2016 - August 2018
FOM University of Applied Sciences for Economics and Management
Position
  • Lecturer
Description
  • Lectureships in Operating Systems, in Computer Networks, and in IT Infrastructure
September 2018 - March 2020
IU International University of Applied Sciences
Position
  • Professor
Description
  • Professorship in Informatics
Education
April 2010 - June 2012
University of Wales
Field of study
  • Financial Services
October 2004 - February 2007
University of Hagen
Field of study
  • Computer Science
January 2004 - December 2009
RWTH Aachen University
Field of study
  • Doctoral Studies in Computer Engineering

Publications

Publications (45)
Article
Full-text available
As a general rule, when writing parallel applications according to the MPI standard, the programmer does not need to worry about the underlying hardware topology. This is because the MPI standard intentionally hides the actual hardware topology from the application programmer for the seizure of portability, while at the same time burdening the MPI...
Article
Full-text available
The continuous growth of supercomputers is accompanied by increased complexity of the intra-node level and the interconnection topology. Consequently, the whole software stack ranging from the system software to the applications has to evolve, eg, by means of fault tolerance and support for the rising intra-node parallelism. Migration techniques ar...
Conference Paper
Hierarchy-awareness for message-passing has been around since the early 2000s with the emergence of SMP systems. Since then, many works dealt with the optimization of collective communication operations (so-called collectives) for such hierarchical topologies. However, until now, all these optimizations basically assume that the hierarchical topolo...
Conference Paper
In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the features that, being standard compliant, an MPI stack must support to provide the necessary fault tolerance guarantees, based on...
Chapter
Heading towards exascale, the challenges for process management with respect to flexibility and efficiency grow accordingly. Running more than one application simultaneously on a node can be the solution for better resource utilization. However, this approach of co-scheduling can also be the way to go for gaining a degree of flexibility with respec...
Conference Paper
Load balancing, maintenance, and energy efficiency are key challenges for upcoming supercomputers. An indispensable tool for the accomplishment of these tasks is the ability to migrate applications during runtime. Especially in HPC, where any performance hit is frowned upon, such migration mechanisms have to come with minimal overhead. This constra...
Conference Paper
Full-text available
Heading towards exascale, the challenges for process management with respect to flexibility and efficiency grow accordingly. Running more than one application simultaneously on a node can be the solution for better resource utilization. However, we believe that this approach of co-scheduling can also be the way to go for gaining a degree of process...
Conference Paper
The Peripheral Component Interconnect Express (PCIe) is the predominant interconnect enabling the CPU to communicate with attached input/output and storage devices. Considering its high performance and capabilities to connect different address domains via the so-called Non-Transparent Bridging (NTB) technology, it starts to be an alternative or add...
Book
This book constitutes thoroughly refereed post-conference proceedings of the workshops of the 19th International Conference on Parallel Computing, Euro-Par 2013, held in Aachen, Germany in August 2013. The 99 papers presented were carefully reviewed and selected from 145 submissions. The papers include seven workshops that have been co-located with...
Technical Report
Full-text available
This paper presents rckSock: A communication layer for the Intel SCC manycore processor that utilizes the SCC's shared on-chip memory and that does not involve the operating system for emulating a Socket-like communication programming interface. By means of this layer, common client/server-based applications can easily be ported to the SCC in a lig...
Conference Paper
The trend towards the integration of many cores per chip will raise the demand for new many-core architectures if established multi-core techniques such as hardware implemented cache-coherence limit scalability. Parallel applications especially with dynamically changing access pattern can benefit from software supported weaker memory consistency mo...
Article
Since the beginning of the multicore era, parallel processing has become prevalent across the board. On a traditional multicore system, a single operating system manages all cores and schedules threads and processes among them, inherently supported by hardware-implemented cache coherence protocols. However, a further growth of the number of cores p...
Conference Paper
Full-text available
The Parallel Random Access Machine (PRAM) model describes an abstract register machine for analyzing the complexity and scalability of parallel algorithms. Unfortunately, it is not possible to implement this model directly in hardware but it is at least possible to emulate this abstract model on more realistic parallel machines. Moreover, the recen...
Conference Paper
In this paper, we present a prototype implementation of the Multicore Communications API (MCAPI) for the Intel Single-Chip Cloud Computer (SCC). The SCC is a 48 core concept vehicle for future many-core systems that exhibit message-passing oriented architectures. The MCAPI specification, recently developed by the Multicore Association, resembles a...
Conference Paper
The growing number of cores per chip implies an increasing chip complexity, especially with respect to hardware-implemented cache coherence protocols. An attractive alternative for future many-core systems is to waive the hardware-based cache coherency and to introduce a software-oriented approach instead: a so-called Cluster-on-Chip architecture....
Conference Paper
Full-text available
In this paper, we present first successes with building an SCC-related shared virtual memory management system, called MetalSVM, that is implemented using a bare-metal hypervisor, located within a virtualization layer between the SCC's hardware and the operating system. The basic concept is based on a small kernel developed from scratch by the auth...
Conference Paper
The Single-Chip Cloud Computer (SCC) experimental processor is a 48-core concept vehicle created by Intel Labs as a platform for many-core software research. Intel provides a customized programming library for the SCC, called RCCE, that allows for fast message-passing between the cores. For that purpose, RCCE offers an application programming inter...
Conference Paper
Since the beginning of the multicore era, parallel processing has become prevalent across the board. On a traditional multicore system, a single operating system manages all cores and schedules threads and processes among them, inherently supported by hardware-implemented cache coherence protocols. However, a further growth of the number of cores p...
Conference Paper
In this contribution, we present our experiences gained while prototyping the MPIT Configuration and Performance Interface that is currently under discussion for being integrated into the next MPI standard. The work is based on an API draft that has been recently released by the MPI Tools Working Group [1]. As a use case, we have already developed...
Article
Full-text available
Schon seit jeher ist die mathematische Beschreibung und Analyse von physikalischen Vorgängen ein Kerngebiet aller naturwissenschaftlichen Disziplinen. Dabei gewann in den letzten Jahrzehnten durch das Aufkommen der Digitalrechner die numerische Analyse mit dem Ziel einer Simulation der Vorgänge immer mehr an Bedeutung. Die Anforderungen an immer ge...
Conference Paper
Full-text available
Nowadays, common systems in the area of high perfor- mance computing exhibit highly hierarchical architecture s. As a result, achieving satisfactory application performan ce demands an adaptation of the respective parallel algorithm to such systems. This, in turn, requires knowledge about the actual hardware structure even at the application level...
Conference Paper
In this contribution, we present a new library (that can be used in addition to any MPI library) which allows writing optimized applications for heterogeneous environments without the need of relying on intrinsic adaptation features provided by a particular MPI implementation. This is achieved by giving the application programmer the ability to int...
Conference Paper
Full-text available
When running large parallel applications with demands for resources that exceed the capacity the local computing site offers, the deployment in a distributed Grid environ- ment may help to satisfy these demands. However, since such an environment is a heterogeneous system by nature, there are some drawbacks that, if not taken into account, are limi...
Conference Paper
Full-text available
When writing parallel applications according to the MPI standard especially for hierarchical computing envi- ronments, the recognition of the underlying heterogeneous hardware structure at application level is not trivial at all. Although the MPI standard tries to support the application programmer with some process grouping and mapping fa- cilitie...
Conference Paper
Full-text available
Coupled clusters usually exhibit a heterogeneous but also hierarchical structure in terms of communication and computation. Therefore, it is inevitable to adapt parallel applications to such systems in order to gain reasonable performance results. Moreover, also regular benchmark tools are not capable of exposing the latent potential of such couple...
Conference Paper
MetaMPICH is an MPI implementation which allows the coupling of different computing resources to form a heterogeneous computing system called a meta computer. Such a coupled system may consist of multiple compute clusters, MPPs, and SMP servers, using different network technologies like Ethernet, SCI, and Myrinet. There are several other MPI libra...
Conference Paper
Full-text available
Running large MPI-applications with resource demands exceeding the local site's cluster capacity could be dis- tributed across a number of clusters in a Grid instead, to sat isfy the demand. However, there are a number of draw- backs limiting the applicability of this approach: communication paths between compute nodes of different clusters usually...
Conference Paper
Full-text available
Since the beginning of computational engineering, the numerical simulation of physical processes is an es- sential element in the area of high performance comput- ing. Thus, also the domain of metal foundry demands the computational simulation of casting and solidifica- tion processes. A popular software tool for this purpose has been developed by...
Conference Paper
Full-text available
Abstract Im Bereich der Clusterverbundsysteme führen Windows-basierte Systeme mit wenigen Ausnahmen [17] ein Nischendasein. Allerdings wird in der Zukunft jeder Bürokraft einer mittelständischen Firma durch die Entwicklung von Dual Core CPUs eine zweite Ausführungseinheit zur Verfügung stehen, die durch die Verwendung normaler Büroanwendungen kaum...
Conference Paper
Full-text available
Cluster systems built mainly from commodity hardware components have become more and more usable for high performance computing tasks in the past few years. To increase the parallelism for applications, it is often desirable to combine those clusters to a higher lever, commonly called metacomputer. This class of high performance computing platforms...

Network

Cited By