Sergei Gorlatch

University of Münster, Muenster, North Rhine-Westphalia, Germany

Are you Sergei Gorlatch?

Claim your profile

Publications (194)38.02 Total impact


  • No preview · Conference Paper · Sep 2015
  • Sergei Gorlatch · Tim Humernbrum
    [Show abstract] [Hide abstract]
    ABSTRACT: We aim at enhancing the Quality of Service (QoS) management in modern Internet applications that heavily rely on the quality of the underlying network. Our goal is to provide the application developers with mechanisms for specifying and controlling high-level, application-related QoS metrics, rather than the traditional low-level, network-related metrics like latency, throughput, packet loss, etc. We study this problem for the challenging class of Real-Time Online Interactive Applications (ROIA) which include, e.g., multiplayer online games and simulation-based e-learning and training. We leverage the dynamic management features of Software-Defined Networking (SDN) to express, monitor and control applications' QoS demands. Our particular contributions are as follows: 1) we propose a Northbound API for specifying the application-level QoS requirements in ROIA, 2) we demonstrate how the application-level metric 'response time' can be automatically translated into network-level metrics understood by the SDN controller, and 3) we report experimental results on managing application-level QoS in an example online game on an OpenFlow-enabled testbed.
    No preview · Article · Mar 2015
  • D. Meiländer · M. Lorke · S. Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider a challenging class of so-called Real-Time Online Interactive Applications (ROIA); popular examples include multi-player online computer games, e-learning and training based on real-time simulations. ROIA combine high demands on scalability and real-time user interactivity with the problem of an efficient and economic utilization of resources with heterogeneous network capabilities. This paper proposes a generic scalability model for ROIA that analyzes the application performance during runtime and predicts the demand for load balancing, i.e., whether and when to add/remove resources or redistribute workload. In an experimental evaluation, we prove the quality of our model by comparing the proposed workload distribution using our model against the optimal workload distribution for a multi-player online game.
    No preview · Article · Jan 2015
  • S. Gorlatch · M. Steuwer
    [Show abstract] [Hide abstract]
    ABSTRACT: Application development for modern high-performance systems with many cores, i.e., comprising multiple Graphics Processing Units (GPUs) and multi-core CPUs, currently exploits low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. In this paper, we advocate a high-level programming approach for such systems, which relies on the following two main principles: (a) the model is based on the current OpenCL standard, such that programs remain portable across various many-core systems, independently of the vendor, and all low-level code optimizations can be applied; (b) the model extends OpenCL with three high-level features which simplify many-core programming and are automatically translated by the system into OpenCL code. The high-level features of our programming model are as follows: (1) memory management is simplified and automated using parallel container data types (vectors and matrices); (2) a data (re)distribution mechanism supports data partitioning and generates automatic data movements between multiple GPUs; (3) computations are precisely and concisely expressed using parallel algorithmic patterns (skeletons). The well-defined skeletons allow for semantics-preserving transformations of SkelCL programs which can be applied in the process of program development, as well as in the compilation and optimization phase. We demonstrate how our programming model and its implementation are used to express several parallel applications, and we report first experimental results on evaluating our approach in terms of program size and target performance.
    No preview · Article · Jan 2015
  • Michael Haidl · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has to be written using a device-dependent, explicitly parallel programming model, e.g., OpenCL or CUDA. This leads to long, poorly structured and error-prone codes. In PACXX, both host and device programs are written in the same programming language -- the newest C++14 standard, with all modern features including type inference (auto), variadic templates, generic lambda expressions, as well as STL containers and algorithms. We implement PACXX by a custom compiler (based on the Clang front-end and LLVM IR) and a runtime system, that together perform major tasks of memory management and data synchronization automatically and transparently for the programmer. We evaluate our approach by comparing it to CUDA and OpenCL regarding program size and target performance.
    No preview · Conference Paper · Nov 2014
  • Fabian Dütsch · Karim Djelassi · Michael Haidl · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: The development of programs for modern systems with GPUs and other accelerators is a complex and error-prone task. The popular GPU programming approaches like CUDA and OpenCL require a deep knowledge of the underlying architecture to achieve good performance. We present HLSF -- a high-level framework that greatly simplifies the development of stencil-based applications on systems with accelerators. The main novel features of HLSF are as follows: 1) it provides a high-level interface for stencils that hides from the programmer the low-level management of the parallelism and memory on accelerators; 2) it allows the developer to write programs in the pure C++ style, using all convenient features of the most recent C++14 standard. Our experimental evaluation shows that the framework significantly reduces the programming effort for stencil-based applications, while delivering performance competitive to CUDA and OpenCL.
    No preview · Conference Paper · Oct 2014
  • Source
    Michel Steuwer · Michael Haidl · Stefan Breuer · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high-level programming abstractions with competitive performance on multi-GPU systems. SkelCL extends the OpenCL standard by three high-level features: 1) pre-implemented parallel patterns (a.k.a. skeletons); 2) container data types for vectors and matrices; 3) automatic data (re)distribution mechanism. We introduce two new SkelCL skeletons which specifically target stencil computations – MapOverlap and Stencil – and we describe their use for particular application examples, discuss their efficient parallel implementation, and report experimental results on systems with multiple GPUs. Our evaluation of three real-world applications shows that stencil code written with SkelCL is considerably shorter and offers competitive performance to hand-tuned OpenCL code.
    Full-text · Article · Sep 2014 · Parallel Processing Letters
  • Source
    Michael Olejnik · Michel Steuwer · Sergei Gorlatch · Dominik Heider
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. Availability and implementation: The source code can be downloaded at http://www.heiderlab.de Contact: d.heider@wz-straubing.de.
    Full-text · Article · Aug 2014 · Bioinformatics
  • Tim Humernbrum · Frank Glinka · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-Time Online Interactive Applications (ROIA), e.g., multiplayer online games and simulation-based e-learning, make high Quality of Service (QoS) demands on the underlying network. These demands depend on the number of users and the actual application state and, therefore, vary at runtime. Traditional networks have very limited possibilities of influencing the network behavior to meet the dynamic QoS demands, such that most ROIA use the underlying network on a best-effort basis. The emerging Software-Defined Networking (SDN) technology decouples the control and forwarding logic from the network infrastructure, making the network behavior programmable for applications. This paper analyses ROIA requirements on the underlying network and describes the specification and design of an SDN Northbound API that allows ROIA applications to specify their dynamic network requirements and to meet them at runtime using SDN networks.
    No preview · Article · Aug 2014
  • Source
    Michel Steuwer · Malte Friese · Sebastian Albers · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism and provide their efficient implementations, allowing the application developer to focus on the structure of algorithms, rather than on implementation details. This becomes especially important for modern parallel systems with multiple graphics processing units (GPUs) whose programming is complex and error-prone, because state-of-the-art programming approaches like CUDA and OpenCL lack high-level abstractions. We define a new algorithmic skeleton for allpairs computations which occur in real-world applications, ranging from bioinformatics to physics. We develop the skeleton’s generic parallel implementation for multi-GPU Systems in OpenCL. To enable the automatic use of the fast GPU memory, we identify and implement an optimized version of the allpairs skeleton with a customizing function that follows a certain memory access pattern. We use matrix multiplication as an application study for the allpairs skeleton and its two implementations and demonstrate that the skeleton greatly simplifies programming, saving up to 90 % of lines of code as compared to OpenCL. The performance of our optimized implementation is up to 6.8 times higher as compared with the generic implementation and is competitive to the performance of a manually written optimized OpenCL code.
    Full-text · Article · Aug 2014 · International Journal of Parallel Programming
  • Source
    Michel Steuwer · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Application development for modern high-performance systems with graphics processing units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. We present SkelCL—a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL makes three main enhancements to the OpenCL standard: (1) memory management is simplified using parallel container data types (vectors and matrices); (2) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability when using multiple GPUs; (3) computations are conveniently expressed using parallel algorithmic patterns (skeletons). We demonstrate how SkelCL is used to implement parallel applications, and we report experimental evaluation of our approach in terms of programming effort and performance.
    Full-text · Article · Jul 2014 · The Journal of Supercomputing
  • [Show abstract] [Hide abstract]
    ABSTRACT: Massively Multi-player Online Games (MMOG) are characterized by intensive interactions between many simultaneous users and real-time demands on Quality of Service (QoS). Other examples of similar, real-time online interactive applications include various virtual environments, as well as multi-pupil e-learning and simulation-based training courses. A highly desirable enhancement for MMOG is the use of mobile devices for accessing the game application. However, the limited computing power of mobile devices is an obstacle for implementing computation-intensive parts of MMOG, in particular graphics processing, on mobile devices. This paper proposes a novel runtime system for mobile MMOG and other similar applications that moves computation-intensive tasks, including graphics processing, from the mobile devices to Cloud resources. We report experimental results of our runtime system using a realistic multi-player online game.
    No preview · Conference Paper · Apr 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider a challenging class of highly interactive virtual environments, also known as Real-Time Online Interactive Applications (ROIA). Popular examples of ROIA include multi-player online computer games, e-learning and training applications based on real-time simulations, among others. An emerging enhancement for ROIA is the use of mobile devices for accessing the application (mobile ROIA). However, the limited computing power of mobile devices is an obstacle for implementing computation-intensive parts of ROIA, in particular graphics processing, on mobile devices. This paper proposes a runtime system for mobile ROIA that moves computation-intensive tasks, including graphics processing, from the mobile devices to Cloud resources. We report experimental results of our runtime system using a multi-player online game with real-world characteristics.
    Full-text · Conference Paper · Apr 2014
  • Sergei Gorlatch · Tim Humernbrum · Frank Glinka
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-Time Online Interactive Applications (ROIA), e.g., multiplayer online games and simulation-based e-learning, are emerging internet applications that make high Quality of Service (QoS) demands on the underlying network. These demands depend on the number of users and the actual application state and, therefore, vary at runtime. Traditional networks have very limited possibilities of influencing the network behaviour to meet the dynamic QoS demands, such that most ROIA use the underlying network on a best-effort basis. The emerging architecture of Software-Defined Networking (SDN) decouples the control and forwarding logic from the network infrastructure, making the network behaviour programmable for applications. This paper analyses ROIA requirements on the underlying network and describes the specification of an SDN Northbound API that allows ROIA applications to specify their dynamic network requirements and to meet them using SDN networks.
    No preview · Conference Paper · Feb 2014
  • T. Humernbrum · F. Glinka · S. Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider an emerging class of challenging Internet applications called Real-Time Online Interactive Applications (ROIA). Examples of ROIA are multiplayer online games, computation- and interaction-intensive training, simulation-based e-learning, etc. These applications make high QoS demands on the underlying network which involve the number of users and the actual application state and, therefore, vary at runtime. In traditional networks, the reconfiguration possibilities of the network to meet the dynamic QoS demands of ROIA are limited due to the lack of control of the network. The emerging architecture of Software-Defined Networking (SDN) decouples the control and forwarding logic from the network infrastructure, making it programmable for applications. This paper describes the specification, design and implementation of a novel Northbound API for developing ROIA that can use the advantages of SDN. We describe the basic architectural design of the SDN Module which implements the API functionality required by ROIA, and we report experimental testing and evaluation results for its prototype implementation.
    No preview · Article · Jan 2014
  • Source
    Michel Steuwer · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Application development for modern high-performance systems with Graphics Processing Units (GPUs) relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs.In this paper, we present SkelCL – a high-level programming model for systems with multiple GPUs and its implementa- tion as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel patterns (skeletons); 2) memory management is simplified using parallel container data types; 3) an automatic data (re)distribution mechanism allows for scalability when using multi-GPU systems.We use a real-world example from the field of medical imaging to motivate the design of our programming model and we show how application development using SkelCL is simplified without sacrificing performance: we were able to reduce the code size in our imaging example application by 50% while introducing only a moderate runtime overhead of less than 5%.
    Full-text · Article · Dec 2013 · Procedia Computer Science
  • Philipp Kegel · Michel Steuer · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern computer systems become increasingly distributed and heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the system’s full performance potential. In this paper, we present dOpenCL (distributed OpenCL)—a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL allows the user to run unmodified existing OpenCL applications in a heterogeneous distributed environment. We describe the challenges of implementing the OpenCL programming model for distributed systems, as well as its extension for running multiple applications concurrently. Using several example applications, we compare the performance of dOpenCL with MPI + OpenCL and standard OpenCL implementations.
    No preview · Article · Dec 2013 · Journal of Parallel and Distributed Computing
  • [Show abstract] [Hide abstract]
    ABSTRACT: The class of distributed Real-time Online Interactive Applications (ROIA) includes such important applications as Massively Multiplayer Online Games (MMOGs), as well as interactive e-Learning and simulation systems. These applications usually work in a persistent environment (also called world) which continues to exist and evolve also while the user is offline and away from the application. The challenge is how to efficiently make the world and the player characters persistent in the system over time. In this paper, we deal with storing persistent data of real-time interactive applications in modern relational databases. We analyze the major requirements on a system for persistency and we describe a preliminary design of the Entity Persistence Module (EPM) middleware which liberates the application developer from writing and maintaining complex and error-prone code for persistent data management. EPM automatically performs the mapping operations to store/retrieve the complex data to/from different types of relational databases, supports the management of persistent data in memory, and integrates it into the main loop of the ROIA client-server architecture.
    No preview · Conference Paper · Sep 2013
  • P. Kegel · M. Steuwer · S. Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Application programming for modern heterogeneous systems which comprise multi-core CPUs and multiple GPUs is complex and error-prone. Approaches like OpenCL and CUDA are relatively low-level as they require explicit handling of parallelism and memory, and they do not offer support for multiple GPUs within a stand-alone computer, nor for distributed systems that integrate several computers. In particular, distributed systems require application developers to use a mix of programming models, e.g., MPI together with OpenCL or CUDA. We propose a uniform, high-level approach for programming both stand-alone and distributed systems with many cores and multiple GPUs. The approach consists of two parts: 1) the dOpenCL runtime system for transparent execution of OpenCL programs on several stand-alone computers connected by a network, and 2) the SkelCL library for high-level application programming on heterogeneous stand-alone systems with multi-core CPUs and multiple GPUs. While dOpenCL provides transparent accessibility of arbitrary computing devices (multi-core CPUs and GPUs) across distributed systems, SkelCL offers a set of pre-implemented patterns (skeletons) of parallel computation and communication which greatly simplify programming these devices. Both parts are built on top of OpenCL which ensures their high portability across different kinds of processors and GPUs. We describe dOpenCL and SkelCL, demonstrate how our approach simplifies programming for distributed systems with many cores and multiple GPUs and report experimental results on a real-world application from the field of medical imaging.
    No preview · Article · Jan 2013 · Advances in Parallel Computing
  • Source
    Michel Steuwer · Sergei Gorlatch
    [Show abstract] [Hide abstract]
    ABSTRACT: Application development for modern high-performance systems with Graphics Processing Units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. In this paper, we present SkelCL – a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel algorithmic patterns (skeletons); 2) memory management is simplified using parallel container data types (vectors and matrices); 3) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability when using multiple GPUs. We demonstrate how SkelCL is used to implement parallel applications on one- and two-dimensional data. We report experimental results to evaluate our approach in terms of programming effort and performance.
    Full-text · Conference Paper · Jan 2013

Publication Stats

2k Citations
38.02 Total Impact Points

Institutions

  • 1970-2014
    • University of Münster
      • • Department of Mathematics and Computer Sciences
      • • Institute for Geoinformatics
      • • Institute of Computer Science
      Muenster, North Rhine-Westphalia, Germany
  • 1994-2006
    • Universität Passau
      • Department of Informatics and Mathematics
      Passau, Bavaria, Germany
  • 2001-2005
    • Technische Universität Berlin
      • School IV Electrical Engineering and Computer Science
      Berlín, Berlin, Germany