Carsten Trinitis's research while affiliated with Technische Universität München and other places

Publications (94)

Chapter
Full-text available
High Performance Computing (HPC) systems are facing severe limitations in both power and memory bandwidth/capacity. By now, these limitations have been addressed individually: to improve performance under a strict power constraint, power capping, which sets power limits to components/nodes/jobs, is an indispensable feature; and for memory bandwidth...
Chapter
The mining of time series data plays an important role in modern information retrieval and analysis systems. In particular, the identification of similarities within and across time series has garnered significant attention and effort over the last few years. For this task, the class of matrix profile algorithms, which create a generic structure th...
Book
This book constitutes the proceedings of the 33rd International Conference on Architecture of Computing Systems, ARCS 2020, held in Aachen, Germany, in May 2020.* The 12 full papers in this volume were carefully reviewed and selected from 33 submissions. 6 workshop papers are also included. ARCS has always been a conference attracting leading-edge...
Conference Paper
Full-text available
Gas turbine power plants generate an ever growing amount of high frequency dynamic sensor data. One of the applications of this data is the protection against problems induced by combustion dynamics, as, e.g., with the ArgusOMDS system developed by IfTA. In the light of digitalization, this data has the potential to also be used in other areas and...
Conference Paper
On June 28, 2018, he board of directors of the German Informatics Society (GI) adopted new ethical guidelines. Throughout the development process, the main authors, mainly members of GI's "Informatics and Ethics" special interest group in close cooperation with the president of GI, incorporated feedback and suggestions from numerous GI members on t...
Conference Paper
Most applications running on supercomputers achieve only a fraction of a system’s peak performance. It has been demonstrated that co-scheduling applications can improve overall system utilization. In this case, however, applications being co-scheduled need to fulfill certain criteria such that mutual slowdown is kept at a minimum. In this paper we...
Conference Paper
Full-text available
Despite being versatile and efficient for various use cases, nano-and microsatellites are still plagued by low dependability. The low survivability of many earlier CubeSat missions can be attributed, among others, to low component level failure tolerance and a lack of FDIR functionality. Most nanosatellite developers underestimate the required test...
Conference Paper
Full-text available
We present storage integrity concepts developed for the CubeSat MOVE-II over the past two years, enabling dependable computing without relying solely upon hardened special purpose hardware. Neither component level, nor hardware-or software-side measures individually can guarantee sufficient system consistency with modern highly scaled components. I...
Conference Paper
Full-text available
Future spacemissions will require vast amounts of data to be stored and processed aboard spacecraft. While satisfying operational mission requirements, storage systems must guarantee data integrity and recover damaged data throughout the mission. NAND-flash memories have become popular for space-borne high performance mass memory scenarios, though...
Conference Paper
Full-text available
A satellite’s on-board computer must guarantee integrity and recover degraded or damaged data over the entire duration of the spacecraft’s mission in an extreme, radiated environment. While redundancy and hardware-side voting can protect Magnetoresistive RAM well from device failure, more sophisticated software-side storage concepts are required if...
Article
SUMMARY This article highlights the issue of upcoming wider single-instruction, multiple-data units as well as steadily increasing core counts on contemporary and future processor architectures. We present the recent port to and latest results of cache-oblivious algorithms and implementations of our TifaMMy code on four architectures: SGI's UltraVi...
Conference Paper
The number of cores in future CPUs is expected to increase steadily. Balanced CPU designs scale hardware cache coherency functionality according to the number of cores, in order to minimize bottlenecks in parallel applications. An alternative approach is to do away with hardware coherence entirely; the Single-chip Cloud Computer (SCC), a 48 core ex...
Conference Paper
Full-text available
As large scale distributed systems gather and share more and more computing nodes and storage resources, their energy consumption is exponentially increasing. Next generation computing and data centers might require tens of MWatts to be feasible. Thus designing more efficient systems is a major challenge for computer engineers. This challenge is tw...
Conference Paper
In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×...
Conference Paper
Two electrical engineering applications from industry partners dealing with sparse matrices were analyzed regarding cache efficiency and scalability on modern multi core systems. Two different contemporary multi-core architectures have been investigated, namely Intel’s Westmere and AMD’s Magny-Cours. This paper can be regarded as a continuation of...
Conference Paper
In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI's UltraViolet distributed shared memory machine, and Intel's latest x86 architecture Sandy Bridge. TifaMMy's matrix multiplication and LU decomposition routines...
Article
Full-text available
This paper compares various contemporary multicore-based microprocessor architectures from different vendors with different memory interconnects regarding performance, speedup, and parallel efficiency. Sparse matrix decomposition is used as a benchmark application. The example matrix used in the experiments comes from an electrical engineering appl...
Article
In this paper we present a framework for automatic detection and application of the best binding between threads of a running parallel application and processor cores in a shared memory system, by making use of hardware performance counters. This is especially important within the scope of multicore architectures with shared cache levels. We demons...
Chapter
In contrast to just a few years ago, the answer to the question “What system should we buy next to best assist our users” has become a lot more complicated for the operators of an HPC center today. In addition to multicore architectures, powerful accelerator systems have emerged, and the future looks heterogeneous. In this paper, we will concentrat...
Conference Paper
Cache-obliviousness represents an important but relatively new concept for cache optimization. As cache-oblivious algorithms perform well on architectures with arbitrary cache configurations, the programming effort required for porting and optimizing for future architectures can be significantly reduced. In [8] and [9], fast parallel cache-obliviou...
Conference Paper
Full-text available
This paper compares various contemporary multi-core based microprocessor architectures with different memory interconnects regarding performance, speedup, and parallel efficiency. Sparse matrix operations are used as a benchmark application from the area of electrical engineering. Within this context, thread to core pinnning and cache optimization...
Article
In recent years, a trend towards multi-core architectures with a growing number of cores for all standard instruction set architectures can be observed. To utilize the full potential of such novel microprocessor architectures, applications running on them must be efficiently parallelized and carefully analyzed regarding runtime, speedup, and parall...
Conference Paper
Full-text available
The efficient use of multicore architectures for sparse matrix-vector multiplication (SpMV) is currently an open challenge. One algorithm which makes use of SpMV is the maximum likelihood expectation maximization (MLEM) algorithm. When using MLEM for positron emission tomography (PET) image reconstruction, one requires a particularly large matrix....
Chapter
Computers were conceived as a replacement for the mechanical solution of engineering applications using simple calculators. The basic idea of Konrad Zuse and John von Neumann was to automate this process by introducing programmability, i.e. the description of the sequence of execution of individual steps of a longer algorithm including the possibil...
Conference Paper
In today’s world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software...
Article
An important issue when designing numerical code in High Performance Computing is cache optimization in order to exploit the performance potential of a given target architecture. This includes techniques to improve memory access locality as well as prefetching. Inherent algorithm constrains often limit the first approach, which typically uses a blo...
Conference Paper
In today’s world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and manycore, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new op...
Conference Paper
In today’s world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and manycore, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new op...
Conference Paper
This paper provides a detailed investigation of latency penalties caused by repeated memory writes to nearby memory cells from different threads in parallel programs. When such writes map to the same corresponding cache lines in multiple processors, one can observe the so called false sharing effect. This effect can unnecessarily hamper parallel co...
Conference Paper
This article describes conceptual issues of the tool SafeME (The Safety Modeling Environment). The tool allows for modeling a safetyücritical, faultütolerant system. Several undiserable events like shutdown, accident, unavailability can be defined and are analyzed by within same model. Furthermore, interrelations between these events can be defined...
Conference Paper
Strong tendencies towards a higher grade of individualization in consumer parts can be discovered in the automotive as well as the aeronautic industry so that the demand for individual parts has constantly increased in number over the last years. Manufacturers who have to deliver customized geometries are searching for new production strategies or...
Conference Paper
A key aspect in the design and optimization process of high voltage apparatus is the precise simulation and geometric optimization of the electric electromagnetic field distribution on electrodes and dielectrics. Since these simulations and optimizations are rather compute intensive, the engineer demands a user friendly working environment requirin...
Conference Paper
We address several common problems with transparent checkpointing and present solutions to these problems, with focus on graphical user interfaces and thread support. We describe two possible ways to retrieve the GUI state from the X-server, by using a proxy for the X-protocol and enhancing the X-server with an extension, and we will present a perf...
Conference Paper
Traditional Modeling Methods for High-Availability-Systems like Fault Trees (FT) or Reliability Block Diagrams (RBD) assume that there are no stochastic dependencies between the failure and repair behavior of the system's components). However, this assumption is over-optimistic, because of numerous dependencies between the subsystems. Using the too...
Conference Paper
Implementations of algorithms based on shared memory are widely used in the area of high performance computing. The concept of computation clusters, however, does not imply the availability of a globally shared memory across nodes in that cluster. Therefore, several concepts for an emulation of a globally shared memory, called software distributed...
Article
When modeling fault-tolerant systems, state-based methods yield much more realistic results in comparison to traditional combinatorial methods. To avoid the difficult manual design of large state-based mod- els, we advocate an approach, by which a high-level input model is used from which a semantically equivalent low-level model is automatically g...
Conference Paper
Log-based recovery protocols enable process replicas in distributed systems to replay a computation up to the point where a previous computation failed. One fundamental assumption underlying these protocols is the piecewise deterministic (PWD) execution model, stating that recovery must not execute, but simulate the execution of nondeterministic ev...
Conference Paper
Traditional modeling tools for high availability systems do not combine intuitivity, efficiency, and modeling power under the same umbrella. For example, combinatorial methods like fault trees or reliability block diagrams are quite intuitive, allow for a stepwise refinement of the models and can be analyzed efficiently. However, their modeling pow...
Conference Paper
We describe a system which enables FPGAs to generate machine code for various CPUs, similar to a conventional assembler. Such conversion from intermediate code to a CPU's native code can be used as the last step in just-in-time compilation for virtual machines like the Java Virtual Machine. The translation system itself and the FPGA logic are indep...
Conference Paper
The use of parallel programming and architectures has become essential for simulating practical problems in engineering disciplines. The remarkable progress in CPU power, system scalability, and interconnect technology, as well as the introduction of new paradigms like computational Grids or E-Services, continues to provide new opportunities, as we...
Conference Paper
The authors recommend to quantify the security of a complex system by first quantifying the security of its components, and, in a second step, by calculating the overall security according to a given method. This paper summarizes the state of the art of security measures for components and presents a new method for combining these measures into the...
Conference Paper
Computer systems usually rely on hardware counters and software instrumentation to acquire performance information about the cache access behavior. These approaches either provide only limited data or are restricted in their applicability. This paper introduces a novel approach based on a hardware cache monitoring facility that exhibits both the de...
Conference Paper
The increasing gap of processor and main memory perfor- mance underlines the need for cache-optimizations, especially on memory- intensive applications. Tools which are able to localize code regions with high cache miss ratio seem to be appropriate for access optimizations. However, a programmer often does not know what to do with the col- lected i...
Conference Paper
Full-text available
Cluster systems interconnected via fast interconnection networks have been successfully applied to various research fields for parallel execution of large applications. Next to MPI, the conventional programming model, OpenMP is increasingly used for parallelizing sequential codes. Due to its easy programming interface and similar semantics with tra...
Conference Paper
Cache optimizations typically include code transformations to increase the locality of memory accesses. An orthogonal approach is to enable for latency hiding by introducing prefetching techniques. With software prefetching, cache load instructions have to be inserted into the program code. To overcome this complexity for the programmer, modern pro...
Conference Paper
Full-text available
In this paper, two tools are presented: an execution driven cache simulator which relates event metrics to a dynamically built-up call-graph, and a graphical front end able to visualize the generated data in various ways. To get a general purpose, easy-to-use tool suite, the sim- ulation approach allows us to take advantage of runtime instrumentati...
Conference Paper
In this paper, a novel modeling method for highly available systems is proposed. As an input, the model accepts common reliability block diagrams, which are widely used because of their excellent manageability. However, unlike traditional solution methods for block diagrams, the proposed method also supports the attribution of the model with severa...
Conference Paper
Full-text available
This work describes ViSMI, a software distributed shared memory system for cluster systems connected via InfiniBand. ViSMI implements a kind of home-based lazy release consistency protocol, which uses a multiple-writer coherence scheme to alleviate the traffic introduced by false sharing. For further performance gain, InfiniBand features and optimi...
Conference Paper
For the first time, the program of the EuroPVM/MPI conference series includes a special session on current trends in Numerical Simulations for Parallel Engineering Environments. Its goal is to provide a discussion forum for scientists from both the engineering disciplines and computer science in order to foster a closer cooperation between them. In...
Conference Paper
The BALANCE project aims at investigating the possibilities for improving overall system performance as well as availability of telecommunication computing systems by the use of high availability middleware. It is our belief that this approach represents a viable step towards "autonomic computing" as all decisions regarding the system's availabilit...
Conference Paper
Simulating practical problems in engineering disciplines has become a key field for the use of parallel programming environments. Remarkable progress in both CPU power and network technology has been paralleled by developments in numerical simulation and software integration resulting in the support for a large variety of engineering applications....
Conference Paper
The optimization process for most modern engineering problems involves a repeated modeling of the target system, simulating its properties, and refining the model based on the results. This process is both time and resource consuming and therefore needs to rely on a distributed resource sharing framework in order to optimally exploit the existing r...
Article
The availability of a comprehensive software infrastructure is essential for the success of a parallel architecture. In order to allow for the greatest possible flexibility, an infrastructure has to be designed in an integrated, easy-to-use manner and with the support of multiple parallel programming paradigms and models to address a wide code base...
Article
The availability of a comprehensive software infrastructure is essential for the success of a parallel architecture. In order to allow for the greatest possible flexibility, an infrastructure has to be designed in an integrated, easy-to-use manner and with the support of multiple parallel programming paradigms and models to address a wide code base...
Conference Paper
Full-text available
A key aspect in the design process of high voltage gear is the exact simulation of the electrostatic and/or electromagnetic field distribution for three dimensional problems. However, such simulation runs are quite compute- and communication-intensive. Therefore, clusters of commodity PCs, equipped with high-speed interconnection technologies, are...
Conference Paper
When designing high voltage equipment like power transformers, it is of essential importance to precisely and efficiently calculate eddy-current problems in a transformer to determine possible losses. A method suitable for such simulations is the Boundary-Element Method (BEM). As far as the simulation is concerned, for electrical devices operating...
Conference Paper
One of the crucial aspects in the design process of high voltage apparatus is the precise simulation of the electrostatic and/or electromagnetic field distribution in 3D domains. This paper summarizes the results obtained on the PC cluster platform installed at ABB Corporate Research using POLOPT a state-of-the-art parallel simulation environment f...
Conference Paper
The paper proposes a novel modeling method for the evaluation of dependability measures of highly available systems. The proposed method, which has been implemented in the tool OpenSESAME (Simple but Extensive Structured Availability Modeling Environment), combines the advantages of Boolean methods and state space based methods. The tool supports t...
Conference Paper
The exchange of a computer system's components during operation can be accomplished by the so called Hot Swap technology. This technology makes it possible to continuously run a computer system without the necessity of a shutdown for maintenance purposes, e.g. upgrading of a network adapter. Thus the overall uptime of a system can be drastically in...
Conference Paper
This paper summarizes results obtained using the parallel 3D electromagnetic field simulation program POLOPT on a cluster of PCs connected via an Scalable Coherent Interface (SCI) network interface. Compared to previous measurements carried out with conventional network technologies such as Ethernet or Fast Ethernet, a significant rise in parallel...
Conference Paper
Full-text available
. This paper summarizes the results that were obtained using the parallel 3D electric field simulation program POLOPT on a cluster of PCs connected via Fast Ethernet. With the high performance of the CPUs and interconnection technology, the results can be compared to those obtained on multiprocessor machines. Several practical high voltage engineer...
Article
The paper deals with the numerical simulation of transient fields using BoundaryElement -Method (BEM) and Discrete Fourier Transformation (DFT). Instead of solving the Maxwell equations for transient fields in the time domain a numerical method using the DFT algorithm for solving in the frequency domain is developed. Impulse voltages and polarity r...
Article
The goal of finding an optimal electric field strength distribution for arbitrary three dimensional problems can be achieved by utilizing a parametric CAD modelling system that needs to be coupled with a three dimensional electric field calculation program. These components are to be linked with a numerical optimization algorithm. The package obtai...
Conference Paper
The goal of finding an optimal electric field strength distribution for arbitrary three dimensional problems can be achieved by utilizing a parametric CAD modelling system that needs to be coupled with a three dimensional electric field calculation program. These components are to be linked with a numerical optimization algorithm. The package obtai...
Conference Paper
. This paper summarizes the results that were obtained usingthe parallel 3D electric eld simulation program POLOPT on differentarchitectures like clusters of workstations/PCs or multiprocessormachines. With the high performance of the CPUs and interconnectiontechnology, the results obtained on clusters of PCs can be compared tothose obtained on mul...
Article
To avoid a large number of iterations, optimization of electrode shapes has been done by artificial neural networks (NN). Two practical examples have been considered, an axisymmetric single-phase GIS bus termination and an axisymmetric transformer shield ring. The shape of the electrodes has been taken as quarter-ellipse or half-ellipse because an...