About
290
Publications
47,676
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,380
Citations
Introduction
Current institution
Additional affiliations
August 2006 - present
April 2002 - August 2006
October 1995 - March 2002
Education
October 1995 - December 1997
October 1993 - September 1995
October 1991 - September 1993
Publications
Publications (290)
Distributed dataflow systems like Spark and Flink enable data-parallel processing of large datasets on clusters of cloud resources. Yet, selecting appropriate computational resources for dataflow jobs is often challenging. For efficient execution, individual resource allocations, such as memory and CPU cores, must meet the specific resource demands...
Distributed dataflow systems like Spark and Flink enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs is often challenging. For efficient execution, individual resource allocations, such as memory and CPU cores, must meet the specific resource requirements of the job. A...
Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various inter...
Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major c...
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users that communicate, compute, and store information. Therefore, timely and accurate anomaly detection is necessary for reliability, security, safe operation, and mitigation of losses in these increasingly important systems. Recently, the evolu...
Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the...
Edge computing was introduced as a technical enabler for the demanding requirements of new network technologies like 5G. It aims to overcome challenges related to centralized cloud computing environments by distributing computational resources to the edge of the network towards the customers. The complexity of the emerging infrastructures increases...
The emergence of the Internet of Things has seen the introduction of numerous connected devices used for the monitoring and control of even Critical Infrastructures. Distributed stream processing has become key to analyzing data generated by these connected devices and improving our ability to make decisions. However, optimizing these systems towar...
Fault tolerance is a property which needs deeper consideration when dealing with streaming jobs requiring high levels of availability and low-latency processing even in case of failures where Quality-of-Service constraints must be adhered to. Typically, systems achieve fault tolerance and the ability to recover automatically from partial failures b...
Embedded systems have been used to control physical environments for decades. Usually, such use cases require low latencies between commands and actions as well as a high predictability of the expected worst-case delay. To achieve this on small, low-powered microcontrollers, Real-Time Operating Systems (RTOSs) are used to manage the different tasks...
Rotating machines like engines, pumps, or turbines are ubiquitous in modern day societies. Their mechanical parts such as electrical engines, rotors, or bearings are the major components and any failure in them may result in their total shutdown. Anomaly detection in such critical systems is very important to monitor the system's health. As the req...
Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This op...
Edge computing was introduced as a technical en-abler for the demanding requirements of new network technologies like 5G. It aims to overcome challenges related to centralized cloud computing environments by distributing computational resources to the edge of the network towards the customers. The complexity of the emerging infrastructures increase...
Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between the research areas of machine learning, big data, streaming analytics, and the management of IT operations. AIOps, as a field, is a candidate to produce the future standard for IT operation management. To that end, AIOps has...
This book constitutes the proceedings of the 19th International Conference on Service-Oriented Computing, ICSOC 2020, which is held virtually in November 2021.
The 29 full, 28 short, and 3 vision papers included in this volume were carefully reviewed and selected from 189 submissions. They were organized in topical sections named: Blockchains and s...
Microservices represent a popular paradigm to construct large-scale applications in many domains thanks to benefits such as scalability, flexibility, and agility. However, it is difficult to manage and operate a microservice system due to its high dynamics and complexity. In particular, the frequent updates of microservices lead to the absence of h...
Microservice architectures are increasingly adopted to design large-scale applications. However, the highly distributed nature and complex dependencies of microservices complicate automatic performance diagnosis and make it challenging to guarantee service level agreements (SLAs). In particular, identifying the culprits of a microservice performanc...
Data often are formed of multiple modalities, which jointly describe the observed phenomena. Modeling the joint distribution of multimodal data requires larger expressive power to capture high-level concepts and provide better data representations. However, multimodal generative models based on variational inference are limited due to the lack of f...
The detection of anomalies is essential mining task for the security and reliability in computer systems. Logs are a common and major data source for anomaly detection methods in almost every computer system. They collect a range of significant events describing the runtime system status. Recent studies have focused predominantly on one-class deep...
The rapid growth and distribution of IT systems increases their complexity and aggravates operation and maintenance. To sustain control over large sets of hosts and the connecting networks, monitoring solutions are employed and constantly enhanced. They collect diverse key performance indicators (KPIs) (e.g. CPU utilization, allocated memory, etc.)...
Especially in context of critical urban infrastructures, trust in IoT data is of utmost importance. While most technology stacks provide means for authentication and encryption of device-to-cloud traffic, there are currently no mechanisms to rule out physical tampering with an IoT device's sensors. Addressing this gap, we introduce a new method for...
Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artifi...
Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the...
Distributed data‐parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data‐parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to...
Software architecture is undergoing a transition from monolithic architectures to microservices to achieve resilience , agility and scalability in software development. However, with microservices it is difficult to diagnose performance issues due to technology heterogeneity, large number of microservices, and frequent updates to both software feat...
Artificial Intelligence for IT Operations (AIOps) describes the process of maintaining and operating large IT infrastructures using AI-supported methods and tools on different levels. This includes automated anomaly detection and root cause analysis, remediation and optimization, as well as fully automated initiation of self-stabilizing activities....
The emerging field of Artificial Intelligence for IT Operations (AIOps) utilizes monitoring data, big data platforms, and machine learning, to automate operations and maintenance (O&M) tasks in complex IT systems. The available research data usually contain only a single source of information, often logs or metrics. The inability of the single-sour...
Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major c...
Data often are formed of multiple modalities, which jointly describe the observed phenomena. Modeling the joint distribution of multimodal data requires larger expressive power to capture high-level concepts and provide better data representations. However, multimodal generative models based on variational inference are limited due to the lack of f...
As a result of the many technical advances in microcomputers and mobile connectivity, the Internet of Things (IoT) has been on the rise in the recent decade. Due to the broad spectrum of applications, networks facilitating IoT scenarios can be of very different scale and complexity. Additionally, connected devices are uncommonly heterogeneous, incl...
Huge amounts of data are collected every millisecond all around the world. This ranges from images and videos to an increasing amount of sensor data. Thus, it gets difficult for humans to decide on the most important features anymore. But reducing the feature vector is an important and necessary task to achieve higher precision in classification ta...
In the area of computer system programming, theoretical concepts between hardware and software solutions need to be explained. Even though this topic is highly practical, as it covers the theoretical concepts of any modern machine like computers and smartphones, it is still difficult to provide easy access to any practical experience as the gap bet...
Artificial intelligence based methods for operations of IT-systems (AIOps) support the process of maintaining and operating large IT infrastructures on different levels, e.g. anomaly detection, root cause analysis, or initiation of self-stabilizing activities. The foundation for the deployment of such methods are extensive and reliable metric data...
Technologies like machine-to-machine communica-
tion, autonomous driving or virtual reality applications form an
increasingly diverse service landscape. This entails individual
and dynamic requirements regarding scalability, availability,
latency or throughput from the underlying IT infrastructure. To
meet those, telecommunication and network provi...
Cloud computing provides companies large scale access to virtual resources, offering cost efficient and flexible usage of digital resources at any time. Thus, companies digitalize their dedicated hardware solutions to virtualized services, which can run in a cloud environment. For example, telecommunication providers move their IP multimedia subsys...
Virtualization offers cost efficient usage of digital resources. Thus, dedicated hardware solutions are transferred into virtualized services running in the cloud. Such softwarization of hardware is for example the IP multimedia subsystems, which telecommunication system providers currently move to the cloud. The dedicated hardware solutions provid...
Future services in fields like autonomous driving and virtual reality rely on cloud computing resources located at the edge of Internet Service Provider(ISP) networks. Instead of deploying many service-specific monitoring and reliability platforms, a centralized monitoring solution can reduce the usage of the already sparse edge cloud resources.
Th...
Reliable deployment of services is especially challenging in virtualized infrastructures, where the deep tech-nological stack and the multitude of components necessitate automatic anomaly detection and remediation mechanisms. Traditional monitoring solutions observe the system and generate alarms when the collected metrics exceed predefined thresho...
Telecommunication system providers move their IP multimedia subsystems to virtualized services in the cloud. For such systems, dedicated hardware solutions provided a reliability of 99.999% in the past. Although virtualization offers more cost efficient usage of such services, it comes with higher complexity for providing reliable running software...
The articles in this special section focus on space and terrestrial integrated networks. Currently, many aerial platforms, satellite systems, and space and terrestrial integrated networks (STINs) have been developed, while some of them are still under construction. The basic idea of STIN is to simply connect heterogeneous devices, systems, and netw...
Virtualization technologies have proven to be important drivers for the fast and cost-efficient development and deployment of services. While the benefits are tremendous, there are many challenges to be faced when developing or porting services to virtualized infrastructure. Especially critical applications like Virtualized Network Functions must m...
Cardiac diseases like myocardial infarction, which possibly result in cardiac death, are still a relevant topic. To achieve recognitions in early stages, long term ECG monitoring devices are used. Such devices produce large amounts of data, either directly streamed or stored in databases. Manually analysing this data by experts is inefficient. Thus...
Information overload in the medical field is both visible by the increased number of publications as well as by the volume of patient data. In order to cope with this problem, we propose a novel framework combining patient's health records with medical knowledge, which is based on medical algorithms from frequently used guidelines. The framework us...
Hadoop has been used widely for data analytic tasks in various domains. At the same time, data volume is expected to grow even further in the next years. Hadoop recently introduced the concept Archival Storage, an automated tiered storage technique for increasing storage capacity for long-term storage. However, Hadoop Distributed File System's scal...
Many network device vendors are providing a vendor specific VLAN based access solutions for WLAN clients. This applications allows network operators to specify WLAN devices which automatically fall into their department specific networks ans allows them to access their local resources like e.g. printers. The configuration of these VLAN mappings is...
Virtualization as a key IT technology has developed to a predominant model in data centers in recent years. The flexibility regarding scaling-out and migration of virtual machines for seamless maintenance has enabled a new level of continuous operation and changed service provisioning significantly. Meanwhile, services from domains striving for hig...
Critical services in the field of Network Function Virtualization require elaborate reliability and high availability mechanisms to meet the high service quality requirements. Traditional monitoring systems detect overload situations and outages in order to automatically scale out services or mask faults. However, faults are often preceded by anoma...
The performance of scalable analytic frameworks supporting data-intensive parallel applications often depends significantly on the time it takes to read input data. Therefore, existing frameworks like Spark and Flink try to achieve a high degree of data locality by scheduling tasks on nodes where the input data resides. However, the set of nodes ru...
Nowadays, the energy consumption of data centers is one of the biggest challenges in order to reduce operational expenditure (OPEX) and the carbon dioxide footprint. Most efforts are investigating the modernization of air conditions and server hardware, but also the optimization of resource allocations. Moreover, virtual server are migrated from on...
Implementing Internet of Things (IoT) applications is tightly coupled to challenges like sensor integration, sensor management, semantics, heterogeneity, or abstraction. Irrespective of the challenges, a proliferation of IoT-related applications and solutions can be observed. This leads to an ever-expanding amount of sensors and devices deployed in...
The IEEE Intercloud project aims to facilitate intercloud interoperability and portability. While topology elements and basic security and trust models had been developed during the last years, an adequate base protocol for the intercloud communication is still required. This protocol have to be extensible and should eliminate limitations of HTTP....
This paper describes Aura, a parallel dataflow engine for analysis of large-scale datasets on commodity clusters. Aura allows to compose program plans from relational operators and second-order functions, provides automatic program parallelization and optimization, and is a scalable and efficient runtime. Furthermore, Aura provides dedicated suppor...
This paper introduces a Software Defined Network- ing based cloud federation approach, which enables a horizontal network federation between different cloud providers, based on a federated Software Defined Networking (SDN) layer. Thus, a new cloud federation agent is introduced, designed, and exemplary evaluated. The described agent is managing the...
Many Big Data applications in science and industry have arisen, that require large amounts of streamed or event data to be analyzed with low latency. This paper presents a reactive strategy to enforce latency guarantees in data flows running on scalable Stream Processing Engines (SPEs), while minimizing resource consumption. We introduce a model fo...
The appeal of MapReduce has spawned a family of systems that implement or extend it. In order to enable parallel collection processing with User-Defined Functions (UDFs), these systems expose extensions of the MapReduce programming model as library-based dataflow APIs that are tightly coupled to their underlying runtime engine. Expressing data anal...
Since the cloud paradigm becomes increasingly popular for dynamic resources allocation, the flexibility of a cloud is still limited regarding network services and their autonomous federation between different providers. The following architectural approach introduces a generic layered model to orchestrate and federate heterogeneous networks. In par...
This paper introduces a cloud-federation agent which enables a horizontal network federation between different cloud providers, based on Software Defined Networking (SDN). Furthermore, tenants, using the cloud's Infrastructure as a Service (IaaS) model, have a fine grained access to the resources via an exposed OpenFlow interface, deployed on top o...
Software defined networking becomes more and more popular in the networking community, but is still missing its triumphal procession into existing networks. Especially data-centers could benefit from this evolutionary network paradigm and get rid of many legacy parts with are still blocking the evolution how their networks are working in general.
I...
SaaS applications often face a vendor or technical lock-in due to PaaS provider specific specifications, like cloud management APIs. As a solution, this paper presents a novel approach for developing applications more PaaS provider independent. In particular, the approach illustrates advantages of extending JavaEE application servers with a new con...
Software Defined Networking decouples network services from the underlying physical hardware, thus agile and secure networks can be build, moved, replaced, and programmatically provided on demand. However, the full range of Software Defined Networking capabilities is not utilized in today's cloud middle wares, especially dynamic Quality of Service...
Managing replicated data in distributed systems that is concurrently accessed by multiple sites is a complex task, because consistency must be ensured. In this paper, we present the Replicated Convergent Data Containers (RCDCs) - a set of distributed data structures that coordinate replicated data and allow for optimistic inserts, updates and delet...
The paper argues the need to provide novel methods and tools to support software developers aiming to optimise energy efficiency and minimise the carbon footprint resulting from designing, developing, deploying and running software in Clouds, while maintaining other quality aspects of software to adequate and agreed levels. A cloud architecture to...
The conception and implementation of an on-premise cloud storage is a significant, but also necessary step in the development of modern data centers. Providing seamless access to data from multiple devices and from any network-connected location world-wide is a mandatory requirement for the data management and also expected by users, as most of the...
We present Stratosphere, an open-source software stack for parallel data analysis. Stratosphere brings together a unique set of features that allow the expressive, easy, and efficient programming of analytical applications at very large scale. Stratosphere’s features include “in situ” data processing, a declarative query language, treatment of user...