Fabian Lehmann

Fabian Lehmann
Humboldt-Universität zu Berlin | HU Berlin · Department of Computer Science

Master of Science

About

14
Publications
1,609
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
48
Citations
Citations since 2017
14 Research Items
48 Citations
2017201820192020202120222023051015202530
2017201820192020202120222023051015202530
2017201820192020202120222023051015202530
2017201820192020202120222023051015202530
Introduction
In my Ph.D. research, I focus on workflow engines, improving the execution of distributed workflows while analyzing large amounts of data. In particular, my goal is to improve scheduling and data management. Therefore, I work closely with the Earth Observation Lab at the Humboldt University of Berlin to understand real-world requirements.
Additional affiliations
November 2020 - present
Humboldt-Universität zu Berlin
Position
  • PhD Student
Description
  • In my Ph.D. studies, I focus on improving the execution of large scientific workflows processing hundreds of gigabytes of data.
May 2018 - October 2020
Technische Universität Berlin
Position
  • Student Assistent
Description
  • In my student job, we were working with time-series data in DIGINET-PS. For example, we predicted parking slot occupation.
September 2016 - October 2016
Reflect IT Solutions GmbH
Position
  • Developer
Description
  • In my semester term work, I helped to develop the backend for a construction-progress-management system.
Education
October 2018 - November 2020
Technische Universität Berlin
Field of study
  • Information Systems Management
October 2015 - February 2019
Technische Universität Berlin
Field of study
  • Information Systems Management

Publications

Publications (14)
Preprint
Scientific workflow management systems (SWMSs) and resource managers together ensure that tasks are scheduled on provisioned resources so that all dependencies are obeyed, and some optimization goal, such as makespan minimization, is fulfilled. In practice, however, there is no clear separation of scheduling responsibilities between an SWMS and a r...
Preprint
Scientific workflows consist of thousands of highly parallelized tasks executed in a distributed environment involving many components. Automatic tracing and investigation of the components' and tasks' performance metrics, traces, and behavior are necessary to support the end user with a level of abstraction since the large amount of data cannot be...
Preprint
Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have hig...
Preprint
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefin...
Conference Paper
Full-text available
Creating, maintaining, and operating software artifacts is a long ongoing challenge. Various management strategies have been developed and are frequently used. Nevertheless, a unification of describing the management strategies to compare them is an open question. We present ßMACH as an answer. ßMACH allows systematic descriptions and checks indepe...
Presentation
Full-text available
ßMACH—A Software Management Guidance
Conference Paper
Full-text available
Modern Earth Observation (EO) often analyses hundreds of gigabytes of data from thousands of satellite images. This data usually is processed with hand-made scripts combining several tools implementing the various steps within such an analysis. A fair amount of geographers' work goes into optimization, tuning, and parallelization in such a setting....
Conference Paper
Full-text available
Continuous integration and deployment are established paradigms in modern software engineering. Both intend to ensure the quality of software products and to automate the testing and release process. Today's state of the art, however, focuses on functional tests or small microbenchmarks such as single method performance while the overall quality of...

Network

Cited By

Projects

Project (1)
Project
FONDA investigates methods for increasing productivity in the development, execution, and maintenance of Data Analysis Workflows for large scientific data sets. We approach the underlying research questions from a fundamental perspective, aiming to find new abstractions, models, and algorithms that can eventually form the basis of a new class of future infrastructures for Data Analysis Workflows.