Fabian LehmannHumboldt-Universität zu Berlin | HU Berlin · Department of Computer Science
Master of Science
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
Citations since 2017
14 Research Items
In my Ph.D. research, I focus on workflow engines, improving the execution of distributed workflows while analyzing large amounts of data. In particular, my goal is to improve scheduling and data management. Therefore, I work closely with the Earth Observation Lab at the Humboldt University of Berlin to understand real-world requirements.
November 2020 - present
- PhD Student
- In my Ph.D. studies, I focus on improving the execution of large scientific workflows processing hundreds of gigabytes of data.
May 2018 - October 2020
- Student Assistent
- In my student job, we were working with time-series data in DIGINET-PS. For example, we predicted parking slot occupation.
September 2016 - October 2016
Reflect IT Solutions GmbH
- In my semester term work, I helped to develop the backend for a construction-progress-management system.
October 2018 - November 2020
October 2015 - February 2019
Scientific workflow management systems (SWMSs) and resource managers together ensure that tasks are scheduled on provisioned resources so that all dependencies are obeyed, and some optimization goal, such as makespan minimization, is fulfilled. In practice, however, there is no clear separation of scheduling responsibilities between an SWMS and a r...
Scientific workflows consist of thousands of highly parallelized tasks executed in a distributed environment involving many components. Automatic tracing and investigation of the components' and tasks' performance metrics, traces, and behavior are necessary to support the end user with a level of abstraction since the large amount of data cannot be...
Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have hig...
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefin...
Creating, maintaining, and operating software artifacts is a long ongoing challenge. Various management strategies have been developed and are frequently used. Nevertheless, a unification of describing the management strategies to compare them is an open question. We present ßMACH as an answer. ßMACH allows systematic descriptions and checks indepe...
ßMACH—A Software Management Guidance
Modern Earth Observation (EO) often analyses hundreds of gigabytes of data from thousands of satellite images. This data usually is processed with hand-made scripts combining several tools implementing the various steps within such an analysis. A fair amount of geographers' work goes into optimization, tuning, and parallelization in such a setting....
Continuous integration and deployment are established paradigms in modern software engineering. Both intend to ensure the quality of software products and to automate the testing and release process. Today's state of the art, however, focuses on functional tests or small microbenchmarks such as single method performance while the overall quality of...
FONDA investigates methods for increasing productivity in the development, execution, and maintenance of Data Analysis Workflows for large scientific data sets. We approach the underlying research questions from a fundamental perspective, aiming to find new abstractions, models, and algorithms that can eventually form the basis of a new class of future infrastructures for Data Analysis Workflows.