Content uploaded by Florian Katerndahl
Author content
All content in this area was uploaded by Florian Katerndahl on May 27, 2022
Content may be subject to copyright.
Humboldt-Universität zu Berlin | Department of Geography | Earth Observation Lab | Unter den Linden 6 | D-10099 Berlin | Tel.: +49.30.2093.6905 | Fax: +49.30.2093.6848 | e-mail: patrick.hostert@geo.hu-berlin.de
Geography Department
Florian Katerndahl1* Dirk Pflugmacher1 Fabian Lehmann2 Andreas Janz1 Ulf Leser2 Patrick Hostert1
1 Humboldt-Universität zu Berlin | Department of Geography | Earth Observation Lab | Unter den Linden 6 | D-10099 Berlin
2 Humboldt-Universität zu Berlin | Institute for Computer Science | Knowledge Management in Bioinformatics | Unter den Linden 6 | D-10099 Berlin
*contact: florian.katerndahl@geo.hu-berlin.de | https://www.geographie.hu-berlin.de/en/professorships/eol
Geoflow
Novel Workflow Implementations To Facilitate Big EO Data Workflows in Nextflow
Funded by the Deutsche
Forschungsgemeinschaft (DFG, German
Research Foundation) – Project-ID
414984028 – SFB 1404 FONDA
Poster
HUB EOL
Twitter
●EO analysis workflows lack reusability, and scalability across large areas
●EO analysis workflows combine tasks of heterogeneous resource
requirements
●Coupling of specific input data, processing back-ends and execution
environments creates software- and/or hardware-infrastructure lock-ins
●As a layer of abstraction, workflow engines promise accessible
development of portable, adaptable and dependable processing
environments (Lehmann et al. 2021 & Leser et al. 2021)
Challenge
●Develop a Nextflow workflow that leverages a broad range of
existing, already widely used open source tools and programs
●Map annual land cover between 2000 and 2020 across Germany
using Landsat times series and the harmonized European-wide Land
Use and Coverage Area frame Survey (LUCAS) (d’Andrimont et al.
2020)
Objectives
●Domain-agnostic workflow
(execution) engine with its own
Groovy-based DSL and
connectors to resource managers
●Originated from Bioinformatics (Di
Tommaso et al. 2017) – was used
in EO applications as well
(Lehmann et al. 2021)
Nextflow
●Help scientists organize and execute workflows
●take care of errors and dependencies
●Sequence of tasks are declared in a Domain
Specific Language (DSL)
●Create and schedule physical tasks
●Scaling across computational infrastructure is
handled by resource managers (e.g.
Kubernetes)
●Yield repeatable and portable workflows
Workflow Engines
●Wrap single or multiple program calls (e.g.
script execution) into abstract tasks
●Dependencies between tasks are defined via
files
●A task can start once all inputs are available
●Independent tasks may run in parallel
Workflows
References
d’Andrimont, R., Yordanov, M., Martinez-Sanchez, L., Eiselt, B., Palmieri, A., Dominici, P., Gallego, J., Reuter, H. I., Joebges, C., Lemoine, G., and van der Velde, M. (2020). “Harmonised LUCAS in-situ land cover and use database for field surveys from
2006 to 2018 in the European Union”. In: Scientific Data 7.1, p. 352. doi: 10.1038/s41597-020-00675-z.
Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., and Notredame, C. (2017). “Nextflow enables reproducible computational workflows”. In: Nature Biotechnology 35.4, pp. 316–319. doi: 10.1038/nbt.3820.
EnMAP-Box Developers (2019). EnMAP-Box 3 - A QGIS Plugin to process and visualize hyperspectral remote sensing data. URL: https://enmap-box.readthedocs.io.
Frantz, D. (2019). “FORCE—Landsat + Sentinel-2 Analysis Ready Data and Beyond”. In: Remote Sensing 11.9, p. 1124. doi: 10.3390/rs11091124. url: http://www.mdpi.com/2072-4292/11/9/1124.
Lehmann, F., Frantz, D., Becker, S., Leser, U., and Hostert, P. (2021). “FORCE on Nextflow: Scalable Analysis of Earth Observation data on Commodity Clusters”. In: Proceedings of the CIKM 2021 Workshops. url:
http://ceur-ws.org/Vol-3052/short12.pdf.
Leser, U., Hilbrich, M., Draxl, C., Eisert, P., Grunske, L., Hostert, P., Kainmüller, D., Kao, O., Kehr, B., Kehrer, T., Koch, C., Markl, V., Meyerhenke, H., Rabl, T., Reinefeld, A., Reinert, K., Ritter, K., Scheuermann, B., Schintke, F., Schweikardt, N., and
Weidlich, M. (November 2021). “The Collaborative Research Center FONDA”. In: Datenbank-Spektrum 1610-1995. doi: 10.1007/s13222-021-00397-5.