Helmut Neukirchen’s research while affiliated with University of Iceland and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (69)


Figure 1: Comparison of successive halving in time (top) and halving in time combined with doubling in space (bottom). Each line corresponds to the learning curve of a single HPO combination.
HPO for the CV application case, trained for 40 epochs on 64 GPUs on the JURECA-DC-GPU system. Results are averaged over three random seeds. Better results (↑ or ↓ depending on the metric) are underlined.
HPO for the CFD application case, trained for 40 epochs on 128 GPUs on the JUWELS BOOSTER module. Results are averaged over three random seeds. Better results (↑ or ↓ depending on the metric) are underlined.
Large-scale HPO for the CFD application case, evaluating 64 config- urations, trained for a maximum of 20 epochs on 1,024 GPUs on the JUWELS BOOSTER module, including relative improvement to the HPO run on 128 GPUs.
Resource-Adaptive Successive Doubling for Hyperparameter Optimization with Large Datasets on High-Performance Computing Systems
  • Preprint
  • File available

December 2024

·

17 Reads

·

·

Helmut Neukirchen

·

[...]

·

On High-Performance Computing (HPC) systems, several hyperparameter configurations can be evaluated in parallel to speed up the Hyperparameter Optimization (HPO) process. State-of-the-art HPO methods follow a bandit-based approach and build on top of successive halving, where the final performance of a combination is estimated based on a lower than fully trained fidelity performance metric and more promising combinations are assigned more resources over time. Frequently, the number of epochs is treated as a resource, letting more promising combinations train longer. Another option is to use the number of workers as a resource and directly allocate more workers to more promising configurations via data-parallel training. This article proposes a novel Resource-Adaptive Successive Doubling Algorithm (RASDA), which combines a resource-adaptive successive doubling scheme with the plain Asynchronous Successive Halving Algorithm (ASHA). Scalability of this approach is shown on up to 1,024 Graphics Processing Units (GPUs) on modern HPC systems. It is applied to different types of Neural Networks (NNs) and trained on large datasets from the Computer Vision (CV), Computational Fluid Dynamics (CFD), and Additive Manufacturing (AM) domains, where performing more than one full training run is usually infeasible. Empirical results show that RASDA outperforms ASHA by a factor of up to 1.9 with respect to the runtime. At the same time, the solution quality of final ASHA models is maintained or even surpassed by the implicit batch size scheduling of RASDA. With RASDA, systematic HPO is applied to a terabyte-scale scientific dataset for the first time in the literature, enabling efficient optimization of complex models on massive scientific data. The implementation of RASDA is available on https://github.com/olympiquemarcel/rasda

Download





Reproducible Cross-border High Performance Computing for Scientific Portals

September 2022

·

28 Reads

To reproduce eScience, several challenges need to be solved: scientific workflows need to be automated; the involved software versions need to be provided in an unambiguous way; input data needs to be easily accessible; High-Performance Computing (HPC) clusters are often involved and to achieve bit-to-bit reproducibility, it might be even necessary to execute the code on a particular cluster to avoid differences caused by different HPC platforms (and unless this is a scientist's local cluster, it needs to be accessed across (administrative) borders). Preferably, to allow even inexperienced users to (re-)produce results, all should be user-friendly. While some easy-to-use web-based scientific portals support already to access HPC resources, this typically only refers to computing and data resources that are local. By the example of two community-specific portals in the fields of biodiversity and climate research, we present a solution for accessing remote HPC (and cloud) compute and data resources from scientific portals across borders, involving rigorous container-based packaging of the software version and setup automation, thus enhancing reproducibility.


Accelerating Hyperparameter Tuning of a Deep Learning Model for Remote Sensing Image Classification

July 2022

·

120 Reads

·

7 Citations

Deep Learning models have proven necessary in dealing with the challenges posed by the continuous growth of data volume acquired from satellites and the increasing complexity of new Remote Sensing applications. To obtain the best performance from such models, it is necessary to fine-tune their hyperparameters. Since the models might have massive amounts of parameters that need to be tuned, this process requires many computational resources. In this work, a method to accelerate hyperparameter optimization on a High-Performance Computing system is proposed. The data batch size is increased during the training, leading to a more efficient execution on Graphics Processing Units. The experimental results confirm that this method reduces the runtime of the hyperparameter optimization step by a factor of 3 while achieving the same validation accuracy as a standard training procedure with a fixed batch size.



Fig. 1. Heterogenous Modular Supercomputer Architecture
Fig. 2. Scalable and Diverse Application Workload Examples of the MSA
Fig. 3. Remote Sensing applications taking advantage of the MSA ensuring conceptual interoperability with Clouds.
Fig. 4. Health science applications taking advantage of the MSA enabling seamless access for non-technical medical experts.
Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures

June 2021

·

617 Reads

·

12 Citations

We observe a continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., ’big data’) that requires powerful computing resources with equally increasing performance. Consequently, innovative heterogeneous High-Performance Computing (HPC) systems based on multi-core CPUs and many-core GPUs require an architectural design that addresses end user communities’ requirements that take advantage of ML and DL. Still the workloads of end user communities of the simulation sciences (e.g., using numerical methods based on known physical laws) needs to be equally supported in those architectures. This paper offers insights into the Modular Supercomputer Architecture (MSA) developed in the Dynamic Exascale Entry Platform (DEEP) series of projects to address the requirements of both simulation sciences and data-intensive sciences such as High Performance Data Analytics (HPDA). It shares insights into implementing the MSA in the Jülich Supercomputing Centre (JSC) hosting Europe No. 1 Supercomputer Jülich Wizard for European Leadership Science (JUWELS). We augment the technical findings with experience and lessons learned from two application communities case studies (i.e., remote sensing and health sciences) using the MSA with JUWELS and the DEEP systems in practice. Thus, the paper provides details into specific MSA design elements that enable significant performance improvements of ML and DL algorithms. While this paper focuses on MSA-based HPC systems and application experience, we are not losing sight of advances in Cloud Computing (CC) and Quantum Computing (QC) relevant for ML and DL.


CHICOM: Code for comparing weighted or unweighted histograms in Fortran-77, C++, R and Python

August 2019

·

17 Reads

·

3 Citations

Computer Physics Communications

Improved a program that calculates test statistics to compare weighted and unweighted histograms. The program is presented in Fortran-77, C++, R and Python. The code calculates test statistics for histograms with either normalized or unnormalized weights of events. New version program summary Program Title: CHICOM Program Files doi: http://dx.doi.org/10.17632/424sd4fhj8.1 Licensing provisions: GPLv3 Programming language: Fortran-77, C++, Python, R Journal reference of previous version: CHICOM: A code of tests for comparing unweighted and weighted histograms and two weighted histograms, N. D. Gagunashvili, Comput. Phys. Commun. 183 (2012) 193-196 Does the new version supersede the previous version?: Yes Reasons for the new version: To use an improved version of the chi-square test with better statistical properties, instead of the median statistic [3] as in the previous version. Summary of revisions: An improved version of the test statistic [2] was used that uses an improved chi-square test [4]. The algorithm has been implemented in four commonly used programming languages (Fortran-77, C++, Python and R). Nature of problem: The program calculates test statistics for comparing weighted or unweighted histograms. Solution method: An improved test statistic for comparing weighted histograms is calculated using the formulas presented in Ref. [2]. In order to find the test statistic, we must find the probability that minimizes the sum of the goodness of fit test statistic of each histogram. To do so, the Polak–Ribière conjugate gradient method is used to converge to the minimum from an initial guess suggested in the article. References [1] N. D. Gagunashvili, Comput. Phys. Commun. 183(2012)193. [2] N. D. Gagunashvili, Eur. Phys. J. Plus (2017) 132: 196. [3] N. D. Gagunashvili, Nucl. Instrum. Meth. A 596(2008)439. [4] N. D. Gagunashvili, J. Instrum. 10(2015)P05004.


Citations (47)


... In high-performance computing (HPC) environments, AI models excel in real-time processing and decisionmaking (Riedel et al., 2023). AI frameworks, such as the Unique AI Framework (UAIF), allow for rapid scaling and frequent updates, significantly outperforming traditional models in time-sensitive applications like healthcare and financial forecasting (Riedel et al., 2023). ...

Reference:

The Impact of Artificial Intelligence on Predictive Customer Behaviour Analytics in E-commerce: A Comparative Study of Traditional and AI-driven Models
Enabling Hyperparameter-Tuning of AI Models for Healthcare using the CoE RAISE Unique AI Framework for HPC
  • Citing Conference Paper
  • May 2023

... Recent trends have popularized numerous techniques in machine learning for modeling, simulation, and the solution of differential equations. Machine learning models being used in this area are essentially powerful function approximators [15]- [17]. Although these algorithms utilize general-purpose processors, they are still computationally intensive. ...

Facilitating Collaboration in Machine Learning and High-Performance Computing Projects with an Interaction Room
  • Citing Conference Paper
  • October 2022

... As the GNS starts out small and increases over time, this justifies the usage of larger batch sizes during the later part of training. This approach has also been used successfully for HPO and scheduling tasks in the past [42,2]. ...

Accelerating Hyperparameter Tuning of a Deep Learning Model for Remote Sensing Image Classification

... The swift advancement in high-performance computing, cluster computing, and computer hardware has led to the prominence of heterogeneous systems as the backbone of distributed computing [1]. Presently, most large-scale computing platforms are a blend of various computing devices, including CPUs, GPUs, and DSPs, each with distinct structures and capabilities [2]. ...

Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures

... Our RESNET-50 studies mentioned above is using 128 GPUs for many hours, hence, we believe that for the time being RS researchers need to use still the cost-free HPC computational time grants to be feasible unless specific cooperations are formed with vendors. Such HPC grants are provided by e-infrastructures such as Partnership for Advanced Computing in Europe (PRACE) 15 in the EU (e.g., that includes free of charge A100 GPUs in JUWELS) or Extreme Science and Engineering Discovery Environment (XSEDE) 16 in the US. Our experience reveals further that free CC resources of commercial vendors typically have drawbacks like the Google Collaboratory example getting just different types of GPUs assigned that make it relatively hard to perform proper speed-up studies not even mentioning the missing possibility to interconnect GPUs for large-scale usage. ...

Scalable Workflows for Remote Sensing Data Processing with the Deep-Est Modular Supercomputing Architecture
  • Citing Conference Paper
  • July 2019

... A series of incremental deformation steps (Δu s1 , Δu s2 , Δu s3 …) need to be set in the process of the numerical calculation. Under the condition that the state variables (τ n , u sn p ) after n incremental steps are known, the state variables of the n + 1 increment step are solved [47,48]. The flow chart shown in Fig. 7 is used to compile the specific calculation. ...

CHICOM: Code for comparing weighted or unweighted histograms in Fortran-77, C++, R and Python
  • Citing Article
  • August 2019

Computer Physics Communications

... A is on the edge elements and V on the nodal elements. For inter-process communication and parallel processing, the open-source FEA utilizes the standardized message passing interface (MPI), which makes it possible to run analyses in multi-core as well as in multiprocessor environments [16]. The superior benefit of open source FEA-software is, that it can use as many cores as any desktop or supercomputer can offer. ...

Scientific workflows applied to the coupling of a continuum (Elmer v8.3) and a discrete element (HiDEM v1.0) ice dynamic model

... In recent years, artificial intelligence (AI) applications have made amazing advances in many areas such as computer vision, speech and language processing, security and cyber-security [11], business intelligence, and robotics. AI is also used in environmental domain for remote sensing and forecasting [12]. The success of neural networks and the rise of deep learning [13] is believed under the availability of: ...

Automated Analysis of Remotely Sensed Images Using the Unicore Workflow Management System

... 27 In this paper, we present a discovery model architecture for CPSs based on the Web of 28 Things (WoT) that includes different capabilities in a faceted way: (1) discover proactively 29 in a pull configuration, (2) recommending devices and services such as other Discovery 30 Services using AI, (3) federated searches through a federation of Discovery Services, and 31 (4) query expansion. Furthermore, as the discovery model architecture is based on the 32 WoT, it can be used with other implementations and environments as long as the services 33 are described following the Thing Description (TD) structure. The discovery model ar-34 chitecture with two of four capabilities (i.e., proactive discovery and recommendation) 35 has been implemented in a real example scenario of a smart room with different topolo- 36 gies using Edge Computing facilities. ...

Towards Federated Service Discovery and Identity Management in Collaborative Data and Compute Cloud Infrastructures

Journal of Grid Computing

... Several WMS have been implemented to support physical and data modelling applications, but there are less cases known where machine learning scenarios, such as classification methods, have been catered. In our previous work [2], we proposed to automate only the cross-validation phase of the Support Vector Machine (SVM) [3], which are one of the most used classifiers for analyzing and extracting information from remote sensing data; in this earlier work, we use a standards-based parameter sweep model and the HPC middleware UNICORE [4] for the cross-validation. This paper goes one step further by extending the workflow to the model generation (i.e., training) and prediction (i.e., classification) phases by using UNICORE's more advanced workflow management capabilities and it's graphical client, the UNICORE Rich Client (URC) [5]. ...

Facilitating efficient data analysis of remotely sensed images using standards-based parameter sweep models
  • Citing Conference Paper
  • July 2017