For the most part, this report focuses on the evaluation and improvement of the performance of both simulation and data analysis applications as well as data management and visualization solutions. This topic appears throughout most of the chapters, presenting this issue from various perspectives. In general, the method of presentation of the analysis and the obtained results from the point of view of the pilot applications was adopted. This assumption was necessary due to the specificity of implementation solutions used within use cases. Another assumption was to use the existing infrastructure for testing, both offered by project partners and thanks to cooperation with the PRACE project. It should be noted, however, that a distinction was made here within the so-called standard (regular) infrastructure and novelty infrastructure. Only the first one is the subject of this report, while the investigation on the second one is included in the D5.8 report.
Our benchmark suite complements mature generic benchmark suites such as HPL, HPCG, or SPEC, and benchmark suites for global systems science applications such as CoeGSS benchmark suites in several way. It introduces a simple and portable methodology for collecting and reporting benchmark results, a wide set of automation scripts, as well as a rich set of benchmarks for CPUs and GPUs that cover computationally expensive data pre-/post-processing and simulation kernels in the data flows of HiDALGO pilots.
In the HiDALGO Centre of Excellence CoE (duration Nov 2018 - Feb 2022), about 60 scientists from different disciplines worked together. A total of 13 partner institutions from seven countries were involved. One of the main challenges was to create a working environment where the different disciplines interact in an optimal and constructive way. Consequently, an internal community-building process, addressing almost all scientists, management, working groups and tasks, was designed and successfully implemented. In this report, the project team describes and reflects on 11 different measures and actions along the four areas management process, research work, staff development and transfer. These measures were used to support collaboration within the research teams. The authors elaborate the aspects of internal community building within the HiDALGO project in a general way, so that they can be applied and adopted to other research and technology projects.
Document represents the final report of WP6 concluding its developments. This report presents (i) the final set of requirements and their KPIs, (ii) the Artificial Intelligence (AI) enabled use case workflows, as well as highlights (iii) the final outcomes of the integration process. It should be noted that all respective objectives have been successfully completed.
This deliverable presents the final version of the HiDALGO Portal and its operations. First of all, it presents the main portal features, the selected architecture, and changes from the second version of this deliverable are pointed out. Noteworthy changes affected the following services and tools: • Single-Sign-On, • workflow orchestrator, the changes were done to support the pre-processing of ECMWF weather data in the ECMWF Cloud and the installation of Cloudify with the latest version. • training, includes now more courses to disseminate the HiDALGO technology within the community and customize GUI with HiDALGO favicon. • data management, • visualization, • Wiki, provide a public page for the HiDALGO community interaction. • notebooks, • matchmaking, extends its functionality from its previous version to refer the users by using the Machine Algorithm. • marketplace, • a new frontend and documentation.
Data scientists spend much time with data cleaning tasks, and this is especially important when dealing with data gathered from sensors, as finding failures is not unusual (there is an abundance of research on anomaly detection in sensor data). This work analyzes several aspects of the data generated by different sensor types to understand particularities in the data, linking them with existing data mining methodologies. Using data from different sources, this work analyzes how the type of sensor used and its measurement units have an important impact in basic statistics such as variance and mean, because of the statistical distributions of the datasets. The work also analyzes the behavior of outliers, how to detect them, and how they affect the equivalence of sensors, as equivalence is used in many solutions for identifying anomalies. Based on the previous results, the article presents guidance on how to deal with data coming from sensors, in order to understand the characteristics of sensor datasets, and proposes a parallelized implementation. Finally, the article shows that the proposed decision-making processes work well with a new type of sensor and that parallelizing with several cores enables calculations to be executed up to four times faster.
The common aim of such events is to raise mutual awareness amongst the GC and the HPC/HPDA communities. Subsequently, this deliverable analyses the current state-of-the-art of HiDALGO training activities and innovation workshops, and defines the necessary roadmap for future events. Other equally important aspects of T7.3 that are described in this document include the collection of feedback on the pilots from the stakeholders and the academic community, the interaction with other CoEs, and the identification of best practices in education and training.
The report provides information on new promising technologies, which appear on the market and could have significant influence on the functionality and performance of HiDALGO solutions. Furthermore, the HiDALGO benchmark tests are delivered based on the available systems. This paper introduces two-dimensional implications. In the first place, it makes an inventory on edge achievements on technology market applicable for HiDALGO workflows. Secondly, delivers practical approach in the form of benchmarking simulation tools on recently acquired computational nodes (AMD Rome). Based on this elaboration, pilot developers would be able select most promising technologies and software solutions to achieve conceivably best performance results.
We focus on High Performance Computing (HPC) and High Performance Data Analytics (HPDA) processing methodologies implemented and used in the HiDALGO project in order to conduct computation within multi-stage use case specific workflows. An introductory talk provides a view of HPC and HPDA from the HiDALGO perspective listing the infrastructure and tools that have been used, together with information on benchmarks, profiling and co-design the consortium has done (as a source of information about their scalability). This introduction also presents how these tools are being applied in HiDALGO, in order to solve use case related problems showing methods for Big data handling on two different use cases but can be applied to any data.
This document provides the initial strategies for optimising applications and implementing novel algorithms and methods. In particular, initial strategies for coupling applications in conjunction with WP4 are provided. In respect of our approach to High Performance Data Analytics (HPDA), this document does not aim to introduce and test applications developed within project activities. Rather, it is intended to make an inventory of HPDA infrastructure and tools along with checking their capabilities in order to verify if fundamental requirements are completely fulfilled. Later on, in the project life time, when specific HPDA methods will have been defined and implemented, these deliverable findings will serve as indicators for selection of the best tools and approaches for implementation as well as a signpost informing on baseline performance of the hardware and frameworks. The initial tests of HiBench on Spark and Hadoop showed the greater benefits can be obtained by using more executor containers than increasing number of cores per executor.
Introduction to Workshop on High-performance Data Analytics co-organized by ENCCS and HiDALGO. It introduces HIDALGO project by presentation of project's motivation and scope along with use case description. HPDA methodology and initial findings are presented in respected various methods tested on Spark framework.
With over 79 million people forcibly displaced, forced human migration becomes a common issue in the modern world and a serious challenge for the global community. The Flee is a validated agent-based social simulation framework for forecasting the population displacements in the armed conflict settings. In this paper, we present two schemes to parallelize Flee, analyze computational complexity of those schemes, and outline results for benchmarks of our parallel codes with the real-world and synthetic scenarios on four state-of-the-art systems including a new European pre-exascale system, Hawk. On all testbeds, we evidenced high scalability of our codes. It exceeds more than 16,384 cores in our largest benchmark with 100 million agents on Hawk. Parallelization schemes discussed in this work, can be extrapolated to a wide range of ABSS applications with frequent agent movement and lesser impact of direct communications between agents.
Deliverable 3.1 focuses on the HPC aspects of HiDALGO, and in particular, sets the guidelines of the HPC benchmarking methodology followed within the project. HiDALGO aims to follow a systematic, reproducible, and interpretable methodology for collecting and storing benchmarking information from the HiDALGO Pilots, to serve their systematic development, optimization, and advance, on the HiDALGO systems and beyond the scope of the HiDALGO project. This deliverable presents the goals and practices of the HiDALGO methodology for HPC benchmarking, which draws from best practices for HPC systems and applications. It then surveys the existing HiDALGO infrastructure and lists the set of tools selected for use within the HiDALGO project. Following, it presents the preliminary efforts on HPC benchmarking of the HiDALGO Pilots. The HiDALGO HPC benchmarking process has been applied on the Migration Pilot. Similar efforts have been kick-started for benchmarking the Air Pollution and Social Networks pilots as well. This initial experimentation and benchmarking have helped identify various major and minor issues in procuring and/or benchmarking the HiDALGO pilots, which are very diverse and have significantly impacted the definition of the HiDALGO methodology.
This document aims at describing the implementation of the second release of the HiDALGO Portal (to be renamed as the Global Challenges Portal), which gives access to HiDALGO services in a simple way, as a one-stop-shop. Such solution consists of a set of tools covering several aspects useful for HiDALGO stakeholders, like training, execution of simulations, visualization of results, user support, data management and even code testing in a simple way. The document goes through all the features implemented for the second version, including the implementation of the frontend (and backend) that puts all of them together, reducing the complexity to access the HiDALGO services.
This paper examines the challenges of leveraging big data in the humanitarian sector in support of UN Sustainable Development Goal 17 “Partnerships for the Goals”. The full promise of Big Data is underpinned by a tacit assumption that the heterogeneous ‘exhaust trail’ of data is contextually relevant and sufficiently granular to be mined for value. This promise, however, relies on relationality – that patterns can be derived from combining different pieces of data that are of corresponding detail or that there are effective mechanisms to resolve differences in detail. Here, we present empirical work integrating eight heterogeneous datasets from the humanitarian domain to provide evidence of the inherent challenge of complexity resulting from differing levels of data granularity. In clarifying this challenge, we explore the reasons why it is manifest, discuss strategies for addressing it and, as our principal contribution, identify five propositions to guide future research.
Presentation "Hardware and Software Co-Design Aspects in Social Science Simulations" was given at HiPEAC 2021 Conference, 5th Heterogeneous Hardware & Software Alliance Workshop, Future Generation of Heterogeneous Computing
HiDALGO’s strategy for external community building includes an introduction to HiDALGO’s offerings, a list of the main target groups for building a community around HiDALGO, a strategy for marketing and collaborations, a training concept and a short characterisation of past and planned events. HiDALGO’s offerings include networking, consulting, easy access to resources and case study results. The target groups include big companies, SMEs, researchers and academia, policy makers, public companies, civil societies and additional bodies like open source communities or alliances.
Understanding major global challenges (GCs) as well as their underlying parameters is a vital issue in our modern world. The importance of assisted decision making by addressing global, multi-dimensional problems is more important than ever. To predict the impact of global decisions with their dependencies, we need an accurate problem representation and a systemic analysis. To achieve this, HiDALGO enables highly accurate simulations, data analytics and data visualization, and also provides knowledge on how to integrate the various workflows as well as the corresponding data. Our project aims to bring together the HPC, HPDA, and Global Systems Science (GSS) communities in order to address GCs and bridge the gap between traditional HPC and data-centric computation.
Presently, development and optimization activities within HiDALGO project are conducted considering already available hardware and software solutions. However, computer technology constantly develops and it becomes equally important to use the recent achievements offered in this field. That drives us to the main purpose of this document, which is the analysis of information about edge technologies available on the market, which could be of interest for HIDALGO use cases. The HiDALGO system constitutes composition of computation and data flows where number of processing ways and methods are involved. Covering of all necessary facets requires comprehensive look on achievements from many computational areas. Certainly, pilots will benefit from efficient utilization of applicable hardware and software solutions and gain better possible yields of simulation and analysis tools.
This document shows how specific HPC requirements arising when dealing with GC can be collected in order to come up with such a curriculum. Additionally to the collection of requirements, the quality of the HiDALGO curriculum depends on both didactic and technical best practices in training. At the same time, training events are opportunities to exchange among the different communities, and therefore to collect feedback on the technical pilots of the project. In short, this document describes current and planned events, new requirements, best practices, and collected feedback.
HiDALGO support system adapts IT Service Management (ITSM) framework to provide an overview of the system and defines the support concepts to provide a guideline in the support provisioning. The support concept details the reasoning behind the design of multiple sub-supporting systems, the selection of multiple supporting tools, best practices in the support process, metrics to evaluate the system, roles and responsibilities of the agents to provide a rule-of-thumb in the service provisioning.
HiPEAC Conference HPC and Big Data Technologies for Global Systems Interactive Workshop and Hands-on Session
We have worked on different tasks and objectives. On one side, our work focused on internal community building. On the other side, we used our communication channels to disseminate our results to a wide audience, with a special focus on our main stakeholder groups. Furthermore, we planned and conducted ample event management and collaboration activities. In the first year, we succeeded in bridging the different communities working in the HiDALGO project by using several targeted internal community building activities. Furthermore, our main external communication channels reached a substantial number of potential users.
The document presents the features available, comparing the current implementation with the original plans. Then, for each main feature, the document presents how the feature was implemented, it describes the available APIs for each component (both graphical and REST APIs) and provides information about how to use the features. In the case of REST APIs, the document provides examples of calls to the services. In the case of GUIs, providing screenshots and some guidance about the options.
This document provides the initial strategies for optimising applications and implementing novel algorithms and methods. In particular, strategies for coupling applications in conjunction with WP4 will be provided. Along with this document we are delivering basic knowledge about HPDA applications and their capabilities by providing performance test findings. It states a ground for coupling use case requirements and application(s) that will be used for their implementation. Moreover, we are discussing different approaches to huge datasets visualization. Four different applications are presented and three of them are benchmarked. Base on that we can pick out the best possible solution for specific demands addresses by pilots. This deliverable is also intended as a starting discussion point on strategies for coupling technologies that are used to combine different (existing) applications. Making them work together could bring additional benefits in the multiscale and hybrid simulation approaches. This document is not yet approved by EC.
Migration, in particular of refugees, is recognized as an important challenge among European citizens and politicians. In fact, it has become so prominent in the last years, that elections can be won or lost based on a party’s position regarding that issue. In 2017, 68.5 million people were forced to leave their home due to persecution and insecurity worldwide. While 40 million people are internally displaced persons, about 25 million are classified as refugees and 3 million claimed asylum1. In order for governments to efficiently manage such situations, it is essential to understand the movement and behaviour of migrants. Knowing when and to which extent fleeing refugees will arrive at countries’ borders is vital for proper handling of such movements and avoiding crisis situations doing harm to both refugees and locals. Furthermore, humanitarian aid is only available in limited quantity. Knowing where it is needed most would allow organizations to distribute aid packages and support more efficiently. Thus, knowledge of the routes taken by refugees and when and where to expect them in times of humanitarian crises would be beneficial. To assist organizations and governments, we pursue a datadriven approach utilizing methods of Artificial Intelligence (AI) to detect and analyze cross-border refugee movements. Having knowledge about movement patterns allows these organizations and governments to enact policies and allocate humanitarian aid resources more efficiently and, thus, save human lives. Besides supporting organizations and governments with knowledge about refugee movement, our work should contribute to the success of the ongoing project HiDALGO2, which is funded by the European Commission and focuses on solving global challenges with the combined power of High-Performance Computing (HPC) and High-Performance Data Analytics (HPDA). Specifically the goal of one of the pilot implementations, namely the Migration Use Case, is to provide an Agent-Based Model (ABM) simulation and data analysis tool which helps stakeholders to (a) forecast refugee movements when a conflict erupts, (b) make decisions on where to provide food and infrastructure, (c) to acquire approximate refugee population estimates in regions where The authors express their special thanks to Bernhard C Geiger (Know- Center GmbH, Graz, Austria) for his mentorship and support throughout the work on this extended abstract. 1https://www.unhcr.org/figures-at-a-glance, last access: 2019-03-11 2https://hidalgo-project.eu, last access: 2019-03-11 existing data is incomplete, and (d) prioritize humanitarian resources to the most important areas. This ABM simulation is described in . In order to run such ABM simulations, numerous parameters (e.g., movement speed of refugees) and rule-sets of the agents’ behaviour are needed. We aim to provide comprehensible, data-derived values for these parameters.
This deliverable reports on the initial status of the pilot applications in HiDALGO. This includes the three core applications (migration, urban air pollution and social media), as well as models and data sources that are planned to be coupled in to these applications. It serves to provide basic awareness of the HiDALGO applications to the consortium and the general public, and helps inform many other activities in HiDALGO, such as the requirements gathering in WP6 and the performance optimization efforts in WP3.
HiDALGO’s target is the definition of a generic, systematic, reproducible, and interpretable methodology for collecting benchmarking information from the HiDALGO applications, and a systematic way of storing benchmarking results. To achieve that, this deliverable studies the existing HiDALGO infrastructure, surveys available tools, and draws from best practices for HPC systems and applications. It also presents the preliminary effort to apply this methodology on the HiDALGO pilot applications. The HiDALGO methodology has been fully applied on the Migration pilot. Similar efforts have been kick-started for benchmarking the Air Pollution and Social Networks pilots as well. This initial experimentation and benchmarking has helped identify various major and minor issues in procuring and/or benchmarking the HiDALGO pilots, and has significantly impacted the definition of the HiDALGO methodology.
The document describes the hardware, software and services offered by the three supercomputing centres HLRS, PSNC and ECMWF. The main focus is given on using the already available resources including the process of accessing these resources. The HPC resources require specialized expertise to use them efficiently and thus, the deliverable provides the mechanisms for accessing and using these resources at HLRS, PSNC and ECMWF. Furthermore, to ensure seamless access to compute resources within the HiDALGO project, it is highly imperative that we facilitate the cooperation with external initiatives like PRACE to enhance the infrastructure. Consequently, this deliverable also describes the process of accessing the PRACE Research Infrastructure in detail.