About
47
Publications
10,322
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
309
Citations
Introduction
Senior Researcher at the Computing Systems Laboratory (CSLab) of the National Technical University of Athens (NTUA). Research interestes include Computer Architecture, HPC systems and applications, IoT, Cloud Computing, Big Data.
Additional affiliations
January 2008 - present
Education
September 2003 - January 2008
September 1998 - July 2003
Publications
Publications (47)
Simulations have become a prevalent method in the scientific community for researching climate, environmental, and social phenomena. These simulations aid in understanding how various elements such as air, pollution, smoke, and heat disperse in complex spatial environments. Computational Fluid Dynamics (CFD) is a widely used application for conduct...
This deliverable provides an overview of the components, that make up the core foundation of the HiDALGO2 ecosystem and support the project’s use cases in achieving their scientific, technical and environmental goals. It builds upon the work of previous deliverables in HiDALGO2, as well as of D2.1, “Requirements Analysis and Scenario Definition”. I...
Among the key contributions of Vitamin-V (2023-2025 Horizon Europe project), we develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominant x86 counterpart. In this paper, we detail the software suites and applications ported plus the three cloud setups under evaluation.
A report that defines the benchmarking methodology that will be followed within the
HiDALGO2 project and presents the initial findings of the benchmarking of the HiDALGO2 pilots on various EuroHPC JU systems.
Vitamin-V is a 2023-2025 Horizon Europe project that aims to develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominant x86 counterpart and a powerful virtual execution environment for software development, validation, verification, and test that considers the relevant RISC-V ISA extens...
The rich, complementary data provided by Sentinel-1 and Sentinel-2 satellite constellations host considerable potential to transform Earth observation (EO) applications. However, a substantial amount of effort and infrastructure is still required for the generation of analysis-ready data (ARD) from the low-level products provided by the European Sp...
Document represents the final report of WP6 concluding its developments. This report presents (i) the final set of requirements and their KPIs, (ii) the Artificial Intelligence (AI) enabled use case workflows, as well as highlights (iii) the final outcomes of the integration process. It should be noted that all respective objectives have been succe...
This deliverable presents the final version of the HiDALGO Portal and its operations. First of all, it presents the main portal features, the selected architecture, and changes from the second version of this deliverable are pointed out. Noteworthy changes affected the following services and tools:
• Single-Sign-On,
• workflow orchestrator, the cha...
This document provides the initial strategies for optimising applications and implementing novel algorithms and methods. In particular, initial strategies for coupling applications in conjunction with WP4 are provided.
In respect of our approach to High Performance Data Analytics (HPDA), this document does not aim to introduce and test applications...
This document aims at describing the implementation of the second release of the HiDALGO Portal (to be renamed as the Global Challenges Portal), which gives access to HiDALGO services in a simple way, as a one-stop-shop. Such solution consists of a set of tools covering several aspects useful for HiDALGO stakeholders, like training, execution of si...
Deliverable 3.1 focuses on the HPC aspects of HiDALGO, and in particular, sets the guidelines of the HPC benchmarking methodology followed within the project. HiDALGO aims to follow a systematic, reproducible, and interpretable methodology for collecting and storing benchmarking information from the HiDALGO Pilots, to serve their systematic develop...
The common aim of such events is to raise mutual awareness amongst the GC and the HPC/HPDA communities. Subsequently, this deliverable analyses the current state-of-the-art of HiDALGO training activities and innovation workshops, and defines the necessary roadmap for future events. Other equally important aspects of T7.3 that are described in this...
Concurrent search trees (STs) are among the most widely used data structures to store and retrieve data in contemporary multithreaded applications. Despite the high amount of prior work, it still remains challenging to implement highly efficient concurrent STs. This is mainly due to the fact that both traditional synchronization methods (i.e., lock...
HiDALGO’s strategy for external community building includes an introduction to HiDALGO’s offerings, a list of the main target groups for building a community around HiDALGO, a strategy for marketing and collaborations, a training concept and a short characterisation of past and planned events. HiDALGO’s offerings include networking, consulting, eas...
The Rocket Chip Generator uses a collection of parameterized processor components to produce RISC-V-based SoCs. It is a powerful tool that can produce a wide variety of processor designs ranging from tiny embedded processors to complex multi-core systems. In this paper we extend the features of the Memory Management Unit of the Rocket Chip Generato...
This document shows how specific HPC requirements arising when dealing with GC can be collected in order to come up with such a curriculum. Additionally to the collection of requirements, the quality of the HiDALGO curriculum depends on both didactic and technical best practices in training. At the same time, training events are opportunities to ex...
Cloud service providers (CSPs) rely mostly on simplistic and conservative policies regarding resource management, to minimize interference of shared resources between multiple VMs and to provide acceptable performance. However, such approaches may lead to suboptimal allocation and resource underutilization. In this demonstration we present ACTiMana...
According to McKinsey & Company, about a third of food produced is lost or wasted every year, amounting to a $940 billion economic hit. Inefficiencies in planting, harvesting, water use, reduced animal contributions, as well as uncertainty about weather, pests, consumer demand and other intangibles contribute to the loss. Precision Agriculture (PA)...
Workload consolidation has been shown to achieve improved resource utilisation in modern datacentres. In this paper we focus on the extended problem of allocating resources when co-locating High-Priority (HP) and Best-Effort (BE) applications. Current approaches either neglect this prioritisation and focus on maximising the utilisation of the serve...
This document provides the initial strategies for optimising applications and implementing novel algorithms and methods. In particular, strategies for coupling applications in conjunction with WP4 will be provided.
Along with this document we are delivering basic knowledge about HPDA applications and their capabilities by providing performance test...
HiDALGO’s target is the definition of a generic, systematic, reproducible, and interpretable methodology for collecting benchmarking information from the HiDALGO applications, and a systematic way of storing benchmarking results. To achieve that, this deliverable studies the existing HiDALGO infrastructure, surveys available tools, and draws from b...
Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast resources efficiently. Resources are stranded and fragmented, limiting cloud applicability only to classes of applications that pose moderate resource demands. In addition, the need for reduced cost through consolidat...
Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating...
This paper presents a fast and simple contention-aware scheduling pol-
icy for CMP systems that relies on information collected at runtime with no ad-
ditional hardware support. Our approach is based on a classification scheme that
detects activity and possible interference across the entire memory hierarchy, in-
cluding both shared caches and memo...
In this paper we combine Hardware Transactional Memory (HTM) with Read-Copy-Update (RCU) to implement highly scalable concurrent balanced Binary Search Trees (BSTs). The two key features of our approach are: a) read-only operations require no synchronization or restarts and b) tree modifications are first performed in private copies of sub-trees, t...
Co-execution of multiple workloads in modern multi-core servers may create severe performance degradation and unpredictable execution behavior, impacting significantly their Quality of Service (QoS) levels. To safeguard the QoS levels of high priority workloads, current resource allocation policies are quite conservative, disallowing their co-execu...
In this paper we analyze the performance of concurrent red-black trees using two HTM implementations, Intel's Trans-actional Synchronization Extensions (TSX) on Haswell processors and IBM's Power8 HTM. We parallelize bottom-up and top-down red-black trees using coarse-grained transactions and evaluate their performance. Our experimental results sho...
This paper presents LCA, a memory Link and Cache-Aware co-scheduling approach for CMPs. It is based on a novel application classification scheme that monitors resource utilization across the entire memory hierarchy from main memory down to CPU cores. This enables us to predict application interference accurately and support a co-scheduling algorith...
In this paper, we apply a method for extracting a running power estimate of applications from hardware performance counters, producing power/time curves which can be integrated over particular intervals to estimate the energy consumption of individual application stages. We use this method to instrument executions of a conjugate gradient solver, to...
Multiphysics simulations are at the core of modern Computer Aided Engineering (CAE) allowing the analysis of multiple, simultaneously acting physical phenomena. These simulations often rely on Finite Element Methods (FEM) and the solution of large linear systems which, in turn, end up in multiple calls of the costly Sparse Matrix-Vector Multiplicat...
In this paper we present a Helper Threading scheme used to parallelize efficiently Kruskal's Minimum Spanning Forest algorithm. This algorithm is known for exhibiting inherently sequential characteristics. More specifically, the strict order by which the algorithm checks the edges of a given graph is the main reason behind the lack of explicit para...
Abstract—In this paper we work on the parallelization of the inherently serial Dijkstra’s algorithm on modern multicore plat- forms. Dijkstra’s algorithm is a greedy algorithm that computes Single Source Shortest Paths for graphs with non-negative edges and is based on the iterative extraction of nodes from a priority queue. This property limits th...
In this paper we use Dijkstra's algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several par- allelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its perfor- mance with straightforward parallel versi...
This paper investigates the problem of partitioning the last-level shared cache of multicore architectures. Contention for such a shared resource has been shown to severely degrade performance when running multiple applications. As architectures incorporate more cores, multiple application workloads become increasingly attractive, further exacerbat...