Foivos Zakkak

Foivos Zakkak
  • Doctor of Philosophy
  • Developer at Red Hat

About

32
Publications
7,655
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
140
Citations
Introduction
Current institution
Red Hat
Current position
  • Developer

Publications

Publications (32)
Article
Big data analytics frameworks, such as Spark and Giraph, need to process and cache massive datasets that do not always fit on the managed heap. Therefore, frameworks temporarily move long-lived objects outside the heap (off-heap) on a fast storage device. However, this practice results in (1) high serialization/deserialization (S/D) cost and (2) hi...
Preprint
Full-text available
Cache-coherent non-uniform memory access (ccNUMA) systems enable parallel applications to scale-up to thousands of cores and many terabytes of main memory. However, since remote accesses come at an increased cost, extra measures are necessitated to scale the applications to high core-counts and process far greater amounts of data than a typical ser...
Preprint
Full-text available
Managed analytics frameworks (e.g., Spark) cache intermediate results in memory (on-heap) or storage devices (off-heap) to avoid costly recomputations, especially in graph processing. As datasets grow, on-heap caching requires more memory for long-lived objects, resulting in high garbage collection (GC) overhead. On the other hand, off-heap caching...
Preprint
Full-text available
In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the...
Conference Paper
With micro-services continuously gaining popularity and low-power processors making their way into data centers, efficient execution of managed runtime systems on low-power architectures is also gaining interest. Apart from the inherent performance differences between high and low power processors, porting a managed runtime system to a low-power ar...
Conference Paper
Full-text available
By utilizing diverse heterogeneous hardware resources, developers can significantly improve the performance of their applications. Currently, in order to determine which parts of an application suit a particular type of hardware accelerator better, an offline analysis that uses a priori knowledge of the target hardware configuration is necessary. T...
Preprint
Full-text available
The proliferation of heterogeneous hardware in recent years means that every system we program is likely to include a mix of compute elements; each with different characteristics. By utilizing these available hardware resources, developers can improve the performance and energy efficiency of their applications. However, existing tools for heterogen...
Conference Paper
Full-text available
In this paper we outline the current state of language Virtual Machines (VMs) running on RISC-V as well as our initiatives in augmenting the existing ecosystem with Maxine VM, a state-of-the-art open source research Virtual Machine (VM). Maxine VM is a metacircular VM for Java and is currently part of the Beehive ecosystem that provides a unified f...
Conference Paper
Full-text available
In the recent years, we have witnessed an explosion of the usages of Virtual Machines (VMs) which are currently found in desktops, smartphones, and cloud deployments. These recent developments create new research opportunities in the VM domain extending from performance to energy efficiency, and scalability studies. Research into these directions n...
Conference Paper
Full-text available
Extending current Virtual Machine implementations to new Instruction Set Architectures entails a significant programming and debugging effort. Meta-circular VMs add another level of complexity towards this aim since they have to compile themselves with the same compiler that is being extended. Therefore, having low-level debugging tools is of vital...
Conference Paper
Full-text available
In this paper, we describe our experiences in co-designing a domain-specific compilation stack. Our motivation stems from the missed optimization opportunities we observed while implementing a computer vision library in Java. To tackle the performance shortcomings, we developed Indigo, a computer vision API co-designed with a compilation plugin for...
Conference Paper
Trying to cope with the constantly growing number of cores per processor, hardware architects are experimenting with modular non cache coherent architectures. Such architectures delegate the memory coherency to the software. On the contrary, high productivity languages like Java are designed to abstract away the hardware details and allow developer...
Conference Paper
This work presents the key challenges that Java Virtual Machines (JVMs) implementers face when targeting future non-cache-coherent architectures. It discusses the techniques used to overcome these challenges by distributed JVMs in the literature, and examines their applicability on future non-cache-coherent architectures. It presents new algorithms...
Technical Report
Full-text available
Trying to cope with the constantly growing number of cores per processor, hardware architects are experimenting with modular non cache coherent architectures. Such architec-tures delegate the memory coherency to the software. On the contrary, high productivity languages, like Java, are designed to abstract away the hardware details and allow develo...
Preprint
Trying to cope with the constantly growing number of cores per processor, hardware architects are experimenting with modular non-cache-coherent architectures. Such architectures delegate the memory coherency to the software. On the contrary, high productivity languages, like Java, are designed to abstract away the hardware details and allow develop...
Conference Paper
This work presents a hierarchical, parallel, dynamic dependence analysis for inferring run-time dependencies between recursively parallel tasks in the OmpSs programming model. To evaluate the dependence analysis we implement PARTEE, a scalable runtime system that supports implicit synchronization between nested parallel tasks. We evaluate the perfo...
Article
We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy amon...
Article
As the number of cores continuously grows, processor designers are considering non coherent memories as more scalable and energy efficient alternatives to the current coherent ones. The Java Memory Model (JMM) requires that all cores can access the Java heap. It guarantees sequential consistency for data-race-free programs and no out-of-thin-air va...
Conference Paper
Full-text available
As the number of cores continuously grows, processor designers are considering non coherent memories as more scalable and energy efficient alternatives to the current coherent ones. The Java Memory Model (JMM) requires that all cores can access the Java heap. It guarantees sequential consistency for data-race-free programs and no out-of-thin-air va...
Conference Paper
Full-text available
The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incu...
Conference Paper
Full-text available
We present a set of static techniques that reduce runtime overheads in task-parallel programs with implicit synchronization. We use a static dependence analysis to detect non-conflicting tasks and remove unnecessary runtime checks. We further reduce overheads by statically optimizing task creation and management of runtime metadata. We implemented...
Thesis
Full-text available
Over the past decade, CPU development has focused mainly on multi-core architectures, and recent trends lead to multi-core designs with ever increasing numbers of cores. In order to get the best out of a many-core system, one has to develop efficient parallel applications. However, reasoning about synchronization, ordering and conflicting memory ac...
Technical Report
Full-text available
Reasoning about synchronization, ordering and conflicting memory accesses makes parallel programming difficult, error-prone and hard to test, debug and maintain. Task-parallel programming models such as OpenMP, Cilk and Sequoia offer a more structured way of expressing parallelism than threads, but still require the programmer to manually find and...
Article
Full-text available
We present SCOOP, a C source-to-source compiler for programming models that use dataflow annotations. SCOOP currently generates code for two runtime systems that use similar APIs to express parallelism. One runtime requires the programmer to enforce any ordering constraints manually, whereas the other detects dependencies automatically. Using eithe...

Network

Cited By