Jean-François Méhaut

Jean-François Méhaut
Université Grenoble Alpes · Laboratoire d'Informatique de Grenoble

Professor

About

183
Publications
16,748
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,538
Citations
Additional affiliations
September 2011 - August 2013
Cea Leti
Position
  • Research Director
September 2003 - present
University of Grenoble
Position
  • Professor (Full)
September 2000 - August 2003
Université des Antilles
Position
  • Professor (Full)
Education
October 1985 - February 1989
Université de Lille
Field of study
  • computer science

Publications

Publications (183)
Article
Full-text available
Graph algorithms have inherent characteristics, including data-driven computations and poor locality. These characteristics expose graph algorithms to several challenges, because most well studied (parallel) abstractions and implementation are not suitable for them. In our previous work [21, 22, 24], we show how to use some complex-network properti...
Article
Multikernel Operating Systems (OSs) were introduced to cope with challenges in software development and deployment in lightweight manycores. Among the possible structures for a multikernel OS, we focus on designs based on asymmetric kernels. This design delivers better performance isolation, but it suffers from an overhead in energy efficiency. In...
Article
Lightweight manycore processors deliver high performance and energy efficiency by bundling hundreds of low‐power cores, a distributed memory architecture with small local memories and Networks‐on‐Chip in a single die. However, the lack of rich and portable programming models for these processors makes software development a challenging task. Curren...
Article
Global schedulers are components in parallel runtime libraries that distribute the application's workload across physical resources. More often than not, applications showcase dynamic load imbalance and require customized scheduling solutions to avoid wasting resources. Some libraries lack support for user-defined schedulers and developers resort t...
Article
Full-text available
Lightweight manycore processors deliver high performance and scalability by bundling in a single chip hundreds of low-power cores, a distributed memory architecture and Networks-on-Chip (NoCs). Operating Systems (OSes) for these processors feature a distributed design, in which a communication layer enables kernels to exchange information and inter...
Chapter
IoT devices are pillars for the Industry 4.0 software applications. However, clustering these edge nodes are interesting open challenges in several dimensions, because of mandatory integration of diverse hardware and software packages. Different type of industrial cameras and a supercomputer node to support 3D reconstructions is not a trivial appro...
Conference Paper
This paper describes an experimental research work, which was conducted to gather different types of cameras and computer node, as IIoT (Industrial IoT) devices, to produce support to a digital transformation with a 3D reconstruction virtualization for a real engineering application. The computational approach considered was a heterogeneous edge co...
Article
A memory allocation anomaly occurs when the allocation of a set of heap blocks imposes an unnecessary overhead on the execution of an application. This overhead is particularly disturbing for high‐performance computing (HPC) applications running on shared resources—for example, numerical simulations running on clusters or clouds—because it may incr...
Conference Paper
Improvements in I/O architectures are becoming increasingly required nowadays. This is an essential point to complex and data intensive scalable applications. Data-Intensive Scalable Computing (DISC) and High-Performance Computing (HPC) applications frequently need to transfer data between storage resources. In the scientific and industrial fields,...
Conference Paper
The performance and energy efficiency provided by lightweight manycores is undeniable. However, the lack of rich and portable support for these processors makes software development challenging. To address this problem, we propose a portable and lightweight MPI library (LWMPI) designed from scratch to cope with restrictions and intricacies of lightw...
Conference Paper
Dealing with complex networks is often a challenge due to the high computational cost in analyzing a huge amount of data. Partitioning methods can decrease the complexity of large structures by reducing them to smaller, less connected parts. Also, the data splitting allows the use of multiprocessing to accelerate the execution of data procedures wi...
Chapter
The Network Search method is not yet widely used in computational simulations due to its high processing time in the solutions’ calculation. In this sense, this paper seeks to analyze the gains achieved with the parallel implementation of the Network Search method algorithm for shared memory systems. The results achieved with the parallel implement...
Poster
Full-text available
Method: We use previous designed graph numbering algorithms and combined them with graph compression algorithms. We then propose a new algorithm that is expected to tackle both cache misses reduction and graph compression and finally reduce execution time.
Conference Paper
A memory allocation anomaly occurs when the allocation of a set of heap blocks imposes an unnecessary overhead on the execution of an application. In this paper, we propose a method for identifying, locating, characterizing and fixing allocation anomalies, and a tool for developers to apply the method. We experiment our method and tool with a numer...
Conference Paper
Multikernel operating systems (OSs) were introduced to match the architectural characteristics of lightweight manycores. While several multikernel OS designs are possible, in this work we argue on one that is structured in asymmetric microkernel instances. We deliver an open-source implementation of an OS kernel with these characteristics, and we p...
Article
Workload-aware loop schedulers were introduced to deliver better performance than classical loop scheduling strategies. However, they presented limitations such as inexible built-in workload estimators and suboptimal chunk scheduling. Targeting these challenges, we proposed previously a workload-aware scheduling strategy called BinLPT, which relies...
Article
In this article, we study the I/O performance of the Santos Dumont supercomputer, since the gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. For a large-scale expensive supercomputer, it is essential to ensure applications achieve the best I/O performance to promot...
Conference Paper
Full-text available
In this paper, we propose a pattern matching approach for server-side access pattern detection for the HPC I/O stack. More specifically, our proposal concerns file-level accesses, such as the ones made to I/O libraries, I/O nodes, and the parallel file system servers. The goal of this detection is to allow the system to adapt to the current workloa...
Chapter
The new trend in computing systems is providing solutions by using multicore and many-core processors. COTS processors are preferred because they offer a high performance with low-power consumption within an affordable price. Lately these devices have been used in High Performance Computing systems due to their massive parallelism and low-power bud...
Conference Paper
Lightweight manycores deliver high performance and scal-ability at low power consumption. However, architectural intricacies of these processors impose programmability challenges that keep them away from mass adoption. While several efforts aim at introducing parallel programming environments to lightweight manycores, few initiatives are concerned...
Chapter
Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, this paper proposes optimization methods to accelerate performance and increase energy efficiency of geophysics applicat...
Article
Full-text available
Modeling turbulent transport is a major goal in order to predict confinement performance in a tokamak plasma. The gyrokinetic framework considers a computational domain in five dimensions to look at kinetic issues in a plasma; this leads to huge computational needs. Therefore, optimization of the code is an especially important aspect, especially s...
Conference Paper
Global schedulers are components used in parallel solutions, especially in dynamic applications, to optimize resource usage. Nonetheless, their development is a cumbersome process due to necessary adaptations to cope with the programming interfaces and abstractions of runtime systems. This paper proposes a model to dissociate schedulers from runtim...
Conference Paper
Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, this paper proposes optimization methods to accelerate performance and increase energy efficiency of geophysics applicat...
Conference Paper
Full-text available
Performance of parallel scientific applications on many-core processor architectures is a challenge that increases every day, especially when energy efficiency is concerned. To achieve this, it is necessary to explore architectures with high processing power composed by a network-on-chip to integrate many processing cores and other components. In t...
Article
This paper presents an energy efficiency and I/O performance analysis of low‐power architectures when compared to conventional architectures, with the goal of studying the viability of using them as storage servers. Our results show that despite the fact the power demand of the storage device amounts for a small fraction of the power demand of the...
Conference Paper
Este artigo apresenta um novo balanceador de carga para redução do tempo de execução e consumo de energia de aplicações paralelas. O algoritmo do balanceador coleta informações do sistema e as utiliza para tomar decisões de balanceamento. Para implementação foi utilizado o modelo de programação paralela CHARM++. Os resultados preliminares apresenta...
Article
A parallel program needs to manage the trade‐off between the time spent in synchronisation and computation. This trade‐off is significantly affected by its parallelism degree. A high parallelism degree may decrease computing time while increasing synchronisation cost. Furthermore, thread placement on processor cores may impact program performance,...
Article
Este trabajo presenta un enfoque de inyección de fallas para evaluar el impacto de soft errors en aplicaciones que se ejecutan en un procesador heterogé- neo de muchos núcleos. Esta evaluación es significativa para caracterizar el comportamiento de la aplicación implementada en dispositivos avanzados en términos de confiabilidad. El enfoque se basa...
Article
Full-text available
Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-...
Article
Full-text available
Power consumption of the High Performance Computing (HPC) systems is an increasing concern as large-scale systems grow in size and, consequently, consume more energy. In response to this challenge, we have develop and evaluate new energy-aware load balancers to reduce the average power demand and save energy of parallel systems when scientific appl...
Conference Paper
Full-text available
A complex network is a set of entities in a relationship, modeled by a graph where nodes represent entities and edges between nodes represent relationships. Graph algorithms have inherent characteristics, including data-driven computations and poor locality. These characteristics expose graph algorithms to several challenges, because most well stud...
Conference Paper
Workload-aware loop schedulers were introduced to deliver better performance than classical strategies, but they present limitations on work-load estimation, chunk scheduling and integrability with applications. Targeting these challenges, in this work we propose a novel workload-aware loop sched-uler that is called BinLPT and it is based on three...
Article
The portability of real high-performance computing (HPC) applications on new platforms is an open and very delicate problem. Especially, the performance portability of the underlying computing kernels is problematic as they need to be tuned for each and every platform the application encounters. This article presents BOAST, a metaprogramming framew...
Conference Paper
Full-text available
Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aim the manufacture of more energy-efficient systems. In this context, this paper proposes to accelerate performance and increase the energy efficiency of stencil application by optimizing the u...
Article
Full-text available
The input workload of an irregular application must be evenly distributed among its threads to enable cutting-edge performance. To address this need in OpenMP, several loop scheduling strategies were proposed. While having this ever-increasing number of strategies at disposal is helpful, it has become a non-trivial task to select the best one for a...
Article
Full-text available
Monitoring is the study of a system at runtime, looking for input and output events to discover, check or enforce behavioral properties. Interactive debugging is the study of a system at runtime in order to discover and understand its bugs and fix them, inspecting interactively its internal state. Interactive Runtime Verification (i-RV) combines mo...
Conference Paper
The power consumption of High Performance Computing systems is an increasing concern as large-scale systems grow in size and, consequently, consume more energy. In response to this challenge, we proposed two variants of a new energy-aware load balancer that aim at reducing the energy consumption of parallel platforms running imbalanced scientific a...
Article
Full-text available
Last version asked for publication 10th may; finally accepted in 6th April 2017; Accepted after minor changes in 17th October 2016, International audience ABSTRACT. One of social graph's properties is the community structure, that is, subsets where nodes belonging to the same subset have a higher link density between themselves and a low link densi...
Conference Paper
As large-scale parallel platforms are deployed to comply with the increasing performance requirements of scientific applications, a new concern is getting the attention of the HPC community: the power consumption. In this paper, we aim at evaluating the viability of using low-power architectures as file systems servers in HPC environments, since pr...
Article
The constant need for faster and more energy-efficient processors has been stimulating the development of new architectures, such as low-power many-core architectures. Researchers aiming to study these architectures are challenged by peculiar characteristics of some components such as networks-on-chip and lack of specific tools to evaluate their pe...
Article
Full-text available
This work evaluates the SEE static and dynamic sensitivity of a single-chip many-core processor having implemented 16 compute clusters, each one with 16 processing cores. The SEU error-rate of an application implemented in the device is predicted by combining experimental results with those issued from fault injection campaigns applying the CEU (Co...
Conference Paper
High-performance computing (HPC) is recognized as one of the pillars for further progress in science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging architectural challenges in order to reach Exascale level of performance, projected for the year 2020. The much larger embedded and mobile market allows...
Conference Paper
Este artigo apresenta uma análise de desempenho e eficiência energética de operações de E/S em processadores de baixo consumo quando comparados a arquiteturas convencionais. O objetivo é analisar a viabilidade da utilização destes dispositivos na implementação de sistemas de arquivos para HPC. Os resultados mostraram que o uso do MPSoC levou a uma e...
Article
In high-performance computing, the application's workload must be evenly balanced among threads to deliver cutting-edge performance and scalability. In OpenMP, the load balancing problem arises when scheduling loop iterations to threads. In this context, several scheduling strategies have been proposed, but they do not take into account the input w...
Conference Paper
Parallel programs need to manage the time tradeoff between synchronization and computation. A high parallelism may decrease computing time but meanwhile increase synchronization cost among threads. Software Transactional Memory (STM) has emerged as a promising technique, which bypasses locks, to address synchronization issues through transactions....
Article
Full-text available
The aim of this work is to evaluate the SEE sensitivity of a multi-core processor having implemented ECC and parity in their cache memories. Two different application scenarios are studied. The first one configures the multi-core in Asymmetric Multi-Processing mode running a memory-bound application, whereas the second one uses the Symmetric Multi-...
Article
The large processing requirements of seismic wave propagation simulations make High Performance Computing (HPC) architectures a natural choice for their execution. However, to keep both the current pace of performance improvements and the power consumption under a strict power budget, HPC systems must be more energy efficient than ever. As a respon...
Conference Paper
In High Performance Computing, the application's workload must be well balanced among the threads to achieve better performance. In this work, we propose a methodology that enables the design and exploration of new loop scheduling strategies. In this methodology, a simulator is used to evaluate the most relevant existing scheduling strategies, and...
Article
The widespread use of multicore processors in computing systems and the imperative necessity of exploiting massive parallelism to improve performance and dependability, make mandatory to evaluate the impact of SEUs on parallel applications running on multicore processors. This paper presents a method and preliminary results of SEU fault-injection c...
Article
Full-text available
Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the mo...
Article
Full-text available
Multi-core architectures comprising several graphics processing units (GPUs) have become mainstream in the field of high-performance computing. However, obtaining the maximum performance of such heterogeneous machines is challenging as it requires to carefully off-load computations and manage data movements between the different processing units. T...