Riyadh Baghdadi

Riyadh Baghdadi
Massachusetts Institute of Technology | MIT · Computer Science and Artificial Intelligence Laboratory

PhD

About

38
Publications
11,379
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
688
Citations
Citations since 2017
28 Research Items
630 Citations
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
Introduction
Riyadh Baghdadi currently works at the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Riyadh does research in Parallel Computing and Programming Languages. Their most recent publication is 'Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers'.
Additional affiliations
November 2012 - March 2013
NVIDIA
Position
  • Research engineer internship
September 2011 - September 2015
Ecole Normale Supérieure de Paris
Position
  • PhD Student

Publications

Publications (38)
Article
Availability constraints, machine condition as well as human behavior phenomena were recently introduced in the study of scheduling problems in order to get closer to the industrial reality. In this context, the permutation flowshop scheduling problem (PFSP) under flexible maintenance planning is investigated by incorporating machine deteriorating...
Chapter
In this paper, we address two versions of the permutation flowshop scheduling problem (PFSP) with makespan minimization under availability constraints with learning and deteriorating effects. Availability constraints are due to flexible maintenance activities scheduled based on prognostics and health management (PHM) results. In the first study, hu...
Thesis
Full-text available
In recent years there has been a surge in the usage of deep learning at various steps of the compilation process. Auto-scheduling, the process of automatically optimizing the execution time of a program, is one of the steps that has significantly benefited from this intersection. In this work we propose a deep learning based cost model for speedup...
Preprint
Full-text available
In this paper, we present a work in progress about a deep learning based approach for automatic code optimization in polyhedral compilers. The proposed technique explores combinations of affine and non-affine loop transformations to find the sequence of transformations that minimizes the execution time of a given program. This exploration is guided...
Preprint
Term Rewriting Systems (TRS) are used in compilers to simplify and prove expressions. State-of-the-art TRSs in compilers use a greedy algorithm that applies a set of rewriting rules in a predefined order (where some of the rules are not axiomatic). This leads to a loss in the ability to simplify certain expressions. E-graphs and equality saturation...
Preprint
The low-energy spectrum and scattering of two-nucleon systems are studied with lattice quantum chromodynamics using a variational approach. A wide range of interpolating operators are used: dibaryon operators built from products of plane-wave nucleons, hexaquark operators built from six localized quarks, and quasi-local operators inspired by two-nu...
Preprint
Full-text available
Enabling compilers to automatically optimize code has been a longstanding goal for the compiler community. Efficiently solving this problem requires using precise cost models. These models predict whether applying a sequence of code transformations reduces the execution time of the program. Building an analytical cost model to do so is hard in mode...
Thesis
Full-text available
Programmers spend a lot of time and effort optimizing their code to make it run faster, this has led compiler researchers to focus on developing automatic optimization techniques that allow to automatically improve program performance. Such techniques aim to transform programs to exploit more efficiently the underlying hardware. Efficient implement...
Preprint
Full-text available
Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irre...
Thesis
Full-text available
L’apprentissage profond (Deep Learning) connait une révolution sans précédent grâce au progrès effectué en puissance de calcul. Il devient nécessaire de créer des architectures de Deep Learning de plus en plus profondes, contenant de plus en plus de paramètres pour atteindre de plus grandes précisions. L’apprentissage profond est basé sur l’agencem...
Preprint
In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural networks, both of which are currently outside of the scope of existing neural network compilers (sparse neural networks here stand for networks that can be accelerated with sparse tensor algebra techniques). Our demonstration includes a mapping of sparse and recu...
Preprint
Full-text available
To achieve high performance in modern processors, compilers should optimize programs. We address in this paper Loop Unrolling optimization, proposing a novel approach based on deep neural networks to automatically optimize loops in TIRAMISU. TIRAMISU is a new language to create a code of high performance. It allows to separate between the algorithm...
Preprint
Full-text available
The development of lightweight polyhedral compilation algorithms opens polyhedral loop transformation, paralleliza-tion and code generation to a larger class or programs. The Pluto affine scheduling algorithm is a central step in state-of-the-art polyhedral compilers, aiming for the simultaneous enhancement of locality and the exploitation of coars...
Article
Full-text available
The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100—a factor of over 10⁶—and the amount of data to be analyzed has increa...
Article
We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which considered a limited subset of schedules. We define a parameterization of possible schedules much larger than prior methods and use a variant of beam s...
Thesis
Full-text available
Improvements in mathematical formulas and increasingly powerful hardware allow neural networks to model various functions with a high-accuracy in di�erent �fields, for instance, image processing, time series predictions, anomaly detection in data, and natural language understanding. Thanks to their classi�cation and generalization capacity, neural...
Article
Full-text available
The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. As a result, programmers must try different combinations of a large set of techniques, which make tradeoffs among locality, work-efficiency, and parallelism, to develop the best imple...
Preprint
The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. Programmers must try different combinations of a large set of techniques to develop the best implementation for a specific algorithm and type of graph. Existing graph frameworks and d...
Preprint
This paper introduces Tiramisu, an optimization framework designed to generate efficient code for high-performance systems such as multicores, GPUs, FPGAs, distributed machines, or any combination of these. Tiramisu relies on a flexible representation based on the polyhedral model and introduces a novel four-level IR that allows full separation bet...
Article
Full-text available
High-performance DSL developers work hard to take advantage of modern hardware. The DSL compilers have to build their own complex middle-ends before they can target a common back-end such as LLVM, which only handles single instruction streams with SIMD instructions. We introduce Tiramisu, a common middle-end that can generate efficient code for mod...
Article
2017 IEEE. Field Programmable Gate Arrays (FPGAs) are configurable integrated circuits able to provide a good trade-off in terms of performance, power consumption, and flexibility with respect to other architectures, like CPUs, GPUs and ASICs. The main drawback in using FPGAs, however, is their steep learning curve. An emerging solution to this pro...
Thesis
Multi-core processors are now in widespread use in almost all areas of computing: desktops, laptops and accelerators such as GPGPUs (General Purpose Graphics Processing Units). To harness the power of multi-core processors and complex memory hierarchies, the need for powerful compiler optimizations and especially loop nest transformations is now in...
Article
Full-text available
Programming accelerators such as GPUs with low-level APIs and languages such as OpenCL and CUDA is difficult, error prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide this complexity and to regain some performance portability. We present PENCIL, a rigorously-defined subset...
Article
We present VOBLA, a domain-specific language designed for programming linear algebra libraries. VOBLA is compiled to PENCIL, a domain independent intermediate language designed for efficient mapping to accelerator architectures such as GPGPUs. PENCIL is compiled to efficient, platform-specific OpenCL code using techniques based on the polyhedral mo...
Conference Paper
Full-text available
We present VOBLA, a domain-specific language designed for programming linear algebra libraries. VOBLA is compiled to PENCIL, a domain independent intermediate language designed for efficient mapping to accelerator architectures such as GPGPUs. PENCIL is compiled to efficient, platform-specific OpenCL code using techniques based on the polyhedral mo...
Article
Full-text available
We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming.
Article
Full-text available
To preserve the validity of loop nest transformations and parallelization, data dependences need to be analyzed. Memory dependences come in two varieties: true dependences or false dependences. While true dependences must be satisfied in order to preserve the correct order of computations, false dependences are induced by the reuse of a single memo...
Article
Full-text available
Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core proces...
Conference Paper
System resource limitation has been for a long time a great obstacle for memory-intensive applications, and especially for high resolution image processing applications. Memory, and RAM in particular, has a big impact on the performance of those applications. Many solutions to reduce the effects of memory lack have been proposed in the literature,...

Network

Cited By

Projects

Projects (2)