Amir H. Ashouri

Amir H. Ashouri
Huawei Technologies · Department of R&D

Ph.D.

About

24
Publications
38,645
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
471
Citations
Citations since 2017
14 Research Items
436 Citations
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
Introduction
I defended my Ph.D. entitled: "Compiler Autotuning using Machine Learning Techniques". It won the best IEEE (Italy section) Ph.D. thesis award of 2016. As of Jan 2017, I am a Postdoc Fellow @ University of Toronto where I conduct research on accelerating deep learning applications and automatic tuning.
Additional affiliations
December 2019 - present
Huawei Technologies
Position
  • Senior R&D
January 2017 - December 2019
University of Toronto
Position
  • PostDoc Position
August 2014 - April 2015
University of Delaware
Position
  • Researcher
Education
January 2013 - December 2016
Politecnico di Milano
Field of study
  • Computer Engineering
September 2010 - December 2012
Politecnico di Milano
Field of study
  • Computer Engineering
January 2005 - September 2009
Iran University of Science and Technology
Field of study
  • Computer Engineering

Publications

Publications (24)
Article
Full-text available
The variety of today’s architectures forces programmers to spend a great deal of time porting and tuning application codes across different platforms. Compilers themselves need additional tuning, which has considerable complexity as the standard optimization levels, usually designed for the average case and the specific target architecture, often f...
Article
Full-text available
Recent compilers offer a vast number of multilayered optimizations targeting different code segments of an application. Choosing among these optimizations can significantly impact the performance of the code being optimized. The selection of the right set of compiler optimizations for a particular code segment is a very hard problem, but finding th...
Book
This book explores break-through approaches to tackling and mitigating the well-known problems of compiler optimization using design space exploration and machine learning techniques. It demonstrates that not all the optimization passes are suitable for use within an optimization sequence and that, in fact, many of the available passes tend to coun...
Article
Full-text available
Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing w...
Article
We explore retraining-free pruning of CNNs. We propose and evaluate three model-independent methods for sparsification of model weights. Our methods are magnitude-based, efficient, and can be applied on-the-fly during model load time, which is necessary in some deployment contexts. We evaluate the effectiveness of these methods in introducing spars...
Preprint
Full-text available
For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed...
Preprint
Full-text available
Modern Convolutional Neural Networks (CNNs) are complex, encompassing millions of parameters. Their deployment exerts computational, storage and energy demands, particularly on embedded platforms. Existing approaches to prune or sparsify CNNs require retraining to maintain inference accuracy. Such retraining is not feasible in some contexts. In thi...
Conference Paper
Designing and optimizing applications for energy-efficient High Performance Computing systems up to the Exascale era is an extremely challenging problem. This paper presents the toolbox developed in the ANTAREX European project for autotuning and adaptivity in energy efficient HPC systems. In particular, the modules of the ANTAREX toolbox are descr...
Chapter
Very Long Instruction Word (VLIW) processors represent an attractive solution for embedded computing, offering significant computational power with reduced hardware complexity. However, they impose higher compiler complexity since the instructions are executed in parallel based on the static compiler schedule. Therefore, finding a promising set of...
Chapter
This chapter proposes our second approach to tackle the phase-ordering problem. We already showed our intermediate speedup prediction method in Chap. 4. Here, we present our full-sequence speedup prediction method called MiCOMP. MiCOMP: Mitigating the Compiler Phase-ordering problem using optimization sub-sequences and machine learning, is an autot...
Chapter
This chapter presents the first of two methods to tackle the phase-ordering problem of compiler optimizations. Here, we present an intermediate speedup prediction approach followed by a full-sequence prediction approach in the next chapter and we show pros and cons of each approach in detail. Today’s compilers offer a vast number of transformation...
Chapter
Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. The techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing whi...
Chapter
After presenting our DSE approach for finding good compiler optimizations, we present our autotuning framework to tackle the problem of selecting the best compiler passes. It leverages machine learning and an application characterization to find the most promising optimization passes given an application. This chapter proposes COBAYN: Compiler auto...
Thesis
Full-text available
In this Ph.D. thesis, we provide break-through approaches to tackle and mitigate the well-known problems of compiler optimization using design space exploration and machine learning techniques. We show that not all the optimization passes are beneficial to be used within an optimization sequence and in fact many of the available passes are oblitera...
Conference Paper
Full-text available
Diversity of today's architectures have forced programmers and compiler researchers to port their application across many different platforms. Compiler auto-tuning itself plays a major role within that process as it has certain levels of complexities that the standard optimization levels fail to bring the best results due to their average performan...
Poster
Full-text available
Diversity of today’s architectures have forced programmers and compiler researchers to port their application across many different platforms. Compiler auto-tuning itself plays a major role within that process as it has certain levels of complexities that simply the standard pre-defined optimization levels fail to bring the best results due to thei...
Conference Paper
Full-text available
Today's compilers offer a huge number of transformation options to choose among and this choice can significantly impact on the performance of the code being optimized. Not only the selection of compiler options represents a hard problem to be solved, but also the ordering of the phases is adding further complexity, making it a long standing proble...
Conference Paper
Full-text available
The complexity and diversity of today's architec-tures require an additional effort from the programmers in port-ing and tuning the application code across different platforms. The problem is even more complex when considering that also the compiler requires some tuning, since standard optimization options have been customized for specific architec...
Conference Paper
Full-text available
Very Long Instruction Word (VLIW) application specific processors represent an attractive solution for embedded computing, offering significant computational power with reduced hardware complexity. However, they impose higher compiler complexity since the instructions are executed in parallel based on the static compiler schedule. Therefore, findin...
Poster
Full-text available
Embedded systems can be considered as specialized computing systems which can be used for multi-purpose application varying from mobile-phone to military and home- automation devices. Although the functionalities of these devices are differed, the computational structure and design is tightly connected with the platform and programmability in which...
Thesis
Full-text available
Embedded systems can be considered as specialized computing systems which can be used for multi-purpose application varying from mobile-phone to military and home-automation devices. Although the functionalities of these devices are differed, the computational structure and design is tightly connected with the platform and programmability in which...

Questions

Questions (4)
Question
I was wondering whether anyone knows about an automated tool to collect GPU kernels features, i.e., stencil dimension, size, operations, etc. Such tools are widely available for CPU kernels.
Question
Can anyone suggest me how to normalize one single dataset having 2 different sets of kernel features (preferably in Matlab) to be able to feed to a machine learning engine.
1- I have done the normalization separately and then combine them but the results are not good enough.
2- I used Matlab normc() and normr() to normalize integrally and also the results were even worse.
Question
I am looking to find standard reinforcement learning implementations in C, C++ or Python, to be able to adapt to my problem which is compiler optimizations. 
Looking at the link attached bellow it seems all the C++ implementations are bad linked. Anyone has any suggestions?

Network

Cited By

Projects

Projects (2)
Project
This project aims at addressing the sparsification and exploitation of sparsity in state-of-the-art Convolutional Neural Networks. Funded by Qualcomm Inc. Canada
Project
Goal: AutoTuning and Adaptivity appRoach for Energy efficient eXascale HPC systems http://www.antarex-project.eu/