Ashkan Tousi

Ashkan Tousi
The University of Manchester · School of Computer Science

PhD

About

18
Publications
11,467
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
88
Citations
Additional affiliations
November 2018 - present
The University of Manchester
Position
  • Research Fellow
November 2021 - present
Safeguard Global
Position
  • Principal Engineer
March 2015 - November 2015
ARM
Position
  • R&D Software Developer

Publications

Publications (18)
Conference Paper
Full-text available
Systems with large numbers of cores have become commonplace. Accordingly, applications are shifting towards increased parallelism. In a general-purpose system, applications residing in the system compete for shared resources. Thread and task scheduling in such a multithreaded multiprogramming environment is a significant challenge. In this study, w...
Conference Paper
Full-text available
The concept of task already exists in many parallel programming models. Programmers express parallelism by defining tasks in their applications, and runtime libraries schedule tasks on threads. However, in many task-based parallel programming models, choosing the right number of threads is still key to performance. Hence, the onus is on the program...
Article
Full-text available
In a general-purpose computing system, several parallel applications run simultaneously on the same platform. Even if each application is highly tuned for that specific platform, additional performance issues are arising in such a dynamic environment in which multiple applications compete for the resources. Different scheduling and resource managem...
Thesis
Full-text available
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defini...
Article
Full-text available
Simulation-based performance prediction is complicated and time-consuming. In this study, we apply supervised learning to predict the performance scores of Standard Performance Evaluation Corporation (SPEC) benchmarks. The SPEC CPU2017 is a public dataset of results obtained by executing 43 standardised performance benchmarks organised into 4 suite...
Article
Full-text available
Image convolution is widely used for sharpening, blurring and edge detection. In this paper, we review two common algorithms for convolving a 2D image by a separable kernel (filter). After optimising the naive codes using loop unrolling and SIMD vectorisation, we choose the algorithm with better performance as the baseline for parallelisation. We t...
Article
Full-text available
This document is part of the Arm Research Starter Kit on System Modeling. It is intended to guide you through Arm-based system modeling using the gem5 simulator. The gem5 simulator is a modular platform for system architecture research. We first introduce the gem5 simulator and its basics. gem5 provides two main simulation modes: the System call Em...
Conference Paper
Full-text available
OpenMP is a very popular and successful parallel programming API, but efficient parallel traversal of a list (of possibly unknown size) of items linked by pointers is a challenging task: solving the problem with OpenMP worksharing constructs requires either transforming the list into an array for the traversal or for all threads to traverse each of...
Article
Intel's XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler to this architecture and assess its performance by comparisons of the XeonPhi with 3...
Conference Paper
Full-text available
The XeonPhi [5] is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler (VPC) to this architecture and assess its performance by comparisons of the XeonPhi w...
Conference Paper
Full-text available
Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of concurrency in the program does not necessarily lead to better performance. Parallel programming models have to...
Article
Full-text available
With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to investigate challenges in parallel computing. The TILEPro64 is a manycore accelerator, composed of 64 tiles int...
Chapter
Full-text available
The emergence of multicore and manycore processors is set to change the parallel computing world. Applications are shifting towards increased parallelism in order to utilise these architectures efficiently. This leads to a situation where every application creates its desirable number of threads, based on its parallel nature and the system resource...
Article
Full-text available
We present the Glasgow Parallel Reduction Machine (GPRM), a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In thi...

Network

Cited By