
Syed Waqar NabiUniversity of Glasgow | UofG · School of Computing Science
Syed Waqar Nabi
Doctor of Engineering
About
47
Publications
3,976
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
201
Citations
Introduction
I work on the EPSRC funded project “Exploiting Parallelism through Type Transformations for Hybrid Manycore Systems”.
The specific challenge that is addressed in this project is how to exploit the parallelism of a given heterogeneous computing platform in the best possible way. My work focuses on the FPGA; to develop a programming flow from legacy scientific code to FPGA for High Performance Computing.
Additional affiliations
January 2014 - May 2016
August 2010 - November 2013
Publications
Publications (47)
Dynamic High-Level Synthesis (HLS) uses additional hardware to perform memory disambiguation at runtime, increasing loop throughput in irregular codes compared to static HLS. However, most irregular codes consist of multiple sibling loops, which currently have to be executed sequentially by all HLS tools. Static HLS performs loop fusion only on reg...
Irregular codes are bottlenecked by memory and communication latency. Decoupled access/execute (DAE) is a common technique to tackle this problem. It relies on the compiler to separate memory address generation from the rest of the program, however, such a separation is not always possible due to control and data dependencies between the access and...
Competencies may be defined as the knowledge, skills, and professional dispositions that an individual is required to demonstrate in order to be considered professionally competent. Competency-based education has long been a feature of professional degree programs, but the discipline of Computing Science has only recently begun to embrace competenc...
Dynamically scheduled high-level synthesis (HLS) achieves higher throughput than static HLS for codes with unpredictable memory accesses and control flow. However, excessive dataflow scheduling results in circuits that use more resources and have a slower critical path, even when only a part of the circuit exhibits dynamic behavior. Recent work has...
Work-based degree programmes are seen as a means of addressing the reported lack of employability skills in Computing Science (CS) graduates. In the UK, work-based CS degree programmes – or apprenticeships – were established to close this skills gap. In Scotland, a national ‘meta-skills’ framework has been developed, comprising twelve employability...
Competency-based learning has been a successful pedagogical approach for centuries, but only recently has it gained traction within computing. Competencies, as defined in Computing Curricula 2020, comprise knowledge, skills, and professional dispositions. Building on recent developments in competency and computing education, this working group exam...
Competency-based learning has been a successful pedagogical approach for centuries, but only recently has it gained traction within computing education. Building on recent developments in the field, this working group will explore competency-based learning from practical considerations and show how it benefits computing. In particular, the group wi...
Multi-scale models integrating biomolecular data from genetic, transcriptional, and translational levels, coupled with extracellular microenvironments can assist in decoding the complex mechanisms underlying system-level diseases such as cancer. To investigate the emergent properties and clinical translation of such cancer models, we present Theatr...
PERCEPTRON is a next-generation freely available web-based proteoform identification and characterization platform for top-down proteomics (TDP). PERCEPTRON search pipeline brings together algorithms for (i) intact protein mass tuning, (ii) de novo sequence tags-based filtering, (iii) characterization of terminal as well as post-translational modif...
Learning a second language (L2) usually progresses faster if a learner's L2 is similar to their first language (L1). Yet global similarity between languages is difficult to quantify, obscuring its precise effect on learnability. Further, the combinatorial explosion of possible L1 and L2 language pairs, combined with the difficulty of controlling fo...
There is a large body of legacy scientific code in use today that could benefit from execution on accelerator devices like GPUs and FPGAs. Manual translation of such legacy code into device-specific parallel code requires significant manual effort and is a major obstacle to wider FPGA adoption. We are developing an automated optimizing compiler TyT...
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framew...
There is a large body of legacy scientific code written in languages like Fortran that is not optimised to get the best performance out of heterogeneous acceleration devices like GPUs and FPGAs, and manually porting such code into parallel languages frameworks like OpenCL requires considerable effort. We are working towards developing a turn-key, s...
High-performance computing on heterogeneous platforms in general and those with FPGAs in particular presents a significant programming challenge. We contend that compiler technology has to evolve to automatically optimize applications by transforming a given original program. We are developing a novel methodology based on type transformations on a...
Slides for our paper at RAW (at IPDPS), nominated for best paper award, related to our work on developing an optimizing compiler for running scientific code on FPGAs.
We present preliminary results with the TyTra design flow. The TyTra project aims to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different c...
We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for
high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby-con...
We present preliminary results with the TyTra design flow. The TyTra project aims to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different c...
Heterogeneous High-Performance Computing (HPC) platforms present a signi�cant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. It is our view that compiler technology has to evolve to automatically create the best compiled program variant by transforming a given original program. We...
The potential of FPGAs for High-Performance Computing is increasingly recognized, but most work focuses on acceleration of small, isolated kernels. We present a parallel FPGA implementation of a legacy algorithm, the seminal scheme for cumulus convection in large-scale models developed by Emanuel. Our design makes use of pipelines both at the arith...
The potential of FPGAs for High-Performance Computing is increasingly being recognised, but most work focuses on acceleration of small, isolated kernels. We present a parallel FPGA implementation of a scientific legacy algorithm, the seminal scheme for cumulus convection in large-scale models developed by Emanuel [1]. Our design makes use of pipeli...
High Performance Computing (HPC) platforms allow scientists to model
computationally intensive algorithms. HPC clusters increasingly use
General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs
provide an attractive alternative to GPGPUs for use as co-processors, but they
are still far from being mainstream due to a number of chall...
We present the TyTra-IR, a new intermediate language intended as a
compilation target for high-level language compilers and a front-end for HDL
code generators. We develop the requirements of this new language based on the
design-space of FPGAs that it should be able to express and the
estimation-space in which each configuration from the design-sp...
A RFID system's front-end consists of multiple tags being identified by a single (or multiple) reader. In order for the reader to uniquely identify all tags within certain range, some sort of anti-collision mechanism is required. ALOHA and its variations can be used in such a scenario. EPC Global, a standardizing body for RFID technology, has propo...
We present an infrastructure for dynamic reconfiguration of heterogeneous coarse-grained reconfigurable architectures (CGRAs) based on our Gannet SoC platform. We introduce the infrastructure and in particular its domain-specific high-level programming language Gannet-C and discuss the language features that support dynamic reconfiguration and the...
The amount of data communications is increasing each day and with it comes the issues of assuring its security. This research
paper explores the information security management issues with respect to confidentiality and integrity and the impact of
Information Security Management Standards, Policies and Practices (ISMSPP) on information security. Th...
We have designed a coarse-grained, dynamically reconfigurable architecture, specifically for implementing the wireless MAC layer in consumer hand-held devices. The dynamically reconfigurable MAC Processor is a SoC architecture that uses a reconfigurable hardware co-processor to delegate critical tasks. The co-processor can reconfigure packet-by-pac...
To address the challenges of the consumer wireless device industry, we have designed a dynamically reconfigurable architecture with flexibility limited to address the MAC layer. It is a Software/Hardware partitioned platform in which critical tasks are delegated to a dynamically reconfigurable hardware co-processor. It will handle data streams of m...
The dynamically reconfigurable MAC processor is an innovative architecture specialized for the wireless MAC layer, and aimed at consumer hand-held devices. It is a software/hardware partitioned platform where the microprocessor uses a reconfigurable hardware co-processor to delegate critical tasks. This allows the microprocessor to handle fast and...
This paper presents the architecture of a dynamically reconfigurable platform being developed specially for implementing wireless protocols' MAC layer for consumer wireless devices. The cornerstone of this architecture is the exploitation of substantial overlaps in the functionality of the three MACs considered. By using function-specific reconfigu...
This paper presents on-going research 1 on the de-sign of a dynamically reconfigurable system-on-chip architecture specialized for implementing wireless protocols' MAC layer for consumer handheld devices. We study the various recon-figuration mechanisms made available by recent technological advances in this field, to determine what sort of archite...