Maurice Jamieson’s research while affiliated with University of Edinburgh and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (20)


Fig. 2: MG benchmark performance (higher is better) parallelised via OpenMP
Fig. 6: Percentage performance of MPI based NPB benchmark compared to the OpenMP benchmark implementation
Summary of memory behaviour for NPB benchmarks on a Xeon Plat- inum 8170
Summary of CPUs that are benchmarked in this section
For each pseudo application, the number of times faster a specific CPU is than the SG2042 at the given number of cores
Performance characterisation of the 64-core SG2042 RISC-V CPU for HPC
  • Preprint
  • File available

June 2024

·

138 Reads

·

Maurice Jamieson

Whilst RISC-V has grown phenomenally quickly in embedded computing, it is yet to gain significant traction in High Performance Computing (HPC). However, as we move further into the exascale era, the flexibility offered by RISC-V has the potential to be very beneficial in future supercomputers especially as the community places an increased emphasis on decarbonising its workloads. Sophon's SG2042 is the first mass produced, commodity available, high-core count RISC-V CPU designed for high performance workloads. First released in summer 2023, and at the time of writing now becoming widely available, a key question is whether this is a realistic proposition for HPC applications. In this paper we use NASA's NAS Parallel Benchmark (NPB) suite to characterise performance of the SG2042 against other CPUs implementing the RISC-V, x86-64, and AArch64 ISAs. We find that the SG2042 consistently outperforms all other RISC-V solutions, delivering between a 2.6 and 16.7 performance improvement at the single core level. When compared against the x86-64 and AArch64 CPUs, which are commonplace for high performance workloads, we find that the SG2042 performs comparatively well with computationally bound algorithms but decreases in relative performance when the algorithms are memory bandwidth or latency bound. Based on this work, we identify that performance of the SG2042's memory subsystem is the greatest bottleneck.

Download





Is RISC-V ready for HPC prime-time: Evaluating the 64-core Sophon SG2042 RISC-V CPU

September 2023

·

747 Reads

The Sophon SG2042 is the world's first commodity 64-core RISC-V CPU for high performance workloads and an important question is whether the SG2042 has the potential to encourage the HPC community to embrace RISC-V. In this paper we undertaking a performance exploration of the SG2042 against existing RISC-V hardware and high performance x86 CPUs in use by modern supercomputers. Leveraging the RAJAPerf benchmarking suite, we discover that on average, the SG2042 delivers, per core, between five and ten times the performance compared to the nearest widely available RISC-V hardware. We found that, on average, the x86 high performance CPUs under test outperform the SG2042 by between four and eight times for multi-threaded workloads, although some individual kernels do perform faster on the SG2042. The result of this work is a performance study that not only contrasts this new RISC-V CPU against existing technologies, but furthermore shares performance best practice.


Backporting RISC-V Vector Assembly

August 2023

·

15 Reads

·

6 Citations

Lecture Notes in Computer Science

Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.KeywordsRISC-V vector extensionHPCClangRVV Rollback


Test-Driving RISC-V Vector Hardware for HPC

August 2023

·

19 Reads

·

14 Citations

Lecture Notes in Computer Science

Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive’s U74 processor.


Performance of the Vipera Framework for DSLs on Micro-Core Architectures

May 2023

·

1 Read

Lecture Notes in Computer Science

Vipera provides a compiler and runtime framework for implementing dynamic Domain-Specific Languages on micro-core architectures. The performance and code size of the generated code is critical on these architectures. In this paper we present the results of our investigations into the efficiency of Vipera in terms of code performance and size.KeywordsDomain-specific languagesPythonnative code generationRISC-Vmicro-core architectures


Figure 1: Performance comparison of two Polybench kernels between testbed hardware
Experiences of running an HPC RISC-V testbed

April 2023

·

111 Reads

Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC community efforts upon.


Citations (10)


... Whilst we have focused on intrinsics in this paper, it would also be interesting to extend this work to a wider range of algorithmic patterns such as stencils. Work has already been undertaken mapping an MLIR stencil dialect to FPGAs [3], and we plan on extending this to also target AIEs. ...

Reference:

Seamless acceleration of Fortran intrinsics via AMD AI engines
A shared compilation stack for distributed-memory parallelism in stencil DSLs
  • Citing Conference Paper
  • April 2024

... Compiler-driven optimizations automate transformations for stencil-based computations, including FDTD. MLIR has been applied to stencils [21,22], matrix multiplication [23], FFTs [24][25][26], and climate modeling [27]. The DaCe framework [28] provides similar optimizations for PDE solvers and scientific simulations [29,30]. ...

Stencil-HMLS: A multi-layered approach to the automatic optimisation of stencil codes on FPGA
  • Citing Conference Paper
  • November 2023

... Stencil-HMLS [31] leverages MLIR to automatically transform stencil-based codes to FPGAs. Driven by extracting stencils from existing programming languages [32] and Domain Specific Languages [33], this work operates upon the MLIR stencil dialect [34] to generate resulting code structures that are highly tuned for FPGAs and then provided to AMD Xilinx's HLS tool at the LLVM-IR level. This work demonstrates that based upon domain-specific abstractions, in this case, stencils, one is able to leverage the knowledge and expertise of the FPGA community to transform these abstract representations into an efficient dataflow form. ...

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang
  • Citing Conference Paper
  • November 2023

... The pseudocode for the optimized algorithm for a lower triangular non-transposed matrix is given in Algorithm 4. It traverses the matrix "bottom-up". For the last columns, calculations are performed using the baseline algorithm (lines [4][5][6][7][8][9][10][11][12][13][14], and for the remaining columns, they are performed by an optimized algorithm that traverses along diagonals (lines [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. . Then the next unknown x[i] is calculated. ...

Is RISC-V ready for HPC prime-time: Evaluating the 64-core Sophon SG2042 RISC-V CPU
  • Citing Conference Paper
  • November 2023

... Paper [7] contains a performance evaluation of HPC workloads using FPGA simulation. The paper [8] focuses on the important problem of insufficient development of compilers for RISC-V in terms of code vectorization. Indeed, current compilers only target the RVV 1.0 vector extensions, while most of the available devices support only RVV 0.7.1. ...

Backporting RISC-V Vector Assembly
  • Citing Chapter
  • August 2023

Lecture Notes in Computer Science

... A vital extension is RVV, which implements the SIMD approach -single instruction multiple data, allowing parallel processing of data in vector registers. RVV enables the efficient processing of large data arrays, resulting in considerable performance benefits for HPC (High Performance Computing) tasks [8] and artificial intelligence applications [9,10]. Notably, the non-fixed length of RVV's vector register, determined by the VLEN parameter, simplifies development by allowing software to be written once and run on hardware with varying register lengths. ...

Test-driving RISC-V Vector hardware for HPC
  • Citing Preprint
  • April 2023

... vPython can either be run standalone on the device or as a Domain-Specific Language (DSL) within Python running on the host, offloading kernels for execution to the device. More information on the parallel programming, offloading and dynamic code loading capabilities of the language can be found in [22] and [21]. ...

Compact native code generation for dynamic languages on micro-core architectures
  • Citing Conference Paper
  • March 2021

... However, to date, these technologies have tended to result in significant performance overheads, required the programmer to ensure their code fits within the limited on-chip memory, provided limited choices around data location and size, and provided little, if any, portability across architectures. As evidenced by ePython [12], a Python interpreter for the Epiphany-III, dynamic programming languages can significantly reduce the programming effort required to overcome these complexities in comparison to the provided, low-level C software development kits (SDKs) [22]. ...

Having your cake and eating it: Exploiting Python for programmer productivity and performance on micro-core architectures using ePython
  • Citing Conference Paper
  • January 2020

... As described previously, on-core memory is everything with these micro-cores and whilst previous work around memory hierarchies and remote data [24] allow an unlimited amount of data to be streamed through the micro-core memory, there were still fundamental limits to the code size. This resulted in two major impacts, firstly the size of the Python codes that could be executed on the micro-cores and secondly the number of language features that the ePython interpreter could fully support. ...

High level programming abstractions for leveraging hierarchical memories with micro-core architectures
  • Citing Article
  • December 2019

Journal of Parallel and Distributed Computing