Conference Paper

ePython: An Implementation of Python for the Many-Core Epiphany Co-processor

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... However, to date, these technologies have tended to result in significant performance overheads, required the programmer to ensure their code fits within the limited on-chip memory, provided limited choices around data location and size, and provided little, if any, portability across architectures. As evidenced by ePython [12], a Python interpreter for the Epiphany-III, dynamic programming languages can significantly reduce the programming effort required to overcome these complexities in comparison to the provided, low-level C software development kits (SDKs) [22]. ...
... Vipera provides two implementations of this; the first compiles down to bytecode that executes on a tiny virtual machine (c. 24KB on the Adapteva Epiphany-III [12]) running on the device and the second generates Olympus abstract machine code that is compiled to provide device native code. In this paper we will focus on the Olympus abstract machine version of vPython. ...
Preprint
Vipera provides a compiler and runtime framework for implementing dynamic Domain-Specific Languages on micro-core architectures. The performance and code size of the generated code is critical on these architectures. In this paper we present the results of our investigations into the efficiency of Vipera in terms of code performance and size.
... More recently, a number of dynamic languages have been ported to these very constrained architectures. From an implementation of a Python interpreter [17], to [11] and [30], these are typically provided as interpreters but the severe memory limits of these cores is a major limiting factor to these approaches. ...
... Furthermore, as the code for dynamic functions is stored within the abstract machine heap, it can be discarded (freed) as required, thereby allowing the execution of much larger kernels than is possible with previous static code loading model. Crucially, our environment model automatically enables runtime symbol resolution within the compiled C code, enabling dynamic function loading. 1 declare_proc ( env ,1 , " add " , NULL ) ; 2 ... 3 update_proc ( env ,1 , load_proc ( " add " ,env ,2) ) ; Listing 5. Deferred dynamically loaded function example 4 Python -a Vehicle for Testing Our Approach ePython [17] is an interpreter which implements a subset of Python and is designed to target micro-core architectures. Designed with portability across these architectures in mind, it has evolved from its initial purpose as an educational language for parallel programming, through its use as research vehicle for understanding how to program micro-core architectures, to supporting real-word applications on the micro-cores. ...
Preprint
Micro-core architectures combine many simple, low memory, low power-consuming CPU cores onto a single chip. Potentially providing significant performance and low power consumption, this technology is not only of great interest in embedded, edge, and IoT uses, but also potentially as accelerators for data-center workloads. Due to the restricted nature of such CPUs, these architectures have traditionally been challenging to program, not least due to the very constrained amounts of memory (often around 32KB) and idiosyncrasies of the technology. However, more recently, dynamic languages such as Python have been ported to a number of micro-cores, but these are often delivered as interpreters which have an associated performance limitation. Targeting the four objectives of performance, unlimited code-size, portability between architectures, and maintaining the programmer productivity benefits of dynamic languages, the limited memory available means that classic techniques employed by dynamic language compilers, such as just-in-time (JIT), are simply not feasible. In this paper we describe the construction of a compilation approach for dynamic languages on micro-core architectures which aims to meet these four objectives, and use Python as a vehicle for exploring the application of this in replacing the existing micro-core interpreter. Our experiments focus on the metrics of performance, architecture portability, minimum memory size, and programmer productivity, comparing our approach against that of writing native C code. The outcome of this work is the identification of a series of techniques that are not only suitable for compiling Python code, but also applicable to a wide variety of dynamic languages on micro-cores.
... ePython [28] is an implementation of Python, initially developed for the Epiphany, and now ported to other micro-core architectures including the MicroBlaze. The primary purpose of ePython was initially educational, but it is also applicable as a research vehicle for understanding how best to program these architectures and prototyping applications on them. ...
Preprint
Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in which programmers offload kernels needs to be carefully considered. In this paper we use Python as a vehicle for exploring the semantics and abstractions of higher level programming languages to support the offloading of computational kernels to these devices. By moving to a pass by reference model, along with leveraging memory kinds, we demonstrate the ability to easily and efficiently take advantage of multiple levels in the memory hierarchy, even ones that are not directly accessible to the micro-cores. Using a machine learning benchmark, we perform experiments on both Epiphany-III and MicroBlaze based micro-cores, demonstrating the ability to compute with data sets of arbitrarily large size. To provide context of our results, we explore the performance and power efficiency of these technologies, demonstrating that whilst these two micro-core technologies are competitive within their own embedded class of hardware, there is still a way to go to reach HPC class GPUs.
Chapter
Vipera provides a compiler and runtime framework for implementing dynamic Domain-Specific Languages on micro-core architectures. The performance and code size of the generated code is critical on these architectures. In this paper we present the results of our investigations into the efficiency of Vipera in terms of code performance and size.KeywordsDomain-specific languagesPythonnative code generationRISC-Vmicro-core architectures
Article
Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in which programmers offload kernels needs to be carefully considered. In this paper we use Python as a vehicle for exploring the semantics and abstractions of higher level programming languages to support the offloading of computational kernels to these devices. By moving to a pass by reference model, along with leveraging memory kinds, we demonstrate the ability to easily and efficiently take advantage of multiple levels in the memory hierarchy, even ones that are not directly accessible to the micro-cores. Using a machine learning benchmark, we perform experiments on both Epiphany-III and MicroBlaze based micro-cores, demonstrating the ability to compute with data sets of arbitrarily large size. To provide context of our results, we explore the performance and power efficiency of these technologies, demonstrating that whilst these two micro-core technologies are competitive within their own embedded class of hardware, there is still a way to go to reach HPC class GPUs.
Article
Full-text available
This paper describes the design of a 1024-core processor chip in 16nm FinFet technology. The chip ("Epiphany-V") contains an array of 1024 64-bit RISC processors, 64MB of on-chip SRAM, three 136-bit wide mesh Networks-On-Chip, and 1024 programmable IO pins. The chip has taped out and is being manufactured by TSMC. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Article
Full-text available
The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following an investigation into the use of two-sided communication using threaded MPI, one-sided communication using SHMEM is being explored. Here we present work in progress on the development of an OpenSHMEM 1.2 implementation for the Epiphany architecture.
Article
Full-text available
Datatypes 63 7.1 A Datatype Example: stack . . . . . . . . . . . . . . . . . . . . . . . 64 7.2 Datatype Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3 CASES Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 A The Grammar 75 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Chapter 1 Introduction PVS is a Prototype V erication System for the development and analysis of formal specications. The PVS system consists of a specication language, a parser, a typechecker, a prover, specication libraries, and various browsing tools. This document primarily describes the specication language and is meant to be used as a reference manual. The PVS System Guide [9] is to be consulted for information on how to use the system to develop specications and proofs. The PVS Prover Guide [13] is a reference manual for the commands used...
Article
The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix-matrix multiplication, N-body particle interaction, a five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
Article
The IPython project provides an enhanced interactive environment for scientific computing, with features including support data visualization and facilities for distributed and parallel computation. The most important characteristic of scientific computing is a collection of high-performance code written in FORTRAN, C language, and C++ that runs in batch mode on large systems, clusters, and superconductors. The IPython project aims to provide a greatly enhanced Python shell, facilities for interactive distributed and parallel computing, and comprehensive set of tools for building special-purpose interactive environments for scientific computing. This project has been providing tools to extend Python's interactive capabilities and continues to be developed as a base layer for new interactive environments. It offers a set of control commands designed to improve Python's usability in an interactive environment.
Coprthr api reference
  • D Richie
epython parallel odd even sort
  • N Brown
High performance python on intel many-core architecture
  • R D Wargny
A manycore coprocessor architecture for heterogeneous computing
  • A Olofsson
Python language reference
  • P S Foundation