Mulya AgungUniversity of Edinburgh | UoE · MRC Human Genetics Unit
Mulya Agung
PhD
About
26
Publications
3,185
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
100
Citations
Introduction
I am a computer scientist and engineer specialising in high-performance computing and data-driven science. My work and research contributions span a range of topics in parallel, distributed, and data-intensive computing. My research seeks to develop computational methods and tools to address challenging practical problems, such as coming from AI and genomic research over the years.
Additional affiliations
April 2020 - March 2021
Publications
Publications (26)
MPI process placement is an important step to achieve scalable performance on modern non-uniform memory access (NUMA) systems. A recent study on NUMA architectures has shown that, on modern NUMA systems, the memory congestion problem could cause more severe performance degradation than the data locality problem because heavy congestion on memory co...
The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controll...
Dedicated infrastructures are commonly used for urgent computations. However, using dedicated resources is not always affordable due to budget constraints. As a result, utilizing shared infrastructures becomes an alternative solution for urgent computations. Since the infrastructures are meant to serve many users, the urgent jobs may arrive when re...
Genome-wide association studies (GWAS) aim to identify associations of genetic variants with a trait or disease. The scale of genomic datasets has increased to millions of genetic variants and hundreds of thousands of individuals, opening the possibilities for discoveries from GWAS. However, largescale GWAS analyses are prone to high false positive...
NEC SX-Aurora TSUBASA (SX-AT) is the latest vector supercomputer, consisting of host processors called Vector Hosts (VHs) and vector processors called Vector Engines (VEs). The goal of this work is to simultaneously use both VHs and VEs to increase the resource utilization and improve the system throughput by co-executing more workloads. One diffic...
With the rapid development of heterogeneous multi-core processors, a new High Performance Computing (HPC) system architecture combining the heterogeneous multi-core architecture and NUMA architecture will emerge in the future. However, existing task mapping methods are ineffective on such systems because they do not simultaneously consider multiple...
SX-Aurora TSUBASA (SX-AT) is a vector supercomputer equipped with Vector Engines (VEs). SX-AT has not only such a new system architecture, but also some execution modes to achieve high performance on executing a real-world application that often consists of vector friendly and unfriendly parts. Vector Engine Offloading (VEO) is a programming framew...
NEC SX-Aurora TSUBASA is the latest vector supercomputer, consisting of host processors called Vector Hosts (VHs) and vector processors called Vector Engines (VEs). The final goal of this work is to simultaneously use both VHs and VEs to increase the resource utilization and improve the system throughput by co-executing more workloads. However, per...
Mapping MPI processes to processor cores, called process mapping, is crucial to achieving the scalable performance on multi-core processors. By analyzing the communication behavior among MPI processes, process mapping can improve the communication locality, and thus reduce the overall communication cost. However, on modern non-uniform memory access...
Recently, many researchers have been investigating quantum annealing as a solver for realworld combinatorial optimization problems. However, due to the format of problems that quantum annealing solves and the structure of the physical annealer, these problems often require additional setup prior to solving. We study how these setup steps affect per...
MPI process mapping is an important step to achieve scalable performance on non-uniform memory access (NUMA) systems. Conventional approaches have focused only on improving the locality of communication. However, related studies have shown that on modern NUMA systems, the memory congestion problem could cause more severe performance degradation tha...
Thread mapping is crucial to improve the performance and energy consumption of modern NUMA systems. In this work, we investigate the impacts of locality and memory congestion-aware thread mapping on the energy consumption. Our evaluation shows that considering both the locality and memory congestion can significantly reduce not only the performance...
The OpenMP specification introduces thread team for hierarchical parallelism. A thread team is a team of synchronizable threads, and the number of threads in a thread team is called thread team size. OpenMP allows static adjustment of the thread team size, where the team size must be specified before executing an application and has to stay constan...
On modern NUMA systems, the memory congestion problem could degrade performance more than the memory access locality problem because a large number of processor cores in the systems can cause heavy congestion on memory controllers. In this work, we propose a thread mapping method that considers the spatio-temporal communication behavior of multi-th...
Checkpointing with a constant checkpoint interval, a so-called constant checkpointing method, is commonly used in HPC field and has been proved to be the optimal solution for failures whose inter-arrival times are distributed exponentially. On the other hand, previous works have shown that there is a high correlation between processor temperature a...
Checkpointing with a fixed checkpoint interval, a so-called constant checkpointing method, is commonly used in the field of fault-tolerance for high-performance computing (HPC) systems. It can achieve minimum total execution time if the failure follows an exponential distribution. Related work show that there is a high correlation between temperatu...
A call detail record (CDR) is a data record produced by telecommunication equipment consisting of call detail transaction logs. It contains valuable information for many purposes in several domains, such as billing, fraud detection and analytical purposes. However, in the real world these needs face a big data challenge. Billions of CDRs are genera...