Mulya Agung

Mulya Agung
University of Edinburgh | UoE · MRC Human Genetics Unit

PhD

About

26
Publications
3,185
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
100
Citations
Introduction
I am a computer scientist and engineer specialising in high-performance computing and data-driven science. My work and research contributions span a range of topics in parallel, distributed, and data-intensive computing. My research seeks to develop computational methods and tools to address challenging practical problems, such as coming from AI and genomic research over the years.
Additional affiliations
April 2020 - March 2021
Tohoku University
Position
  • Researcher

Publications

Publications (26)
Conference Paper
Full-text available
MPI process placement is an important step to achieve scalable performance on modern non-uniform memory access (NUMA) systems. A recent study on NUMA architectures has shown that, on modern NUMA systems, the memory congestion problem could cause more severe performance degradation than the data locality problem because heavy congestion on memory co...
Article
Full-text available
The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controll...
Article
Full-text available
Dedicated infrastructures are commonly used for urgent computations. However, using dedicated resources is not always affordable due to budget constraints. As a result, utilizing shared infrastructures becomes an alternative solution for urgent computations. Since the infrastructures are meant to serve many users, the urgent jobs may arrive when re...
Preprint
Genome-wide association studies (GWAS) aim to identify associations of genetic variants with a trait or disease. The scale of genomic datasets has increased to millions of genetic variants and hundreds of thousands of individuals, opening the possibilities for discoveries from GWAS. However, largescale GWAS analyses are prone to high false positive...
Article
Full-text available
NEC SX-Aurora TSUBASA (SX-AT) is the latest vector supercomputer, consisting of host processors called Vector Hosts (VHs) and vector processors called Vector Engines (VEs). The goal of this work is to simultaneously use both VHs and VEs to increase the resource utilization and improve the system throughput by co-executing more workloads. One diffic...
Chapter
With the rapid development of heterogeneous multi-core processors, a new High Performance Computing (HPC) system architecture combining the heterogeneous multi-core architecture and NUMA architecture will emerge in the future. However, existing task mapping methods are ineffective on such systems because they do not simultaneously consider multiple...
Chapter
SX-Aurora TSUBASA (SX-AT) is a vector supercomputer equipped with Vector Engines (VEs). SX-AT has not only such a new system architecture, but also some execution modes to achieve high performance on executing a real-world application that often consists of vector friendly and unfriendly parts. Vector Engine Offloading (VEO) is a programming framew...
Chapter
NEC SX-Aurora TSUBASA is the latest vector supercomputer, consisting of host processors called Vector Hosts (VHs) and vector processors called Vector Engines (VEs). The final goal of this work is to simultaneously use both VHs and VEs to increase the resource utilization and improve the system throughput by co-executing more workloads. However, per...
Article
Full-text available
Mapping MPI processes to processor cores, called process mapping, is crucial to achieving the scalable performance on multi-core processors. By analyzing the communication behavior among MPI processes, process mapping can improve the communication locality, and thus reduce the overall communication cost. However, on modern non-uniform memory access...
Article
Recently, many researchers have been investigating quantum annealing as a solver for realworld combinatorial optimization problems. However, due to the format of problems that quantum annealing solves and the structure of the physical annealer, these problems often require additional setup prior to solving. We study how these setup steps affect per...
Conference Paper
Full-text available
MPI process mapping is an important step to achieve scalable performance on non-uniform memory access (NUMA) systems. Conventional approaches have focused only on improving the locality of communication. However, related studies have shown that on modern NUMA systems, the memory congestion problem could cause more severe performance degradation tha...
Conference Paper
Full-text available
Thread mapping is crucial to improve the performance and energy consumption of modern NUMA systems. In this work, we investigate the impacts of locality and memory congestion-aware thread mapping on the energy consumption. Our evaluation shows that considering both the locality and memory congestion can significantly reduce not only the performance...
Conference Paper
Full-text available
The OpenMP specification introduces thread team for hierarchical parallelism. A thread team is a team of synchronizable threads, and the number of threads in a thread team is called thread team size. OpenMP allows static adjustment of the thread team size, where the team size must be specified before executing an application and has to stay constan...
Poster
Full-text available
On modern NUMA systems, the memory congestion problem could degrade performance more than the memory access locality problem because a large number of processor cores in the systems can cause heavy congestion on memory controllers. In this work, we propose a thread mapping method that considers the spatio-temporal communication behavior of multi-th...
Conference Paper
Full-text available
Checkpointing with a constant checkpoint interval, a so-called constant checkpointing method, is commonly used in HPC field and has been proved to be the optimal solution for failures whose inter-arrival times are distributed exponentially. On the other hand, previous works have shown that there is a high correlation between processor temperature a...
Poster
Full-text available
Checkpointing with a fixed checkpoint interval, a so-called constant checkpointing method, is commonly used in the field of fault-tolerance for high-performance computing (HPC) systems. It can achieve minimum total execution time if the failure follows an exponential distribution. Related work show that there is a high correlation between temperatu...
Article
Full-text available
A call detail record (CDR) is a data record produced by telecommunication equipment consisting of call detail transaction logs. It contains valuable information for many purposes in several domains, such as billing, fraud detection and analytical purposes. However, in the real world these needs face a big data challenge. Billions of CDRs are genera...

Network

Cited By