Conference Paper

Reducing Shared Cache Contention by Scheduling Order Adjustment on Commodity Multi-cores

DOI: 10.1109/IPDPS.2011.248 Conference: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Source: IEEE Xplore

ABSTRACT Due to the limitation of power and processor complexity on traditional single core processors, multi-core processors have become the mainstream. One key feature on commodity multi-cores is that the last level cache (LLC) is usually shared. However, the shared cache contention can affect the performance of applications significantly. Several existing proposals demonstrate that task co-scheduling has the potential to alleviate the contention, but it is challenging to make co-scheduling practical in commodity operating systems. In this paper, we propose two lightweight practical cache-aware co-scheduling methods, namely static SOA and dynamic SOA, to solve the cache contention problem on commodity multi-cores. The central idea of the two methods is that the cache contention can be reduced by adjusting the scheduling order properly. These two methods are different from each other mainly in the way of acquiring the process's cache requirement. The static SOA (static scheduling order adjustment) method acquires the cache requirement information statically by offline profiling, while the dynamic SOA (dynamic scheduling order adjustment) captures the cache requirement statistics by using performance counters. Experimental results using multi-programmed NAS workloads suggest that the proposed methods can greatly reduce the effect of cache contention on multi-core systems. Specifically, for the static SOA method, the execution time can be reduced by up to 15.7%, the number of cache misses can be reduced by up to 11.8%, and the performance improvement remains obvious across the cache size and the length of time slice. For the dynamic SOA method, the execution time reduction can achieve up to 7.09%.

1 Bookmark
 · 
151 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Given the wide variety of performance demands for various workloads, the trend in embedded systems is shifting from homogeneous to heterogeneous processors, which have been shown to yield performance and energy saving benefits. A typical heterogeneous processor has cores with different performance and power characteristics, that is, high performance and power hungry (“big”) cores, and low power and performance (“small”) cores. In order to satisfy the memory bandwidth and computation demands of various threads, it is important (albeit challenging) to map threads to cores. Such assignment should take into account that threads could potentially be harmful to each other in the usage of shared resources (e.g., cache, memory). We propose a scheme for dynamic energy-efficient assignment of threads to big/small cores, DIO-E (Distributed Intensity Online-Energy), which is an enhancement of the previously proposed DIO. In contrast to DIO, we take into account both CPU and memory demands of threads to characterize the performance of threads when co-running on the same core at run-time. Our results show that DIO-E improves the energy-delay-squared product (ED2) by 9% (average) over DIO, running on a performance-asymmetric multicore system. Both DIO and DIO-E show about 50% improvement in ED2 over a state-of-the-art solution.
    Embedded Software (EMSOFT), 2013 Proceedings of the International Conference on; 01/2013