[Show abstract][Hide abstract] ABSTRACT: Performance tuning for data centers is essential and complicated. It is important since a data center comprises thousands of machines and thus a single-digit performance improvement can significantly reduce cost and power consumption. Unfortunately, it is extremely difficult as data centers are dynamic environments where applications are frequently released and servers are continually upgraded. In this paper, we study the effectiveness of different processor prefetch configurations, which can greatly influence the performance of memory system and the overall data center. We observe a wide performance gap when comparing the worst and best configurations, from 1.4% to 75.1%, for 11 important data center applications. We then develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. The framework achieves performance within 1% of the best performance of any single configuration for the same set of applications.
Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, November 14-20, 2009, Portland, Oregon, USA; 01/2009
[Show abstract][Hide abstract] ABSTRACT: A typical data center application requires the processor cycles of thousands of machines. Even a single-digit performance improvement can significantly reduce the cost and power consumption of a data center. Unfortunately, achieving sustained improvement, even if modest, is difficult. Data centers are dynamic environments where applications are frequently released and servers are continually upgraded. For maintainability and fault tolerance, the physical capabilities and configuration of the servers are abstracted from the application programmer. We study application performance under different processor prefetch configurations. These configurations are largely transparent to the programmer, yet we observe a wide range of performance when comparing the worst and best configurations, with relative performance improvement ranging from 1.4% to 75.1%. Alarmingly, one application that consumes many processor cycles has a 23.6% improvement. Default prefetch configurations favor aggressively prefetching memory, which benefits most applications, but some data center applications have highly tuned memory behavior and aggressive prefetching severely decreases performance. We develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. It applies to a large number of performance-critical data center applications without modifying the source codeor binaries. The framework achieves performance within 1% of the best performance of a suite of important data center applications.
Proceedings of the 23rd international conference on Supercomputing, 2009, Yorktown Heights, NY, USA, June 8-12, 2009; 01/2009