Conference Paper

Towards Efficient Supercomputing: A Quest for the Right Metric.

DOI: 10.1109/IPDPS.2005.440 Conference: 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), CD-ROM / Abstracts Proceedings, 4-8 April 2005, Denver, CO, USA
Source: DBLP


Over the past decade, we have been building less and less efficient supercomputers, resulting in the construction of substantially larger machine rooms and even new build- ings. In addition, because of the thermal power envelope of these supercomputers, a small fortune must be spent to cool them. These infrastructure costs coupled with the ad- ditional costs of administering and maintaining such (un- reliable) supercomputers dramatically increases their to tal cost of ownership. As a result, there has been substantial in - terest in recent years to produce more reliable and more ef- ficient supercomputers that are easy to maintain and use. But how does one quantify efficient supercomputing? That is, what metric should be used to evaluate how efficiently a supercomputer delivers answers? We argue that existing efficiency metrics such as the performance-power ratio are insufficient and motivate the need for a new type of efficiency metric, one that incorpo- rates notions of reliability, availability, productivity , and to- tal cost of ownership (TCO), for instance. In doing so, how- ever, this paper raises more questions than it answers with respect to efficiency. And in the end, we still return to the performance-power ratio as an efficiency metric with re- spect to power and use it to evaluate a menagerie of pro- cessor platforms in order to provide a set of reference data points for the high-performance computing community.

Download full-text


Available from: Wu Feng, Sep 29, 2015
11 Reads
  • Source
    • "This is done due to fear of increased node failures at higher temperatures because Mean Time Between Failure (MTBF) of a processor is directly proportional to the exponential of its temperature [20] [21] [22]. It has also been reported that for every 10 • C increase in temperature, fault rate doubles [20] [23] [24] [25]. Therefore, besides reducing the cooling energy of the data center, restraining the temperature of the processors also increases the MTBF of the system. "
    Dataset: sc13
  • Source
    • "[They] raise awareness about power consumption, promote alternative total cost of ownership performance metrics, and ensure that supercomputers only simulate climate change and not create it.'' Between its inception in 2007 and the November release of 2012, the GREEN500 list was a reordering of the TOP500 list in order of decreasing power efficiency measured as the maximal GFLOPS per Watt (GFLOPS/W), a metric proposed in the parallel and distributed processing community [12]. Power consumption of a system is measured by a digital power meter plugged into the system's power strip, and readings are sent to a profiling computer at a rate of 50 kHz. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The biannual TOP500 list of the highest performing supercomputers has chronicled, and even fostered, the development of recent supercomputing platforms. Coupled with the GREEN500 list that launched in November 2007, the TOP500 list has enabled analysis of multiple aspects of supercomputer design. In this comparative and retrospective study, we examine all of the available data contained in these two lists through November 2012 and propose a novel representation and analysis of the data, highlighting several major evolutionary trends.
    Parallel Computing 04/2013; 39(6-7):271–279. DOI:10.1016/j.parco.2013.04.007 · 1.51 Impact Factor
  • Source
    • "As a result of this, there has been a substantial increased interest in the pursuit of efficient supercomputing. An immediate question is how to quantify efficiency in supercomputing [16]. One possible metric is the performancepower ratio. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Efficiency in supercomputing has traditionally focused on execution time. In early 2000's, the concept of total cost of ownership was re-introduced, with the introduction of efficiency measure to include aspects such as energy and space. Yet the supercomputing community has never agreed upon a metric that can cover these aspects completely and also provide a fair basis for comparison. This paper examines the metrics that have been proposed in the past decade, and proposes a vector-valued metric for efficient supercomputing. Using this metric, the paper presents a study of where the supercomputing industry has been and where it stands today with respect to efficient supercomputing.
Show more