Are you ナオト フクモト?

Claim your profile

Publications (3)0 Total impact

  • Article: Analyzing the Impact of Data Prefetching on Chip MultiProcessors
    [show abstract] [hide abstract]
    ABSTRACT: Data prefetching is a well known approach to compensating for poor memory performance, and has been employed in commercial processor chips. Although a number of prefetching techniques have so far been proposed, in many cases, they have assumed single-core architectures. In chip multiprocessor (or CMP) chips, there are some shared resources such as L2 caches, buses, and so on. Therefore, the effect of prefetching on CMP should be different from traditional single-core processors. In this paper, we analyze the effect of prefetching on CMP performance. This paper first classifies the impact of prefetches issued during program execution. Then, we discuss quantitatively the effect of prefetching to memory performance. The experimental results show that the negative effect of invalidation of prefetched cache blocks is very small. In addition, it is observed that the current prefetch algorithms do not exploit effectively the feature of CMPs, i.e. cache-to-cache on-chip data transfer.
  • Source
    Article: 演算/メモリ性能バランスを考慮したCMP向けヘルパースレッド実行方式の提案と評価
    [show abstract] [hide abstract]
    ABSTRACT: 複数のプロセッサコアを1チップに搭載するチップマルチプロセッサ(CMP)が現在注目されている. チップ内スレッドレベル並列処理により高い演算性能を得ることができるためである.しかしながら,メモリバンド幅の制約や複数コア搭載によるメモリアクセス頻度の増加により,メモリウォール問題が深刻化する.その結果,多くのメモリ参照を必要とする並列プログラムの実行においては実効性能が低下するといった問題が生じる.そこで本稿では,CMPの性能向上を目的として,演算性能とメモリ性能のバランスを考慮したヘルパースレッド実行方式を提案する.従来の方式では,スレッドレベル並列性を高めるため,搭載された全てのプロセッサコアを利用して並列プログラムを実行する.これに対し,提案方式では,一部のプロセッサコアをプリフェッチを行うヘルパースレッドに割当てる.ヘルパースレッドの最適な数が既知であると仮定して提案方式の性能を評価した結果,従来方式と比較して,最大で47%の性能向上を得ることができた. Conventional CMPs attempt to exploit the thread-level parallelism (TLP) by using all of the cores integrated in a chip. However, this kind of straightforward way does not always achieve the best performance. This is because the memory-wall problem becomes more critical in CMPs, resulting in poor performance in spite of high TLP. To solve this issue, we propose an efficient thread management technique, called performance balancing. We dare to throttle the TLP to execute software prefetchers as helper-threads. Our experimental results show 47% speed up in the best case compared with a conventional parallel execution.
  • Source
    Article: チップマルチプロセッサにおけるデータ・プリフェッチ効果の分析
    [show abstract] [hide abstract]
    ABSTRACT: 複数コアを1チップに搭載するチップマルチプロセッサ(CMP)が注目されている。CMP は、複数コアで並列処理することで高い演算性能を達成することができる。しかしながら、メモリバンド幅の制約や複数コア搭載によるメモリアクセス頻度の上昇により、メモリウォール問題が深刻化する。主記憶のアクセス時間を隠蔽する方法のひとつにデータ・プリフェッチがある。CMP においてデータ・プリフェッチを行う場合、コア間の相互作用があるため、シングルコアプロセッサとは異なる効果が現れる。そこで本稿では、CMP におけるデータ・プリフェッチが性能へ与える影響を分析した。その結果、プリフェッチしたデータが無効化される割合は極めて小さく、プリフェッチを発行したコア以外のメモリアクセス時間を隠蔽するプリフェッチが約5%あることが明らかになった。 Chip Multiprocessors (or CMPs) can achieve higher performance by means of exploiting thread level parallelism. Increasing the number of processor cores in a chip dramatically improves the peak performance. However, since the memory bandwidth does not scale with the number of cores, the negative impact of the memory-wall problem becomes more critical. Data prefetching is a well known approach to compensating for the poor memory performance, and has been employed in commercial processor chips. Although a number of prefetching techniques have so far been proposed, in many cases, they have assumed that the processor core in a chip is only one. In CMP chips, there are some shared resources such as L2 caches, buses, and so on. Therefore, the effect of prefetching on CMPs should be different from that on single-core processors. In this paper, we analyze the effect of prefetching on CMP performance. This paper first classifies the impact of prefetch operations issued during a program execution. Then, we discuss qualitatively and quantitatively the effect of prefetching to the memory performance. The experimental results show that the negative effect of invalidation of prefetched data is very small. In addition, it is observed that about 5% of prefetch operations improve the cache hit rates of other cores.