The smoothed particle hydrodynamics (SPH)
method has been implemented on Graphical Processing Units
(GPU) several times to increase performance. However, the need
for ever faster implementations is still there. Modern GPUs have
a complex memory hierarchy of which effective utilization is
paramount for high-performance computing. Use of the GPU’s
shared memory has traditionally been seen as important, as
GPUs have had no or little automatic caching. Newer GPUs from
NVIDIA, such as the Fermi and Kepler architectures, have a
more advanced cache implementation than previous generations,
possibly alleviating the shared memory requirement. We present
benchmark results of four different memory handling strategies
for the SPH algorithm with computations on the GPUs and with
kernel support width of both 2h and 3h. Our results indicate that
modern caching to a great extent alleviate the need for explicit
and manual use of shared memory, and that the kernel support
has a great influence on the choice of memory strategy.