Convex polyhedra represent granular media well. This geometric representation may be critical in obtaining realistic simulations of many industrial processes using the discrete element method (DEM). However detecting collisions between the polyhedra and surfaces that make up the environment and the polyhedra themselves is computationally expensive. This paper demonstrates the significant computational benefits that the graphical processor unit (GPU) offers DEM. As we show, this requires careful consideration due to the architectural differences between CPU and GPU platforms. This paper describes the DEM algorithms and heuristics that are optimized for the parallel NVIDIA Kepler GPU architecture in detail. This includes a GPU optimized collision detection algorithm for convex polyhedra based on the separating plane (SP) method. In addition, we present heuristics optimized for the parallel NVIDIA Kepler GPU architecture. Our algorithms have minimalistic memory requirements, which enables us to store data in the limited but high bandwidth constant memory on the GPU. We systematically verify the DEM implementation, where after we demonstrate the computational scaling on two large-scale simulations. We are able achieve a new performance level in DEM by simulating 34 million polyhedra on a single NVIDIA K6000 GPU. We show that by using the GPU with algorithms tailored for the architecture, large scale industrial simulations are possible on a single graphics card.