I came across the info that K40 and upwards have a Fermi-style cache mode, which would be very useful for my project, since I can’t use shared memory there (for performance portability reasons):
http://docs.nvidia.com/cuda/kepler-tuning-guide/#l1-cache.
My question: Does PGI already make use of this tuning option or is there a way to pass this option through to nvcc? I guess I could always do something like ‘keepgpu’ and then pass the cuda-files to nvcc myself, but I imagine there is an easier way?