Fermi-style L1 cache in K40 and upwards

I came across the info that K40 and upwards have a Fermi-style cache mode, which would be very useful for my project, since I can’t use shared memory there (for performance portability reasons):

http://docs.nvidia.com/cuda/kepler-tuning-guide/#l1-cache.

My question: Does PGI already make use of this tuning option or is there a way to pass this option through to nvcc? I guess I could always do something like ‘keepgpu’ and then pass the cuda-files to nvcc myself, but I imagine there is an easier way?

Does PGI already make use of this tuning option or is there a way to pass this option through to nvcc?

Yes, it’s “-ta=tesla:loadcache:L1”

  • Mat