GMEM loads: caching vs. non-caching

on Fermi GPUs, the default global memory access pattern are caching loads (i.e. a granularity of 128-bytes). With CUDA, you may change it to non-caching loads by compiling with nvcc and “-Xptxas -dlcm=cg”.
With PGI’s OpenACC, I assume we also have caching loads be default. Right? Is there any chance to use non-caching loads with OpenACC (compiler flag, environment variable,…)?

Hi Sandra,

We do have an experimental flag (-Mx,180,8) that will disable the L1 cache. You are welcome to give it a try. The caveat being that since it’s not been exposed at the user level, it is subject to change.

  • Mat

Thanks Mat! I will give it a try and will report my results.


Apologies for resurrecting this thread - since in the K40 we can once again use caching loads and dlcm=ca, I was wondering how I could enable this in the CUDA Fortran compiler - could you help me with that please?

Thank you,

Hi Istvan,

We added this as the flag “-ta=tesla:noL1”.

  • Mat