Excluding a global array from caching

For GF-100 series GPUs, global memory is by default cached. How to exclude a specific global array from being cached? Is there an API provided for this purpose? Thanks.

Unfortunately there is no API yet for marking a specific array as being uncached. You can only turn off caching globally with a compiler flag to nvcc. However, PTX does have modifiers to specify that specific global loads should bypass the cache. You could use some inline PTX in your kernel to do the uncached load. I don’t have an example of the inline PTX syntax handy, but a quick Google search of this forum should turn one up.

mark it volatile, that should do it. (that will definitely avoid L1, not sure if it avoids L2 completely. I think it does?)

Thanks for the hint. I’ve checked PTX assembly version 2.0/2.1, it seems that PTX instruction ld.cs OR ld.lu both can satisfy my requirements. I’m not quite sure ld.cv (which should be used for volatile variables) can achieve the same effect.

I haven’t successfully found any useful inlining examples for PTX code in CUDA C code. Is there any good examples available? Thanks.