Seibert: thanks for the reply
The loading to shared memory part really assumes that this would aid in a cache hit, given that that loaded to shared memory is not used immediately; the intrinsic of global memory caching is not really elaborately discussed, hence I was hoping that this might hint the compiler somehow
Perhaps I should rephrase as such: given that global memory is cached for devices of sufficient compute capability, are there any cache instructions or methods in general to aid cache hits
My global memory accesses are rather conditional, such that it is sure to result in cache misses when execution gets to the global memory read points
However, my algorithm/ kernel is sufficiently large that global memory “pre-fetching” becomes sensible
Some times with a little overhead, and some times with no overhead, I can pre-determine - or simply know in advance - global memory reads down the line, with addresses known
Is there any way to “manipulate” or “aid” global memory caching under these conditions, to increase cache hits? Put differently, can global memory caching in any way imply global memory “pre-fetching”?