I have a Kernel program that uses 400B of constant memory. In the kernel, I access 32-bit chunks of the constant memory serially. When there is a cache miss, does the GPU go out and fetch a continuous block of constant memory or does it just fetch the 32-bit?
Also is it possible to control how much data the GPU fetches when there’s a cache miss? For example on the first cache miss, the idea is to have the GPU fetch all 400B and stick it into the nearby constant cache.