I know that constant memory is chached. But my question is: Is this cache maintained throughout kernels or is it reloaded in each kernel?
I believe it gets invalidated for each kernel launch.
So, once some bytes from constant memory are read in a kernel launch (let’s say, by thread tid within block blockId), do all other read attempts (to the same positions) from the different threads in all the other blocks go directly to the cache?
The cache is per multi-processor, so only all other blocks on that multiproc will go to the cache.