Is there any documentation anywhere, describing the size of the I-cache on any of the NVIDIA GPUs? Alternately, do you know of any research groups/optimization gurus who have experimented with determining the I-cache size - for example, by porting CPU-based techniques onto GPUs?
I’ve searched around quite a bit, but haven’t found any details. I even dug into the nvopencc source code to figure out what metrics are used in ‘automatic unrolling of small loops’ that is performed by the cuda to ptx compiler, since I-cache considerations are very important for loop unrolling - but the unroller is not very sophisticated and it gives no hints - it just uses hard-wired information independent of the device being compiled to (max unroll factor = 16 and max node count in AST of loop body = 40)
So if anyone has any pointers on this, please let me know.