I-cache size Question about I-cache size on GTX280

Hi folks,

Is there any documentation anywhere, describing the size of the I-cache on any of the NVIDIA GPUs? Alternately, do you know of any research groups/optimization gurus who have experimented with determining the I-cache size - for example, by porting CPU-based techniques onto GPUs?

I’ve searched around quite a bit, but haven’t found any details. I even dug into the nvopencc source code to figure out what metrics are used in ‘automatic unrolling of small loops’ that is performed by the cuda to ptx compiler, since I-cache considerations are very important for loop unrolling - but the unroller is not very sophisticated and it gives no hints - it just uses hard-wired information independent of the device being compiled to (max unroll factor = 16 and max node count in AST of loop body = 40)

So if anyone has any pointers on this, please let me know.


I’ve been toying with unrolling loops on my 8800GTS (G92) and found that I had the best times when I unrolled a certain number of iterations that created a .cubin with, IIRC, 8KBs worth of instructions (and operands). I’m not sure of that number now but I’m certain it was nice and round (power of two), so it fitted as a possible size of the cache.