Article by David Kantner mysteries unveiled

I found an article over at RealWorldTech which I have not yet seen at Nvidias own listngs. Amongst other things it compares the older 1.1 compute model to the newer 1.3 model, with pretty pictures and everything,

It lists as references quite a few prominent people from this forum, but I doubt the thruthiness of this sentence:

Instructions? … Me believes that must be a typo, no?

I can’t find anything that agrees with that, but it does raise a question that’s been on my mind: Where does the code reside? Constant memory seems as good a place as any and it certainly works in a way that would be more optimal than the other areas, for code execution.

A better place would be that code has its own cache, undocumented and only known to the scheduler. (IMHO)

Instructions have a separate cache - slide 25 of http://developer.nvidia.com/object/siggraph-2008-CUDA.html

Paulius

as far as I ever understood from reading these forums, it resides in device memory like all other memory. There is a instruction cache in each SM. There is as far as I remember also a limit to the size of a kernel (#instructions) I think that is also mentioned in the prog. guide

Constant memory also resides in device memory, is limited to 64k, and there is a cache in each SM (of 16k)

Oops. This is interesting and it makes sense why big “FOR” loops (like encompassing a whole big fat kernel) can cause latencies – the instructions might not be in the SM cache and hence would result in cache-miss… Juss my guess based on the discussion going on here.