Decoding Fermi PTXinfo

Does anyone out there have a good guide as to how to decode the ptxinfo for Fermis? For example:

 ptxas info    : Function properties for progno_cloud
     1000 bytes stack frame, 1192 bytes spill stores, 1776 bytes spill loads
*ptxas info    : Used 63 registers, 48 bytes cmem[0], 8 bytes cmem[14], 280 bytes cmem[16]

Now, I’m pretty sure that “stack frame” is per-thread local memory use because in my kernels with lots of local memory, “stack frame” goes up. Likewise, I think the “spills” refer to spilling of registers into local memory. Again, my big kernels where I easily crash into the 63 register limit get spills.

But, I’m confused by the cmems. My first thought was that it refers to constant and shared memory use. Yet, when I have a kernel with lots of constant memory used:

ptxas info    : Used 63 registers, 4+0 bytes lmem, 64 bytes cmem[0], 4 bytes cmem[14], 276 bytes cmem[16]

while a kernel with no constant memory used:

 ptxas info    : Used 28 registers, 120 bytes cmem[0], 5600 bytes cmem[2], 8 bytes cmem[16]

Now, I can say that latter code is one where I have 4 or 5 global kernels often sharing some “working space” global memory, i.e., there is a common array to all kernels that is allocated but never copied to/from the host. (Oh and neither of these uses shared memory!)

So: What does cmem refer to and what do the various [#] refer to? And is there something I should be careful of? Like, avoid cmem[14] but cheer cmem[16]?

Thanks,
Matt

Hey Matt,

I’m trying to get answer for you, though no one’s responded to me yet. I’ll push a bit an see what I can find out.

  • Mat

Thanks. It seems to be a night state secret at NVIDIA. They don’t explain it anywhere in their own documentation that I can see. Even their forums only know bits and pieces.

Of course, once you get an answer about this, they’ll change it up in 4.1 or 4.2…just to keep us honest!

Matt