I have a simple question here. I could test it to get my answer but i would also like to get a “hardware” explanation here.
I have read in a couple of articles (i.e. article)that it is possible to save registers by using constant memory instead of kernel parameters. I really wonder why. I thought that kernel parameters were stocked in shared memory. What would be the difference then ?
You are right, kernel arguments are stored in shared memory, and you wouldn’t save registers by moving kernel arguments to constant memory. But I didn’t find any reference to that in the article you linked to. They did say they precomputed some things and stored them in constant memory, rather than have them in registers to improve occupancy, but that is somewhat different to what you are suggesting, I think.
It was only meaning that they avoid some computations and consequently save some registers… Ok !
All of this because i’m trying to find a way to count the number of registers (by checking the “life” of temporary registers) needed by the GPU for my kernels. Actually i have the real number (via cubin or profiler), but if i try to count them by myself i’m totally under the real number !
I tried to do it with a simple example like transpose naive in SDK, with the help of PTX file. It requires 6 registers and i count up to 5…