Maxrregcount ?

If I take the C code I have and compile it with -maxrregcount 16 and -ptxas-options=-v it reports that 16 registers are used for the kernels in the source, as expected.

When I compile without maxrregcount in the command the register counts are greater (18 & 20). However the .ptx code I get is identical.

I must be missing something but I don’t see what. I’m trying to reduce the number of registers so that I can get higher occupancy.

Also the extra storage seems to be from local memory (lmem). Since shared memory is faster, is it possible to force the compiler to use shared memory?

Any help would be appreciated.


The ptx code shown is unoptimised - the -magrregcount flag is taken into account when optimising it. If you look at the cubin files you will notice the difference.

If you want to use shared memory I’d do it explicitly. Not possible to get the compiler to do it as far as I know.

Thanks, I’ll look at the cubin files.

How do I use shared memory explicity?