Register demand

Hi all!

I have to questions about registers:

I wrote a kernel which needs 11 registers per thread. But if I want to have a occupancy of 100%, I’m only allowed to use 10 registers. So, is there any way to reduce the register usage? I put all variables in shared memory, but this doesn’t help.

I control the usage of regsters by the nvcc flag –maxrregcount=10. Now I use only 10 registers, but there ins no increasing of shared memory or global memory. Which memory use the graphics card, now?

Thanks in advance!

Read the programming guide, you are using local memory, which is actually located in the global device memory. Also 100% occupancy does not necessarily equal maximum performance.

It either uses local memory or compiles a less efficient kernel in terms of instructions.

As jjp said, acheiving 100% occupancy doesn’t really help you much (if at all) over, say, 50% occupancy. Read the best practices guide - it covers optimisation in quite some detail. Chances are that you’ve actually slowed your program by removing that extra register, especially if it’s gone into local memory.