We have an application that can save a lot of memory loads if we can use more registers.
Is it possible to use more than 128 registers per thread?
The nvcc documentation for the --maxregcount flag says:
specified value will be rounded to the next multiple of 4 registers until
the GPU specific maximum of 128 registers.
The above sentence is not quite clear to me. The GTX280 has twice the number of registers
compared to earlier cards, so it would be nice if we
can use them in this way. Or can they only be used to run more threads?
I don’t know about the compiler flag, but remember that if you are able to avoid bank conflicts in your reads/writes, shared memory is just as fast as the registers (section 18.104.22.168 of the programming guide).