We have an application that can save a lot of memory loads if we can use more registers.
Is it possible to use more than 128 registers per thread?
The nvcc documentation for the --maxregcount flag says:
Otherwise the
specified value will be rounded to the next multiple of 4 registers until
the GPU specific maximum of 128 registers.
The above sentence is not quite clear to me. The GTX280 has twice the number of registers
compared to earlier cards, so it would be nice if we
can use them in this way. Or can they only be used to run more threads?
I don’t know about the compiler flag, but remember that if you are able to avoid bank conflicts in your reads/writes, shared memory is just as fast as the registers (section 5.1.2.5 of the programming guide).
Yeah, that looks to be the challenge as far as I know. For us it seems getting data into the PC fast enough, and I can only imagine it will be the same with all the data you will be getting.