More than 128 registers per thread?

Hi all,

We have an application that can save a lot of memory loads if we can use more registers.
Is it possible to use more than 128 registers per thread?
The nvcc documentation for the --maxregcount flag says:

Otherwise the
specified value will be rounded to the next multiple of 4 registers until
the GPU specific maximum of 128 registers.

The above sentence is not quite clear to me. The GTX280 has twice the number of registers
compared to earlier cards, so it would be nice if we
can use them in this way. Or can they only be used to run more threads?

Cheers,

Rob

I don’t know about the compiler flag, but remember that if you are able to avoid bank conflicts in your reads/writes, shared memory is just as fast as the registers (section 5.1.2.5 of the programming guide).

Yes, I already tried that, but in our case it is very difficult to avoid bank conflicts…

Shared memory with bank conflicts is a lot faster than loading from global memory.

Yes, this is true of course. However, we use the texture cache.

The cache works extremely well in our case. I found that it was faster than using the shared memory with bank

conflicts…

Yes, for accesses that hit the cache often that is the best option. Are you doing a correlator for LOFAR?

Yep, that’s what I’m doing. Currently, we are still in the research stage. We are also looking at ATI hardware, the Cell, multi-core CPUs, etc.

Many-core systems could be a nice alternative for our Blue Gene/P…

The challenge is that we also have to do a lot of I/O!

Yeah, that looks to be the challenge as far as I know. For us it seems getting data into the PC fast enough, and I can only imagine it will be the same with all the data you will be getting.