For example, if part of my code is as follows:
…
shared float sh;
…
register float tmp = sh[i];
a[0] += tmp;
a[1] += tmp;
a[2] += tmp;
a[3] += tmp;
…
I wish to load sh[i] into a register first, and then use this register to do the following computation. However, according to file generated by decuda, no register is used, but each time “tmp” is referenced in the source code, it is replaced by a shared memory access. I checked the ptx file, and register is used, so I assume it alright all the way through nvcc compiler, and what replaces register with shared memory access is ptxas.
Any idea how to force to use register? Thanks a lot! It is very important for me!!!