As programming guide mentioned, to declare a variable as shared memory can speed up the execution efficiency. My colleague and I wonder that will it be better to declare as register? If so, how to accomplish it? Programming guide said that register and shared memory can achieve almost the same speed that much faster than local memory, but I can’t find how to force the variable to be put in register rather than local memory?
Everything goes into a register automatically. The compiler only pushes values into local memory when it decides the register usage of your kernel is getting too high, or if you place a limit on the number of registers using -maxregcount.
Ah, right. I forgot about that. Array variables only go into registers if they have few elements and are accessed with constant indices known at compile time. Then the array can be treated like a struct.
The problem is that at a hardware level, I don’t think there is a way to do offset-based addressing with registers. Array indexing requires memory address arithmetic, and registers don’t have addresses. This is what makes arrays in registers impossible except in the special case I mentioned. (I’m not aware of any other CPU that can do this either, although most CPUs have so few registers, it wouldn’t make any sense anyway.)
Use shared memory for your array, a copy for each thread; it satisfies both the speed and addressability requirement (unless the array is too big to fit in shared mem, but then it will certainly be too big for registers…)