Shorts obviously take up less memory than ints in device memory and shared memory. My question is what happens with shorts in registers.
If I understand correctly, section 7.6.1 (page 36) of the PTX documentation:
seems to say that shorts are promoted to ints, and therefore 1 short takes up just as much register space as 1 int, and so there is no memory benefit from using shorts instead of ints for register variables.
There would be performance benefits if there are too many register variables in the device code and so some have to spill over into local memory in the device memory.
Also, the compiler should translate multiplication of two shorts into use of the faster __mul24 operation, so using shorts would give cleaner code when going for max performance.
Can anyone confirm or contradict these comments?
Presumably the same comments apply to chars?
What about a short2? Is it stored in a single register, or in 2 registers?
If squeezed for register space, I know we can always manually code things to use bit shifting to concatenate 2 shorts into 1 int to go into 1 register. It’s a bit tedious though, and would involve a performance hit because of the cost of the bit shifting. But if the alternative is spillover into local memory then it might be worthwhile in some circumstances.