When presented with code such as:
int a = 128;
short i = 32;
int result = a >> i;
NVCC assigns to ‘result’ the value 128 instead of 0. If i=33, then result==64 instead of 0. In short, only the lowest five bits of i are considered. This is because nvcc directly uses the x86 SAR instruction (shift-arithmetic-right) which DOES NOT HAVE C-MANDATED BEHAVIOR! That’s all besides the fact that SAR can only take an 8-bit argument, even though the variable in my code is clearly 16-bit.
For comparison, VC++ emitts a call to _allshr.
CUDA hardware seems to work correctly.