>=32bit shift on 32bit integer

ocanela · September 10, 2021, 10:00am

Hi,

Shifting 32bit integers using a number > 31 can lead to undefined behaviour in CPUs. I read in old forum posts that this does not seem to be the case in GPUs. The current PTX documentation seems to indicate:

"Signed shifts fill with the sign bit, unsigned and untyped shifts fill with 0. "

and

“Shift amounts greater than the register width N are clamped to N .”

Is it safe to assume that the results of left/right shifting a 32bit unsigned integer by more than >=32bits (when using nvcc) is going to return 0 as a result in current and previous nVidia GPUs?

Is there any expectation of keeping this behaviour in future architectures?

Many thanks for the help.

njuffa · September 10, 2021, 10:19am

The behavior of shifts with shift count greater than or equal to the bit-width of the integer data type is undefined at the language level, in particular in C++, which is the language family to which CUDA belongs.

In decades of experience with all kind of processors, what happens with large shift counts is well-defined at the hardware level, i.e. the relevant machine instructions. The two common models are to either wrap the shift count (modulo the bit-width) or saturate it to a limit. x86 chose the former and NVIDIA GPUs generally chose the latter (although I seem to recall that for some NVIDIA GPUs wrap/clamp mode on shifts was selectable). In general, the machine instruction set of NVIDIA GPUs has seen large changes historically, and specifically there has been no notion of binary compatibility at the machine code level. So there are no guarantees regarding future GPUs.

For NVIDIA GPUs in particular, hardware instructions are publicly documented in a superficial manner only, and there is not supported way to program at the machine code level. However, you can rely on the behavior defined in the PTX specification when programming at that level. You cannot rely on anything that the PTX specification does not guarantee. In HLL CUDA code, you would want to stick to C++ specifications (or CUDA specifications where deviating from C++). What is __device__ code today may become __host__ __device__ code tomorrow.

ocanela · September 10, 2021, 11:00am

Hi njuffa,

I was hoping for a confirmation that I may be able to use shifts larger than 32bits, but perhaps it would be better to find alternative approaches.

Many thanks for your answer and taking the time to write it.

Best,

rs277 · September 10, 2021, 7:11pm

Depending on your exact usage requirement, you could possibly achieve the desired result through use of the “shf” instruction. Using 0x00000000 as the second operand appropriately for shf.l or shf.r would guarantee a zero result:

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions-shf

Recent, (limited), exposure to Cuda 8.6 SASS makes me wonder how much longer the “shl” and shr" instructions will be around in future hardware, as all shift operations were being done using only the shf instruction.

ocanela · September 13, 2021, 10:29am

oh, many thanks. That is a good solution.

Thanks for taking the time to post it.

Best,

Topic		Replies	Views
Unexpected result with 64-bit integer left-shift CUDA NVCC Compiler	4	780	April 7, 2022
problems reported with __ffs()? CUDA Programming and Performance	9	7601	October 31, 2010
multiword bit shifting (long intgers) CUDA Programming and Performance	5	6952	October 19, 2010
16-bit vs 32-bit Integer Arithmetic Performance CUDA Programming and Performance cuda	3	928	April 21, 2024
signed vs unsigned int for indexes and sizes CUDA Programming and Performance	9	13886	October 8, 2016
Incorrect x86 instruction emitted by Emu nvcc Right-shift is broken! CUDA Programming and Performance	13	6231	August 18, 2008
do bool/ char types imply inherent type conversion? CUDA Programming and Performance	3	1495	November 2, 2014
compiler bug? bit shift CUDA Programming and Performance	7	4529	November 28, 2008
Switch oddities Compiler bug? CUDA Programming and Performance	16	4565	September 10, 2008
Problem with Shift Left Can anyone help? CUDA Programming and Performance	4	4637	May 27, 2009

>=32bit shift on 32bit integer

Related topics