When I try to compile my program I obtain a very strange error:
### Assertion failure at line 1923 of ../../be/cg/cgemit.cxx:
### Compiler Error in file /tmp/tmpxft_000022e5_00000000-7_kernel.cpp3.i during Assembly phase:
### incorrect register class for operand 1
nvopencc INTERNAL ERROR: /opt/cuda/open64/lib//be returned non-zero status 1
This error disappears if, instead of using the “<<”, I use the “>>” (right shift). I do not understand this behavior. By now I’m using the following “trick”
I submitted a similar bug to NVIDIA a few weeks ago which is clearly a compiler bug. This is probably the same one. It had to do with a shifts of 64 bit long values when the shift had integer math in it.
It only failed for sm_20 and not sm_13.
As a workaround, just put the shift amount into a temporary variable.
// compile with nvcc -arch sm_20 crash.cu
__device__ unsigned long circularShift(unsigned long v, int shift)
{
return (v<<(64-shift));
}
int main() {}
I recall a compiler bug being reported regarding 64-bit rotates recently. If memory serves, one workaround is to assign the left-shift and right-shift portions to temporary variables, then combine them (with | or + for example) in a separate step. Unfortunately that won’t work for a macro due to the need for temp variables. Could you use a forceinline function instead?
BTW, negative shift counts cause undefined behavior in C/C++, so assuming that offset is >= 0, ((a) >> (0-offset)) will not in general give the desired result.
Just as an aside, since shifts aren’t a common topic usually worth posting for: GPU shifts behave differently than CPU shifts when the shift amount exceeds the word size in bits.
This is undefined behavior in the C standard, and is one of the few cases where NVIDIA GPUs differ in their integer computation from Intel and AMD CPUs.
On the CPUs, a shift amount >= word size will end up using the low order bits of the shift, so a integer shift of 33 is the same as a shift of 1, for example.
On the GPU, the shift value is saturated, so a shift of 32 or more acts like 32.
This is true for right or left shifts.
Again this is not a bug, it’s documented as undefined behavior in C itself. It’s interesting there’s a difference though.