Functional Difference Emu vs. G80 (shift macro)

Dan_Johnson · April 25, 2007, 3:44pm

Took me forever to track down a bug in my code, seems to be this

#define SHL(x, s) ((unsigned int) ((x) << (s) ))
#define SHR(x, s) ((unsigned int) (( (unsigned int)x) >> (32 - (s)) ))

This code is yielding different results on the emu and the G80 hardware. Specifically, the results in the emu are as expected and the results from the G80 mismatch.

The macros have been changed from s to (s&31) and this fixed the problem.

I was wondering if there was a reason the emu and the hardware would perform different in this case. Are the shifts treated differently on each? IIRC, the emu just runs CPU threads and not any PTX/NV-generated code. I am guessing that the handling of out-of-range shift values is somehow inconsistent.

wumpus · April 25, 2007, 6:29pm

Indeed, shifting more to the left or right than the width of the type can result in different behaviour on different architectures.

Dan_Johnson · April 25, 2007, 9:06pm

My point is more that they result in different behavior on the G80 and under emulation - making kernel results inconsistent. Is there a reason why it is not compiled to perform the same operation on the G80 as the CPU? (since we know very little of the actual G80 hw at that level). If its just an arbitrary decision, it seems the G80 code should be made to match the emulation results.

mfatica · April 25, 2007, 9:52pm

These macros are invoking undefined behavior under both the C

and C++ standards (and thus also in CUDA) if the effective

shift count is greater than or equal to 32. This is the case

when s >= 32 for SHL or when s = 0 for SHR. The difference

between x86 and G80 is that 32-bit x86 truncates the shift

count to the lower 5 bits, whereas G80 does not.

C standard (1999), section 6.5.7 “Bitwise shift operators”

[…]

The type of the result is that of the promoted left operand.

If the value of the right operand is negative or is greater

than or equal to the width of the promoted left operand, the

behavior is undefined

If you are using these to construct a rotate from SHR and SHL

the following should work better within the restrictions of

the standards:

#define ROTL(x,s) ((((unsigned int)(x))<<((s)&31)) | \

                (((unsigned int)(x))>>((-(int)(s))&31)))

wumpus · April 26, 2007, 7:22am

It’s not really an arbitrary decision; << is compiled to a shift left instruction, >> to a shift right instruction. No clamping or other preprocessing on the values is done (imagine how inefficient this would be), so what you get is the result on your architecture.

As you see the ‘emulator’ doesn’t emulate G80 instruction set at all, just the grid/blocks/threads architecture.

osiris1 · April 26, 2007, 8:45am

Dan, You would have a beef if they called it a simulator, but it is not. A simulator would be at least an order of magnitude slower and should produce bit identical results to the hardware - including all floating point operations and also trap operations with indeterminate output (concurrent write from different threads).
Cheers, Eric

Dan_Johnson · April 26, 2007, 8:34pm

Thanks for all the replies :)

I can see why I get different results, the G80 hw implements shift different than x86 does. But I still have the same question though, only now about the hardware: why do it differently in the context of the G80? ( I suspect the answer to this could be unrelated to software, but maybe there is a gfx-specific reason to provide different behavior? )

I’m not aware of any hw manual, so what does the G80 do? (I am sure I could write code to find out, but I’ll be lazy and just ask…)

As wumpus says, clamping is inefficient and this is now what I am doing (ok well masking really … which would be free in HW ) because it is not done in hardware.

Although, I suppose this is really a side issue now and not CUDA-specific anymore…

Topic		Replies	Views
Incorrect x86 instruction emitted by Emu nvcc Right-shift is broken! CUDA Programming and Performance	13	6386	August 18, 2008
>=32bit shift on 32bit integer CUDA Programming and Performance cuda	5	1307	October 12, 2021
Problem with Shift Left Can anyone help? CUDA Programming and Performance	4	4705	May 27, 2009
Problem with left shift CUDA Programming and Performance	4	3088	January 24, 2011
64 bit integer shift instruction throughput CUDA Programming and Performance	3	6841	June 8, 2011
Problem with left shift CUDA Programming and Performance	0	2722	January 21, 2011
compiler bug? bit shift CUDA Programming and Performance	7	4614	November 28, 2008
multiword bit shifting (long intgers) CUDA Programming and Performance	5	7043	October 19, 2010
Emulator works but G80 doesn't CUDA Programming and Performance	11	5542	July 3, 2007
Need help writing a CUDA kernel for image shifting CUDA Programming and Performance	16	1659	June 16, 2022

Functional Difference Emu vs. G80 (shift macro)

Related topics