Handling of uint and the modulo operator

I believe I have found a bug either in the handling of unsigned integer arguments to kernels, or in the implementation of the modulo (%) operator.

I have listed a small example kernel below that demonstrates both the erroneous and the expected behavior. The expeced result is the three numbers 47, 48 and 49, but the first set of calculations produces 47, 4294967248, and 42949672497. The only difference between them is whether the second operant to % is a passed argument or a local variable initialized to a numerical literal.

Running the same kernel using a Radeon card produces the expected result for both variants.

I have attached a zip file containing both the .cl and a .cpp file that launches the kernel and prints the results in a human readable format.

__kernel void bug_kernel(__global uint* buffer, uint small)


  uint large = 2147483648U;

  buffer[0] = (large-1) % small;

  buffer[1] = large % small;

  buffer[2] = (large+1) % small;

uint otherSmall = 100U;

  buffer[3] = (large-1) % otherSmall;

  buffer[4] = (large-0) % otherSmall;

  buffer[5] = (large+1) % otherSmall;

buffer[6] = small;

  buffer[7] = otherSmall;


moduloOverflow_nvidia.zip (1.69 KB)

btw, it’s better to set it like a constant outside the kernel )

Have anyone had any similar problems? Is there a known work-around?

Which version of driver are you using? Have you tried the latest CUDA 3.0? The output looks correct when I tried it with CUDA 3.0.

I was using 3.0beta. Upgrading to 3.0 does indeed solve the problem. Thanks.