I believe I have found a bug either in the handling of unsigned integer arguments to kernels, or in the implementation of the modulo (%) operator.
I have listed a small example kernel below that demonstrates both the erroneous and the expected behavior. The expeced result is the three numbers 47, 48 and 49, but the first set of calculations produces 47, 4294967248, and 42949672497. The only difference between them is whether the second operant to % is a passed argument or a local variable initialized to a numerical literal.
Running the same kernel using a Radeon card produces the expected result for both variants.
I have attached a zip file containing both the .cl and a .cpp file that launches the kernel and prints the results in a human readable format.
__kernel void bug_kernel(__global uint* buffer, uint small)
{
uint large = 2147483648U;
buffer[0] = (large-1) % small;
buffer[1] = large % small;
buffer[2] = (large+1) % small;
uint otherSmall = 100U;
buffer[3] = (large-1) % otherSmall;
buffer[4] = (large-0) % otherSmall;
buffer[5] = (large+1) % otherSmall;
buffer[6] = small;
buffer[7] = otherSmall;
}
moduloOverflow_nvidia.zip (1.69 KB)