Strange program output; possibly a compiler bug

I’m posting below a snippet of code that seems to produce wrong results. I weren’t able to compile under a different platform, so I can’t tell how reproducible this is. My setup is as follows:

platform: 2.6.31-16 x86_64 GNU/Linux (Ubuntu)

nvcc: Built on Tue_Jul_21_09:08:51_PDT_2009, release 2.3, V0.2.1221

gcc: version 4.3.4 (Ubuntu 4.3.4-5ubuntu1)

cuda sdk: version 2.3

driver version: 190.42

hardware: nVidia Corporation G86 [GeForce 8500 GT] (rev a1)

All variables are of type ‘unsigned int’. 4 bytes of mapped memory are used to pass the value ‘0’ to variable ‘a’. ‘a’ is ANDed with a constant value, so the ternary operator should assign value ‘0’ to ‘b’. The ‘switch’ should then select the value ‘0’. The result, however, is the value 100.

If I comment or uncomment the indicated lines, the result changes. It is interesting to note that removing the redundant ‘for’, shuffling the cases, simplifying the computation of ‘b’, changing the ‘0x10’ constant, implementing an ‘if’ instead of the ‘switch’ and possibly making other changes is quite likely to produce the right result, so simplifying the code to more accurately pinpoint the error was difficult.

So, am I missing something here?

[codebox]#include <stdio.h>

#include <cutil_inline.h>

device unsigned int my_func(unsigned int x)

{

    switch (x)

    {

            case 1: return 10;

            case 2: return 20;

            case 3: return 30;

            case 0: return 0;

            default:return 100;

    }

}

global void my_kernel(unsigned int *output)

{

    unsigned int i, a, b;

// Comment next line

    for (i = 0; i < 1; i++)

    {

            a = output[0];

            b = ((a & 0x10) ? 1 : 0);

// Uncomment next line

            //if (b != 0) { output[0] = 0xDEADBEEF; } else

output[0] = my_func(B);

    }

}

int main()

{

    unsigned int *output, *output_d;

cudaSetDevice(0);

    cutilSafeCall(cudaSetDeviceFlags(cudaDeviceMapHost));

if (cudaHostAlloc(&output, sizeof(unsigned int), cudaHostAllocMapped) != cudaSuccess ||

            cudaHostGetDevicePointer(&output_d, output, 0) != cudaSuccess)

            exit(-1);

output[0] = 0;

    my_kernel<<<1, 1>>>(output_d);

    cutilSafeCall(cudaThreadSynchronize());

printf(“%u\n”, output[0]);

return 0;

}

[/codebox]

Anyone?

Which value has variable “l”?

I ask because sometimes I get wrong results because of the loop. Here you have a discussion about my problem. Anyway if l is less than 1000000 discard this suggestion.