Nvcc seems to generate incorrect code when accessing an int array through a char pointer:
__global__ void mixedaccesskernelok(int *result)
{
int ai[] = { 0x01020304 };
char* ac = (char*)ai;
*result = ac[0] + ac[1] + ac[2] + ac[3];
}
__global__ void mixedaccesskernelbug(int *result)
{
int ai[] = { 0x01020304 };
char* ac = (char*)ai;
int s = 0;
for (int i = 0; i != sizeof(ai); i++)
s += ac[i];
*result = s;
}
mixedaccesskernelok and mixedaccesskernelbug should give the same result “10”. The former does, but the latter returns 0x01020304.
When looking at the generated PTX code, I see strange operations like “shl.b32 %r4, %r2, -8;”, i.e. shifts by a negative amount. I assume the shifts are required because the loop is unrolled and ai placed in a register.
The output does not change if I prevent the compiler from unrolling the loop with “#pragma unroll 1”, although ai is placed in local memory and I cannot spot an error in the generated PTX code.
I compiled the attached program with
nvcc -O2 -o mixedcharintaccess mixedcharintaccess.cu
and
nvcc -O2 --ptx mixedcharintaccess.cu
-
Operating System: Debian Lenny amd64, 2.6.31 kernel
-
CUDA toolkit 2.3
-
SDK 2.3
-
Compiler for CPU host code: gcc version 4.3.2 (Debian 4.3.2-1.1)
mixedcharintaccess.cu (1.22 KB)