The following code fails when ARRAYSIZE is 1. It works for ARRAYSIZE > 1.
The failure case seems to be:
[*]passing an object as a parameter to a kernel
[*]the object contains members that are arrays
[*]the array length is 1
#include <stdio.h>
#define ARRAYSIZE 1
class MyClass {
public:
float *m_ptr[ARRAYSIZE];
int m_val[ARRAYSIZE];
};
__global__ void mykernel(MyClass obj, int *out) {
unsigned int tid = threadIdx.y * blockDim.x + threadIdx.x;
out[tid] = obj.m_val[tid];
}
int main() {
MyClass obj;
for (int i = 0; i < ARRAYSIZE; i++) {
obj.m_ptr[i] = NULL;
obj.m_val[i] = 666 + i;
}
int *out_dev;
cudaMalloc((void **)&out_dev, ARRAYSIZE * sizeof(int));
mykernel<<<1, ARRAYSIZE>>>(obj, out_dev);
int *out_host = (int *)malloc(ARRAYSIZE * sizeof(int));
cudaMemcpy(out_host, out_dev, ARRAYSIZE * sizeof(int), cudaMemcpyDeviceToHost);
for (int i = 0; i < ARRAYSIZE; i++) printf("%u\n", out_host[i]);
return 0;
}
Expected output:
[*]For ARRAYSIZE = 1, it should output the single line “666”.
[*]For ARRAYSIZE = 2, it should output the lines “666” and “667”.
[*]etc.
Details:
[*]This occurs under CUDA 3.0 but not under 2.3.
[*]It does not occur when compiling with the -G option (debug mode).
[*]I’m running it on a GTX 480.
[*]I’m using Ubuntu 9.04 (64-bit).
[*]I’ve tried many GCC versions from 4.2 to 4.4, same results.
[*]I found this problem in a larger program. The above code is the smallest example that reproduces the bug.
Jim