CUDA 3.0 compiler bug with repro code

The following code fails when ARRAYSIZE is 1. It works for ARRAYSIZE > 1.

The failure case seems to be:

    passing an object as a parameter to a kernel

    the object contains members that are arrays

    the array length is 1

#include <stdio.h>

#define ARRAYSIZE 1

class MyClass {


	float *m_ptr[ARRAYSIZE];

	int	m_val[ARRAYSIZE];


__global__ void mykernel(MyClass obj, int *out) {

	unsigned int tid = threadIdx.y * blockDim.x + threadIdx.x;

	out[tid] = obj.m_val[tid];


int main() {

	MyClass obj;

	for (int i = 0; i < ARRAYSIZE; i++) {

		obj.m_ptr[i] = NULL;

		obj.m_val[i] = 666 + i;


	int *out_dev;

	cudaMalloc((void **)&out_dev, ARRAYSIZE * sizeof(int));

	mykernel<<<1, ARRAYSIZE>>>(obj, out_dev);

	int *out_host = (int *)malloc(ARRAYSIZE * sizeof(int));

	cudaMemcpy(out_host, out_dev, ARRAYSIZE * sizeof(int), cudaMemcpyDeviceToHost);

	for (int i = 0; i < ARRAYSIZE; i++) printf("%u\n", out_host[i]);

	return 0;


Expected output:

    For ARRAYSIZE = 1, it should output the single line “666”.

    For ARRAYSIZE = 2, it should output the lines “666” and “667”.



    This occurs under CUDA 3.0 but not under 2.3.

    It does not occur when compiling with the -G option (debug mode).

    I’m running it on a GTX 480.

    I’m using Ubuntu 9.04 (64-bit).

    I’ve tried many GCC versions from 4.2 to 4.4, same results.

    I found this problem in a larger program. The above code is the smallest example that reproduces the bug.


Thank you for bringing this to our attention. I was able to reproduce the behavior you observed a recent compiler. It does look like a compiler issue to me so I went ahead and filed a compiler bug.

Glad it helped. Good luck.