alignment issue in passing arrays to the GPU in kernel parameters

I’ve come across an issue while porting Scott Draves’ Fractal Flames to the GPU. I had intended to pass an array of 4 real numbers (actually floats) to the GPU. Worked just fine in emulation.

typedef float real;

On the host, the bounds array was declared like this

real bounds[4];

I was trying to pass the array by value into a kernel

global void iterate_kernel(int n, int fuse, point *points, int width, int height, real bounds[4]);

The kernel would segfault (actually unspecified launch fail) when making use of bounds elements in calculations.
The code ran fine in emulation however.

I worked around it by passing bounds[0], bounds[1], etc… explicitly

global void iterate_kernel(int n, int fuse, point *points, int width, int height, real bounds0, real bounds1, real bounds2, real bounds3);

So I conclude that there must be a problem with alignment of such arrays when being passed by value.

This is with CUDA toolkit 2.3 on OpenSuse Linux 32 bit.

Has anyone stepped onto this problem before?

Christian

The following program reproduces the behavior you describe:

[codebox]

#include

global void foo(int *out, int v[1])

{

*out = v[0];

}

int main(void)

{

int *out = 0;

cudaMalloc((void**)&out, sizeof(int));

int v[1] = {0};

foo<<<1,1>>>(out, v);

cudaThreadSynchronize();

cudaFree(out);

std::cerr << "CUDA Error: " << cudaGetErrorString(cudaGetLastError()) << std::endl;

return 0;

}

[/codebox]

Most likely the compiler is implementing the global function call by passing v’s pointer instead of copying the array by value. Maybe passing an array by value to a global function should be illegal similar to how global function reference arguments are illegal.

Here’s a workaround:

[codebox]

#include

struct wrapped_array

{

int v[1];

};

global void foo(int *out, wrapped_array a)

{

*out = a.v[0];

}

int main(void)

{

int *out = 0;

cudaMalloc((void**)&out, sizeof(int));

wrapped_array v = {0};

foo<<<1,1>>>(out, v);

cudaFree(out);

std::cerr << "CUDA Error: " << cudaGetErrorString(cudaGetLastError()) << std::endl;

return 0;

}

[/codebox]

Your explanation sounds reasonable. Yes, the compiler should definitely warn about this behavior and not let the user figure out what the heck is going on. ;)

My mistake; this is not a compiler bug. Arrays are always passed by reference in C/C++, so your code passes a pointer to an array on the host to the global function, which promptly crashes.